Some contributed Python packages: Quick overview, comparisons, and links

Icon class
icon_class_computed
fab fa-python
icon_class
fab fa-python

Some quick notes on some key Python packages, comparison of libraries, and links. This guide is by no means complete.


PyPi

The primary (but not only) resource for information on Python packages is PyPy.


NumPy

Perhaps the most famous contributed Python package is NumPy, which offers support for numerical and scientific computing, mathematical functions, improved arrays, vectors, matrices, and linear algebra (and some vector and matrix operations). Most of NumPy under the hood is C-optimised compiled. There is a dedicated info page here:


Matplotlib

The Matplotlib libraries offers plotting, interactive graphics, and animations. It depends on NumPy imported under the common alias np.

The former MATLAB-similar pylab convenience module has been replaced by pyplot

The Matplotlib docs use a slight unusual definition of the term 'backend'. The Python code is described as the 'frontend' and whatever is used for rendering and displaying the Matplotlib figures (including "user interface backends", "interactive backends", and "hardcopy backends").

From the Matplotlib Quick Start:

Plotting functions expect numpy.array or numpy.ma.masked_array as input, or objects that can be passed to numpy.asarray. Classes that are similar to arrays ('array-like') such as pandas data objects and numpy.matrix may not work as intended. Common convention is to convert these to numpy.array objects prior to plotting.

Pandas

Pandas ("Panel Data") for data analysis and data manipulation is another of the best known Python projects. It can import data from spreadsheets and a wide range of SQL databases and other data sources such as HDF5, and has strong support for working with JSON, XML, and HTML.

For an expanded description visit:

Visit also: Python Pandas

SciPy

The SciPy libraries for scientific computing, data analysis, image processing, signal processing, and mathematics (including more advanced linear algebra than NumPy) are also amongst the best known Python libraries. It is mostly composed for wrappers for optimised Fortran, C, and C++ compiled code.

For some nice examples visit also Scientific Programming with Python.

The sciipy.linalg module contains all the functions in numpy.linalg plus some more advanced ones. The SciPy variants are guaranteed to use BLAS/LAPACK support, so the SciPy variants might be faster.

It relies on the numpy.matrix (which apparently might "die" at some stage).


SymPy

SymPy is a Python library for symbolic mathematics. Dr Darren says:

Why anyone would use SymPy for symbolic algebra instead of  Wolfram Mathematica (or Maple) is beyond me. Maybe the argument is cost, but if your time is worth $$$ it might end up costing you more.

There are a number of other projects that do use SymPy, some of which are more or less trying to creates clones of (selected aspects of)  Wolfram Mathematica, Maple, or MATLAB.

To get a feel for the extent that SciPy counts as symbolic algebra visit What is Symbolic Computation?. Some very basic examples:

>>> from sympy import symbols
>>> sympy.sqrt(8)
2*sqrt(2)
>>> x, y = symbols('x y')
>>> expr = x + 2*y
>>> expr
x+2*y

SymPy is clearly better than nothing, it's free, and it's lightweight, but it isn't the Wolfram Language, and it can't match a Mathematica Notebook.


The Wolfram Client Library for Python

Did you know that you can call the Wolfram Engine (which is free for pre-production software development) and evaluate Wolfram Language code from Python? Did you know that it can directly handle Pandas DataFrames?


SageMath

The SageMath project is another free open source mathematics system for Python with some degree of symbolic algebra support. It also has a web interface system SageMatheCell for embedding Sage computations in web pages.


urllib vs urllib3 vs Requests

The newer urllib3 uses C extensions and is much faster than Requests or url lib.

requests[security]

Requests has a security extra: requests[security], which ensures that pyOpenSSL, ndg-httpsclient, and pyasn1 are also installed. Mostly it works transparently


BeautifulSoup4

BeautifulSoup4 is a contributed Python library designed for parsing HTML and XML documents and web scraping.


PyYAML

YAML config file parser and emitter.


PySpark

PySpark is the Python API for the Apache Spark open-source, distributed computing framework for large-scale data processing and analytics. The primary data structures are Resilient Distributed Datasets (RDD) and Spark DataFrames.

It plays nicely with Pandas, although the PySpark version of a data frame (intended for use across a distributed system and for very large datasets) is not the exactly the same as a Pandas DataFrame (intended for in memory use). The pyspark.pandas (Pandas API on Spark) module provides a pandas-like interface for working with PySpark DataFrames.

PySpark also has direct support for parallel processing of data. Pandas does not directly have support for parallel processing, but there are some additional libraries that offer parallel processing with Pandas-llke code.

And PySpark also has built-in support for big data tools like Hadoop and Hive.

PySpark itself is written in Scala – so runs on a Java Virtual Machine (JVM) – while Pandas is written in Python (although it has some C and Cython optimisations, and is built on top of NumPy, which has its own compiled optimisations).


dateutil

The python-dateutil module extends the standard date time module. For an overview of features visit the docs.


Scikit-learn

Scikit-learn has libraries for machine learning, and works together with NumPy and SciPy. It is written primarily in Python with some core algorithm Cython optimisations.

According to this DigitalOcean tutorial it is 'ideal for traditional machine learning models, while TensorFlow and PyTorch excel in deep learning and large-scale AI applications.'


Natural Language Toolkit (NTLK)

The Natural Language Toolkit (NTLK) is for working with human language data. It provides interfaces to corpora and lexical resources such as WordNet and a suite of processing libraries for parsing, tokenisation, stemming etc. and for semantic reasoning. It employs the nltk.PerceptronTagger.


Statsmodels

Statsmodels is a module for estimation of statistical models built on NumPy, SciPy, and Pandas. It supports specifying models using R-style formulas and pandas DataFrames.

There is a known version issue vs SciPy-1.16.0 because of the removal of _lazywhere from SciPy

PyCrypto and PyCryptodome

PyCrypto is no longer maintained and contains security vulnerabilities, however it is still listed on PyPi as pycrypto 2.6.1.

Use instead PyCryptodome which can optionally be installed as an (almost) drop-in replacement for the old PyCrypto library. PyCryptodome is a fork of PyCrypto (vs 2.6.1). Does not require OpenSSL, making it a lightweight option for standalone cryptographic solutions.


cryptography

cryptography 'includes both high level recipes and low level interfaces to common cryptographic algorithms such as symmetric ciphers, message digests, and key derivation functions'.

The main two encryption techniques offered are the simple to use Fernet (symmetric encryption) method (which requires keeping a single shared key secret) and the far more complicated but far more secure certificate-based X.509 method (typically using an RSA key). For local testing one can use a self-signed certificate.

Visit also Cryptography vs PyCryptodome: Understand the Difference:

The Cryptography package provides a high-level API, making it easier to use for developers. On the other hand, PyCryptodome offers a comprehensive range of cryptographic primitives and is known for its speed and efficiency.

Notes
Relevant snippets (from other sources)
Visit also
Visit also (backlinks)
External links
Flags