Python Pandas

Icon class
icon_class_computed
fab fa-python
icon_class
fab fa-python

Pandas ("Panel Data") for data analysis and data manipulation is one of the best known Python projects. It can import data from spreadsheets and a wide range of SQL databases and other data sources such as HDF5, and has strong support for working with JSON, XML, and HTML.

The primary data structures are the Series (a one-dimensional labeled array holding data of any type) and the DataFrame (a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns).

Pandas works together with NumPy, and can leverage constructs such as the Not-A-Number np.nan. It is typically imported as:

import numpy as np
import pandas as pd

One of the main differences between Pandas and NumPy is the ability to work with labelled data and spreadsheet-like tabular data. Another is that Pandas has dedicated features for dealing with time series and very large data sets, as well as advanced data analytics and data cleansing tools.

The Pandas DataFrame is more flexible than a NumPy ndarray:

NumPy arrays have one dtype for the entire array while pandas DataFrames have one dtype per column. When you call DataFrame.to_numpy(), pandas will find the NumPy dtype that can hold all of the dtypes in the DataFrame.

Pandas also offers plots using Matplotlib.

Pandas has support for multi-level hierarchical indexing with MultiIndex .

To get a feel for the DataFrame and access read 10 minutes to pandas. Note how it emphasises that selection using direct [start:stop:step] slicing notation is supported, in production code one should use the data access methods at(), iat(), loc(), and iloc().

Pandas provides some vectorised operations such as apply() and map() that operate on entire arrays or columns at once, and leverage C optimisations, so are much faster than Python loops.

Pandas does not directly support parallel processing, but additional libraries such as pandarallel, parallel-pandas, and Modin enable parallel processing with a Pandas-like API.

RAPIDS cuDF pandas offers GPU acceleration with zero code change.


Visit also about PySpark

Notes
Relevant snippets (from other sources)
Visit also
Visit also (backlinks)
External links
Flags