Stats: statistics 

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

agg(func=None, axis: Axis = 0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Parameters:

func (function, str, list or dict) –
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. [np.sum, 'mean']
- dict of axis labels -> functions, function names or list of such.
axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.

Returns:

The return can be:

scalar : when Series.agg is called with single function
Series : when DataFrame.agg is called with a single function
DataFrame : when DataFrame.agg is called with several functions

Return type:

scalar, Series or DataFrame

See also

DataFrame.apply: Perform any type of operations.
DataFrame.transform: Perform transformation type operations.
pandas.DataFrame.groupby: Perform operations over groups.
pandas.DataFrame.resample: Perform operations over resampled bins.
pandas.DataFrame.rolling: Perform operations over rolling window.
pandas.DataFrame.expanding: Perform operations over expanding window.
pandas.core.window.ewm.ExponentialMovingWindow: Perform operation over exponential weighted window.

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64

aggregate(func=None, axis: Axis = 0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Parameters:

func (function, str, list or dict) –
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. [np.sum, 'mean']
- dict of axis labels -> functions, function names or list of such.
axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.

Returns:

The return can be:

scalar : when Series.agg is called with single function
Series : when DataFrame.agg is called with a single function
DataFrame : when DataFrame.agg is called with several functions

Return type:

scalar, Series or DataFrame

See also

DataFrame.apply: Perform any type of operations.
DataFrame.transform: Perform transformation type operations.
pandas.DataFrame.groupby: Perform operations over groups.
pandas.DataFrame.resample: Perform operations over resampled bins.
pandas.DataFrame.rolling: Perform operations over rolling window.
pandas.DataFrame.expanding: Perform operations over expanding window.
pandas.core.window.ewm.ExponentialMovingWindow: Perform operation over exponential weighted window.

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

>>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64

all(axis: Axis | None = 0, bool_only: bool = False, skipna: bool = True, **kwargs) → Series | bool

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters:

axis ({0 or 'index', 1 or 'columns', None}, default 0) –
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.
- 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
- 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
- None : reduce all axes, return a scalar.
bool_only (bool, default False) – Include only boolean columns. Not implemented for Series.
skipna (bool, default True) – Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
**kwargs (any, default None) – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

If level is specified, then, DataFrame is returned; otherwise, Series is returned.

Return type:

See also

Series.all: Return True if all elements are True.
DataFrame.any: Return True if one (or more) elements are True.

Examples

Series

>>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([], dtype="float64").all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True

DataFrames

Create a dataframe from a dictionary.

>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
   col1   col2
0  True   True
1  True  False

Default behaviour checks if values in each column all return True.

>>> df.all()
col1     True
col2    False
dtype: bool

Specify axis='columns' to check if values in each row all return True.

>>> df.all(axis='columns')
0     True
1    False
dtype: bool

Or axis=None for whether every value is True.

>>> df.all(axis=None)
False

any(*, axis: Axis | None = 0, bool_only: bool = False, skipna: bool = True, **kwargs) → Series | bool

Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters:

axis ({0 or 'index', 1 or 'columns', None}, default 0) –
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.
- 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
- 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
- None : reduce all axes, return a scalar.
bool_only (bool, default False) – Include only boolean columns. Not implemented for Series.
skipna (bool, default True) – Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
**kwargs (any, default None) – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

If level is specified, then, DataFrame is returned; otherwise, Series is returned.

Return type:

See also

numpy.any: Numpy version of this method.
Series.any: Return whether any element is True.
Series.all: Return whether all elements are True.
DataFrame.any: Return whether any element is True over requested axis.
DataFrame.all: Return whether all elements are True over requested axis.

Examples

Series

For Series input, the output is a scalar indicating whether any element is True.

>>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([], dtype="float64").any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True

DataFrame

Whether each column contains at least one True element (the default).

>>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
   A  B  C
0  1  0  0
1  2  2  0

>>> df.any()
A     True
B     True
C    False
dtype: bool

Aggregating over the columns.

>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
       A  B
0   True  1
1  False  2

>>> df.any(axis='columns')
0    True
1    True
dtype: bool

>>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
       A  B
0   True  1
1  False  0

>>> df.any(axis='columns')
0    True
1    False
dtype: bool

Aggregating over the entire DataFrame with axis=None.

>>> df.any(axis=None)
True

any for an empty DataFrame is an empty Series.

>>> pd.DataFrame([]).any()
Series([], dtype: bool)

apply(func: AggFuncType, axis: Axis = 0, raw: bool = False, result_type: Literal['expand', 'reduce', 'broadcast'] | None = None, args=(), by_row: Literal[False, 'compat'] = 'compat', engine: Literal['python', 'numba'] = 'python', engine_kwargs: dict[str, bool] | None = None, **kwargs)

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

Parameters:

func (function) – Function to apply to each column or row.
axis ({0 or 'index', 1 or 'columns'}, default 0) –
Axis along which the function is applied:
- 0 or ‘index’: apply function to each column.
- 1 or ‘columns’: apply function to each row.
raw (bool, default False) –
Determines if row or column is passed as a Series or ndarray object:
- False : passes each row or column as a Series to the function.
- True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
result_type ({'expand', 'reduce', 'broadcast', None}, default None) –
These only act when axis=1 (columns):
- ’expand’ : list-like results will be turned into columns.
- ’reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
- ’broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.
args (tuple) – Positional arguments to pass to func in addition to the array/series.
by_row (False or "compat", default "compat") –
Only has an effect when func is a listlike or dictlike of funcs and the func isn’t a string. If “compat”, will if possible first translate the func into pandas methods (e.g. Series().apply(np.sum) will be translated to Series().sum()). If that doesn’t work, will try call to apply again with by_row=True and if that fails, will call apply again with by_row=False (backward compatible). If False, the funcs will be passed the whole Series at once.

Added in version 2.1.0.
engine ({'python', 'numba'}, default 'python') –
Choose between the python (default) engine or the numba engine in apply.

The numba engine will attempt to JIT compile the passed function, which may result in speedups for large DataFrames. It also supports the following engine_kwargs :
- nopython (compile the function in nopython mode)
- nogil (release the GIL inside the JIT compiled function)
- parallel (try to apply the function in parallel over the DataFrame)
  
  Note: Due to limitations within numba/how pandas interfaces with numba, you should only use this if raw=True
Note: The numba compiler only supports a subset of valid Python/numpy operations.

Please read more about the supported python features and supported numpy features in numba to learn what you can or cannot use in the passed function.

Added in version 2.2.0.
engine_kwargs (dict) – Pass keyword arguments to the engine. This is currently only used by the numba engine, see the documentation for the engine argument for more information.
**kwargs – Additional keyword arguments to pass as keywords arguments to func.

Returns:

Result of applying func along the given axis of the DataFrame.

Return type:

See also

DataFrame.map: For elementwise operations.
DataFrame.aggregate: Only perform aggregating type operations.
DataFrame.transform: Only perform transforming type operations.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Examples

>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df
   A  B
0  4  9
1  4  9
2  4  9

Using a numpy universal function (in this case the same as np.sqrt(df)):

>>> df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

Using a reducing function on either axis

>>> df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64

>>> df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

Returning a list-like will result in a Series

>>> df.apply(lambda x: [1, 2], axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

Passing result_type='expand' will expand list-like results to columns of a Dataframe

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
1
1  2
1  2
1  2

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
   foo  bar
0    1    2
1    1    2
2    1    2

Passing result_type='broadcast' will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
   A  B
0  1  2
1  1  2
2  1  2

applymap(func: PythonFuncType, na_action: NaAction | None = None, **kwargs) → DataFrame

Apply a function to a Dataframe elementwise.

Deprecated since version 2.1.0: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters:

func (callable) – Python function, returns a single value from a single value.
na_action ({None, 'ignore'}, default None) – If ‘ignore’, propagate NaN values, without passing them to func.
**kwargs – Additional keyword arguments to pass as keywords arguments to func.

Returns:

Transformed DataFrame.

Return type:

See also

DataFrame.apply: Apply a function along input axis of DataFrame.
DataFrame.map: Apply a function along input axis of DataFrame.
DataFrame.replace: Replace values given in to_replace with value.

Examples

>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
       0      1
0  1.000  2.120
1  3.356  4.567

>>> df.map(lambda x: len(str(x)))
1
3  4
5  5

assign(**kwargs) → DataFrame

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters:: **kwargs (dict of {str: callable or Series}) – The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.
Returns:: A new DataFrame with the new columns in addition to all the existing columns.
Return type:: DataFrame

Notes

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

Examples

>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
...                   index=['Portland', 'Berkeley'])
>>> df
          temp_c
Portland    17.0
Berkeley    25.0

Where the value is a callable, evaluated on df:

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
...           temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9)
          temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15

property axes: list[Index]

Return a list representing the axes of the DataFrame.

It has the row axis labels and column axis labels as the only members. They are returned in that order.

Examples

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'],
dtype='object')]

boxplot(column=None, by=None, ax=None, fontsize: int | None = None, rot: int = 0, grid: bool = True, figsize: tuple[float, float] | None = None, layout=None, return_type=None, backend=None, **kwargs)

Make a box plot from DataFrame columns.

Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

For further details see Wikipedia’s entry for boxplot.

Parameters:

column (str or list of str, optional) – Column name or list of names, or vector. Can be any valid input to pandas.DataFrame.groupby().
by (str or array-like, optional) – Column in the DataFrame to pandas.DataFrame.groupby(). One box-plot will be done per value of columns in by.
ax (object of class matplotlib.axes.Axes, optional) – The matplotlib axes to be used by boxplot.
fontsize (float or str) – Tick label font size in points or as a string (e.g., large).
rot (float, default 0) – The rotation angle of labels (in degrees) with respect to the screen coordinate system.
grid (bool, default True) – Setting this to True will show the grid.
figsize (A tuple (width, height) in inches) – The size of the figure to create in matplotlib.
layout (tuple (rows, columns), optional) – For example, (3, 5) will display the subplots using 3 rows and 5 columns, starting from the top-left.
return_type ({'axes', 'dict', 'both'} or None, default 'axes') –
The kind of object to return. The default is axes.
- ’axes’ returns the matplotlib axes the boxplot is drawn on.
- ’dict’ returns a dictionary whose values are the matplotlib Lines of the boxplot.
- ’both’ returns a namedtuple with the axes and dict.
- when grouping with by, a Series mapping columns to return_type is returned.
  
  If return_type is None, a NumPy array of axes with the same shape as layout is returned.
backend (str, default None) – Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.
**kwargs – All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot().

Returns:

See Notes.

Return type:

result

See also

pandas.Series.plot.hist: Make a histogram.
matplotlib.pyplot.boxplot: Matplotlib equivalent plot.

Notes

The return type depends on the return_type parameter:

‘axes’ : object of class matplotlib.axes.Axes
‘dict’ : dict of matplotlib.lines.Line2D objects
‘both’ : a namedtuple with structure (ax, lines)

For data grouped with by, return a Series of the above or a numpy array:

Series
array (for return_type = None)

Use return_type='dict' when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned.

Examples

Boxplots can be created for every column in the dataframe by df.boxplot() or indicating the columns to be used:

Boxplots of variables distributions grouped by the values of a third variable can be created using the option by. For instance:

A list of strings (i.e. ['X', 'Y']) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:

The layout of boxplot can be adjusted giving a tuple to layout:

Additional formatting can be done to the boxplot, like suppressing the grid (grid=False), rotating the labels in the x-axis (i.e. rot=45) or changing the fontsize (i.e. fontsize=15):

The parameter return_type can be used to select the type of element returned by boxplot. When return_type='axes' is selected, the matplotlib axes on which the boxplot is drawn are returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], return_type='axes')
>>> type(boxplot)
<class 'matplotlib.axes._axes.Axes'>

When grouping with by, a Series mapping columns to return_type is returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type='axes')
>>> type(boxplot)
<class 'pandas.core.series.Series'>

If return_type is None, a NumPy array of axes with the same shape as layout is returned:

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type=None)
>>> type(boxplot)
<class 'numpy.ndarray'>

columns

The column labels of the DataFrame.

Examples

>>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> df
     A  B
0    1  3
1    2  4
>>> df.columns
Index(['A', 'B'], dtype='object')

combine(other: DataFrame, func: Callable[[Series, Series], Series | Hashable], fill_value=None, overwrite: bool = True) → DataFrame

Perform column-wise combine with another DataFrame.

Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.

Parameters:

other (DataFrame) – The DataFrame to merge column-wise.
func (function) – Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns.
fill_value (scalar value, default None) – The value to fill NaNs with prior to passing any column to the merge func.
overwrite (bool, default True) – If True, columns in self that do not exist in other will be overwritten with NaNs.

Returns:

Combination of the provided DataFrames.

Return type:

See also

DataFrame.combine_first: Combine two DataFrame objects and default to non-null values in frame calling the method.

Examples

Combine using a simple function that chooses the smaller column.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3

Example using a true element-wise combine function.

>>> df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3

Using fill_value fills Nones prior to passing the column to the merge function.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0 -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None is preserved

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
    A    B
0  0 -5.0
1  0  3.0

Example that demonstrates the use of overwrite and behavior when the axis differ between the dataframes.

>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])
>>> df1.combine(df2, take_smaller)
     A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0

>>> df1.combine(df2, take_smaller, overwrite=False)
     A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0

Demonstrating the preference of the passed in dataframe.

>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2])
>>> df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN

>>> df2.combine(df1, take_smaller, overwrite=False)
     A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0

combine_first(other: DataFrame) → DataFrame

Update null elements with value in the same location in other.

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two. The resulting dataframe contains the ‘first’ dataframe values and overrides the second one values where both first.loc[index, col] and second.loc[index, col] are not missing values, upon calling first.combine_first(second).

Parameters:: other (DataFrame) – Provided DataFrame to use to fill null values.
Returns:: The result of combining the provided DataFrame with the other object.
Return type:: DataFrame

See also

DataFrame.combine: Perform series-wise operation on two DataFrames using a given function.

Examples

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0

Null values still persist if the location of that null value does not exist in other

>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0

compare(other: DataFrame, align_axis: Axis = 1, keep_shape: bool = False, keep_equal: bool = False, result_names: Suffixes = ('self', 'other')) → DataFrame

Compare to another DataFrame and show the differences.

Parameters:

other (DataFrame) – Object to compare with.
align_axis ({0 or 'index', 1 or 'columns'}, default 1) –
Determine which axis to align the comparison on.
- 0, or ‘index’Resulting differences are stacked vertically
  with rows drawn alternately from self and other.
- 1, or ‘columns’Resulting differences are aligned horizontally
  with columns drawn alternately from self and other.
keep_shape (bool, default False) – If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.
keep_equal (bool, default False) – If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.
result_names (tuple, default ('self', 'other')) –
Set the dataframes names in the comparison.

Added in version 1.5.0.

Returns:

DataFrame that shows the differences stacked side by side.

The resulting index will be a MultiIndex with ‘self’ and ‘other’ stacked alternately at the inner level.

Return type:

Raises:

ValueError – When the two DataFrames don’t have identical labels or shape.

See also

Series.compare: Compare with another Series and show differences.
DataFrame.equals: Test whether two objects contain the same elements.

Notes

Matching NaNs will not appear as a difference.

Can only compare identically-labeled (i.e. same shape, identical row and column labels) DataFrames

Examples

>>> df = pd.DataFrame(
...     {
...         "col1": ["a", "a", "b", "b", "a"],
...         "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
...     },
...     columns=["col1", "col2", "col3"],
... )
>>> df
  col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

>>> df2 = df.copy()
>>> df2.loc[0, 'col1'] = 'c'
>>> df2.loc[2, 'col3'] = 4.0
>>> df2
  col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

Align the differences on columns

>>> df.compare(df2)
  col1       col3
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

Assign result_names

>>> df.compare(df2, result_names=("left", "right"))
  col1       col3
  left right left right
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

Stack the differences on rows

>>> df.compare(df2, align_axis=0)
        col1  col3
0 self     a   NaN
  other    c   NaN
2 self   NaN   3.0
  other  NaN   4.0

Keep the equal values

>>> df.compare(df2, keep_equal=True)
  col1       col3
  self other self other
0    a     c  1.0   1.0
2    b     b  3.0   4.0

Keep all original rows and columns

>>> df.compare(df2, keep_shape=True)
  col1       col2       col3
  self other self other self other
0    a     c  NaN   NaN  NaN   NaN
1  NaN   NaN  NaN   NaN  NaN   NaN
2  NaN   NaN  NaN   NaN  3.0   4.0
3  NaN   NaN  NaN   NaN  NaN   NaN
4  NaN   NaN  NaN   NaN  NaN   NaN

Keep all original rows and columns and also all original values

>>> df.compare(df2, keep_shape=True, keep_equal=True)
  col1       col2       col3
  self other self other self other
0    a     c  1.0   1.0  1.0   1.0
1    a     a  2.0   2.0  2.0   2.0
2    b     b  3.0   3.0  3.0   4.0
3    b     b  NaN   NaN  4.0   4.0
4    a     a  5.0   5.0  5.0   5.0

corr(method: CorrelationMethod = 'pearson', min_periods: int = 1, numeric_only: bool = False) → DataFrame

Compute pairwise correlation of columns, excluding NA/null values.

Parameters:

method ({'pearson', 'kendall', 'spearman'} or callable) –
Method of correlation:
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- callable: callable with input two 1d ndarrays
  and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.
min_periods (int, optional) – Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.
numeric_only (bool, default False) –
Include only float, int or boolean data.

Added in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

Correlation matrix.

Return type:

See also

DataFrame.corrwith: Compute pairwise correlation with another DataFrame or Series.
Series.corr: Compute the correlation between two Series.

Notes

Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.

Examples

>>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
...                   columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0

>>> df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],
...                   columns=['dogs', 'cats'])
>>> df.corr(min_periods=3)
      dogs  cats
dogs   1.0   NaN
cats   NaN   1.0

corrwith(other: DataFrame | Series, axis: Axis = 0, drop: bool = False, method: CorrelationMethod = 'pearson', numeric_only: bool = False) → Series

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

Parameters:

other (DataFrame, Series) – Object with which to compute correlations.
axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ to compute row-wise, 1 or ‘columns’ for column-wise.
drop (bool, default False) – Drop missing indices from result.
method ({'pearson', 'kendall', 'spearman'} or callable) –
Method of correlation:
- pearson : standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- callable: callable with input two 1d ndarrays
  and returning a float.
numeric_only (bool, default False) –
Include only float, int or boolean data.

Added in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

Pairwise correlations.

Return type:

See also

DataFrame.corr: Compute pairwise correlation of columns.

Examples

>>> index = ["a", "b", "c", "d", "e"]
>>> columns = ["one", "two", "three", "four"]
>>> df1 = pd.DataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
>>> df2 = pd.DataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)
>>> df1.corrwith(df2)
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64

>>> df2.corrwith(df1, axis=1)
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64

count(axis: Axis = 0, numeric_only: bool = False)

Count non-NA cells for each column or row.

The values None, NaN, NaT, pandas.NA are considered NA.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
numeric_only (bool, default False) – Include only float, int or boolean data.

Returns:

For each column/row the number of non-NA/null entries.

Return type:

See also

Series.count: Number of non-NA elements in a Series.
DataFrame.value_counts: Count unique combinations of columns.
DataFrame.shape: Number of DataFrame rows and columns (including NA elements).
DataFrame.isna: Boolean same-sized DataFrame showing places of NA elements.

Examples

Constructing DataFrame from a dictionary:

>>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2   Lewis  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64

Counts for each row:

>>> df.count(axis='columns')
  3
  2
  3
  3
  3
dtype: int64

cov(min_periods: int | None = None, ddof: int | None = 1, numeric_only: bool = False) → DataFrame

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a DataFrame. The returned data frame is the covariance matrix of the columns of the DataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Parameters:

min_periods (int, optional) – Minimum number of observations required per pair of columns to have a valid result.
ddof (int, default 1) – Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. This argument is applicable only when no nan is in the dataframe.
numeric_only (bool, default False) –
Include only float, int or boolean data.

Added in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:

The covariance matrix of the series of the DataFrame.

Return type:

See also

Series.cov: Compute covariance with another Series.
core.window.ewm.ExponentialMovingWindow.cov: Exponential weighted sample covariance.
core.window.expanding.Expanding.cov: Expanding sample covariance.
core.window.rolling.Rolling.cov: Rolling sample covariance.

Notes

Returns the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-ddof.

For DataFrames that have Series that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member Series.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.

Examples

>>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> df.cov()
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667

>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(1000, 5),
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov()
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(20, 3),
...                   columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = np.nan
>>> df.loc[df.index[5:10], 'b'] = np.nan
>>> df.cov(min_periods=12)
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202

cummax(axis: Axis | None = None, skipna: bool = True, *args, **kwargs)

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args – Additional keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative maximum of Series or DataFrame.

Return type:

See also

core.window.expanding.Expanding.max: Similar functionality but ignores NaN values.
DataFrame.max: Return the maximum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummax()
  2.0
  NaN
  5.0
  5.0
  5.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummax(skipna=False)
  2.0
  NaN
  NaN
  NaN
  NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row, use axis=1

>>> df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0

cummin(axis: Axis | None = None, skipna: bool = True, *args, **kwargs)

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args – Additional keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative minimum of Series or DataFrame.

Return type:

See also

core.window.expanding.Expanding.min: Similar functionality but ignores NaN values.
DataFrame.min: Return the minimum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummin()
  2.0
  NaN
  2.0
 -1.0
 -1.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummin(skipna=False)
  2.0
  NaN
  NaN
  NaN
  NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row, use axis=1

>>> df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

cumprod(axis: Axis | None = None, skipna: bool = True, *args, **kwargs)

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args – Additional keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative product of Series or DataFrame.

Return type:

See also

core.window.expanding.Expanding.prod: Similar functionality but ignores NaN values.
DataFrame.prod: Return the product over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumprod()
   2.0
   NaN
  10.0
 -10.0
  -0.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumprod(skipna=False)
  2.0
  NaN
  NaN
  NaN
  NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumprod()
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row, use axis=1

>>> df.cumprod(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0

cumsum(axis: Axis | None = None, skipna: bool = True, *args, **kwargs)

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args – Additional keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative sum of Series or DataFrame.

Return type:

See also

core.window.expanding.Expanding.sum: Similar functionality but ignores NaN values.
DataFrame.sum: Return the sum over DataFrame axis.
DataFrame.cummax: Return cumulative maximum over DataFrame axis.
DataFrame.cummin: Return cumulative minimum over DataFrame axis.
DataFrame.cumsum: Return cumulative sum over DataFrame axis.
DataFrame.cumprod: Return cumulative product over DataFrame axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumsum()
  2.0
  NaN
  7.0
  6.0
  6.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumsum(skipna=False)
  2.0
  NaN
  NaN
  NaN
  NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row, use axis=1

>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0

diff(periods: int = 1, axis: Axis = 0) → DataFrame

First discrete difference of element.

Calculates the difference of a DataFrame element compared with another element in the DataFrame (default is element in previous row).

Parameters:

periods (int, default 1) – Periods to shift for calculating difference, accepts negative values.
axis ({0 or 'index', 1 or 'columns'}, default 0) – Take difference over rows (0) or columns (1).

Returns:

First differences of the Series.

Return type:

See also

DataFrame.pct_change: Percent change over given number of periods.
DataFrame.shift: Shift index by desired number of periods with an optional time freq.
Series.diff: First discrete difference of object.

Notes

For boolean dtypes, this uses operator.xor() rather than operator.sub(). The result is calculated according to current dtype in DataFrame, however dtype of the result is always float64.

Examples

Difference with previous row

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
...                    'b': [1, 1, 2, 3, 5, 8],
...                    'c': [1, 4, 9, 16, 25, 36]})
>>> df
   a  b   c
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36

>>> df.diff()
     a    b     c
NaN  NaN   NaN
1.0  0.0   3.0
1.0  1.0   5.0
1.0  1.0   7.0
1.0  2.0   9.0
1.0  3.0  11.0

Difference with previous column

>>> df.diff(axis=1)
    a  b   c
NaN  0   0
NaN -1   3
NaN -1   7
NaN -1  13
NaN  0  20
NaN  2  28

Difference with 3rd previous row

>>> df.diff(periods=3)
     a    b     c
NaN  NaN   NaN
NaN  NaN   NaN
NaN  NaN   NaN
3.0  2.0  15.0
3.0  4.0  21.0
3.0  6.0  27.0

Difference with following row

>>> df.diff(periods=-1)
     a    b     c
-1.0  0.0  -3.0
-1.0 -1.0  -5.0
-1.0 -1.0  -7.0
-1.0 -2.0  -9.0
-1.0 -3.0 -11.0
NaN  NaN   NaN

Overflow in input dtype

>>> df = pd.DataFrame({'a': [1, 0]}, dtype=np.uint8)
>>> df.diff()
       a
0    NaN
1  255.0

div(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

divide(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

dot(other: Series) → Series

dot(other: DataFrame | Index | ArrayLike) → DataFrame

Compute the matrix multiplication between the DataFrame and other.

This method computes the matrix product between the DataFrame and the values of an other Series, DataFrame or a numpy array.

It can also be called using self @ other.

Parameters:: other (Series, DataFrame or array-like) – The other object to compute the matrix product with.
Returns:: If other is a Series, return the matrix product between self and other as a Series. If other is a DataFrame or a numpy.array, return the matrix product of self and other in a DataFrame of a np.array.
Return type:: Series or DataFrame

See also

Series.dot: Similar method for Series.

Notes

The dimensions of DataFrame and other must be compatible in order to compute the matrix multiplication. In addition, the column names of DataFrame and the index of other must contain the same values, as they will be aligned prior to the multiplication.

The dot method for Series computes the inner product, instead of the matrix product here.

Examples

Here we multiply a DataFrame with a Series.

>>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> s = pd.Series([1, 1, 2, 1])
>>> df.dot(s)
0    -4
1     5
dtype: int64

Here we multiply a DataFrame with another DataFrame.

>>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(other)
    0   1
0   1   4
1   2   2

Note that the dot method give the same result as @

>>> df @ other
 1
 1   4
 2   2

The dot method works also if other is an np.array.

>>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(arr)
    0   1
0   1   4
1   2   2

Note how shuffling of the objects does not change the result.

>>> s2 = s.reindex([1, 0, 2, 3])
>>> df.dot(s2)
0    -4
1     5
dtype: int64

drop(labels: IndexLabel = None, *, axis: Axis = 0, index: IndexLabel = None, columns: IndexLabel = None, level: Level = None, inplace: Literal[True], errors: IgnoreRaise = 'raise') → None

drop(labels: IndexLabel = None, *, axis: Axis = 0, index: IndexLabel = None, columns: IndexLabel = None, level: Level = None, inplace: Literal[False] = False, errors: IgnoreRaise = 'raise') → DataFrame

drop(labels: IndexLabel = None, *, axis: Axis = 0, index: IndexLabel = None, columns: IndexLabel = None, level: Level = None, inplace: bool = False, errors: IgnoreRaise = 'raise') → DataFrame | None

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by directly specifying index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide for more information about the now unused levels.

Parameters:

labels (single label or list-like) – Index or column labels to drop. A tuple will be used as a single label and not treated as a list-like.
axis ({0 or 'index', 1 or 'columns'}, default 0) – Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
index (single label or list-like) – Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).
columns (single label or list-like) – Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
level (int or level name, optional) – For MultiIndex, level from which the labels will be removed.
inplace (bool, default False) – If False, return a copy. Otherwise, do operation in place and return None.
errors ({'ignore', 'raise'}, default 'raise') – If ‘ignore’, suppress error and only existing labels are dropped.

Returns:

Returns DataFrame or None DataFrame with the specified index or column labels removed or None if inplace=True.

Return type:

DataFrame or None

Raises:

KeyError – If any of the labels is not found in the selected axis.

See also

DataFrame.loc: Label-location based indexer for selection by label.
DataFrame.dropna: Return DataFrame with labels on given axis omitted where (all or any) data are missing.
DataFrame.drop_duplicates: Return DataFrame with duplicate rows removed, optionally only considering certain columns.
Series.drop: Return Series with specified index labels removed.

Examples

>>> df = pd.DataFrame(np.arange(12).reshape(3, 4),
...                   columns=['A', 'B', 'C', 'D'])
>>> df
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Drop columns

>>> df.drop(['B', 'C'], axis=1)
   A   D
0  0   3
1  4   7
2  8  11

>>> df.drop(columns=['B', 'C'])
   A   D
0  0   3
1  4   7
2  8  11

Drop a row by index

>>> df.drop([0, 1])
   A  B   C   D
2  8  9  10  11

Drop columns and/or rows of MultiIndex DataFrame

>>> midx = pd.MultiIndex(levels=[['llama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = pd.DataFrame(index=midx, columns=['big', 'small'],
...                   data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
...                         [250, 150], [1.5, 0.8], [320, 250],
...                         [1, 0.8], [0.3, 0.2]])
>>> df
                big     small
llama   speed   45.0    30.0
        weight  200.0   100.0
        length  1.5     1.0
cow     speed   30.0    20.0
        weight  250.0   150.0
        length  1.5     0.8
falcon  speed   320.0   250.0
        weight  1.0     0.8
        length  0.3     0.2

Drop a specific index combination from the MultiIndex DataFrame, i.e., drop the combination 'falcon' and 'weight', which deletes only the corresponding row

>>> df.drop(index=('falcon', 'weight'))
                big     small
llama   speed   45.0    30.0
        weight  200.0   100.0
        length  1.5     1.0
cow     speed   30.0    20.0
        weight  250.0   150.0
        length  1.5     0.8
falcon  speed   320.0   250.0
        length  0.3     0.2

>>> df.drop(index='cow', columns='small')
                big
llama   speed   45.0
        weight  200.0
        length  1.5
falcon  speed   320.0
        weight  1.0
        length  0.3

>>> df.drop(index='length', level=1)
                big     small
llama   speed   45.0    30.0
        weight  200.0   100.0
cow     speed   30.0    20.0
        weight  250.0   150.0
falcon  speed   320.0   250.0
        weight  1.0     0.8

drop_duplicates(subset: Hashable | Sequence[Hashable] | None = None, *, keep: DropKeep = 'first', inplace: Literal[True], ignore_index: bool = False) → None

drop_duplicates(subset: Hashable | Sequence[Hashable] | None = None, *, keep: DropKeep = 'first', inplace: Literal[False] = False, ignore_index: bool = False) → DataFrame

drop_duplicates(subset: Hashable | Sequence[Hashable] | None = None, *, keep: DropKeep = 'first', inplace: bool = False, ignore_index: bool = False) → DataFrame | None

Return DataFrame with duplicate rows removed.

Considering certain columns is optional. Indexes, including time indexes are ignored.

Parameters:

subset (column label or sequence of labels, optional) – Only consider certain columns for identifying duplicates, by default use all of the columns.
keep ({‘first’, ‘last’, False}, default ‘first’) –
Determines which duplicates (if any) to keep.
- ’first’ : Drop duplicates except for the first occurrence.
- ’last’ : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.
ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.

Returns:

DataFrame with duplicates removed or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.value_counts: Count unique combinations of columns.

Examples

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns.

>>> df.drop_duplicates()
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

To remove duplicates on specific column(s), use subset.

>>> df.drop_duplicates(subset=['brand'])
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5

To remove duplicates and keep last occurrences, use keep.

>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
    brand style  rating
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
4  Indomie  pack     5.0

dropna(*, axis: Axis = 0, how: AnyAll | lib.NoDefault = <no_default>, thresh: int | lib.NoDefault = <no_default>, subset: IndexLabel = None, inplace: Literal[False] = False, ignore_index: bool = False) → DataFrame

dropna(*, axis: Axis = 0, how: AnyAll | lib.NoDefault = <no_default>, thresh: int | lib.NoDefault = <no_default>, subset: IndexLabel = None, inplace: Literal[True], ignore_index: bool = False) → None

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) –
Determine if rows or columns which contain missing values are removed.
- 0, or ‘index’ : Drop rows which contain missing values.
- 1, or ‘columns’ : Drop columns which contain missing value.
Only a single axis is allowed.
how ({'any', 'all'}, default 'any') –
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
- ’any’ : If any NA values are present, drop that row or column.
- ’all’ : If all values are NA, drop that row or column.
thresh (int, optional) – Require that many non-NA values. Cannot be combined with how.
subset (column label or sequence of labels, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.
ignore_index (bool, default False) –
If True, the resulting axis will be labeled 0, 1, …, n - 1.

Added in version 2.0.0.

Returns:

DataFrame with NA entries dropped from it or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.isna: Indicate missing values.
DataFrame.notna: Indicate existing (non-missing) values.
DataFrame.fillna: Replace missing values.
Series.dropna: Drop missing values.
Index.dropna: Drop missing indices.

Examples

>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Drop the rows where at least one element is missing.

>>> df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all')
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2)
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy'])
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

duplicated(subset: Hashable | Sequence[Hashable] | None = None, keep: DropKeep = 'first') → Series

Return boolean Series denoting duplicate rows.

Considering certain columns is optional.

Parameters:

subset (column label or sequence of labels, optional) – Only consider certain columns for identifying duplicates, by default use all of the columns.
keep ({'first', 'last', False}, default 'first') –
Determines which duplicates (if any) to mark.
- first : Mark duplicates as True except for the first occurrence.
- last : Mark duplicates as True except for the last occurrence.
- False : Mark all duplicates as True.

Returns:

Boolean series for each duplicated rows.

Return type:

See also

Index.duplicated: Equivalent method on index.
Series.duplicated: Equivalent method on Series.
Series.drop_duplicates: Remove duplicate values from Series.
DataFrame.drop_duplicates: Remove duplicate values from DataFrame.

Examples

Consider dataset containing ramen rating.

>>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, for each set of duplicated values, the first occurrence is set on False and all others on True.

>>> df.duplicated()
  False
   True
  False
  False
  False
dtype: bool

By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.

>>> df.duplicated(keep='last')
   True
  False
  False
  False
  False
dtype: bool

By setting keep on False, all duplicates are True.

>>> df.duplicated(keep=False)
   True
   True
  False
  False
  False
dtype: bool

To find duplicates on specific column(s), use subset.

>>> df.duplicated(subset=['brand'])
  False
   True
  False
   True
   True
dtype: bool

eq(other, axis: Axis = 'columns', level=None) → DataFrame

Get Equal to of dataframe and other, element-wise (binary operator eq).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:

other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False

>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150

>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225

>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False

eval(expr: str, *, inplace: Literal[False] = False, **kwargs) → Any

eval(expr: str, *, inplace: Literal[True], **kwargs) → None

Evaluate a string describing operations on DataFrame columns.

Operates on columns only, not specific rows or elements. This allows eval to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.

Parameters:

expr (str) – The expression string to evaluate.
inplace (bool, default False) – If the expression contains an assignment, whether to perform the operation inplace and mutate the existing DataFrame. Otherwise, a new DataFrame is returned.
**kwargs – See the documentation for eval() for complete details on the keyword arguments accepted by query().

Returns:

The result of the evaluation or None if inplace=True.

Return type:

ndarray, scalar, pandas object, or None

See also

DataFrame.query: Evaluates a boolean expression to query the columns of a frame.
DataFrame.assign: Can evaluate an expression or function to create new values for a column.
eval: Evaluate a Python expression as a string using various backends.

Notes

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

Examples

>>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
   A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2
>>> df.eval('A + B')
0    11
1    10
2     9
3     8
4     7
dtype: int64

Assignment is allowed though by default the original DataFrame is not modified.

>>> df.eval('C = A + B')
   A   B   C
1  10  11
2   8  10
3   6   9
4   4   8
5   2   7
>>> df
   A   B
1  10
2   8
3   6
4   4
5   2

Multiple columns can be assigned to using multi-line expressions:

>>> df.eval(
...     '''
... C = A + B
... D = A - B
... '''
... )
   A   B   C  D
0  1  10  11 -9
1  2   8  10 -6
2  3   6   9 -3
3  4   4   8  0
4  5   2   7  3

explode(column: IndexLabel, ignore_index: bool = False) → DataFrame

Transform each element of a list-like to a row, replicating index values.

Parameters:

column (IndexLabel) –
Column(s) to explode. For multiple columns, specify a non-empty list with each element be str or tuple, and all specified columns their list-like data on same row of the frame must have matching length.

Added in version 1.3.0: Multi-column explode
ignore_index (bool, default False) – If True, the resulting index will be labeled 0, 1, …, n - 1.

Returns:

Exploded lists to rows of the subset columns; index will be duplicated for these rows.

Return type:

Raises:

ValueError : –

If columns of the frame are not unique. * If specified columns to explode is empty list. * If specified columns to explode have not matching count of elements rowwise in the frame.

See also

DataFrame.unstack: Pivot a level of the (necessarily hierarchical) index labels.
DataFrame.melt: Unpivot a DataFrame from wide format to long format.
Series.explode: Explode a DataFrame from list-like columns to long format.

Notes

This routine will explode list-likes including lists, tuples, sets, Series, and np.ndarray. The result dtype of the subset rows will be object. Scalars will be returned unchanged, and empty list-likes will result in a np.nan for that row. In addition, the ordering of rows in the output will be non-deterministic when exploding sets.

Reference the user guide for more examples.

Examples

>>> df = pd.DataFrame({'A': [[0, 1, 2], 'foo', [], [3, 4]],
...                    'B': 1,
...                    'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
>>> df
           A  B          C
0  [0, 1, 2]  1  [a, b, c]
1        foo  1        NaN
2         []  1         []
3     [3, 4]  1     [d, e]

Single-column explode.

>>> df.explode('A')
     A  B          C
  0  1  [a, b, c]
  1  1  [a, b, c]
  2  1  [a, b, c]
foo  1        NaN
NaN  1         []
  3  1     [d, e]
  4  1     [d, e]

Multi-column explode.

>>> df.explode(list('AC'))
     A  B    C
  0  1    a
  1  1    b
  2  1    c
foo  1  NaN
NaN  1  NaN
  3  1    d
  4  1    e

floordiv(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

classmethod from_dict(data: dict, orient: FromDictOrient = 'columns', dtype: Dtype | None = None, columns: Axes | None = None) → DataFrame

Construct DataFrame from dict of array-like or dicts.

Creates DataFrame object from dictionary by columns or by index allowing dtype specification.

Parameters:

data (dict) – Of the form {field : array-like} or {field : dict}.
orient ({'columns', 'index', 'tight'}, default 'columns') –
The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’. If ‘tight’, assume a dict with keys [‘index’, ‘columns’, ‘data’, ‘index_names’, ‘column_names’].

Added in version 1.4.0: ‘tight’ as an allowed value for the orient argument
dtype (dtype, default None) – Data type to force after DataFrame construction, otherwise infer.
columns (list, default None) – Column labels to use when orient='index'. Raises a ValueError if used with orient='columns' or orient='tight'.

Return type:

See also

DataFrame.from_records: DataFrame from structured ndarray, sequence of tuples or dicts, or DataFrame.
DataFrame: DataFrame object creation using constructor.
DataFrame.to_dict: Convert the DataFrame to a dictionary.

Examples

By default the keys of the dict become the DataFrame columns:

>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Specify orient='index' to create the DataFrame using dictionary keys as rows:

>>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data, orient='index')
       0  1  2  3
row_1  3  2  1  0
row_2  a  b  c  d

When using the ‘index’ orientation, the column names can be specified manually:

>>> pd.DataFrame.from_dict(data, orient='index',
...                        columns=['A', 'B', 'C', 'D'])
       A  B  C  D
row_1  3  2  1  0
row_2  a  b  c  d

Specify orient='tight' to create the DataFrame using a ‘tight’ format:

>>> data = {'index': [('a', 'b'), ('a', 'c')],
...         'columns': [('x', 1), ('y', 2)],
...         'data': [[1, 3], [2, 4]],
...         'index_names': ['n1', 'n2'],
...         'column_names': ['z1', 'z2']}
>>> pd.DataFrame.from_dict(data, orient='tight')
z1     x  y
z2     1  2
n1 n2
a  b   1  3
   c   2  4

classmethod from_records(data, index=None, exclude=None, columns=None, coerce_float: bool = False, nrows: int | None = None) → DataFrame

Convert structured or record ndarray to DataFrame.

Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.

Parameters:

data (structured ndarray, sequence of tuples or dicts, or DataFrame) –
Structured input data.

Deprecated since version 2.1.0: Passing a DataFrame is deprecated.
index (str, list of fields, array-like) – Field of array to use as the index, alternately a specific set of input labels to use.
exclude (sequence, default None) – Columns or fields to exclude.
columns (sequence, default None) – Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns).
coerce_float (bool, default False) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
nrows (int, default None) – Number of rows to read if data is an iterator.

Return type:

See also

DataFrame.from_dict: DataFrame from dict of array-like or dicts.
DataFrame: DataFrame object creation using constructor.

Examples

Data can be provided as a structured ndarray:

>>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')],
...                 dtype=[('col_1', 'i4'), ('col_2', 'U1')])
>>> pd.DataFrame.from_records(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of dicts:

>>> data = [{'col_1': 3, 'col_2': 'a'},
...         {'col_1': 2, 'col_2': 'b'},
...         {'col_1': 1, 'col_2': 'c'},
...         {'col_1': 0, 'col_2': 'd'}]
>>> pd.DataFrame.from_records(data)
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of tuples with corresponding columns:

>>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
>>> pd.DataFrame.from_records(data, columns=['col_1', 'col_2'])
   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

ge(other, axis: Axis = 'columns', level=None) → DataFrame

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:

other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False

>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150

>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225

>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False

groupby(by=None, axis: Axis | lib.NoDefault = <no_default>, level: IndexLabel | None = None, as_index: bool = True, sort: bool = True, group_keys: bool = True, observed: bool | lib.NoDefault = <no_default>, dropna: bool = True) → DataFrameGroupBy

Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:

by (mapping, function, label, pd.Grouper or list of such) – Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
axis ({0 or 'index', 1 or 'columns'}, default 0) –
Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead.
level (int, level name, or sequence of such, default None) – If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.
as_index (bool, default True) – Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).
sort (bool, default True) –
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.
group_keys (bool, default True) –
When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed Series or DataFrame. Specify group_keys explicitly to include the group keys or not.

Changed in version 2.0.0: group_keys now defaults to True.
observed (bool, default False) –
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.
dropna (bool, default True) – If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:

Returns a groupby object that contains information about the groups.

Return type:

pandas.api.typing.DataFrameGroupBy

See also

resample: Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

Examples

>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
...                   index=index)
>>> df
                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level="Type").mean()
         Max Speed
Type
Captive      210.0
Wild         185.0

We can also choose to include NA in group keys or not by setting dropna parameter, the default setting is True.

>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])

>>> df.groupby(by=["b"]).sum()
    a   c
b
1.0 2   3
2.0 2   5

>>> df.groupby(by=["b"], dropna=False).sum()
    a   c
b
1.0 2   3
2.0 2   5
NaN 1   4

>>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])

>>> df.groupby(by="a").sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0

>>> df.groupby(by="a", dropna=False).sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
NaN 12.3   33.0

When using .apply(), use group_keys to include or exclude the group keys. The group_keys argument defaults to True (include).

>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x)
          Max Speed
Animal
Falcon 0      380.0
       1      370.0
Parrot 2       24.0
       3       26.0

>>> df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x)
   Max Speed
0      380.0
1      370.0
2       24.0
3       26.0

gt(other, axis: Axis = 'columns', level=None) → DataFrame

Get Greater than of dataframe and other, element-wise (binary operator gt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:

other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False

>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150

>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225

>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False

hist(column: IndexLabel | None = None, by=None, grid: bool = True, xlabelsize: int | None = None, xrot: float | None = None, ylabelsize: int | None = None, yrot: float | None = None, ax=None, sharex: bool = False, sharey: bool = False, figsize: tuple[int, int] | None = None, layout: tuple[int, int] | None = None, bins: int | Sequence[int] = 10, backend: str | None = None, legend: bool = False, **kwargs)

Make a histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data. This function calls matplotlib.pyplot.hist(), on each series in the DataFrame, resulting in one histogram per column.

Parameters:

data (DataFrame) – The pandas object holding the data.
column (str or sequence, optional) – If passed, will be used to limit data to a subset of columns.
by (object, optional) – If passed, then used to form histograms for separate groups.
grid (bool, default True) – Whether to show axis grid lines.
xlabelsize (int, default None) – If specified changes the x-axis label size.
xrot (float, default None) – Rotation of x axis labels. For example, a value of 90 displays the x labels rotated 90 degrees clockwise.
ylabelsize (int, default None) – If specified changes the y-axis label size.
yrot (float, default None) – Rotation of y axis labels. For example, a value of 90 displays the y labels rotated 90 degrees clockwise.
ax (Matplotlib axes object, default None) – The axes to plot the histogram on.
sharex (bool, default True if ax is None else False) – In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in. Note that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure.
sharey (bool, default False) – In case subplots=True, share y axis and set some y axis labels to invisible.
figsize (tuple, optional) – The size in inches of the figure to create. Uses the value in matplotlib.rcParams by default.
layout (tuple, optional) – Tuple of (rows, columns) for the layout of the histograms.
bins (int or sequence, default 10) – Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. In this case, bins is returned unmodified.
backend (str, default None) – Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.
legend (bool, default False) – Whether to show the legend.
**kwargs – All other plotting keyword arguments to be passed to matplotlib.pyplot.hist().

Return type:

matplotlib.AxesSubplot or numpy.ndarray of them

See also

matplotlib.pyplot.hist: Plot a histogram using matplotlib.

Examples

This example draws a histogram based on the length and width of some animals, displayed in three bins

idxmax(axis: Axis = 0, skipna: bool = True, numeric_only: bool = False) → Series

Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
numeric_only (bool, default False) –
Include only float, int or boolean data.

Added in version 1.5.0.

Returns:

Indexes of maxima along the specified axis.

Return type:

Raises:

ValueError –

If the row/column is empty

See also

Series.idxmax: Return index of the maximum element.

Notes

This method is the DataFrame version of ndarray.argmax.

Examples

Consider a dataset containing food consumption in Argentina.

>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                     'co2_emissions': [37.2, 19.66, 1712]},
...                   index=['Pork', 'Wheat Products', 'Beef'])

>>> df
                consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the maximum value in each column.

>>> df.idxmax()
consumption     Wheat Products
co2_emissions             Beef
dtype: object

To return the index for the maximum value in each row, use axis="columns".

>>> df.idxmax(axis="columns")
Pork              co2_emissions
Wheat Products     consumption
Beef              co2_emissions
dtype: object

idxmin(axis: Axis = 0, skipna: bool = True, numeric_only: bool = False) → Series

Return index of first occurrence of minimum over requested axis.

NA/null values are excluded.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
numeric_only (bool, default False) –
Include only float, int or boolean data.

Added in version 1.5.0.

Returns:

Indexes of minima along the specified axis.

Return type:

Raises:

ValueError –

If the row/column is empty

See also

Series.idxmin: Return index of the minimum element.

Notes

This method is the DataFrame version of ndarray.argmin.

Examples

Consider a dataset containing food consumption in Argentina.

>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                     'co2_emissions': [37.2, 19.66, 1712]},
...                   index=['Pork', 'Wheat Products', 'Beef'])

>>> df
                consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the minimum value in each column.

>>> df.idxmin()
consumption                Pork
co2_emissions    Wheat Products
dtype: object

To return the index for the minimum value in each row, use axis="columns".

>>> df.idxmin(axis="columns")
Pork                consumption
Wheat Products    co2_emissions
Beef                consumption
dtype: object

index

The index (row labels) of the DataFrame.

The index of a DataFrame is a series of labels that identify each row. The labels can be integers, strings, or any other hashable type. The index is used for label-based access and alignment, and can be accessed or modified using this attribute.

Returns:: The index labels of the DataFrame.
Return type:: pandas.Index

See also

DataFrame.columns: The column labels of the DataFrame.
DataFrame.to_numpy: Convert the DataFrame to a NumPy array.

Examples

>>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
...                    'Age': [25, 30, 35],
...                    'Location': ['Seattle', 'New York', 'Kona']},
...                   index=([10, 20, 30]))
>>> df.index
Index([10, 20, 30], dtype='int64')

In this example, we create a DataFrame with 3 rows and 3 columns, including Name, Age, and Location information. We set the index labels to be the integers 10, 20, and 30. We then access the index attribute of the DataFrame, which returns an Index object containing the index labels.

>>> df.index = [100, 200, 300]
>>> df
    Name  Age Location
100  Alice   25  Seattle
200    Bob   30 New York
300  Aritra  35    Kona

In this example, we modify the index labels of the DataFrame by assigning a new list of labels to the index attribute. The DataFrame is then updated with the new labels, and the output shows the modified DataFrame.

Print a concise summary of a DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.

Parameters:

verbose (bool, optional) – Whether to print the full summary. By default, the setting in pandas.options.display.max_info_columns is followed.
buf (writable buffer, defaults to sys.stdout) – Where to send the output. By default, the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output.
max_cols (int, optional) – When to switch from the verbose to the truncated output. If the DataFrame has more than max_cols columns, the truncated output is used. By default, the setting in pandas.options.display.max_info_columns is used.
memory_usage (bool, str, optional) –
Specifies whether total memory usage of the DataFrame elements (including the index) should be displayed. By default, this follows the pandas.options.display.memory_usage setting.

True always show memory usage. False never shows memory usage. A value of ‘deep’ is equivalent to “True with deep introspection”. Memory usage is shown in human-readable units (base-2 representation). Without deep introspection a memory estimation is made based in column dtype and number of rows assuming values consume the same memory amount for corresponding dtypes. With deep memory introspection, a real memory usage calculation is performed at the cost of computational resources. See the Frequently Asked Questions for more details.
show_counts (bool, optional) – Whether to show the non-null counts. By default, this is shown only if the DataFrame is smaller than pandas.options.display.max_info_rows and pandas.options.display.max_info_columns. A value of True always shows the counts, and False never shows the counts.

Returns:

This method prints a summary of a DataFrame and returns None.

Return type:

None

See also

DataFrame.describe: Generate descriptive statistics of DataFrame columns.
DataFrame.memory_usage: Memory usage of DataFrame columns.

Examples

>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values,
...                   "float_col": float_values})
>>> df
    int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

Prints information of all columns:

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   int_col    5 non-null      int64
 1   text_col   5 non-null      object
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

Prints a summary of columns count and its dtypes but not per column information:

>>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
...           encoding="utf-8") as f:  
...     f.write(s)
260

The memory_usage parameter allows deep introspection mode, specially useful for big DataFrames and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> df = pd.DataFrame({
...     'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
... })
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  object
 1   column_2  1000000 non-null  object
 2   column_3  1000000 non-null  object
dtypes: object(3)
memory usage: 22.9+ MB

>>> df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  object
 1   column_2  1000000 non-null  object
 2   column_3  1000000 non-null  object
dtypes: object(3)
memory usage: 165.9 MB

insert(loc: int, column: Hashable, value: Scalar | AnyArrayLike, allow_duplicates: bool | lib.NoDefault = <no_default>) → None

Insert column into DataFrame at specified location.

Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.

Parameters:

loc (int) – Insertion index. Must verify 0 <= loc <= len(columns).
column (str, number, or hashable object) – Label of the inserted column.
value (Scalar, Series, or array-like) – Content of the inserted column.
allow_duplicates (bool, optional, default lib.no_default) – Allow duplicate column labels to be created.

See also

Index.insert: Insert new item by index.

Examples

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
   col1  col2
0     1     3
1     2     4
>>> df.insert(1, "newcol", [99, 99])
>>> df
   col1  newcol  col2
0     1      99     3
1     2      99     4
>>> df.insert(0, "col1", [100, 100], allow_duplicates=True)
>>> df
   col1  col1  newcol  col2
0   100     1      99     3
1   100     2      99     4

Notice that pandas uses index alignment in case of value from type Series:

>>> df.insert(0, "col0", pd.Series([5, 6], index=[1, 2]))
>>> df
   col0  col1  col1  newcol  col2
0   NaN   100     1      99     3
1   5.0   100     2      99     4

isetitem(loc, value) → None

Set the given value in the column with position loc.

This is a positional analogue to __setitem__.

Parameters:

loc (int or sequence of ints) – Index position for the column.
value (scalar or arraylike) – Value(s) for the column.

Notes

frame.isetitem(loc, value) is an in-place method as it will modify the DataFrame in place (not returning a new object). In contrast to frame.iloc[:, i] = value which will try to update the existing values in place, frame.isetitem(loc, value) will not update the values of the column itself in place, it will instead insert a new array.

In cases where frame.columns is unique, this is equivalent to frame[frame.columns[i]] = value.

isin(values: Series | DataFrame | Sequence | Mapping) → DataFrame

Whether each element in the DataFrame is contained in values.

Parameters:: values (iterable, Series, DataFrame or dict) – The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.
Returns:: DataFrame of booleans showing whether each element in the DataFrame is contained in values.
Return type:: DataFrame

See also

DataFrame.eq: Equality test for DataFrame.
Series.isin: Equivalent method on Series.
Series.str.contains: Test if pattern or regex is contained within a string of a Series or Index.

Examples

>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
...                   index=['falcon', 'dog'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0

When values is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)

>>> df.isin([0, 2])
        num_legs  num_wings
falcon      True       True
dog        False       True

To check if values is not in the DataFrame, use the ~ operator:

>>> ~df.isin([0, 2])
        num_legs  num_wings
falcon     False      False
dog         True      False

When values is a dict, we can pass values to check for each column separately:

>>> df.isin({'num_wings': [0, 3]})
        num_legs  num_wings
falcon     False      False
dog        False       True

When values is a Series or DataFrame the index and column must match. Note that ‘falcon’ does not match based on the number of legs in other.

>>> other = pd.DataFrame({'num_legs': [8, 3], 'num_wings': [0, 2]},
...                      index=['spider', 'falcon'])
>>> df.isin(other)
        num_legs  num_wings
falcon     False       True
dog        False      False

isna() → DataFrame

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:: Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
Return type:: DataFrame

See also

DataFrame.isnull: Alias of isna.
DataFrame.notna: Boolean inverse of isna.
DataFrame.dropna: Omit axes labels with missing values.
isna: Top-level isna.

Examples

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

>>> ser.isna()
0    False
1    False
2     True
dtype: bool

isnull() → DataFrame

DataFrame.isnull is an alias for DataFrame.isna.

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:: Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
Return type:: DataFrame

See also

DataFrame.isnull: Alias of isna.
DataFrame.notna: Boolean inverse of isna.
DataFrame.dropna: Omit axes labels with missing values.
isna: Top-level isna.

Examples

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

>>> ser.isna()
0    False
1    False
2     True
dtype: bool

items() → Iterable[tuple[Hashable, Series]]

Iterate over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

Yields:

label (object) – The column names for the DataFrame being iterated over.
content (Series) – The column entries belonging to each label, as a Series.

See also

DataFrame.iterrows: Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.itertuples: Iterate over DataFrame rows as namedtuples of the values.

Examples

>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                   'population': [1864, 22000, 80000]},
...                   index=['panda', 'polar', 'koala'])
>>> df
        species   population
panda   bear      1864
polar   bear      22000
koala   marsupial 80000
>>> for label, content in df.items():
...     print(f'label: {label}')
...     print(f'content: {content}', sep='\n')
...
label: species
content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
label: population
content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: int64

iterrows() → Iterable[tuple[Hashable, Series]]

Iterate over DataFrame rows as (index, Series) pairs.

Yields:

index (label or tuple of label) – The index of the row. A tuple for a MultiIndex.
data (Series) – The data of the row as a Series.

See also

DataFrame.itertuples: Iterate over DataFrame rows as namedtuples of the values.
DataFrame.items: Iterate over (column name, Series) pairs.

Notes

Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames).

To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally faster than iterrows.
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

Examples

>>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
>>> row = next(df.iterrows())[1]
>>> row
int      1.0
float    1.5
Name: 0, dtype: float64
>>> print(row['int'].dtype)
float64
>>> print(df['int'].dtype)
int64

itertuples(index: bool = True, name: str | None = 'Pandas') → Iterable[tuple[Any, ...]]

Iterate over DataFrame rows as namedtuples.

Parameters:

index (bool, default True) – If True, return the index as the first element of the tuple.
name (str or None, default "Pandas") – The name of the returned namedtuples or None to return regular tuples.

Returns:

An object to iterate over namedtuples for each row in the DataFrame with the first field possibly being the index and following fields being the column values.

Return type:

iterator

See also

DataFrame.iterrows: Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.items: Iterate over (column name, Series) pairs.

Notes

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore.

Examples

>>> df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
      num_legs  num_wings
dog          4          0
hawk         2          2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)

By setting the index parameter to False we can remove the index as the first element of the tuple:

>>> for row in df.itertuples(index=False):
...     print(row)
...
Pandas(num_legs=4, num_wings=0)
Pandas(num_legs=2, num_wings=2)

With the name parameter set we set a custom name for the yielded namedtuples:

>>> for row in df.itertuples(name='Animal'):
...     print(row)
...
Animal(Index='dog', num_legs=4, num_wings=0)
Animal(Index='hawk', num_legs=2, num_wings=2)

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

Parameters:

other (DataFrame, Series, or a list containing any combination of them) – Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.
on (str, list of str, or array-like, optional) – Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.
how ({'left', 'right', 'outer', 'inner', 'cross'}, default 'left') –
How to handle the operation of the two objects.
- left: use calling frame’s index (or column if on is specified)
- right: use other’s index.
- outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it lexicographically.
- inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.
- cross: creates the cartesian product from both frames, preserves the order of the left keys.
lsuffix (str, default '') – Suffix to use from left frame’s overlapping columns.
rsuffix (str, default '') – Suffix to use from right frame’s overlapping columns.
sort (bool, default False) – Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).
validate (str, optional) –
If specified, checks if join is of specified type.
- ”one_to_one” or “1:1”: check if join keys are unique in both left and right datasets.
- ”one_to_many” or “1:m”: check if join keys are unique in left dataset.
- ”many_to_one” or “m:1”: check if join keys are unique in right dataset.
- ”many_to_many” or “m:m”: allowed, but does not result in checks.
Added in version 1.5.0.

Returns:

A dataframe containing columns from both the caller and other.

Return type:

See also

DataFrame.merge: For column(s)-on-column(s) operations.

Notes

Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataFrame objects.

Examples

>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
...                    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})

>>> df
  key   A
K0  A0
K1  A1
K2  A2
K3  A3
K4  A4
K5  A5

>>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
...                       'B': ['B0', 'B1', 'B2']})

>>> other
  key   B
0  K0  B0
1  K1  B1
2  K2  B2

Join DataFrames using their indexes.

>>> df.join(other, lsuffix='_caller', rsuffix='_other')
  key_caller   A key_other    B
       K0  A0        K0   B0
       K1  A1        K1   B1
       K2  A2        K2   B2
       K3  A3       NaN  NaN
       K4  A4       NaN  NaN
       K5  A5       NaN  NaN

If we want to join using the key columns, we need to set key to be the index in both df and other. The joined DataFrame will have key as its index.

>>> df.set_index('key').join(other.set_index('key'))
      A    B
key
K0   A0   B0
K1   A1   B1
K2   A2   B2
K3   A3  NaN
K4   A4  NaN
K5   A5  NaN

Another option to join using the key columns is to use the on parameter. DataFrame.join always uses other’s index but we can use any column in df. This method preserves the original DataFrame’s index in the result.

>>> df.join(other.set_index('key'), on='key')
  key   A    B
K0  A0   B0
K1  A1   B1
K2  A2   B2
K3  A3  NaN
K4  A4  NaN
K5  A5  NaN

Using non-unique key values shows how they are matched.

>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K1', 'K3', 'K0', 'K1'],
...                    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})

>>> df
  key   A
K0  A0
K1  A1
K1  A2
K3  A3
K0  A4
K1  A5

>>> df.join(other.set_index('key'), on='key', validate='m:1')
  key   A    B
K0  A0   B0
K1  A1   B1
K1  A2   B1
K3  A3  NaN
K0  A4   B0
K1  A5   B1

kurt(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

Return type:

Series or scalar

kurtosis(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

Return type:

Series or scalar

le(other, axis: Axis = 'columns', level=None) → DataFrame

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:

other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False

>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150

>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225

>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False

lt(other, axis: Axis = 'columns', level=None) → DataFrame

Get Less than of dataframe and other, element-wise (binary operator lt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:

other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False

>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150

>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225

>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False

map(func: PythonFuncType, na_action: str | None = None, **kwargs) → DataFrame

Apply a function to a Dataframe elementwise.

Added in version 2.1.0: DataFrame.applymap was deprecated and renamed to DataFrame.map.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters:

func (callable) – Python function, returns a single value from a single value.
na_action ({None, 'ignore'}, default None) – If ‘ignore’, propagate NaN values, without passing them to func.
**kwargs – Additional keyword arguments to pass as keywords arguments to func.

Returns:

Transformed DataFrame.

Return type:

See also

DataFrame.apply: Apply a function along input axis of DataFrame.
DataFrame.replace: Replace values given in to_replace with value.
Series.map: Apply a function elementwise on a Series.

Examples

>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
       0      1
0  1.000  2.120
1  3.356  4.567

>>> df.map(lambda x: len(str(x)))
1
3  4
5  5

Like Series.map, NA values can be ignored:

>>> df_copy = df.copy()
>>> df_copy.iloc[0, 0] = pd.NA
>>> df_copy.map(lambda x: len(str(x)), na_action='ignore')
     0  1
0  NaN  4
1  5.0  5

It is also possible to use map with functions that are not lambda functions:

>>> df.map(round, ndigits=1)
  1
1.0  2.1
3.4  4.6

Note that a vectorized version of func often exists, which will be much faster. You could square each number elementwise.

>>> df.map(lambda x: x**2)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

But it’s better to avoid map in that case.

>>> df ** 2
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

max(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.max()
8

mean(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return the mean of the values over the requested axis.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.mean()
2.0

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.mean()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.mean(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.mean(numeric_only=True)
a   1.5
dtype: float64

Return type:

Series or scalar

median(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return the median of the values over the requested axis.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.median()
2.0

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.median()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.median(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.median(numeric_only=True)
a   1.5
dtype: float64

Return type:

Series or scalar

melt(id_vars=None, value_vars=None, var_name=None, value_name: Hashable = 'value', col_level: Level | None = None, ignore_index: bool = True) → DataFrame

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

Parameters:

id_vars (scalar, tuple, list, or ndarray, optional) – Column(s) to use as identifier variables.
value_vars (scalar, tuple, list, or ndarray, optional) – Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
var_name (scalar, default None) – Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.
value_name (scalar, default 'value') – Name to use for the ‘value’ column, can’t be an existing column label.
col_level (scalar, optional) – If columns are a MultiIndex then use this level to melt.
ignore_index (bool, default True) – If True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.

Returns:

Unpivoted DataFrame.

Return type:

See also

melt: Identical method.
pivot_table: Create a spreadsheet-style pivot table as a DataFrame.
DataFrame.pivot: Return reshaped DataFrame organized by given index / column values.
DataFrame.explode: Explode a DataFrame from list-like columns to long format.

Notes

Reference the user guide for more examples.

Examples

>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
...                    'B': {0: 1, 1: 3, 2: 5},
...                    'C': {0: 2, 1: 4, 2: 6}})
>>> df
   A  B  C
0  a  1  2
1  b  3  4
2  c  5  6

>>> df.melt(id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5

>>> df.melt(id_vars=['A'], value_vars=['B', 'C'])
   A variable  value
a        B      1
b        B      3
c        B      5
a        C      2
b        C      4
c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

>>> df.melt(id_vars=['A'], value_vars=['B'],
...         var_name='myVarname', value_name='myValname')
   A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5

Original index values can be kept around:

>>> df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)
   A variable  value
a        B      1
b        B      3
c        B      5
a        C      2
b        C      4
c        C      6

If you have multi-index columns:

>>> df.columns = [list('ABC'), list('DEF')]
>>> df
   A  B  C
   D  E  F
0  a  1  2
1  b  3  4
2  c  5  6

>>> df.melt(col_level=0, id_vars=['A'], value_vars=['B'])
   A variable  value
0  a        B      1
1  b        B      3
2  c        B      5

>>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])
  (A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5

memory_usage(index: bool = True, deep: bool = False) → Series

Return the memory usage of each column in bytes.

The memory usage can optionally include the contribution of the index and elements of object dtype.

This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.

Parameters:

index (bool, default True) – Specifies whether to include the memory usage of the DataFrame’s index in returned Series. If index=True, the memory usage of the index is the first item in the output.
deep (bool, default False) – If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned values.

Returns:

A Series whose index is the original column names and whose values is the memory usage of each column in bytes.

Return type:

See also

numpy.ndarray.nbytes: Total bytes consumed by the elements of an ndarray.
Series.memory_usage: Bytes consumed by a Series.
Categorical: Memory-efficient array for string values with many repeated values.
DataFrame.info: Concise summary of a DataFrame.

Notes

See the Frequently Asked Questions for more details.

Examples

>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t))
...              for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
   int64  float64            complex128  object  bool
0      1      1.0              1.0+0.0j       1  True
1      1      1.0              1.0+0.0j       1  True
2      1      1.0              1.0+0.0j       1  True
3      1      1.0              1.0+0.0j       1  True
4      1      1.0              1.0+0.0j       1  True

>>> df.memory_usage()
Index           128
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

>>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

>>> df.memory_usage(deep=True)
Index            128
int64          40000
float64        40000
complex128     80000
object        180000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

>>> df['object'].astype('category').memory_usage(deep=True)
5244

Merge DataFrame or named Series objects with a database-style join.

A named Series object is treated as a DataFrame with a single named column.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Warning

If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.

Parameters:

right (DataFrame or named Series) – Object to merge with.
how ({'left', 'right', 'outer', 'inner', 'cross'}, default 'inner') –
Type of merge to be performed.
- left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
- right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
- outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
- inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
- cross: creates the cartesian product from both frames, preserves the order of the left keys.
on (label or list) – Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
left_on (label or list, or array-like) – Column or index level names to join on in the left DataFrame. Can also be an array or list of arrays of the length of the left DataFrame. These arrays are treated as if they are columns.
right_on (label or list, or array-like) – Column or index level names to join on in the right DataFrame. Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.
left_index (bool, default False) – Use the index from the left DataFrame as the join key(s). If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels.
right_index (bool, default False) – Use the index from the right DataFrame as the join key. Same caveats as left_index.
sort (bool, default False) – Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword).
suffixes (list-like, default is ("_x", "_y")) – A length-2 sequence where each element is optionally a string indicating the suffix to add to overlapping column names in left and right respectively. Pass a value of None instead of a string to indicate that the column name from left or right should be left as-is, with no suffix. At least one of the values must not be None.
copy (bool, default True) –
If False, avoid copy if possible.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True
indicator (bool or str, default False) – If True, adds a column to the output DataFrame called “_merge” with information on the source of each row. The column can be given a different name by providing a string argument. The column will have a Categorical type with the value of “left_only” for observations whose merge key only appears in the left DataFrame, “right_only” for observations whose merge key only appears in the right DataFrame, and “both” if the observation’s merge key is found in both DataFrames.
validate (str, optional) –
If specified, checks if merge is of specified type.
- ”one_to_one” or “1:1”: check if merge keys are unique in both left and right datasets.
- ”one_to_many” or “1:m”: check if merge keys are unique in left dataset.
- ”many_to_one” or “m:1”: check if merge keys are unique in right dataset.
- ”many_to_many” or “m:m”: allowed, but does not result in checks.

Returns:

A DataFrame of the two merged objects.

Return type:

See also

merge_ordered: Merge with optional filling/interpolation.
merge_asof: Merge on nearest keys.
DataFrame.join: Similar method using indices.

Examples

>>> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]})
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]})
>>> df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
>>> df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

>>> df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
foo        1  foo        5
foo        1  foo        8
bar        2  bar        6
baz        3  baz        7
foo        5  foo        5
foo        5  foo        8

Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey',
...           suffixes=('_left', '_right'))
  lkey  value_left rkey  value_right
0  foo           1  foo            5
1  foo           1  foo            8
2  bar           2  bar            6
3  baz           3  baz            7
4  foo           5  foo            5
5  foo           5  foo            8

Merge DataFrames df1 and df2, but raise an exception if the DataFrames have any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
    Index(['value'], dtype='object')

>>> df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df1
      a  b
0   foo  1
1   bar  2
>>> df2
      a  c
0   foo  3
1   baz  4

>>> df1.merge(df2, how='inner', on='a')
      a  b  c
0   foo  1  3

>>> df1.merge(df2, how='left', on='a')
      a  b  c
0   foo  1  3.0
1   bar  2  NaN

>>> df1 = pd.DataFrame({'left': ['foo', 'bar']})
>>> df2 = pd.DataFrame({'right': [7, 8]})
>>> df1
    left
0   foo
1   bar
>>> df2
    right
0   7
1   8

>>> df1.merge(df2, how='cross')
   left  right
0   foo      7
1   foo      8
2   bar      7
3   bar      8

min(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.min()
0

mod(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Modulo of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

mode(axis: Axis = 0, numeric_only: bool = False, dropna: bool = True) → DataFrame

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) –
The axis to iterate over while searching for the mode:
- 0 or ‘index’ : get mode of each column
- 1 or ‘columns’ : get mode of each row.
numeric_only (bool, default False) – If True, only apply to numeric columns.
dropna (bool, default True) – Don’t consider counts of NaN/NaT.

Returns:

The modes of each column or row.

Return type:

See also

Series.mode: Return the highest frequency value in a Series.
Series.value_counts: Return the counts of values in a Series.

Examples

>>> df = pd.DataFrame([('bird', 2, 2),
...                    ('mammal', 4, np.nan),
...                    ('arthropod', 8, 0),
...                    ('bird', 2, np.nan)],
...                   index=('falcon', 'horse', 'spider', 'ostrich'),
...                   columns=('species', 'legs', 'wings'))
>>> df
           species  legs  wings
falcon        bird     2    2.0
horse       mammal     4    NaN
spider   arthropod     8    0.0
ostrich       bird     2    NaN

By default, missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN.

>>> df.mode()
  species  legs  wings
0    bird   2.0    0.0
1     NaN   NaN    2.0

Setting dropna=False NaN values are considered and they can be the mode (like for wings).

>>> df.mode(dropna=False)
  species  legs  wings
0    bird     2    NaN

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
   legs  wings
0   2.0    0.0
1   NaN    2.0

To compute the mode over columns and not rows, use the axis parameter:

>>> df.mode(axis='columns', numeric_only=True)
           0    1
falcon   2.0  NaN
horse    4.0  NaN
spider   0.0  8.0
ostrich  2.0  NaN

mul(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

multiply(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

ne(other, axis: Axis = 'columns', level=None) → DataFrame

Get Not equal to of dataframe and other, element-wise (binary operator ne).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters:

other (scalar, sequence, Series, or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}, default 'columns') – Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns:

Result of the comparison.

Return type:

DataFrame of bool

See also

DataFrame.eq: Compare DataFrames for equality elementwise.
DataFrame.ne: Compare DataFrames for inequality elementwise.
DataFrame.le: Compare DataFrames for less than inequality or equality elementwise.
DataFrame.lt: Compare DataFrames for strictly less than inequality elementwise.
DataFrame.ge: Compare DataFrames for greater than inequality or equality elementwise.
DataFrame.gt: Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

>>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
   cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

>>> df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False

>>> df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

>>> df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

>>> df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

>>> df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
   revenue
A      300
B      250
C      100
D      150

>>> df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225

>>> df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False

nlargest(n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = 'first') → DataFrame

Return the first n rows ordered by columns in descending order.

Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=False).head(n), but more performant.

Parameters:

n (int) – Number of rows to return.
columns (label or list of labels) – Column label(s) to order by.
keep ({'first', 'last', 'all'}, default 'first') –
Where there are duplicate values:
- first : prioritize the first occurrence(s)
- last : prioritize the last occurrence(s)
- all : keep all the ties of the smallest item even if it means selecting more than n items.

Returns:

The first n rows ordered by the given columns in descending order.

Return type:

See also

DataFrame.nsmallest: Return the first n rows ordered by columns in ascending order.
DataFrame.sort_values: Sort DataFrame by the values.
DataFrame.head: Return the first n rows without re-ordering.

Notes

This function cannot be used with all column types. For example, when specifying columns with object or category dtypes, TypeError is raised.

Examples

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 11300,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru          11300      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nlargest to select the three rows having the largest values in column “population”.

>>> df.nlargest(3, 'population')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Malta       434000    12011      MT

When using keep='last', ties are resolved in reverse order:

>>> df.nlargest(3, 'population', keep='last')
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN

When using keep='all', the number of element kept can go beyond n if there are duplicate values for the smallest element, all the ties are kept:

>>> df.nlargest(3, 'population', keep='all')
          population      GDP alpha-2
France      65000000  2583560      FR
Italy       59000000  1937894      IT
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN

However, nlargest does not keep n distinct largest elements:

>>> df.nlargest(5, 'population', keep='all')
          population      GDP alpha-2
France      65000000  2583560      FR
Italy       59000000  1937894      IT
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN

To order by the largest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

>>> df.nlargest(3, ['population', 'GDP'])
        population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN

notna() → DataFrame

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns:: Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.
Return type:: DataFrame

See also

DataFrame.notnull: Alias of notna.
DataFrame.isna: Boolean inverse of notna.
DataFrame.dropna: Omit axes labels with missing values.
notna: Top-level notna.

Examples

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

>>> ser.notna()
0     True
1     True
2    False
dtype: bool

notnull() → DataFrame

DataFrame.notnull is an alias for DataFrame.notna.

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns:: Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.
Return type:: DataFrame

See also

DataFrame.notnull: Alias of notna.
DataFrame.isna: Boolean inverse of notna.
DataFrame.dropna: Omit axes labels with missing values.
notna: Top-level notna.

Examples

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

>>> ser.notna()
0     True
1     True
2    False
dtype: bool

nsmallest(n: int, columns: IndexLabel, keep: NsmallestNlargestKeep = 'first') → DataFrame

Return the first n rows ordered by columns in ascending order.

Return the first n rows with the smallest values in columns, in ascending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=True).head(n), but more performant.

Parameters:

n (int) – Number of items to retrieve.
columns (list or str) – Column name or names to order by.
keep ({'first', 'last', 'all'}, default 'first') –
Where there are duplicate values:
- first : take the first occurrence.
- last : take the last occurrence.
- all : keep all the ties of the largest item even if it means selecting more than n items.

Return type:

See also

DataFrame.nlargest: Return the first n rows ordered by columns in descending order.
DataFrame.sort_values: Sort DataFrame by the values.
DataFrame.head: Return the first n rows without re-ordering.

Examples

>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 337000,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
          population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru         337000      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nsmallest to select the three rows having the smallest values in column “population”.

>>> df.nsmallest(3, 'population')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS

When using keep='last', ties are resolved in reverse order:

>>> df.nsmallest(3, 'population', keep='last')
          population  GDP alpha-2
Anguilla       11300  311      AI
Tuvalu         11300   38      TV
Nauru         337000  182      NR

When using keep='all', the number of element kept can go beyond n if there are duplicate values for the largest element, all the ties are kept.

>>> df.nsmallest(3, 'population', keep='all')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Nauru         337000    182      NR

However, nsmallest does not keep n distinct smallest elements:

>>> df.nsmallest(4, 'population', keep='all')
          population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Nauru         337000    182      NR

To order by the smallest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

>>> df.nsmallest(3, ['population', 'GDP'])
          population  GDP alpha-2
Tuvalu         11300   38      TV
Anguilla       11300  311      AI
Nauru         337000  182      NR

nunique(axis: Axis = 0, dropna: bool = True) → Series

Count number of distinct elements in specified axis.

Return Series with number of distinct elements. Can ignore NaN values.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
dropna (bool, default True) – Don’t include NaN in the counts.

Return type:

See also

Series.nunique: Method nunique for Series.
DataFrame.count: Count non-NA cells for each column or row.

Examples

>>> df = pd.DataFrame({'A': [4, 5, 6], 'B': [4, 1, 1]})
>>> df.nunique()
A    3
B    2
dtype: int64

>>> df.nunique(axis=1)
0    1
1    2
2    2
dtype: int64

pivot(*, columns, index=<no_default>, values=<no_default>) → DataFrame

Return reshaped DataFrame organized by given index / column values.

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. See the User Guide for more on reshaping.

Parameters:

columns (str or object or a list of str) – Column to use to make new frame’s columns.
index (str or object or a list of str, optional) – Column to use to make new frame’s index. If not given, uses existing index.
values (str, object or a list of the previous, optional) – Column(s) to use for populating new frame’s values. If not specified, all remaining columns will be used and the result will have hierarchically indexed columns.

Returns:

Returns reshaped DataFrame.

Return type:

Raises:

ValueError: – When there are any index, columns combinations with multiple values. DataFrame.pivot_table when you need to aggregate.

See also

DataFrame.pivot_table: Generalization of pivot that can handle duplicate values for one index/column pair.
DataFrame.unstack: Pivot based on the index values instead of a column.
wide_to_long: Wide panel to long format. Less flexible but more user-friendly than melt.

Notes

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods.

Reference the user guide for more examples.

Examples

>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
...                            'two'],
...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
...                    'baz': [1, 2, 3, 4, 5, 6],
...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t

>>> df.pivot(index='foo', columns='bar', values='baz')
bar  A   B   C
foo
one  1   2   3
two  4   5   6

>>> df.pivot(index='foo', columns='bar')['baz']
bar  A   B   C
foo
one  1   2   3
two  4   5   6

>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
      baz       zoo
bar   A  B  C   A  B  C
foo
one   1  2  3   x  y  z
two   4  5  6   q  w  t

You could also assign a list of column names or a list of index names.

>>> df = pd.DataFrame({
...        "lev1": [1, 1, 1, 2, 2, 2],
...        "lev2": [1, 1, 2, 1, 1, 2],
...        "lev3": [1, 2, 1, 2, 1, 2],
...        "lev4": [1, 2, 3, 4, 5, 6],
...        "values": [0, 1, 2, 3, 4, 5]})
>>> df
    lev1 lev2 lev3 lev4 values
0   1    1    1    1    0
1   1    1    2    2    1
2   1    2    1    3    2
3   2    1    2    4    3
4   2    1    1    5    4
5   2    2    2    6    5

>>> df.pivot(index="lev1", columns=["lev2", "lev3"], values="values")
lev2    1         2
lev3    1    2    1    2
lev1
1     0.0  1.0  2.0  NaN
2     4.0  3.0  NaN  5.0

>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"], values="values")
      lev3    1    2
lev1  lev2
   1     1  0.0  1.0
         2  2.0  NaN
   2     1  4.0  3.0
         2  NaN  5.0

A ValueError is raised if there are any duplicates.

>>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
...                    "bar": ['A', 'A', 'B', 'C'],
...                    "baz": [1, 2, 3, 4]})
>>> df
   foo bar  baz
0  one   A    1
1  one   A    2
2  two   B    3
3  two   C    4

Notice that the first two rows are the same for our index and columns arguments.

>>> df.pivot(index='foo', columns='bar', values='baz')
Traceback (most recent call last):
   ...
ValueError: Index contains duplicate entries, cannot reshape

pivot_table(values=None, index=None, columns=None, aggfunc: AggFuncType = 'mean', fill_value=None, margins: bool = False, dropna: bool = True, margins_name: Level = 'All', observed: bool | lib.NoDefault = <no_default>, sort: bool = True) → DataFrame

Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

Parameters:

values (list-like or scalar, optional) – Column or columns to aggregate.
index (column, Grouper, array, or list of the previous) – Keys to group by on the pivot table index. If a list is passed, it can contain any of the other types (except list). If an array is passed, it must be the same length as the data and will be used in the same manner as column values.
columns (column, Grouper, array, or list of the previous) – Keys to group by on the pivot table column. If a list is passed, it can contain any of the other types (except list). If an array is passed, it must be the same length as the data and will be used in the same manner as column values.
aggfunc (function, list of functions, dict, default "mean") – If a list of functions is passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves). If a dict is passed, the key is column to aggregate and the value is function or list of functions. If margin=True, aggfunc will be used to calculate the partial aggregates.
fill_value (scalar, default None) – Value to replace missing values with (in the resulting pivot table, after aggregation).
margins (bool, default False) – If margins=True, special All columns and rows will be added with partial group aggregates across the categories on the rows and columns.
dropna (bool, default True) – Do not include columns whose entries are all NaN. If True, rows with a NaN value in any column will be omitted before computing margins.
margins_name (str, default 'All') – Name of the row / column that will contain the totals when margins is True.
observed (bool, default False) –
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Deprecated since version 2.2.0: The default value of False is deprecated and will change to True in a future version of pandas.
sort (bool, default True) –
Specifies if the result should be sorted.

Added in version 1.3.0.

Returns:

An Excel style pivot table.

Return type:

See also

DataFrame.pivot: Pivot without aggregation that can handle non-numeric data.
DataFrame.melt: Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
wide_to_long: Wide panel to long format. Less flexible but more user-friendly than melt.

Notes

Reference the user guide for more examples.

Examples

>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
...                          "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two",
...                          "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small",
...                          "small", "large", "small", "small",
...                          "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> df
     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

This first example aggregates values by taking the sum.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                        columns=['C'], aggfunc="sum")
>>> table
C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

We can also fill missing values using the fill_value parameter.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                        columns=['C'], aggfunc="sum", fill_value=0)
>>> table
C        large  small
A   B
bar one      4      5
    two      7      6
foo one      4      1
    two      0      6

The next example aggregates by taking the mean across multiple columns.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                        aggfunc={'D': "mean", 'E': "mean"})
>>> table
                D         E
A   C
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333

We can also calculate multiple types of aggregations for any given value column.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                        aggfunc={'D': "mean",
...                                 'E': ["min", "max", "mean"]})
>>> table
                  D   E
               mean max      mean  min
A   C
bar large  5.500000   9  7.500000    6
    small  5.500000   9  8.500000    8
foo large  2.000000   5  4.500000    4
    small  2.333333   6  4.333333    2

plot: alias of PlotAccessor

pop(item: Hashable) → Series

Return item and drop from frame. Raise KeyError if not found.

Parameters:: item (label) – Label of column to be popped.
Return type:: Series

Examples

>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
...                    ('parrot', 'bird', 24.0),
...                    ('lion', 'mammal', 80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
     name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN

>>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object

>>> df
     name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN

pow(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

prod(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, min_count: int = 0, **kwargs)

Return the product of the values over the requested axis.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

By default, the product of an empty or all-NA Series is 1

>>> pd.Series([], dtype="float64").prod()
1.0

This can be controlled with the min_count parameter

>>> pd.Series([], dtype="float64").prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).prod()
1.0

>>> pd.Series([np.nan]).prod(min_count=1)
nan

product(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, min_count: int = 0, **kwargs)

Return the product of the values over the requested axis.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

By default, the product of an empty or all-NA Series is 1

>>> pd.Series([], dtype="float64").prod()
1.0

This can be controlled with the min_count parameter

>>> pd.Series([], dtype="float64").prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).prod()
1.0

>>> pd.Series([np.nan]).prod(min_count=1)
nan

quantile(q: float = 0.5, axis: Axis = 0, numeric_only: bool = False, interpolation: QuantileInterpolation = 'linear', method: Literal['single', 'table'] = 'single') → Series

quantile(q: AnyArrayLike | Sequence[float], axis: Axis = 0, numeric_only: bool = False, interpolation: QuantileInterpolation = 'linear', method: Literal['single', 'table'] = 'single') → Series | DataFrame

quantile(q: float | AnyArrayLike | Sequence[float] = 0.5, axis: Axis = 0, numeric_only: bool = False, interpolation: QuantileInterpolation = 'linear', method: Literal['single', 'table'] = 'single') → Series | DataFrame

Return values at the given quantile over requested axis.

Parameters:

q (float or array-like, default 0.5 (50% quantile)) – Value between 0 <= q <= 1, the quantile(s) to compute.
axis ({0 or 'index', 1 or 'columns'}, default 0) – Equals 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
numeric_only (bool, default False) –
Include only float, int or boolean data.

Changed in version 2.0.0: The default value of numeric_only is now False.
interpolation ({'linear', 'lower', 'higher', 'midpoint', 'nearest'}) –
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:
- linear: i + (j - i) * fraction, where fraction is the fractional part of the index surrounded by i and j.
- lower: i.
- higher: j.
- nearest: i or j whichever is nearest.
- midpoint: (i + j) / 2.
method ({'single', 'table'}, default 'single') – Whether to compute quantiles per-column (‘single’) or over all columns (‘table’). When ‘table’, the only allowed interpolation methods are ‘nearest’, ‘lower’, and ‘higher’.

Returns:

If q is an array, a DataFrame will be returned where the: index is q, the columns are the columns of self, and the values are the quantiles.
If q is a float, a Series will be returned where the: index is the columns of self and the values are the quantiles.

Return type:

See also

core.window.rolling.Rolling.quantile: Rolling quantile.
numpy.percentile: Numpy function to compute the percentile.

Examples

>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df.quantile(.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0

Specifying method=’table’ will compute the quantile over all columns.

>>> df.quantile(.1, method="table", interpolation="nearest")
a    1
b    1
Name: 0.1, dtype: int64
>>> df.quantile([.1, .5], method="table", interpolation="nearest")
     a    b
0.1  1    1
0.5  3  100

Specifying numeric_only=False will also compute the quantile of datetime and timedelta data.

>>> df = pd.DataFrame({'A': [1, 2],
...                    'B': [pd.Timestamp('2010'),
...                          pd.Timestamp('2011')],
...                    'C': [pd.Timedelta('1 days'),
...                          pd.Timedelta('2 days')]})
>>> df.quantile(0.5, numeric_only=False)
A                    1.5
B    2010-07-02 12:00:00
C        1 days 12:00:00
Name: 0.5, dtype: object

query(expr: str, *, inplace: Literal[False] = False, **kwargs) → DataFrame

query(expr: str, *, inplace: Literal[True], **kwargs) → None

query(expr: str, *, inplace: bool = False, **kwargs) → DataFrame | None

Query the columns of a DataFrame with a boolean expression.

Parameters:

expr (str) –
The query string to evaluate.

You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.

You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named “Area (cm^2)” would be referenced as `Area (cm^2)`). Column names which are Python keywords (like “list”, “for”, “import”, etc) cannot be used.

For example, if one of your columns is called a a and you want to sum it with b, your query should be `a a` + b.
inplace (bool) – Whether to modify the DataFrame rather than creating a new one.
**kwargs – See the documentation for eval() for complete details on the keyword arguments accepted by DataFrame.query().

Returns:

DataFrame resulting from the provided query expression or None if inplace=True.

Return type:

DataFrame or None

See also

eval: Evaluate a string describing operations on DataFrame columns.
DataFrame.eval: Evaluate a string describing operations on DataFrame columns.

Notes

The result of the evaluation of this expression is first passed to DataFrame.loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to DataFrame.__getitem__().

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The DataFrame.index and DataFrame.columns attributes of the DataFrame instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.

For further details and examples see the query documentation in indexing.

Backtick quoted variables

Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.

During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign. For other characters that fall outside the ASCII range (U+0001..U+007F) and those that are not further specified in PEP 3131, the query parser will raise an error. This excludes whitespace different than the space character, but also the hashtag (as it is used for comments) and the backtick itself (backtick can also not be escaped).

In a special case, quotes that make a pair around a backtick can confuse the parser. For example, `it's` > `that's` will raise an error, as it forms a quoted string ('s > `that') with a backtick inside.

See also the Python documentation about lexical analysis (https://docs.python.org/3/reference/lexical_analysis.html) in combination with the source code in pandas.core.computation.parsing.

Examples

>>> df = pd.DataFrame({'A': range(1, 6),
...                    'B': range(10, 0, -2),
...                    'C C': range(10, 5, -1)})
>>> df
   A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6
>>> df.query('A > B')
   A  B  C C
4  5  2    6

The previous expression is equivalent to

>>> df[df.A > df.B]
   A  B  C C
4  5  2    6

For columns with spaces in their name, you can use backtick quoting.

>>> df.query('B == `C C`')
   A   B  C C
0  1  10   10

The previous expression is equivalent to

>>> df[df.B == df['C C']]
   A   B  C C
0  1  10   10

radd(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Addition of dataframe and other, element-wise (binary operator radd).

Equivalent to other + dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, add.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

rdiv(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

Conform DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:

labels (array-like, optional) – New labels / index to conform the axis specified by ‘axis’ to.
index (array-like, optional) – New labels for the index. Preferably an Index object to avoid duplicating data.
columns (array-like, optional) – New labels for the columns. Preferably an Index object to avoid duplicating data.
axis (int or str, optional) – Axis to target. Can be either the axis name (‘index’, ‘columns’) or number (0, 1).
method ({None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}) –
Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.
- None (default): don’t fill gaps
- pad / ffill: Propagate last valid observation forward to next valid.
- backfill / bfill: Use next valid observation to fill gap.
- nearest: Use nearest valid observations to fill gap.
copy (bool, default True) –
Return a new object, even if the passed indexes are the same.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (scalar, default np.nan) – Value to use for missing values. Defaults to NaN, but can be any “compatible” value.
limit (int, default None) – Maximum number of consecutive elements to forward or backward fill.
tolerance (optional) –
Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Return type:

DataFrame with changed index.

See also

DataFrame.set_index: Set row labels.
DataFrame.reset_index: Remove row labels or move them to new columns.
DataFrame.reindex_like: Change to same indices as other DataFrame.

Examples

DataFrame.reindex supports two calling conventions

(index=index_labels, columns=column_labels, ...)
(labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
...                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...                   index=index)
>>> df
           http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...              'Chrome']
>>> df.reindex(new_index)
               http_status  response_time
Safari               404.0           0.07
Iceweasel              NaN            NaN
Comodo Dragon          NaN            NaN
IE10                 404.0           0.08
Chrome               200.0           0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

>>> df.reindex(new_index, fill_value=0)
               http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02

>>> df.reindex(new_index, fill_value='missing')
              http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

We can also reindex the columns.

>>> df.reindex(columns=['http_status', 'user_agent'])
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

Or we can use “axis-style” keyword arguments

>>> df.reindex(['http_status', 'user_agent'], axis="columns")
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates).

>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
...                    index=date_index)
>>> df2
            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0

Suppose we decide to expand the dataframe to cover a wider date range.

>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
            prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

The index entries that did not have a value in the original data frame (for example, ‘2009-12-29’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

>>> df2.reindex(date_index2, method='bfill')
            prices
2009-12-29   100.0
2009-12-30   100.0
2009-12-31   100.0
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

Please note that the NaN value present in the original dataframe (at index value 2010-01-03) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataframe, use the fillna() method.

See the user guide for more.

rename(mapper: Renamer | None = None, *, index: Renamer | None = None, columns: Renamer | None = None, axis: Axis | None = None, copy: bool | None = None, inplace: Literal[False] = False, level: Level = None, errors: IgnoreRaise = 'ignore') → DataFrame

Rename columns or index labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

See the user guide for more.

Parameters:

mapper (dict-like or function) – Dict-like or function transformations to apply to that axis’ values. Use either mapper and axis to specify the axis to target with mapper, or index and columns.
index (dict-like or function) – Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper).
columns (dict-like or function) – Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper).
axis ({0 or 'index', 1 or 'columns'}, default 0) – Axis to target with mapper. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). The default is ‘index’.
copy (bool, default True) –
Also copy underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True
inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one. If True then value of copy is ignored.
level (int or level name, default None) – In case of a MultiIndex, only rename labels in the specified level.
errors ({'ignore', 'raise'}, default 'ignore') – If ‘raise’, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the Index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.

Returns:

DataFrame with the renamed axis labels or None if inplace=True.

Return type:

DataFrame or None

Raises:

KeyError – If any of the labels is not found in the selected axis and “errors=’raise’”.

See also

DataFrame.rename_axis: Set the name of the axis.

Examples

DataFrame.rename supports two calling conventions

(index=index_mapper, columns=columns_mapper, ...)
(mapper, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Rename columns using a mapping:

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"})
   a  c
0  1  4
1  2  5
2  3  6

Rename index using a mapping:

>>> df.rename(index={0: "x", 1: "y", 2: "z"})
   A  B
x  1  4
y  2  5
z  3  6

Cast index labels to a different type:

>>> df.index
RangeIndex(start=0, stop=3, step=1)
>>> df.rename(index=str).index
Index(['0', '1', '2'], dtype='object')

>>> df.rename(columns={"A": "a", "B": "b", "C": "c"}, errors="raise")
Traceback (most recent call last):
KeyError: ['C'] not found in axis

Using axis-style parameters:

>>> df.rename(str.lower, axis='columns')
   a  b
0  1  4
1  2  5
2  3  6

>>> df.rename({1: 2, 2: 4}, axis='index')
   A  B
0  1  4
2  2  5
4  3  6

reorder_levels(order: Sequence[int | str], axis: Axis = 0) → DataFrame

Rearrange index levels using input order. May not drop or duplicate levels.

Parameters:

order (list of int or list of str) – List representing new level order. Reference level by number (position) or by key (label).
axis ({0 or 'index', 1 or 'columns'}, default 0) – Where to reorder levels.

Return type:

Examples

>>> data = {
...     "class": ["Mammals", "Mammals", "Reptiles"],
...     "diet": ["Omnivore", "Carnivore", "Carnivore"],
...     "species": ["Humans", "Dogs", "Snakes"],
... }
>>> df = pd.DataFrame(data, columns=["class", "diet", "species"])
>>> df = df.set_index(["class", "diet"])
>>> df
                                  species
class      diet
Mammals    Omnivore                Humans
           Carnivore                 Dogs
Reptiles   Carnivore               Snakes

Let’s reorder the levels of the index:

>>> df.reorder_levels(["diet", "class"])
                                  species
diet      class
Omnivore  Mammals                  Humans
Carnivore Mammals                    Dogs
          Reptiles                 Snakes

reset_index(level: IndexLabel = None, *, drop: bool = False, inplace: Literal[False] = False, col_level: Hashable = 0, col_fill: Hashable = '', allow_duplicates: bool | lib.NoDefault = <no_default>, names: Hashable | Sequence[Hashable] | None = None) → DataFrame

reset_index(level: IndexLabel = None, *, drop: bool = False, inplace: Literal[True], col_level: Hashable = 0, col_fill: Hashable = '', allow_duplicates: bool | lib.NoDefault = <no_default>, names: Hashable | Sequence[Hashable] | None = None) → None

reset_index(level: IndexLabel = None, *, drop: bool = False, inplace: bool = False, col_level: Hashable = 0, col_fill: Hashable = '', allow_duplicates: bool | lib.NoDefault = <no_default>, names: Hashable | Sequence[Hashable] | None = None) → DataFrame | None

Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Parameters:

level (int, str, tuple, or list, default None) – Only remove the given levels from the index. Removes all levels by default.
drop (bool, default False) – Do not try to insert index into dataframe columns. This resets the index to the default integer index.
inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.
col_level (int or str, default 0) – If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.
col_fill (object, default '') – If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.
allow_duplicates (bool, optional, default lib.no_default) –
Allow duplicate column labels to be created.

Added in version 1.5.0.
names (int, str or 1-dimensional list, default None) –
Using the given string, rename the DataFrame column which contains the index data. If the DataFrame has a MultiIndex, this has to be a list or tuple with length equal to the number of levels.

Added in version 1.5.0.

Returns:

DataFrame with the new index or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.set_index: Opposite of reset_index.
DataFrame.reindex: Change to new indices or expand indices.
DataFrame.reindex_like: Change to same indices as other DataFrame.

Examples

>>> df = pd.DataFrame([('bird', 389.0),
...                    ('bird', 24.0),
...                    ('mammal', 80.5),
...                    ('mammal', np.nan)],
...                   index=['falcon', 'parrot', 'lion', 'monkey'],
...                   columns=('class', 'max_speed'))
>>> df
         class  max_speed
falcon    bird      389.0
parrot    bird       24.0
lion    mammal       80.5
monkey  mammal        NaN

When we reset the index, the old index is added as a column, and a new sequential index is used:

>>> df.reset_index()
    index   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN

We can use the drop parameter to avoid the old index being added as a column:

>>> df.reset_index(drop=True)
    class  max_speed
0    bird      389.0
1    bird       24.0
2  mammal       80.5
3  mammal        NaN

You can also use reset_index with MultiIndex.

>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
...                                    ('bird', 'parrot'),
...                                    ('mammal', 'lion'),
...                                    ('mammal', 'monkey')],
...                                   names=['class', 'name'])
>>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),
...                                      ('species', 'type')])
>>> df = pd.DataFrame([(389.0, 'fly'),
...                    (24.0, 'fly'),
...                    (80.5, 'run'),
...                    (np.nan, 'jump')],
...                   index=index,
...                   columns=columns)
>>> df
               speed species
                 max    type
class  name
bird   falcon  389.0     fly
       parrot   24.0     fly
mammal lion     80.5     run
       monkey    NaN    jump

Using the names parameter, choose a name for the index column:

>>> df.reset_index(names=['classes', 'names'])
  classes   names  speed species
                     max    type
0    bird  falcon  389.0     fly
1    bird  parrot   24.0     fly
2  mammal    lion   80.5     run
3  mammal  monkey    NaN    jump

If the index has multiple levels, we can reset a subset of them:

>>> df.reset_index(level='class')
         class  speed species
                  max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

If we are not dropping the index, by default, it is placed in the top level. We can place it in another level:

>>> df.reset_index(level='class', col_level=1)
                speed species
         class    max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

When the index is inserted under another level, we can specify under which one with the parameter col_fill:

>>> df.reset_index(level='class', col_level=1, col_fill='species')
              species  speed species
                class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump

If we specify a nonexistent level for col_fill, it is created:

>>> df.reset_index(level='class', col_level=1, col_fill='genus')
                genus  speed species
                class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump

rfloordiv(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

Equivalent to other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

rmod(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Modulo of dataframe and other, element-wise (binary operator rmod).

Equivalent to other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

rmul(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

Equivalent to other * dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

round(decimals: int | dict[IndexLabel, int] | Series = 0, *args, **kwargs) → DataFrame

Round a DataFrame to a variable number of decimal places.

Parameters:

decimals (int, dict, Series) – Number of decimal places to round each column to. If an int is given, round each column to the same number of places. Otherwise dict and Series round to variable numbers of places. Column names should be in the keys if decimals is a dict-like, or in the index if decimals is a Series. Any columns not included in decimals will be left as is. Elements of decimals which are not columns of the input will be ignored.
*args – Additional keywords have no effect but might be accepted for compatibility with numpy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with numpy.

Returns:

A DataFrame with the affected columns rounded to the specified number of decimal places.

Return type:

See also

numpy.around: Round a numpy array to the given number of decimals.
Series.round: Round a Series to the given number of decimals.

Examples

>>> df = pd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...                   columns=['dogs', 'cats'])
>>> df
    dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

>>> df.round(1)
    dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

>>> df.round({'dogs': 1, 'cats': 0})
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

>>> decimals = pd.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
    dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

rpow(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

Equivalent to other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

rsub(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

Equivalent to other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

rtruediv(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

select_dtypes(include=None, exclude=None) → Self

Return a subset of the DataFrame’s columns based on the column dtypes.

Parameters:

include (scalar or list-like) – A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.
exclude (scalar or list-like) – A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.

Returns:

The subset of the frame including the dtypes in include and excluding the dtypes in exclude.

Return type:

Raises:

ValueError –

If both of include and exclude are empty * If include and exclude have overlapping elements * If any kind of string dtype is passed in.

See also

DataFrame.dtypes: Return Series with the data type of each column.

Notes

To select all numeric types, use np.number or 'number'
To select strings you must use the object dtype, but note that this will return all object dtype columns
See the numpy dtype hierarchy
To select datetimes, use np.datetime64, 'datetime' or 'datetime64'
To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'
To select Pandas categorical dtypes, use 'category'
To select Pandas datetimetz dtypes, use 'datetimetz' or 'datetime64[ns, tz]'

Examples

>>> df = pd.DataFrame({'a': [1, 2] * 3,
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
        a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0

>>> df.select_dtypes(include='bool')
   b
True
False
True
False
True
False

>>> df.select_dtypes(include=['float64'])
   c
1.0
2.0
1.0
2.0
1.0
2.0

>>> df.select_dtypes(exclude=['int64'])
       b    c
 True  1.0
False  2.0
 True  1.0
False  2.0
 True  1.0
False  2.0

sem(axis: Axis | None = 0, skipna: bool = True, ddof: int = 1, numeric_only: bool = False, **kwargs)

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:

axis ({index (0), columns (1)}) –
For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.sem with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.sem().round(6)
0.57735

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.sem()
a   0.5
b   0.5
dtype: float64

Using axis=1

>>> df.sem(axis=1)
tiger   0.5
zebra   0.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.sem(numeric_only=True)
a   0.5
dtype: float64

Return type:

Series or DataFrame (if level specified)

set_axis(labels, *, axis: Axis = 0, copy: bool | None = None) → DataFrame

Assign desired index to given axis.

Indexes for column or row labels can be changed by assigning a list-like or Index.

Parameters:

labels (list-like, Index) – The values for the new index.
axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to update. The value 0 identifies the rows. For Series this parameter is unused and defaults to 0.
copy (bool, default True) –
Whether to make a copy of the underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

An object of type DataFrame.

Return type:

See also

DataFrame.rename_axis: Alter the name of the index or columns. Examples ——– >>> df = pd.DataFrame({“A”: [1, 2, 3], “B”: [4, 5, 6]}) Change the row labels. >>> df.set_axis([‘a’, ‘b’, ‘c’], axis=’index’) A B a 1 4 b 2 5 c 3 6 Change the column labels. >>> df.set_axis([‘I’, ‘II’], axis=’columns’) I II 0 1 4 1 2 5 2 3 6

set_index(keys, *, drop: bool = True, append: bool = False, inplace: Literal[False] = False, verify_integrity: bool = False) → DataFrame

set_index(keys, *, drop: bool = True, append: bool = False, inplace: Literal[True], verify_integrity: bool = False) → None

Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

Parameters:

keys (label or array-like or list of labels/arrays) – This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index, np.ndarray, and instances of Iterator.
drop (bool, default True) – Delete columns to be used as the new index.
append (bool, default False) – Whether to append columns to existing index.
inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.
verify_integrity (bool, default False) – Check the new index for duplicates. Otherwise defer the check until necessary. Setting to False will improve the performance of this method.

Returns:

Changed row labels or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.reset_index: Opposite of set_index.
DataFrame.reindex: Change to new indices or expand indices.
DataFrame.reindex_like: Change to same indices as other DataFrame.

Examples

>>> df = pd.DataFrame({'month': [1, 4, 7, 10],
...                    'year': [2012, 2014, 2013, 2014],
...                    'sale': [55, 40, 84, 31]})
>>> df
   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

Set the index to become the ‘month’ column:

>>> df.set_index('month')
       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

Create a MultiIndex using columns ‘year’ and ‘month’:

>>> df.set_index(['year', 'month'])
            sale
year  month
2012  1     55
2014  4     40
2013  7     84
2014  10    31

Create a MultiIndex using an Index and a column:

>>> df.set_index([pd.Index([1, 2, 3, 4]), 'year'])
         month  sale
   year
1  2012  1      55
2  2014  4      40
3  2013  7      84
4  2014  10     31

Create a MultiIndex using two Series:

>>> s = pd.Series([1, 2, 3, 4])
>>> df.set_index([s, s**2])
      month  year  sale
1 1       1  2012    55
2 4       4  2014    40
3 9       7  2013    84
4 16     10  2014    31

property shape: tuple[int, int]

Return a tuple representing the dimensionality of the DataFrame.

See also

ndarray.shape: Tuple of array dimensions.

Examples

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.shape
(2, 2)

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4],
...                    'col3': [5, 6]})
>>> df.shape
(2, 3)

shift(periods: int | Sequence[int] = 1, freq: Frequency | None = None, axis: Axis = 0, fill_value: Hashable = <no_default>, suffix: str | None = None) → DataFrame

Shift index by desired number of periods with an optional time freq.

When freq is not passed, shift the index without realigning the data. If freq is passed (in this case, the index must be date or datetime, or it will raise a NotImplementedError), the index will be increased using the periods and the freq. freq can be inferred when specified as “infer” as long as either freq or inferred_freq attribute is set in the index.

Parameters:

periods (int or Sequence) – Number of periods to shift. Can be positive or negative. If an iterable of ints, the data will be shifted once by each int. This is equivalent to shifting by one value at a time and concatenating all resulting frames. The resulting columns will have the shift suffixed to their column names. For multiple periods, axis must not be 1.
freq (DateOffset, tseries.offsets, timedelta, or str, optional) – Offset to use from the tseries module or time rule (e.g. ‘EOM’). If freq is specified then the index values are shifted but the data is not realigned. That is, use freq if you would like to extend the index when shifting and preserve the original data. If freq is specified as “infer” then it will be inferred from the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown.
axis ({0 or 'index', 1 or 'columns', None}, default None) – Shift direction. For Series this parameter is unused and defaults to 0.
fill_value (object, optional) – The scalar value to use for newly introduced missing values. the default depends on the dtype of self. For numeric data, np.nan is used. For datetime, timedelta, or period data, etc. NaT is used. For extension dtypes, self.dtype.na_value is used.
suffix (str, optional) – If str and periods is an iterable, this is added after the column name and before the shift value for each shifted column name.

Returns:

Copy of input object, shifted.

Return type:

See also

Index.shift: Shift values of Index.
DatetimeIndex.shift: Shift values of DatetimeIndex.
PeriodIndex.shift: Shift values of PeriodIndex.

Examples

>>> df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
...                    "Col2": [13, 23, 18, 33, 48],
...                    "Col3": [17, 27, 22, 37, 52]},
...                   index=pd.date_range("2020-01-01", "2020-01-05"))
>>> df
            Col1  Col2  Col3
2020-01-01    10    13    17
2020-01-02    20    23    27
2020-01-03    15    18    22
2020-01-04    30    33    37
2020-01-05    45    48    52

>>> df.shift(periods=3)
            Col1  Col2  Col3
2020-01-01   NaN   NaN   NaN
2020-01-02   NaN   NaN   NaN
2020-01-03   NaN   NaN   NaN
2020-01-04  10.0  13.0  17.0
2020-01-05  20.0  23.0  27.0

>>> df.shift(periods=1, axis="columns")
            Col1  Col2  Col3
2020-01-01   NaN    10    13
2020-01-02   NaN    20    23
2020-01-03   NaN    15    18
2020-01-04   NaN    30    33
2020-01-05   NaN    45    48

>>> df.shift(periods=3, fill_value=0)
            Col1  Col2  Col3
2020-01-01     0     0     0
2020-01-02     0     0     0
2020-01-03     0     0     0
2020-01-04    10    13    17
2020-01-05    20    23    27

>>> df.shift(periods=3, freq="D")
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52

>>> df.shift(periods=3, freq="infer")
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52

>>> df['Col1'].shift(periods=[0, 1, 2])
            Col1_0  Col1_1  Col1_2
2020-01-01      10     NaN     NaN
2020-01-02      20    10.0     NaN
2020-01-03      15    20.0    10.0
2020-01-04      30    15.0    20.0
2020-01-05      45    30.0    15.0

skew(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return unbiased skew over requested axis.

Normalized by N-1.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.skew()
0.0

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [1, 3, 5]},
...                   index=['tiger', 'zebra', 'cow'])
>>> df
        a   b   c
tiger   1   2   1
zebra   2   3   3
cow     3   4   5
>>> df.skew()
a   0.0
b   0.0
c   0.0
dtype: float64

Using axis=1

>>> df.skew(axis=1)
tiger   1.732051
zebra  -1.732051
cow     0.000000
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': ['T', 'Z', 'X']},
...                   index=['tiger', 'zebra', 'cow'])
>>> df.skew(numeric_only=True)
a   0.0
dtype: float64

Return type:

Series or scalar

sort_index(*, axis: Axis = 0, level: IndexLabel = None, ascending: bool | Sequence[bool] = True, inplace: Literal[True], kind: SortKind = 'quicksort', na_position: NaPosition = 'last', sort_remaining: bool = True, ignore_index: bool = False, key: IndexKeyFunc = None) → None

sort_index(*, axis: Axis = 0, level: IndexLabel = None, ascending: bool | Sequence[bool] = True, inplace: Literal[False] = False, kind: SortKind = 'quicksort', na_position: NaPosition = 'last', sort_remaining: bool = True, ignore_index: bool = False, key: IndexKeyFunc = None) → DataFrame

sort_index(*, axis: Axis = 0, level: IndexLabel = None, ascending: bool | Sequence[bool] = True, inplace: bool = False, kind: SortKind = 'quicksort', na_position: NaPosition = 'last', sort_remaining: bool = True, ignore_index: bool = False, key: IndexKeyFunc = None) → DataFrame | None

Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis along which to sort. The value 0 identifies the rows, and 1 identifies the columns.
level (int or level name or list of ints or list of level names) – If not None, sort on values in specified index level(s).
ascending (bool or list-like of bools, default True) – Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.
inplace (bool, default False) – Whether to modify the DataFrame rather than creating a new one.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm. See also numpy.sort() for more information. mergesort and stable are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.
na_position ({'first', 'last'}, default 'last') – Puts NaNs at the beginning if first; last puts NaNs at the end. Not implemented for MultiIndex.
sort_remaining (bool, default True) – If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
key (callable, optional) – If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect an Index and return an Index of the same shape. For MultiIndex inputs, the key is applied per level.

Returns:

The original DataFrame sorted by the labels or None if inplace=True.

Return type:

DataFrame or None

See also

Series.sort_index: Sort Series by the index.
DataFrame.sort_values: Sort DataFrame by the value.
Series.sort_values: Sort Series by the value.

Examples

>>> df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150],
...                   columns=['A'])
>>> df.sort_index()
     A
1    4
29   2
100  1
150  5
234  3

By default, it sorts in ascending order, to sort in descending order, use ascending=False

>>> df.sort_index(ascending=False)
     A
3
5
1
 2
  4

A key function can be specified which is applied to the index before sorting. For a MultiIndex this is applied to each level separately.

>>> df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd'])
>>> df.sort_index(key=lambda x: x.str.lower())
   a
A  1
b  2
C  3
d  4

sort_values(by: IndexLabel, *, axis: Axis = 0, ascending=True, inplace: Literal[False] = False, kind: SortKind = 'quicksort', na_position: NaPosition = 'last', ignore_index: bool = False, key: ValueKeyFunc = None) → DataFrame

sort_values(by: IndexLabel, *, axis: Axis = 0, ascending=True, inplace: Literal[True], kind: SortKind = 'quicksort', na_position: str = 'last', ignore_index: bool = False, key: ValueKeyFunc = None) → None

Sort by the values along either axis.

Parameters:

by (str or list of str) –
Name or list of names to sort by.
- if axis is 0 or ‘index’ then by may contain index levels and/or column labels.
- if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.
axis ("{0 or 'index', 1 or 'columns'}", default 0) – Axis to be sorted.
ascending (bool or list of bool, default True) – Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
inplace (bool, default False) – If True, perform operation in-place.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm. See also numpy.sort() for more information. mergesort and stable are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.
na_position ({'first', 'last'}, default 'last') – Puts NaNs at the beginning if first; last puts NaNs at the end.
ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
key (callable, optional) – Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

Returns:

DataFrame with sorted values or None if inplace=True.

Return type:

DataFrame or None

See also

DataFrame.sort_index: Sort a DataFrame by the index.
Series.sort_values: Similar method for a Series.

Examples

>>> df = pd.DataFrame({
...     'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
...     'col2': [2, 1, 9, 8, 7, 4],
...     'col3': [0, 1, 9, 4, 2, 3],
...     'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Sort by col1

>>> df.sort_values(by=['col1'])
  col1  col2  col3 col4
  A     2     0    a
  A     1     1    B
  B     9     9    c
  C     4     3    F
  D     7     2    e
NaN     8     4    D

Sort by multiple columns

>>> df.sort_values(by=['col1', 'col2'])
  col1  col2  col3 col4
  A     1     1    B
  A     2     0    a
  B     9     9    c
  C     4     3    F
  D     7     2    e
NaN     8     4    D

Sort Descending

>>> df.sort_values(by='col1', ascending=False)
  col1  col2  col3 col4
  D     7     2    e
  C     4     3    F
  B     9     9    c
  A     2     0    a
  A     1     1    B
NaN     8     4    D

Putting NAs first

>>> df.sort_values(by='col1', ascending=False, na_position='first')
  col1  col2  col3 col4
NaN     8     4    D
  D     7     2    e
  C     4     3    F
  B     9     9    c
  A     2     0    a
  A     1     1    B

Sorting with a key function

>>> df.sort_values(by='col4', key=lambda col: col.str.lower())
   col1  col2  col3 col4
  A     2     0    a
  A     1     1    B
  B     9     9    c
NaN     8     4    D
  D     7     2    e
  C     4     3    F

Natural sort with the key argument, using the natsort <https://github.com/SethMMorton/natsort> package.

>>> df = pd.DataFrame({
...    "time": ['0hr', '128hr', '72hr', '48hr', '96hr'],
...    "value": [10, 20, 30, 40, 50]
... })
>>> df
    time  value
0    0hr     10
1  128hr     20
2   72hr     30
3   48hr     40
4   96hr     50
>>> from natsort import index_natsorted
>>> df.sort_values(
...     by="time",
...     key=lambda x: np.argsort(index_natsorted(df["time"]))
... )
    time  value
0    0hr     10
3   48hr     40
2   72hr     30
4   96hr     50
1  128hr     20

sparse: alias of SparseFrameAccessor

stack(level: IndexLabel = -1, dropna: bool | lib.NoDefault = <no_default>, sort: bool | lib.NoDefault = <no_default>, future_stack: bool = False)

Stack the prescribed level(s) from columns to index.

Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

if the columns have a single level, the output is a Series;

if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.

Parameters:

level (int, str, list, default -1) – Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.
dropna (bool, default True) – Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.
sort (bool, default True) – Whether to sort the levels of the resulting MultiIndex.
future_stack (bool, default False) – Whether to use the new implementation that will replace the current implementation in pandas 3.0. When True, dropna and sort have no impact on the result and must remain unspecified. See pandas 2.1.0 Release notes for more details.

Returns:

Stacked dataframe or series.

Return type:

DataFrame or Series

See also

DataFrame.unstack: Unstack prescribed level(s) from index axis onto column axis.
DataFrame.pivot: Reshape dataframe from long format to wide format.
DataFrame.pivot_table: Create a spreadsheet-style pivot table as a DataFrame.

Notes

The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).

Reference the user guide for more examples.

Examples

Single level columns

>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
...                                     index=['cat', 'dog'],
...                                     columns=['weight', 'height'])

Stacking a dataframe with a single level column axis returns a Series:

>>> df_single_level_cols
     weight height
cat       0      1
dog       2      3
>>> df_single_level_cols.stack(future_stack=True)
cat  weight    0
     height    1
dog  weight    2
     height    3
dtype: int64

Multi level columns: simple case

>>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('weight', 'pounds')])
>>> df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol1)

Stacking a dataframe with a multi-level column axis:

>>> df_multi_level_cols1
     weight
         kg    pounds
cat       1        2
dog       2        4
>>> df_multi_level_cols1.stack(future_stack=True)
            weight
cat kg           1
    pounds       2
dog kg           2
    pounds       4

Missing values

>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('height', 'm')])
>>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:

>>> df_multi_level_cols2
    weight height
        kg      m
cat    1.0    2.0
dog    3.0    4.0
>>> df_multi_level_cols2.stack(future_stack=True)
        weight  height
cat kg     1.0     NaN
    m      NaN     2.0
dog kg     3.0     NaN
    m      NaN     4.0

Prescribing the level(s) to be stacked

The first parameter controls which level or levels are stacked:

>>> df_multi_level_cols2.stack(0, future_stack=True)
             kg    m
cat weight  1.0  NaN
    height  NaN  2.0
dog weight  3.0  NaN
    height  NaN  4.0
>>> df_multi_level_cols2.stack([0, 1], future_stack=True)
cat  weight  kg    1.0
     height  m     2.0
dog  weight  kg    3.0
     height  m     4.0
dtype: float64

std(axis: Axis | None = 0, skipna: bool = True, ddof: int = 1, numeric_only: bool = False, **kwargs)

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

axis ({index (0), columns (1)}) –
For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.std with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

Return type:

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

Examples

>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

The standard deviation of the columns can be found as follows:

>>> df.std()
age       18.786076
height     0.237417
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.std(ddof=0)
age       16.269219
height     0.205609
dtype: float64

property style: Styler

Returns a Styler object.

Contains methods for building a styled HTML representation of the DataFrame.

See also

io.formats.style.Styler: Helps style a DataFrame or Series according to the data with HTML and CSS.

Examples

>>> df = pd.DataFrame({'A': [1, 2, 3]})
>>> df.style  

Please see Table Visualization for more examples.

sub(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

subtract(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

sum(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, min_count: int = 0, **kwargs)

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum.

Parameters:

axis ({index (0), columns (1)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.sum with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

Series or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.sum()
14

By default, the sum of an empty or all-NA Series is 0.

>>> pd.Series([], dtype="float64").sum()  # min_count=0 is the default
0.0

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

>>> pd.Series([], dtype="float64").sum(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).sum()
0.0

>>> pd.Series([np.nan]).sum(min_count=1)
nan

swaplevel(i: Axis = -2, j: Axis = -1, axis: Axis = 0) → DataFrame

Swap levels i and j in a MultiIndex.

Default is to swap the two innermost levels of the index.

Parameters:

i (int or str) – Levels of the indices to be swapped. Can pass level name as string.
j (int or str) – Levels of the indices to be swapped. Can pass level name as string.
axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to swap levels on. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.

Returns:

DataFrame with levels swapped in MultiIndex.

Return type:

Examples

>>> df = pd.DataFrame(
...     {"Grade": ["A", "B", "A", "C"]},
...     index=[
...         ["Final exam", "Final exam", "Coursework", "Coursework"],
...         ["History", "Geography", "History", "Geography"],
...         ["January", "February", "March", "April"],
...     ],
... )
>>> df
                                    Grade
Final exam  History     January      A
            Geography   February     B
Coursework  History     March        A
            Geography   April        C

In the following example, we will swap the levels of the indices. Here, we will swap the levels column-wise, but levels can be swapped row-wise in a similar manner. Note that column-wise is the default behaviour. By not supplying any arguments for i and j, we swap the last and second to last indices.

>>> df.swaplevel()
                                    Grade
Final exam  January     History         A
            February    Geography       B
Coursework  March       History         A
            April       Geography       C

By supplying one argument, we can choose which index to swap the last index with. We can for example swap the first index with the last one as follows.

>>> df.swaplevel(0)
                                    Grade
January     History     Final exam      A
February    Geography   Final exam      B
March       History     Coursework      A
April       Geography   Coursework      C

We can also define explicitly which indices we want to swap by supplying values for both i and j. Here, we for example swap the first and second indices.

>>> df.swaplevel(0, 1)
                                    Grade
History     Final exam  January         A
Geography   Final exam  February        B
History     Coursework  March           A
Geography   Coursework  April           C

to_dict(orient: Literal['dict', 'list', 'series', 'split', 'tight', 'index'] = 'dict', *, into: type[MutableMappingT] | MutableMappingT, index: bool = True) → MutableMappingT

to_dict(orient: Literal['records'], *, into: type[MutableMappingT] | MutableMappingT, index: bool = True) → list[MutableMappingT]

to_dict(orient: Literal['dict', 'list', 'series', 'split', 'tight', 'index'] = 'dict', *, into: type[dict] = <class 'dict'>, index: bool = True) → dict

to_dict(orient: Literal['records'], *, into: type[dict] = <class 'dict'>, index: bool = True) → list[dict]

Convert the DataFrame to a dictionary.

The type of the key-value pairs can be customized with the parameters (see below).

Parameters:

orient (str {'dict', 'list', 'series', 'split', 'tight', 'records', 'index'}) –
Determines the type of the values of the dictionary.
- ’dict’ (default) : dict like {column -> {index -> value}}
- ’list’ : dict like {column -> [values]}
- ’series’ : dict like {column -> Series(values)}
- ’split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
- ’tight’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values], ‘index_names’ -> [index.names], ‘column_names’ -> [column.names]}
- ’records’ : list like [{column -> value}, … , {column -> value}]
- ’index’ : dict like {index -> {column -> value}}
Added in version 1.4.0: ‘tight’ as an allowed value for the orient argument
into (class, default dict) – The collections.abc.MutableMapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.
index (bool, default True) –
Whether to include the index item (and index_names item if orient is ‘tight’) in the returned dictionary. Can only be False when orient is ‘split’ or ‘tight’.

Added in version 2.0.0.

Returns:

Return a collections.abc.MutableMapping object representing the DataFrame. The resulting transformation depends on the orient parameter.

Return type:

dict, list or collections.abc.MutableMapping

See also

DataFrame.from_dict: Create a DataFrame from a dictionary.
DataFrame.to_json: Convert a DataFrame to JSON format.

Examples

>>> df = pd.DataFrame({'col1': [1, 2],
...                    'col2': [0.5, 0.75]},
...                   index=['row1', 'row2'])
>>> df
      col1  col2
row1     1  0.50
row2     2  0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}

You can specify the return orientation.

>>> df.to_dict('series')
{'col1': row1    1
         row2    2
Name: col1, dtype: int64,
'col2': row1    0.50
        row2    0.75
Name: col2, dtype: float64}

>>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
 'data': [[1, 0.5], [2, 0.75]]}

>>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]

>>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}

>>> df.to_dict('tight')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
 'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}

You can also specify the mapping type.

>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
             ('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])

If you want a defaultdict, you need to initialize it:

>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
 defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]

to_feather(path: FilePath | WriteBuffer[bytes], **kwargs) → None

Write a DataFrame to the binary Feather format.

Parameters:

path (str, path object, file-like object) – String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function. If a string or a path, it will be used as Root Directory path when writing a partitioned dataset.
**kwargs – Additional keywords passed to pyarrow.feather.write_feather(). This includes the compression, compression_level, chunksize and version keywords.

Notes

This function writes the dataframe as a feather file. Requires a default index. For saving the DataFrame with your custom index use a method that supports custom indices e.g. to_parquet.

Examples

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
>>> df.to_feather("file.feather")  

to_gbq(destination_table: str, *, project_id: str | None = None, chunksize: int | None = None, reauth: bool = False, if_exists: ToGbqIfexist = 'fail', auth_local_webserver: bool = True, table_schema: list[dict[str, str]] | None = None, location: str | None = None, progress_bar: bool = True, credentials=None) → None

Write a DataFrame to a Google BigQuery table.

Deprecated since version 2.2.0: Please use pandas_gbq.to_gbq instead.

This function requires the pandas-gbq package.

See the How to authenticate with Google BigQuery guide for authentication instructions.

Parameters:

destination_table (str) – Name of table to be written, in the form dataset.tablename.
project_id (str, optional) – Google BigQuery Account project ID. Optional when available from the environment.
chunksize (int, optional) – Number of rows to be inserted in each chunk from the dataframe. Set to None to load the whole dataframe at once.
reauth (bool, default False) – Force Google BigQuery to re-authenticate the user. This is useful if multiple accounts are used.
if_exists (str, default 'fail') –
Behavior when the destination table exists. Value can be one of:

'fail'
If table exists raise pandas_gbq.gbq.TableCreationError.

'replace'
If table exists, drop it, recreate it, and insert data.

'append'
If table exists, insert data. Create if does not exist.
auth_local_webserver (bool, default True) –
Use the local webserver flow instead of the console flow when getting user credentials.

New in version 0.2.0 of pandas-gbq.

Changed in version 1.5.0: Default value is changed to True. Google has deprecated the auth_local_webserver = False “out of band” (copy-paste) flow.
table_schema (list of dicts, optional) –
List of BigQuery table fields to which according DataFrame columns conform to, e.g. [{'name': 'col1', 'type': 'STRING'},...]. If schema is not provided, it will be generated according to dtypes of DataFrame columns. See BigQuery API documentation on available names of a field.

New in version 0.3.1 of pandas-gbq.
location (str, optional) –
Location where the load job should run. See the BigQuery locations documentation for a list of available locations. The location must match that of the target dataset.

New in version 0.5.0 of pandas-gbq.
progress_bar (bool, default True) –
Use the library tqdm to show the progress bar for the upload, chunk by chunk.

New in version 0.5.0 of pandas-gbq.
credentials (google.auth.credentials.Credentials, optional) –
Credentials for accessing Google APIs. Use this parameter to override default credentials, such as to use Compute Engine google.auth.compute_engine.Credentials or Service Account google.oauth2.service_account.Credentials directly.

New in version 0.8.0 of pandas-gbq.

See also

pandas_gbq.to_gbq: This function in the pandas-gbq library.
read_gbq: Read a DataFrame from Google BigQuery.

Examples

Example taken from Google BigQuery documentation

>>> project_id = "my-project"
>>> table_id = 'my_dataset.my_table'
>>> df = pd.DataFrame({
...                   "my_string": ["a", "b", "c"],
...                   "my_int64": [1, 2, 3],
...                   "my_float64": [4.0, 5.0, 6.0],
...                   "my_bool1": [True, False, True],
...                   "my_bool2": [False, True, False],
...                   "my_dates": pd.date_range("now", periods=3),
...                   }
...                   )

>>> df.to_gbq(table_id, project_id=project_id)  

Render a DataFrame as an HTML table.

Parameters:

buf (str, Path or StringIO-like, optional, default None) – Buffer to write to. If None, the output is returned as a string.
columns (array-like, optional, default None) – The subset of columns to write. Writes all columns by default.
col_space (str or int, list or dict of int or str, optional) – The minimum width of each column in CSS length units. An int is assumed to be px units..
header (bool, optional) – Whether to print column labels, default True.
index (bool, optional, default True) – Whether to print index (row) labels.
na_rep (str, optional, default 'NaN') – String representation of NaN to use.
formatters (list, tuple or dict of one-param. functions, optional) – Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.
float_format (one-parameter function, optional, default None) – Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.
sparsify (bool, optional, default True) – Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.
index_names (bool, optional, default True) – Prints the names of the indexes.
justify (str, default None) –
How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are
- left
- right
- center
- justify
- justify-all
- start
- end
- inherit
- match-parent
- initial
- unset.
max_rows (int, optional) – Maximum number of rows to display in the console.
max_cols (int, optional) – Maximum number of columns to display in the console.
show_dimensions (bool, default False) – Display DataFrame dimensions (number of rows by number of columns).
decimal (str, default '.') – Character recognized as decimal separator, e.g. ‘,’ in Europe.
bold_rows (bool, default True) – Make the row labels bold in the output.
classes (str or list or tuple, default None) – CSS class(es) to apply to the resulting html table.
escape (bool, default True) – Convert the characters <, >, and & to HTML-safe sequences.
notebook ({True, False}, default False) – Whether the generated HTML is for IPython Notebook.
border (int) – A border=border attribute is included in the opening <table> tag. Default pd.options.display.html.border.
table_id (str, optional) – A css id is included in the opening <table> tag if specified.
render_links (bool, default False) – Convert URLs to HTML links.
encoding (str, default "utf-8") – Set character encoding.

Returns:

If buf is None, returns the result as a string. Otherwise returns None.

Return type:

str or None

See also

to_string: Convert DataFrame to a string.

Examples

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [4, 3]})
>>> html_string = '''<table border="1" class="dataframe">
...   <thead>
...     <tr style="text-align: right;">
...       <th></th>
...       <th>col1</th>
...       <th>col2</th>
...     </tr>
...   </thead>
...   <tbody>
...     <tr>
...       <th>0</th>
...       <td>1</td>
...       <td>4</td>
...     </tr>
...     <tr>
...       <th>1</th>
...       <td>2</td>
...       <td>3</td>
...     </tr>
...   </tbody>
... </table>'''
>>> assert html_string == df.to_html()

to_markdown(buf: FilePath | WriteBuffer[str] | None = None, *, mode: str = 'wt', index: bool = True, storage_options: StorageOptions | None = None, **kwargs) → str | None

Print DataFrame in Markdown-friendly format.

Parameters:

buf (str, Path or StringIO-like, optional, default None) – Buffer to write to. If None, the output is returned as a string.
mode (str, optional) – Mode in which file is opened, “wt” by default.
index (bool, optional, default True) – Add index (row) labels.
storage_options (dict, optional) – Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.
**kwargs – These parameters will be passed to tabulate.

Returns:

DataFrame in Markdown-friendly format.

Return type:

str

Notes

Requires the tabulate package.

Examples

>>> df = pd.DataFrame(
...     data={"animal_1": ["elk", "pig"], "animal_2": ["dog", "quetzal"]}
... )
>>> print(df.to_markdown())
|    | animal_1   | animal_2   |
|---:|:-----------|:-----------|
|  0 | elk        | dog        |
|  1 | pig        | quetzal    |

Output markdown with a tabulate option.

>>> print(df.to_markdown(tablefmt="grid"))
+----+------------+------------+
|    | animal_1   | animal_2   |
+====+============+============+
|  0 | elk        | dog        |
+----+------------+------------+
|  1 | pig        | quetzal    |
+----+------------+------------+

to_numpy(dtype: npt.DTypeLike | None = None, copy: bool = False, na_value: object = <no_default>) → np.ndarray

Convert the DataFrame to a NumPy array.

By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32, the results dtype will be float32. This may require copying data and coercing values, which may be expensive.

Parameters:

dtype (str or numpy.dtype, optional) – The dtype to pass to numpy.asarray().
copy (bool, default False) – Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.
na_value (Any, optional) – The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

Return type:

numpy.ndarray

See also

Series.to_numpy: Similar method for Series.

Examples

>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
       [2, 4]])

With heterogeneous data, the lowest common type will have to be used.

>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
       [2. , 4.5]])

For a mix of numeric and non-numeric types, the output array will have object dtype.

>>> df['C'] = pd.date_range('2000', periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
       [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)

Write a DataFrame to the ORC format.

Added in version 1.5.0.

Parameters:

path (str, file-like object or None, default None) – If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function). If path is None, a bytes object is returned.
engine ({'pyarrow'}, default 'pyarrow') – ORC library to use.
index (bool, optional) – If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to infer the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.
engine_kwargs (dict[str, Any] or None, default None) – Additional keyword arguments passed to pyarrow.orc.write_table().

Return type:

bytes if no path argument is provided else None

Raises:

NotImplementedError – Dtype of one or more columns is category, unsigned integers, interval, period or sparse.
ValueError – engine is not pyarrow.

See also

read_orc: Read a ORC file.
DataFrame.to_parquet: Write a parquet file.
DataFrame.to_csv: Write a csv file.
DataFrame.to_sql: Write to a sql table.
DataFrame.to_hdf: Write to hdf.

Notes

Before using this function you should read the user guide about ORC and install optional dependencies.
This function requires pyarrow library.
For supported dtypes please refer to supported ORC features in Arrow.
Currently timezones in datetime columns are not preserved when a dataframe is converted into ORC files.

Examples

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [4, 3]})
>>> df.to_orc('df.orc')  
>>> pd.read_orc('df.orc')  
   col1  col2
0     1     4
1     2     3

If you want to get a buffer to the orc content you can write it to io.BytesIO

>>> import io
>>> b = io.BytesIO(df.to_orc())  
>>> b.seek(0)  
0
>>> content = b.read()  

to_parquet(path: None = None, engine: Literal['auto', 'pyarrow', 'fastparquet'] = 'auto', compression: str | None = 'snappy', index: bool | None = None, partition_cols: list[str] | None = None, storage_options: StorageOptions = None, **kwargs) → bytes

to_parquet(path: FilePath | WriteBuffer[bytes], engine: Literal['auto', 'pyarrow', 'fastparquet'] = 'auto', compression: str | None = 'snappy', index: bool | None = None, partition_cols: list[str] | None = None, storage_options: StorageOptions = None, **kwargs) → None

Write a DataFrame to the binary parquet format.

This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.

Parameters:

path (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function. If None, the result is returned as bytes. If a string or path, it will be used as Root Directory path when writing a partitioned dataset.
engine ({'auto', 'pyarrow', 'fastparquet'}, default 'auto') – Parquet library to use. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.
compression (str or None, default 'snappy') – Name of the compression to use. Use None for no compression. Supported options: ‘snappy’, ‘gzip’, ‘brotli’, ‘lz4’, ‘zstd’.
index (bool, default None) – If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to True the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.
partition_cols (list, optional, default None) – Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.
storage_options (dict, optional) –
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.
**kwargs – Additional arguments passed to the parquet library. See pandas io for more details.

Return type:

bytes if no path argument is provided else None

See also

read_parquet: Read a parquet file.
DataFrame.to_orc: Write an orc file.
DataFrame.to_csv: Write a csv file.
DataFrame.to_sql: Write to a sql table.
DataFrame.to_hdf: Write to hdf.

Notes

This function requires either the fastparquet or pyarrow library.

Examples

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
...               compression='gzip')  
>>> pd.read_parquet('df.parquet.gzip')  
   col1  col2
0     1     3
1     2     4

If you want to get a buffer to the parquet content you can use a io.BytesIO object, as long as you don’t use partition_cols, which creates multiple files.

>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
0
>>> content = f.read()

to_period(freq: Frequency | None = None, axis: Axis = 0, copy: bool | None = None) → DataFrame

Convert DataFrame from DatetimeIndex to PeriodIndex.

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed).

Parameters:

freq (str, default) – Frequency of the PeriodIndex.
axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to convert (the index by default).
copy (bool, default True) –
If False then underlying input data is not copied.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

The DataFrame has a PeriodIndex.

Return type:

Examples

>>> idx = pd.to_datetime(
...     [
...         "2001-03-31 00:00:00",
...         "2002-05-31 00:00:00",
...         "2003-08-31 00:00:00",
...     ]
... )

>>> idx
DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],
dtype='datetime64[ns]', freq=None)

>>> idx.to_period("M")
PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')

For the yearly frequency

>>> idx.to_period("Y")
PeriodIndex(['2001', '2002', '2003'], dtype='period[Y-DEC]')

to_records(index: bool = True, column_dtypes=None, index_dtypes=None) → recarray

Convert DataFrame to a NumPy record array.

Index will be included as the first field of the record array if requested.

Parameters:

index (bool, default True) – Include index in resulting record array, stored in ‘index’ field or using the index label, if set.
column_dtypes (str, type, dict, default None) – If a string or type, the data type to store all columns. If a dictionary, a mapping of column names and indices (zero-indexed) to specific data types.
index_dtypes (str, type, dict, default None) –
If a string or type, the data type to store all index levels. If a dictionary, a mapping of index level names and indices (zero-indexed) to specific data types.

This mapping is applied only if index=True.

Returns:

NumPy ndarray with the DataFrame labels as fields and each row of the DataFrame as entries.

Return type:

numpy.rec.recarray

See also

DataFrame.from_records: Convert structured or record ndarray to DataFrame.
numpy.rec.recarray: An ndarray that allows field access using attributes, analogous to typed columns in a spreadsheet.

Examples

>>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
...                   index=['a', 'b'])
>>> df
   A     B
a  1  0.50
b  2  0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

If the DataFrame index has no label then the recarray field name is set to ‘index’. If the index has a label then this is used as the field name:

>>> df.index = df.index.rename("I")
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('I', 'O'), ('A', '<i8'), ('B', '<f8')])

The index can be excluded from the record array:

>>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
          dtype=[('A', '<i8'), ('B', '<f8')])

Data types can be specified for the columns:

>>> df.to_records(column_dtypes={"A": "int32"})
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])

As well as for the index:

>>> df.to_records(index_dtypes="<S2")
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
          dtype=[('I', 'S2'), ('A', '<i8'), ('B', '<f8')])

>>> index_dtypes = f"<S{df.index.str.len().max()}"
>>> df.to_records(index_dtypes=index_dtypes)
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
          dtype=[('I', 'S1'), ('A', '<i8'), ('B', '<f8')])

to_stata(path: FilePath | WriteBuffer[bytes], *, convert_dates: dict[Hashable, str] | None = None, write_index: bool = True, byteorder: ToStataByteorder | None = None, time_stamp: datetime.datetime | None = None, data_label: str | None = None, variable_labels: dict[Hashable, str] | None = None, version: int | None = 114, convert_strl: Sequence[Hashable] | None = None, compression: CompressionOptions = 'infer', storage_options: StorageOptions | None = None, value_labels: dict[Hashable, dict[float, str]] | None = None) → None

Export DataFrame object to Stata dta format.

Writes the DataFrame to a Stata dataset file. “dta” files contain a Stata dataset.

Parameters:

path (str, path object, or buffer) – String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function.
convert_dates (dict) – Dictionary mapping columns containing datetime types to stata internal format to use when writing the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either an integer or a name. Datetime columns that do not have a conversion type specified will be converted to ‘tc’. Raises NotImplementedError if a datetime column has timezone information.
write_index (bool) – Write the index to Stata dataset.
byteorder (str) – Can be “>”, “<”, “little”, or “big”. default is sys.byteorder.
time_stamp (datetime) – A datetime to use as file creation date. Default is the current time.
data_label (str, optional) – A label for the data set. Must be 80 characters or smaller.
variable_labels (dict) – Dictionary containing columns as keys and variable labels as values. Each label must be 80 characters or smaller.
version ({114, 117, 118, 119, None}, default 114) –
Version to use in the output dta file. Set to None to let pandas decide between 118 or 119 formats depending on the number of columns in the frame. Version 114 can be read by Stata 10 and later. Version 117 can be read by Stata 13 or later. Version 118 is supported in Stata 14 and later. Version 119 is supported in Stata 15 and later. Version 114 limits string variables to 244 characters or fewer while versions 117 and later allow strings with lengths up to 2,000,000 characters. Versions 118 and 119 support Unicode characters, and version 119 supports more than 32,767 variables.

Version 119 should usually only be used when the number of variables exceeds the capacity of dta format 118. Exporting smaller datasets in format 119 may have unintended consequences, and, as of November 2020, Stata SE cannot read version 119 files.
convert_strl (list, optional) – List of column names to convert to string columns to Stata StrL format. Only available if version is 117. Storing strings in the StrL format can produce smaller dta files if strings have more than 8 characters and values are repeated.
compression (str or dict, default 'infer') –
For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

Added in version 1.5.0: Added support for .tar files.

Changed in version 1.4.0: Zstandard support.
storage_options (dict, optional) –
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.
value_labels (dict of dicts) –
Dictionary containing columns as keys and dictionaries of column value to labels as values. Labels for a single variable must be 32,000 characters or smaller.

Added in version 1.4.0.

Raises:

NotImplementedError –
- If datetimes contain timezone information * Column dtype is not representable in Stata
ValueError –
- Columns listed in convert_dates are neither datetime64[ns] or datetime.datetime * Column listed in convert_dates is not in DataFrame * Categorical label contains more than 32,000 characters

See also

read_stata: Import Stata data files.
io.stata.StataWriter: Low-level writer for Stata data files.
io.stata.StataWriter117: Low-level writer for version 117 files.

Examples

>>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon',
...                               'parrot'],
...                    'speed': [350, 18, 361, 15]})
>>> df.to_stata('animals.dta')  

Render a DataFrame to a console-friendly tabular output.

Parameters:

buf (str, Path or StringIO-like, optional, default None) – Buffer to write to. If None, the output is returned as a string.
columns (array-like, optional, default None) – The subset of columns to write. Writes all columns by default.
col_space (int, list or dict of int, optional) – The minimum width of each column. If a list of ints is given every integers corresponds with one column. If a dict is given, the key references the column, while the value defines the space to use..
header (bool or list of str, optional) – Write out the column names. If a list of columns is given, it is assumed to be aliases for the column names.
index (bool, optional, default True) – Whether to print index (row) labels.
na_rep (str, optional, default 'NaN') – String representation of NaN to use.
formatters (list, tuple or dict of one-param. functions, optional) – Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.
float_format (one-parameter function, optional, default None) – Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.
sparsify (bool, optional, default True) – Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.
index_names (bool, optional, default True) – Prints the names of the indexes.
justify (str, default None) –
How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are
- left
- right
- center
- justify
- justify-all
- start
- end
- inherit
- match-parent
- initial
- unset.
max_rows (int, optional) – Maximum number of rows to display in the console.
max_cols (int, optional) – Maximum number of columns to display in the console.
show_dimensions (bool, default False) – Display DataFrame dimensions (number of rows by number of columns).
decimal (str, default '.') – Character recognized as decimal separator, e.g. ‘,’ in Europe.
line_width (int, optional) – Width to wrap a line in characters.
min_rows (int, optional) – The number of rows to display in the console in a truncated repr (when number of rows is above max_rows).
max_colwidth (int, optional) – Max width to truncate each column in characters. By default, no limit.
encoding (str, default "utf-8") – Set character encoding.

Returns:

If buf is None, returns the result as a string. Otherwise returns None.

Return type:

str or None

See also

to_html: Convert DataFrame to HTML.

Examples

>>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
>>> df = pd.DataFrame(d)
>>> print(df.to_string())
   col1  col2
0     1     4
1     2     5
2     3     6

to_timestamp(freq: Frequency | None = None, how: ToTimestampHow = 'start', axis: Axis = 0, copy: bool | None = None) → DataFrame

Cast to DatetimeIndex of timestamps, at beginning of period.

Parameters:

freq (str, default frequency of PeriodIndex) – Desired frequency.
how ({'s', 'e', 'start', 'end'}) – Convention for converting period to timestamp; start of period vs. end.
axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to convert (the index by default).
copy (bool, default True) –
If False then underlying input data is not copied.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

The DataFrame has a DatetimeIndex.

Return type:

Examples

>>> idx = pd.PeriodIndex(['2023', '2024'], freq='Y')
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d, index=idx)
>>> df1
      col1   col2
2023     1      3
2024     2      4

The resulting timestamps will be at the beginning of the year in this case

>>> df1 = df1.to_timestamp()
>>> df1
            col1   col2
2023-01-01     1      3
2024-01-01     2      4
>>> df1.index
DatetimeIndex(['2023-01-01', '2024-01-01'], dtype='datetime64[ns]', freq=None)

Using freq which is the offset that the Timestamps will have

>>> df2 = pd.DataFrame(data=d, index=idx)
>>> df2 = df2.to_timestamp(freq='M')
>>> df2
            col1   col2
2023-01-31     1      3
2024-01-31     2      4
>>> df2.index
DatetimeIndex(['2023-01-31', '2024-01-31'], dtype='datetime64[ns]', freq=None)

Render a DataFrame to an XML document.

Added in version 1.3.0.

Parameters:

path_or_buffer (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string.
index (bool, default True) – Whether to include index in XML document.
root_name (str, default 'data') – The name of root element in XML document.
row_name (str, default 'row') – The name of row element in XML document.
na_rep (str, optional) – Missing data representation.
attr_cols (list-like, optional) – List of columns to write as attributes in row element. Hierarchical columns will be flattened with underscore delimiting the different levels.
elem_cols (list-like, optional) – List of columns to write as children in row element. By default, all columns output as children of row element. Hierarchical columns will be flattened with underscore delimiting the different levels.
namespaces (dict, optional) –
All namespaces to be defined in root element. Keys of dict should be prefix names and values of dict corresponding URIs. Default namespaces should be given empty string key. For example,
```
namespaces = {"": "https://example.com"}
```
prefix (str, optional) – Namespace prefix to be used for every element and/or attribute in document. This should be one of the keys in namespaces dict.
encoding (str, default 'utf-8') – Encoding of the resulting document.
xml_declaration (bool, default True) – Whether to include the XML declaration at start of document.
pretty_print (bool, default True) – Whether output should be pretty printed with indentation and line breaks.
parser ({'lxml','etree'}, default 'lxml') – Parser module to use for building of tree. Only ‘lxml’ and ‘etree’ are supported. With ‘lxml’, the ability to use XSLT stylesheet is supported.
stylesheet (str, path object or file-like object, optional) – A URL, file-like object, or a raw string containing an XSLT script used to transform the raw XML output. Script should use layout of elements and attributes from original output. This argument requires lxml to be installed. Only XSLT 1.0 scripts and not later versions is currently supported.
compression (str or dict, default 'infer') –
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buffer’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

Added in version 1.5.0: Added support for .tar files.

Changed in version 1.4.0: Zstandard support.
storage_options (dict, optional) –
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Returns:

If io is None, returns the resulting XML format as a string. Otherwise returns None.

Return type:

None or str

See also

to_json: Convert the pandas object to a JSON string.
to_html: Convert DataFrame to a html.

Examples

>>> df = pd.DataFrame({'shape': ['square', 'circle', 'triangle'],
...                    'degrees': [360, 360, 180],
...                    'sides': [4, np.nan, 3]})

>>> df.to_xml()  
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <index>1</index>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides/>
  </row>
  <row>
    <index>2</index>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>

>>> df.to_xml(attr_cols=[
...           'index', 'shape', 'degrees', 'sides'
...           ])  
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row index="0" shape="square" degrees="360" sides="4.0"/>
  <row index="1" shape="circle" degrees="360"/>
  <row index="2" shape="triangle" degrees="180" sides="3.0"/>
</data>

>>> df.to_xml(namespaces={"doc": "https://example.com"},
...           prefix="doc")  
<?xml version='1.0' encoding='utf-8'?>
<doc:data xmlns:doc="https://example.com">
  <doc:row>
    <doc:index>0</doc:index>
    <doc:shape>square</doc:shape>
    <doc:degrees>360</doc:degrees>
    <doc:sides>4.0</doc:sides>
  </doc:row>
  <doc:row>
    <doc:index>1</doc:index>
    <doc:shape>circle</doc:shape>
    <doc:degrees>360</doc:degrees>
    <doc:sides/>
  </doc:row>
  <doc:row>
    <doc:index>2</doc:index>
    <doc:shape>triangle</doc:shape>
    <doc:degrees>180</doc:degrees>
    <doc:sides>3.0</doc:sides>
  </doc:row>
</doc:data>

transform(func: AggFuncType, axis: Axis = 0, *args, **kwargs) → DataFrame

Call func on self producing a DataFrame with the same axis shape as self.

Parameters:

func (function, str, list-like or dict-like) –
Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If func is both list-like and dict-like, dict-like behavior takes precedence.

Accepted combinations are:
- function
- string function name
- list-like of functions and/or function names, e.g. [np.exp, 'sqrt']
- dict-like of axis labels -> functions, function names or list-like of such.
axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.

Returns:

A DataFrame that must have the same length as self.

Return type:

:raises ValueError : If the returned DataFrame has a different length than self.:

See also

DataFrame.agg: Only perform aggregating type operations.
DataFrame.apply: Invoke function on a DataFrame.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Examples

>>> df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df
   A  B
0  0  1
1  1  2
2  2  3
>>> df.transform(lambda x: x + 1)
   A  B
0  1  2
1  2  3
2  3  4

Even though the resulting DataFrame must have the same length as the input DataFrame, it is possible to provide several input functions:

>>> s = pd.Series(range(3))
>>> s
0    0
1    1
2    2
dtype: int64
>>> s.transform([np.sqrt, np.exp])
       sqrt        exp
0  0.000000   1.000000
1  1.000000   2.718282
2  1.414214   7.389056

You can call transform on a GroupBy object:

>>> df = pd.DataFrame({
...     "Date": [
...         "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
...         "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
...     "Data": [5, 8, 6, 1, 50, 100, 60, 120],
... })
>>> df
         Date  Data
0  2015-05-08     5
1  2015-05-07     8
2  2015-05-06     6
3  2015-05-05     1
4  2015-05-08    50
5  2015-05-07   100
6  2015-05-06    60
7  2015-05-05   120
>>> df.groupby('Date')['Data'].transform('sum')
0     55
1    108
2     66
3    121
4     55
5    108
6     66
7    121
Name: Data, dtype: int64

>>> df = pd.DataFrame({
...     "c": [1, 1, 1, 2, 2, 2, 2],
...     "type": ["m", "n", "o", "m", "m", "n", "n"]
... })
>>> df
   c type
0  1    m
1  1    n
2  1    o
3  2    m
4  2    m
5  2    n
6  2    n
>>> df['size'] = df.groupby('c')['type'].transform(len)
>>> df
   c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4

transpose(*args, copy: bool = False) → DataFrame

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

Parameters:

*args (tuple, optional) – Accepted for compatibility with NumPy.
copy (bool, default False) –
Whether to copy the data after transposing, even for DataFrames with a single dtype.

Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

The transposed DataFrame.

Return type:

See also

numpy.transpose: Permute the dimensions of a given array.

Notes

Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.

Examples

Square DataFrame with homogeneous dtype

>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
   col1  col2
0     1     3
1     2     4

>>> df1_transposed = df1.T  # or df1.transpose()
>>> df1_transposed
      0  1
col1  1  2
col2  3  4

When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:

>>> df1.dtypes
col1    int64
col2    int64
dtype: object
>>> df1_transposed.dtypes
0    int64
1    int64
dtype: object

Non-square DataFrame with mixed dtypes

>>> d2 = {'name': ['Alice', 'Bob'],
...       'score': [9.5, 8],
...       'employed': [False, True],
...       'kids': [0, 0]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
    name  score  employed  kids
0  Alice    9.5     False     0
1    Bob    8.0      True     0

>>> df2_transposed = df2.T  # or df2.transpose()
>>> df2_transposed
              0     1
name      Alice   Bob
score       9.5   8.0
employed  False  True
kids          0     0

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:

>>> df2.dtypes
name         object
score       float64
employed       bool
kids          int64
dtype: object
>>> df2_transposed.dtypes
0    object
1    object
dtype: object

truediv(other, axis: Axis = 'columns', level=None, fill_value=None) → DataFrame

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters:

other (scalar, sequence, Series, dict or DataFrame) – Any single or multiple element data structure, or list-like object.
axis ({0 or 'index', 1 or 'columns'}) – Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.
level (int or label) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (float or None, default None) – Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns:

Result of the arithmetic operation.

Return type:

See also

DataFrame.add: Add DataFrames.
DataFrame.sub: Subtract DataFrames.
DataFrame.mul: Multiply DataFrames.
DataFrame.div: Divide DataFrames (float division).
DataFrame.truediv: Divide DataFrames (float division).
DataFrame.floordiv: Divide DataFrames (integer division).
DataFrame.mod: Calculate modulo (remainder after division).
DataFrame.pow: Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

>>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

>>> df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

>>> df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

>>> df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

>>> df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

>>> df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a dictionary by axis.

>>> df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720

>>> df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080

Multiply a DataFrame of different shape with operator version.

>>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
           angles
circle          0
triangle        3
rectangle       4

>>> df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

>>> df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720

>>> df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0

unstack(level: IndexLabel = -1, fill_value=None, sort: bool = True)

Pivot a level of the (necessarily hierarchical) index labels.

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).

Parameters:

level (int, str, or list of these, default -1 (last level)) – Level(s) of index to unstack, can pass level name.
fill_value (int, str or dict) – Replace NaN with this value if the unstack produces missing values.
sort (bool, default True) – Sort the level(s) in the resulting MultiIndex columns.

Return type:

See also

DataFrame.pivot: Pivot a table based on column values.
DataFrame.stack: Pivot a level of the column labels (inverse operation from unstack).

Notes

Reference the user guide for more examples.

Examples

>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1.0
     b   2.0
two  a   3.0
     b   4.0
dtype: float64

>>> s.unstack(level=-1)
     a   b
one  1.0  2.0
two  3.0  4.0

>>> s.unstack(level=0)
   one  two
a  1.0   3.0
b  2.0   4.0

>>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.0
     b  2.0
two  a  3.0
     b  4.0
dtype: float64

update(other, join: UpdateJoin = 'left', overwrite: bool = True, filter_func=None, errors: IgnoreRaise = 'ignore') → None

Modify in place using non-NA values from another DataFrame.

Aligns on indices. There is no return value.

Parameters:

other (DataFrame, or object coercible into a DataFrame) – Should have at least one matching index/column label with the original DataFrame. If a Series is passed, its name attribute must be set, and that will be used as the column name to align with the original DataFrame.
join ({'left'}, default 'left') – Only left join is implemented, keeping the index and columns of the original object.
overwrite (bool, default True) –
How to handle non-NA values for overlapping keys:
- True: overwrite original DataFrame’s values with values from other.
- False: only update values that are NA in the original DataFrame.
filter_func (callable(1d-array) -> bool 1d-array, optional) – Can choose to replace values other than NA. Return True for values that should be updated.
errors ({'raise', 'ignore'}, default 'ignore') – If ‘raise’, will raise a ValueError if the DataFrame and other both contain non-NA data in the same place.

Returns:

This method directly changes calling object.

Return type:

None

Raises:

ValueError –
- When errors=’raise’ and there’s overlapping non-NA data. * When errors is not either ‘ignore’ or ‘raise’
NotImplementedError –
- If join != ‘left’

See also

dict.update: Similar method for dictionaries.
DataFrame.merge: For column(s)-on-column(s) operations.

Examples

>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
...                        'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
   A  B
0  1  4
1  2  5
2  3  6

The DataFrame’s length does not increase as a result of the update, only values at matching index/column labels are updated.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
   A  B
0  a  d
1  b  e
2  c  f

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'f']}, index=[0, 2])
>>> df.update(new_df)
>>> df
   A  B
0  a  d
1  b  y
2  c  f

For Series, its name attribute must be set.

>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e', 'f'], name='B')
>>> df.update(new_column)
>>> df
   A  B
0  a  d
1  b  e
2  c  f

If other contains NaNs the corresponding values are not updated in the original dataframe.

>>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400., 500., 600.]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
   A      B
0  1    4.0
1  2  500.0
2  3    6.0

value_counts(subset: IndexLabel | None = None, normalize: bool = False, sort: bool = True, ascending: bool = False, dropna: bool = True) → Series

Return a Series containing the frequency of each distinct row in the Dataframe.

Parameters:

subset (label or list of labels, optional) – Columns to use when counting unique combinations.
normalize (bool, default False) – Return proportions rather than frequencies.
sort (bool, default True) – Sort by frequencies when True. Sort by DataFrame column values when False.
ascending (bool, default False) – Sort in ascending order.
dropna (bool, default True) –
Don’t include counts of rows that contain NA values.

Added in version 1.3.0.

Return type:

See also

Series.value_counts: Equivalent method on Series.

Notes

The returned Series will have a MultiIndex with one level per input column but an Index (non-multi) for a single label. By default, rows that contain any NA values are omitted from the result. By default, the resulting Series will be in descending order so that the first element is the most frequently-occurring row.

Examples

>>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
...                    'num_wings': [2, 0, 0, 0]},
...                   index=['falcon', 'dog', 'cat', 'ant'])
>>> df
        num_legs  num_wings
falcon         2          2
dog            4          0
cat            4          0
ant            6          0

>>> df.value_counts()
num_legs  num_wings
4         0            2
2         2            1
6         0            1
Name: count, dtype: int64

>>> df.value_counts(sort=False)
num_legs  num_wings
2         2            1
4         0            2
6         0            1
Name: count, dtype: int64

>>> df.value_counts(ascending=True)
num_legs  num_wings
2         2            1
6         0            1
4         0            2
Name: count, dtype: int64

>>> df.value_counts(normalize=True)
num_legs  num_wings
4         0            0.50
2         2            0.25
6         0            0.25
Name: proportion, dtype: float64

With dropna set to False we can also count rows with NA values.

>>> df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'],
...                    'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']})
>>> df
  first_name middle_name
0       John       Smith
1       Anne        <NA>
2       John        <NA>
3       Beth      Louise

>>> df.value_counts()
first_name  middle_name
Beth        Louise         1
John        Smith          1
Name: count, dtype: int64

>>> df.value_counts(dropna=False)
first_name  middle_name
Anne        NaN            1
Beth        Louise         1
John        Smith          1
            NaN            1
Name: count, dtype: int64

>>> df.value_counts("first_name")
first_name
John    2
Anne    1
Beth    1
Name: count, dtype: int64

property values: ndarray

Return a Numpy representation of the DataFrame.

Warning

We recommend using DataFrame.to_numpy() instead.

Only the values in the DataFrame will be returned, the axes labels will be removed.

Returns:: The values of the DataFrame.
Return type:: numpy.ndarray

See also

DataFrame.to_numpy: Recommended alternative to this method.
DataFrame.index: Retrieve the index labels.
DataFrame.columns: Retrieving the column names.

Notes

The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type() convention, mixing int64 and uint64 will result in a float64 dtype.

Examples

A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

>>> df = pd.DataFrame({'age':    [ 3,  29],
...                    'height': [94, 170],
...                    'weight': [31, 115]})
>>> df
   age  height  weight
0    3      94      31
1   29     170     115
>>> df.dtypes
age       int64
height    int64
weight    int64
dtype: object
>>> df.values
array([[  3,  94,  31],
       [ 29, 170, 115]])

A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).

>>> df2 = pd.DataFrame([('parrot',   24.0, 'second'),
...                     ('lion',     80.5, 1),
...                     ('monkey', np.nan, None)],
...                   columns=('name', 'max_speed', 'rank'))
>>> df2.dtypes
name          object
max_speed    float64
rank          object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
       ['lion', 80.5, 1],
       ['monkey', nan, None]], dtype=object)

var(axis: Axis | None = 0, skipna: bool = True, ddof: int = 1, numeric_only: bool = False, **kwargs)

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

axis ({index (0), columns (1)}) –
For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.var with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

Return type:

Series or DataFrame (if level specified)

Examples

>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

>>> df.var()
age       352.916667
height      0.056367
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.var(ddof=0)
age       264.687500
height      0.042275
dtype: float64

class pyranges1.ext.stats.Series(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool | None = None, fastpath: bool | lib.NoDefault = <no_default>)

One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).

Operations between Series (+, -, /, *, **) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.

Parameters:

data (array-like, Iterable, dict, or scalar value) – Contains data stored in Series. If data is a dict, argument order is maintained.
index (array-like or Index (1d)) – Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If data is dict-like and index is None, then the keys in the data are used as the index. If the index is not None, the resulting Series is reindexed with the index values.
dtype (str, numpy.dtype, or ExtensionDtype, optional) – Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.
name (Hashable, default None) – The name to give to the Series.
copy (bool, default False) – Copy input data. Only affects Series or 1d ndarray input. See examples.

Notes

Please reference the User Guide for more information.

Examples

Constructing Series from a dictionary with an Index specified

>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['a', 'b', 'c'])
>>> ser
a   1
b   2
c   3
dtype: int64

The keys of the dictionary match with the Index values, hence the Index values have no effect.

>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['x', 'y', 'z'])
>>> ser
x   NaN
y   NaN
z   NaN
dtype: float64

Note that the Index is first build with the keys from the dictionary. After this the Series is reindexed with the given Index values, hence we get all NaN as a result.

Constructing Series from a list with copy=False.

>>> r = [1, 2]
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
[1, 2]
>>> ser
0    999
1      2
dtype: int64

Due to input data type the Series has a copy of the original data even though copy=False, so the data is unchanged.

Constructing Series from a 1d ndarray with copy=False.

>>> r = np.array([1, 2])
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
array([999,   2])
>>> ser
0    999
1      2
dtype: int64

Due to input data type the Series has a view on the original data, so the data is changed as well.

add(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Addition of series and other, element-wise (binary operator add).

Equivalent to series + other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.radd: Reverse of the Addition operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.add(b, fill_value=0)
a    2.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64

agg(func=None, axis: Axis = 0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Parameters:

func (function, str, list or dict) –
Function to use for aggregating the data. If a function, must either work when passed a Series or when passed to Series.apply.

Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. [np.sum, 'mean']
- dict of axis labels -> functions, function names or list of such.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.

Returns:

The return can be:

scalar : when Series.agg is called with single function
Series : when DataFrame.agg is called with a single function
DataFrame : when DataFrame.agg is called with several functions

Return type:

scalar, Series or DataFrame

See also

Series.apply: Invoke function on a Series.
Series.transform: Transform function producing a Series with like indexes.

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64

>>> s.agg('min')
1

>>> s.agg(['min', 'max'])
min   1
max   4
dtype: int64

aggregate(func=None, axis: Axis = 0, *args, **kwargs)

Aggregate using one or more operations over the specified axis.

Parameters:

func (function, str, list or dict) –
Function to use for aggregating the data. If a function, must either work when passed a Series or when passed to Series.apply.

Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. [np.sum, 'mean']
- dict of axis labels -> functions, function names or list of such.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.

Returns:

The return can be:

scalar : when Series.agg is called with single function
Series : when DataFrame.agg is called with a single function
DataFrame : when DataFrame.agg is called with several functions

Return type:

scalar, Series or DataFrame

See also

Series.apply: Invoke function on a Series.
Series.transform: Transform function producing a Series with like indexes.

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64

>>> s.agg('min')
1

>>> s.agg(['min', 'max'])
min   1
max   4
dtype: int64

all(axis: Axis = 0, bool_only: bool = False, skipna: bool = True, **kwargs) → bool

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters:

axis ({0 or 'index', 1 or 'columns', None}, default 0) –
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.
- 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
- 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
- None : reduce all axes, return a scalar.
bool_only (bool, default False) – Include only boolean columns. Not implemented for Series.
skipna (bool, default True) – Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
**kwargs (any, default None) – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

If level is specified, then, Series is returned; otherwise, scalar is returned.

Return type:

scalar or Series

See also

Series.all: Return True if all elements are True.
DataFrame.any: Return True if one (or more) elements are True.

Examples

Series

>>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([], dtype="float64").all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True

DataFrames

Create a dataframe from a dictionary.

>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
   col1   col2
0  True   True
1  True  False

Default behaviour checks if values in each column all return True.

>>> df.all()
col1     True
col2    False
dtype: bool

Specify axis='columns' to check if values in each row all return True.

>>> df.all(axis='columns')
0     True
1    False
dtype: bool

Or axis=None for whether every value is True.

>>> df.all(axis=None)
False

any(*, axis: Axis = 0, bool_only: bool = False, skipna: bool = True, **kwargs) → bool

Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters:

axis ({0 or 'index', 1 or 'columns', None}, default 0) –
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.
- 0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.
- 1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.
- None : reduce all axes, return a scalar.
bool_only (bool, default False) – Include only boolean columns. Not implemented for Series.
skipna (bool, default True) – Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
**kwargs (any, default None) – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

If level is specified, then, Series is returned; otherwise, scalar is returned.

Return type:

scalar or Series

See also

numpy.any: Numpy version of this method.
Series.any: Return whether any element is True.
Series.all: Return whether all elements are True.
DataFrame.any: Return whether any element is True over requested axis.
DataFrame.all: Return whether all elements are True over requested axis.

Examples

Series

For Series input, the output is a scalar indicating whether any element is True.

>>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([], dtype="float64").any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True

DataFrame

Whether each column contains at least one True element (the default).

>>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
   A  B  C
0  1  0  0
1  2  2  0

>>> df.any()
A     True
B     True
C    False
dtype: bool

Aggregating over the columns.

>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
       A  B
0   True  1
1  False  2

>>> df.any(axis='columns')
0    True
1    True
dtype: bool

>>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
       A  B
0   True  1
1  False  0

>>> df.any(axis='columns')
0    True
1    False
dtype: bool

Aggregating over the entire DataFrame with axis=None.

>>> df.any(axis=None)
True

any for an empty DataFrame is an empty Series.

>>> pd.DataFrame([]).any()
Series([], dtype: bool)

apply(func: AggFuncType, convert_dtype: bool | lib.NoDefault = <no_default>, args: tuple[Any, ...] = (), *, by_row: Literal[False, 'compat'] = 'compat', **kwargs) → DataFrame | Series

Invoke function on values of Series.

Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

Parameters:

func (function) – Python function or NumPy ufunc to apply.
convert_dtype (bool, default True) –
Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.

Deprecated since version 2.1.0: convert_dtype has been deprecated. Do ser.astype(object).apply() instead if you want convert_dtype=False.
args (tuple) – Positional arguments passed to func after the series value.
by_row (False or "compat", default "compat") –
If "compat" and func is a callable, func will be passed each element of the Series, like Series.map. If func is a list or dict of callables, will first try to translate each func into pandas methods. If that doesn’t work, will try call to apply again with by_row="compat" and if that fails, will call apply again with by_row=False (backward compatible). If False, the func will be passed the whole Series at once.

by_row has no effect when func is a string.

Added in version 2.1.0.
**kwargs – Additional keyword arguments passed to func.

Returns:

If func returns a Series object the result will be a DataFrame.

Return type:

See also

Series.map: For element-wise operations.
Series.agg: Only perform aggregating type operations.
Series.transform: Only perform transforming type operations.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Examples

Create a series with typical summer temperatures for each city.

>>> s = pd.Series([20, 21, 12],
...               index=['London', 'New York', 'Helsinki'])
>>> s
London      20
New York    21
Helsinki    12
dtype: int64

Square the values by defining a function and passing it as an argument to apply().

>>> def square(x):
...     return x ** 2
>>> s.apply(square)
London      400
New York    441
Helsinki    144
dtype: int64

Square the values by passing an anonymous function as an argument to apply().

>>> s.apply(lambda x: x ** 2)
London      400
New York    441
Helsinki    144
dtype: int64

Define a custom function that needs additional positional arguments and pass these additional arguments using the args keyword.

>>> def subtract_custom_value(x, custom_value):
...     return x - custom_value

>>> s.apply(subtract_custom_value, args=(5,))
London      15
New York    16
Helsinki     7
dtype: int64

Define a custom function that takes keyword arguments and pass these arguments to apply.

>>> def add_custom_values(x, **kwargs):
...     for month in kwargs:
...         x += kwargs[month]
...     return x

>>> s.apply(add_custom_values, june=30, july=20, august=25)
London      95
New York    96
Helsinki    87
dtype: int64

Use a function from the Numpy library.

>>> s.apply(np.log)
London      2.995732
New York    3.044522
Helsinki    2.484907
dtype: float64

argsort(axis: Axis = 0, kind: SortKind = 'quicksort', order: None = None, stable: None = None) → Series

Return the integer indices that would sort the Series values.

Override ndarray.argsort. Argsorts the value, omitting NA/null values, and places the result in the same locations as the non-NA values.

Parameters:

axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
kind ({'mergesort', 'quicksort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm. See numpy.sort() for more information. ‘mergesort’ and ‘stable’ are the only stable algorithms.
order (None) – Has no effect but is accepted for compatibility with numpy.
stable (None) – Has no effect but is accepted for compatibility with numpy.

Returns:

Positions of values within the sort order with -1 indicating nan values.

Return type:

Series[np.intp]

See also

numpy.ndarray.argsort: Returns the indices that would sort this array.

Examples

>>> s = pd.Series([3, 2, 1])
>>> s.argsort()
0    2
1    1
2    0
dtype: int64

property array: ExtensionArray

The ExtensionArray of the data backing this Series or Index.

Returns:

An ExtensionArray of the values stored within. For extension types, this is the actual array. For NumPy native types, this is a thin (no copy) wrapper around numpy.ndarray.

.array differs from .values, which may require converting the data to a different form.

Return type:

ExtensionArray

See also

Index.to_numpy: Similar method that always returns a NumPy array.
Series.to_numpy: Similar method that always returns a NumPy array.

Notes

This table lays out the different array types for each extension dtype within pandas.

dtype	array type
category	Categorical
period	PeriodArray
interval	IntervalArray
IntegerNA	IntegerArray
string	StringArray
boolean	BooleanArray
datetime64[ns, tz]	DatetimeArray

For any 3rd-party extension types, the array type will be an ExtensionArray.

For all remaining dtypes .array will be a arrays.NumpyExtensionArray wrapping the actual ndarray stored within. If you absolutely need a NumPy array (possibly with copying / coercing data), then use Series.to_numpy() instead.

Examples

For regular NumPy types like int, and float, a NumpyExtensionArray is returned.

>>> pd.Series([1, 2, 3]).array
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64

For extension types, like Categorical, the actual ExtensionArray is returned

>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']

autocorr(lag: int = 1) → float

Compute the lag-N autocorrelation.

This method computes the Pearson correlation between the Series and its shifted self.

Parameters:: lag (int, default 1) – Number of lags to apply before performing autocorrelation.
Returns:: The Pearson correlation between self and self.shift(lag).
Return type:: float

See also

Series.corr: Compute the correlation between two Series.
Series.shift: Shift index by desired number of periods.
DataFrame.corr: Compute pairwise correlation of columns.
DataFrame.corrwith: Compute pairwise correlation between rows or columns of two DataFrame objects.

Notes

If the Pearson correlation is not well defined return ‘NaN’.

Examples

>>> s = pd.Series([0.25, 0.5, 0.2, -0.05])
>>> s.autocorr()  
0.10355...
>>> s.autocorr(lag=2)  
-0.99999...

If the Pearson correlation is not well defined, then ‘NaN’ is returned.

>>> s = pd.Series([1, 0, 0, 0])
>>> s.autocorr()
nan

property axes: list[Index]: Return a list of the row axis labels.

between(left, right, inclusive: Literal['both', 'neither', 'left', 'right'] = 'both') → Series

Return boolean Series equivalent to left <= series <= right.

This function returns a boolean vector containing True wherever the corresponding Series element is between the boundary values left and right. NA values are treated as False.

Parameters:

left (scalar or list-like) – Left boundary.
right (scalar or list-like) – Right boundary.
inclusive ({"both", "neither", "left", "right"}) –
Include boundaries. Whether to set each bound as closed or open.

Changed in version 1.3.0.

Returns:

Series representing whether each element is between left and right (inclusive).

Return type:

See also

Series.gt: Greater than of series and other.
Series.lt: Less than of series and other.

Notes

This function is equivalent to (left <= ser) & (ser <= right)

Examples

>>> s = pd.Series([2, 0, 4, 8, np.nan])

Boundary values are included by default:

>>> s.between(1, 4)
   True
  False
   True
  False
  False
dtype: bool

With inclusive set to "neither" boundary values are excluded:

>>> s.between(1, 4, inclusive="neither")
   True
  False
  False
  False
  False
dtype: bool

left and right can be any scalar value:

>>> s = pd.Series(['Alice', 'Bob', 'Carol', 'Eve'])
>>> s.between('Anna', 'Daniel')
0    False
1     True
2     True
3    False
dtype: bool

Replace values where the conditions are True.

Parameters:

caselist (A list of tuples of conditions and expected replacements) –

Takes the form: (condition0, replacement0), (condition1, replacement1), … . condition should be a 1-D boolean array-like object or a callable. If condition is a callable, it is computed on the Series and should return a boolean Series or array. The callable must not change the input Series (though pandas doesn`t check it). replacement should be a 1-D array-like object, a scalar or a callable. If replacement is a callable, it is computed on the Series and should return a scalar or Series. The callable must not change the input Series (though pandas doesn`t check it).

Added in version 2.2.0.

Return type:

See also

Series.mask: Replace values where the condition is True.

Examples

>>> c = pd.Series([6, 7, 8, 9], name='c')
>>> a = pd.Series([0, 0, 1, 2])
>>> b = pd.Series([0, 3, 4, 5])

>>> c.case_when(caselist=[(a.gt(0), a),  # condition, replacement
...                       (b.gt(0), b)])
0    6
1    3
2    1
3    2
Name: c, dtype: int64

cat: alias of CategoricalAccessor

combine(other: Series | Hashable, func: Callable[[Hashable, Hashable], Hashable], fill_value: Hashable | None = None) → Series

Combine the Series with a Series or scalar according to func.

Combine the Series and other using func to perform elementwise selection for combined Series. fill_value is assumed when value is missing at some index from one of the two objects being combined.

Parameters:

other (Series or scalar) – The value(s) to be combined with the Series.
func (function) – Function that takes two scalars as inputs and returns an element.
fill_value (scalar, optional) – The value to assume when an index is missing from one Series or the other. The default specifies to use the appropriate NaN value for the underlying dtype of the Series.

Returns:

The result of combining the Series with the other object.

Return type:

See also

Series.combine_first: Combine Series values, choosing the calling Series’ values first.

Examples

Consider 2 Datasets s1 and s2 containing highest clocked speeds of different birds.

>>> s1 = pd.Series({'falcon': 330.0, 'eagle': 160.0})
>>> s1
falcon    330.0
eagle     160.0
dtype: float64
>>> s2 = pd.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0})
>>> s2
falcon    345.0
eagle     200.0
duck       30.0
dtype: float64

Now, to combine the two datasets and view the highest speeds of the birds across the two datasets

>>> s1.combine(s2, max)
duck        NaN
eagle     200.0
falcon    345.0
dtype: float64

In the previous example, the resulting value for duck is missing, because the maximum of a NaN and a float is a NaN. So, in the example, we set fill_value=0, so the maximum value returned will be the value from some dataset.

>>> s1.combine(s2, max, fill_value=0)
duck       30.0
eagle     200.0
falcon    345.0
dtype: float64

combine_first(other) → Series

Update null elements with value in the same location in ‘other’.

Combine two Series objects by filling null values in one Series with non-null values from the other Series. Result index will be the union of the two indexes.

Parameters:: other (Series) – The value(s) to be used for filling null values.
Returns:: The result of combining the provided Series with the other object.
Return type:: Series

See also

Series.combine: Perform element-wise operation on two Series using a given function.

Examples

>>> s1 = pd.Series([1, np.nan])
>>> s2 = pd.Series([3, 4, 5])
>>> s1.combine_first(s2)
0    1.0
1    4.0
2    5.0
dtype: float64

Null values still persist if the location of that null value does not exist in other

>>> s1 = pd.Series({'falcon': np.nan, 'eagle': 160.0})
>>> s2 = pd.Series({'eagle': 200.0, 'duck': 30.0})
>>> s1.combine_first(s2)
duck       30.0
eagle     160.0
falcon      NaN
dtype: float64

compare(other: Series, align_axis: Axis = 1, keep_shape: bool = False, keep_equal: bool = False, result_names: Suffixes = ('self', 'other')) → DataFrame | Series

Compare to another Series and show the differences.

Parameters:

other (Series) – Object to compare with.
align_axis ({0 or 'index', 1 or 'columns'}, default 1) –
Determine which axis to align the comparison on.
- 0, or ‘index’Resulting differences are stacked vertically
  with rows drawn alternately from self and other.
- 1, or ‘columns’Resulting differences are aligned horizontally
  with columns drawn alternately from self and other.
keep_shape (bool, default False) – If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.
keep_equal (bool, default False) – If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.
result_names (tuple, default ('self', 'other')) –
Set the dataframes names in the comparison.

Added in version 1.5.0.

Returns:

If axis is 0 or ‘index’ the result will be a Series. The resulting index will be a MultiIndex with ‘self’ and ‘other’ stacked alternately at the inner level.

If axis is 1 or ‘columns’ the result will be a DataFrame. It will have two columns namely ‘self’ and ‘other’.

Return type:

See also

DataFrame.compare: Compare with another DataFrame and show differences.

Notes

Matching NaNs will not appear as a difference.

Examples

>>> s1 = pd.Series(["a", "b", "c", "d", "e"])
>>> s2 = pd.Series(["a", "a", "c", "b", "e"])

Align the differences on columns

>>> s1.compare(s2)
  self other
1    b     a
3    d     b

Stack the differences on indices

>>> s1.compare(s2, align_axis=0)
1  self     b
   other    a
3  self     d
   other    b
dtype: object

Keep all original rows

>>> s1.compare(s2, keep_shape=True)
  self other
NaN   NaN
  b     a
NaN   NaN
  d     b
NaN   NaN

Keep all original rows and also all original values

>>> s1.compare(s2, keep_shape=True, keep_equal=True)
  self other
  a     a
  b     a
  c     c
  d     b
  e     e

corr(other: Series, method: CorrelationMethod = 'pearson', min_periods: int | None = None) → float

Compute correlation with other Series, excluding missing values.

The two Series objects are not required to be the same length and will be aligned internally before the correlation function is applied.

Parameters:

other (Series) – Series with which to compute the correlation.
method ({'pearson', 'kendall', 'spearman'} or callable) –
Method used to compute correlation:
- pearson : Standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- callable: Callable with input two 1d ndarrays and returning a float.
Warning

Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.
min_periods (int, optional) – Minimum number of observations needed to have a valid result.

Returns:

Correlation with other.

Return type:

float

See also

DataFrame.corr: Compute pairwise correlation between columns.
DataFrame.corrwith: Compute pairwise correlation with another DataFrame or Series.

Notes

Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.

Automatic data alignment: as with all pandas operations, automatic data alignment is performed for this method. corr() automatically considers values with matching indices.

Examples

>>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> s1 = pd.Series([.2, .0, .6, .2])
>>> s2 = pd.Series([.3, .6, .0, .1])
>>> s1.corr(s2, method=histogram_intersection)
0.3

Pandas auto-aligns the values with matching indices

>>> s1 = pd.Series([1, 2, 3], index=[0, 1, 2])
>>> s2 = pd.Series([1, 2, 3], index=[2, 1, 0])
>>> s1.corr(s2)
-1.0

count() → int

Return number of non-NA/null observations in the Series.

Returns:: Number of non-null values in the Series.
Return type:: int

See also

DataFrame.count: Count non-NA cells for each column or row.

Examples

>>> s = pd.Series([0.0, 1.0, np.nan])
>>> s.count()
2

cov(other: Series, min_periods: int | None = None, ddof: int | None = 1) → float

Compute covariance with Series, excluding missing values.

The two Series objects are not required to be the same length and will be aligned internally before the covariance is calculated.

Parameters:

other (Series) – Series with which to compute the covariance.
min_periods (int, optional) – Minimum number of observations needed to have a valid result.
ddof (int, default 1) – Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns:

Covariance between Series and other normalized by N-1 (unbiased estimator).

Return type:

float

See also

DataFrame.cov: Compute pairwise covariance of columns.

Examples

>>> s1 = pd.Series([0.90010907, 0.13484424, 0.62036035])
>>> s2 = pd.Series([0.12528585, 0.26962463, 0.51111198])
>>> s1.cov(s2)
-0.01685762652715874

cummax(axis: Axis | None = None, skipna: bool = True, *args, **kwargs)

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args – Additional keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative maximum of scalar or Series.

Return type:

scalar or Series

See also

core.window.expanding.Expanding.max: Similar functionality but ignores NaN values.
Series.max: Return the maximum over Series axis.
Series.cummax: Return cumulative maximum over Series axis.
Series.cummin: Return cumulative minimum over Series axis.
Series.cumsum: Return cumulative sum over Series axis.
Series.cumprod: Return cumulative product over Series axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummax()
  2.0
  NaN
  5.0
  5.0
  5.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummax(skipna=False)
  2.0
  NaN
  NaN
  NaN
  NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row, use axis=1

>>> df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0

cummin(axis: Axis | None = None, skipna: bool = True, *args, **kwargs)

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args – Additional keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative minimum of scalar or Series.

Return type:

scalar or Series

See also

core.window.expanding.Expanding.min: Similar functionality but ignores NaN values.
Series.min: Return the minimum over Series axis.
Series.cummax: Return cumulative maximum over Series axis.
Series.cummin: Return cumulative minimum over Series axis.
Series.cumsum: Return cumulative sum over Series axis.
Series.cumprod: Return cumulative product over Series axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummin()
  2.0
  NaN
  2.0
 -1.0
 -1.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cummin(skipna=False)
  2.0
  NaN
  NaN
  NaN
  NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row, use axis=1

>>> df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

cumprod(axis: Axis | None = None, skipna: bool = True, *args, **kwargs)

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args – Additional keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative product of scalar or Series.

Return type:

scalar or Series

See also

core.window.expanding.Expanding.prod: Similar functionality but ignores NaN values.
Series.prod: Return the product over Series axis.
Series.cummax: Return cumulative maximum over Series axis.
Series.cummin: Return cumulative minimum over Series axis.
Series.cumsum: Return cumulative sum over Series axis.
Series.cumprod: Return cumulative product over Series axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumprod()
   2.0
   NaN
  10.0
 -10.0
  -0.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumprod(skipna=False)
  2.0
  NaN
  NaN
  NaN
  NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumprod()
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row, use axis=1

>>> df.cumprod(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0

cumsum(axis: Axis | None = None, skipna: bool = True, *args, **kwargs)

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

Parameters:

axis ({0 or 'index', 1 or 'columns'}, default 0) – The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
*args – Additional keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Return cumulative sum of scalar or Series.

Return type:

scalar or Series

See also

core.window.expanding.Expanding.sum: Similar functionality but ignores NaN values.
Series.sum: Return the sum over Series axis.
Series.cummax: Return cumulative maximum over Series axis.
Series.cummin: Return cumulative minimum over Series axis.
Series.cumsum: Return cumulative sum over Series axis.
Series.cumprod: Return cumulative product over Series axis.

Examples

Series

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumsum()
  2.0
  NaN
  7.0
  6.0
  6.0
dtype: float64

To include NA values in the operation, use skipna=False

>>> s.cumsum(skipna=False)
  2.0
  NaN
  NaN
  NaN
  NaN
dtype: float64

DataFrame

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                   columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.

>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row, use axis=1

>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0

diff(periods: int = 1) → Series

First discrete difference of element.

Calculates the difference of a Series element compared with another element in the Series (default is element in previous row).

Parameters:: periods (int, default 1) – Periods to shift for calculating difference, accepts negative values.
Returns:: First differences of the Series.
Return type:: Series

See also

Series.pct_change: Percent change over given number of periods.
Series.shift: Shift index by desired number of periods with an optional time freq.
DataFrame.diff: First discrete difference of object.

Notes

For boolean dtypes, this uses operator.xor() rather than operator.sub(). The result is calculated according to current dtype in Series, however dtype of the result is always float64.

Examples

Difference with previous row

>>> s = pd.Series([1, 1, 2, 3, 5, 8])
>>> s.diff()
0    NaN
1    0.0
2    1.0
3    1.0
4    2.0
5    3.0
dtype: float64

Difference with 3rd previous row

>>> s.diff(periods=3)
  NaN
  NaN
  NaN
  2.0
  4.0
  6.0
dtype: float64

Difference with following row

>>> s.diff(periods=-1)
  0.0
 -1.0
 -1.0
 -2.0
 -3.0
  NaN
dtype: float64

Overflow in input dtype

>>> s = pd.Series([1, 0], dtype=np.uint8)
>>> s.diff()
0      NaN
1    255.0
dtype: float64

div(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Floating division of series and other, element-wise (binary operator truediv).

Equivalent to series / other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rtruediv: Reverse of the Floating division operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.divide(b, fill_value=0)
a    1.0
b    inf
c    inf
d    0.0
e    NaN
dtype: float64

divide(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Floating division of series and other, element-wise (binary operator truediv).

Equivalent to series / other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rtruediv: Reverse of the Floating division operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.divide(b, fill_value=0)
a    1.0
b    inf
c    inf
d    0.0
e    NaN
dtype: float64

divmod(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Integer division and modulo of series and other, element-wise (binary operator divmod).

Equivalent to divmod(series, other), but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

2-Tuple of Series

See also

Series.rdivmod: Reverse of the Integer division and modulo operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.divmod(b, fill_value=0)
(a    1.0
 b    inf
 c    inf
 d    0.0
 e    NaN
 dtype: float64,
 a    0.0
 b    NaN
 c    NaN
 d    0.0
 e    NaN
 dtype: float64)

dot(other: AnyArrayLike) → Series | np.ndarray

Compute the dot product between the Series and the columns of other.

This method computes the dot product between the Series and another one, or the Series and each columns of a DataFrame, or the Series and each columns of an array.

It can also be called using self @ other.

Parameters:: other (Series, DataFrame or array-like) – The other object to compute the dot product with its columns.
Returns:: Return the dot product of the Series and other if other is a Series, the Series of the dot product of Series and each rows of other if other is a DataFrame or a numpy.ndarray between the Series and each columns of the numpy array.
Return type:: scalar, Series or numpy.ndarray

See also

DataFrame.dot: Compute the matrix product with the DataFrame.
Series.mul: Multiplication of series and other, element-wise.

Notes

The Series and other has to share the same index if other is a Series or a DataFrame.

Examples

>>> s = pd.Series([0, 1, 2, 3])
>>> other = pd.Series([-1, 2, -3, 4])
>>> s.dot(other)
8
>>> s @ other
8
>>> df = pd.DataFrame([[0, 1], [-2, 3], [4, -5], [6, 7]])
>>> s.dot(df)
0    24
1    14
dtype: int64
>>> arr = np.array([[0, 1], [-2, 3], [4, -5], [6, 7]])
>>> s.dot(arr)
array([24, 14])

drop(labels: IndexLabel = None, *, axis: Axis = 0, index: IndexLabel = None, columns: IndexLabel = None, level: Level | None = None, inplace: Literal[True], errors: IgnoreRaise = 'raise') → None

drop(labels: IndexLabel = None, *, axis: Axis = 0, index: IndexLabel = None, columns: IndexLabel = None, level: Level | None = None, inplace: Literal[False] = False, errors: IgnoreRaise = 'raise') → Series

drop(labels: IndexLabel = None, *, axis: Axis = 0, index: IndexLabel = None, columns: IndexLabel = None, level: Level | None = None, inplace: bool = False, errors: IgnoreRaise = 'raise') → Series | None

Return Series with specified index labels removed.

Remove elements of a Series based on specifying the index labels. When using a multi-index, labels on different levels can be removed by specifying the level.

Parameters:

labels (single label or list-like) – Index labels to drop.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
index (single label or list-like) – Redundant for application on Series, but ‘index’ can be used instead of ‘labels’.
columns (single label or list-like) – No change is made to the Series; use ‘index’ or ‘labels’ instead.
level (int or level name, optional) – For MultiIndex, level for which the labels will be removed.
inplace (bool, default False) – If True, do operation inplace and return None.
errors ({'ignore', 'raise'}, default 'raise') – If ‘ignore’, suppress error and only existing labels are dropped.

Returns:

Series with specified index labels removed or None if inplace=True.

Return type:

Series or None

Raises:

KeyError – If none of the labels are found in the index.

See also

Series.reindex: Return only specified index labels of Series.
Series.dropna: Return series without null values.
Series.drop_duplicates: Return Series with duplicate values removed.
DataFrame.drop: Drop specified labels from rows or columns.

Examples

>>> s = pd.Series(data=np.arange(3), index=['A', 'B', 'C'])
>>> s
A  0
B  1
C  2
dtype: int64

Drop labels B en C

>>> s.drop(labels=['B', 'C'])
A  0
dtype: int64

Drop 2nd level label in MultiIndex Series

>>> midx = pd.MultiIndex(levels=[['llama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> s = pd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
...               index=midx)
>>> s
llama   speed      45.0
        weight    200.0
        length      1.2
cow     speed      30.0
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64

>>> s.drop(labels='weight', level=1)
llama   speed      45.0
        length      1.2
cow     speed      30.0
        length      1.5
falcon  speed     320.0
        length      0.3
dtype: float64

drop_duplicates(*, keep: DropKeep = 'first', inplace: Literal[False] = False, ignore_index: bool = False) → Series

drop_duplicates(*, keep: DropKeep = 'first', inplace: Literal[True], ignore_index: bool = False) → None

drop_duplicates(*, keep: DropKeep = 'first', inplace: bool = False, ignore_index: bool = False) → Series | None

Return Series with duplicate values removed.

Parameters:

keep ({‘first’, ‘last’, False}, default ‘first’) –
Method to handle dropping duplicates:
- ’first’ : Drop duplicates except for the first occurrence.
- ’last’ : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
inplace (bool, default False) – If True, performs operation inplace and returns None.
ignore_index (bool, default False) –
If True, the resulting axis will be labeled 0, 1, …, n - 1.

Added in version 2.0.0.

Returns:

Series with duplicates dropped or None if inplace=True.

Return type:

Series or None

See also

Index.drop_duplicates: Equivalent method on Index.
DataFrame.drop_duplicates: Equivalent method on DataFrame.
Series.duplicated: Related method on Series, indicating duplicate Series values.
Series.unique: Return unique values as an array.

Examples

Generate a Series with duplicated entries.

>>> s = pd.Series(['llama', 'cow', 'llama', 'beetle', 'llama', 'hippo'],
...               name='animal')
>>> s
0     llama
1       cow
2     llama
3    beetle
4     llama
5     hippo
Name: animal, dtype: object

With the ‘keep’ parameter, the selection behaviour of duplicated values can be changed. The value ‘first’ keeps the first occurrence for each set of duplicated entries. The default value of keep is ‘first’.

>>> s.drop_duplicates()
0     llama
1       cow
3    beetle
5     hippo
Name: animal, dtype: object

The value ‘last’ for parameter ‘keep’ keeps the last occurrence for each set of duplicated entries.

>>> s.drop_duplicates(keep='last')
1       cow
3    beetle
4     llama
5     hippo
Name: animal, dtype: object

The value False for parameter ‘keep’ discards all sets of duplicated entries.

>>> s.drop_duplicates(keep=False)
1       cow
3    beetle
5     hippo
Name: animal, dtype: object

dropna(*, axis: Axis = 0, inplace: Literal[False] = False, how: AnyAll | None = None, ignore_index: bool = False) → Series

dropna(*, axis: Axis = 0, inplace: Literal[True], how: AnyAll | None = None, ignore_index: bool = False) → None

Return a new Series with missing values removed.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:

axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
inplace (bool, default False) – If True, do operation inplace and return None.
how (str, optional) – Not in use. Kept for compatibility.
ignore_index (bool, default False) –
If True, the resulting axis will be labeled 0, 1, …, n - 1.

Added in version 2.0.0.

Returns:

Series with NA entries dropped from it or None if inplace=True.

Return type:

Series or None

See also

Series.isna: Indicate missing values.
Series.notna: Indicate existing (non-missing) values.
Series.fillna: Replace missing values.
DataFrame.dropna: Drop rows or columns which contain NA values.
Index.dropna: Drop missing indices.

Examples

>>> ser = pd.Series([1., 2., np.nan])
>>> ser
0    1.0
1    2.0
2    NaN
dtype: float64

Drop NA values from a Series.

>>> ser.dropna()
0    1.0
1    2.0
dtype: float64

Empty strings are not considered NA values. None is considered an NA value.

>>> ser = pd.Series([np.nan, 2, pd.NaT, '', None, 'I stay'])
>>> ser
0       NaN
1         2
2       NaT
3
4      None
5    I stay
dtype: object
>>> ser.dropna()
1         2
3
5    I stay
dtype: object

dt: alias of CombinedDatetimelikeProperties

property dtype: DtypeObj

Return the dtype object of the underlying data.

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.dtype
dtype('int64')

property dtypes: DtypeObj

Return the dtype object of the underlying data.

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.dtypes
dtype('int64')

duplicated(keep: DropKeep = 'first') → Series

Indicate duplicate Series values.

Duplicated values are indicated as True values in the resulting Series. Either all duplicates, all except the first or all except the last occurrence of duplicates can be indicated.

Parameters:

keep ({'first', 'last', False}, default 'first') –

Method to handle dropping duplicates:

’first’ : Mark duplicates as True except for the first occurrence.
’last’ : Mark duplicates as True except for the last occurrence.
False : Mark all duplicates as True.

Returns:

Series indicating whether each value has occurred in the preceding values.

Return type:

Series[bool]

See also

Index.duplicated: Equivalent method on pandas.Index.
DataFrame.duplicated: Equivalent method on pandas.DataFrame.
Series.drop_duplicates: Remove duplicate values from Series.

Examples

By default, for each set of duplicated values, the first occurrence is set on False and all others on True:

>>> animals = pd.Series(['llama', 'cow', 'llama', 'beetle', 'llama'])
>>> animals.duplicated()
0    False
1    False
2     True
3    False
4     True
dtype: bool

which is equivalent to

>>> animals.duplicated(keep='first')
  False
  False
   True
  False
   True
dtype: bool

By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True:

>>> animals.duplicated(keep='last')
   True
  False
   True
  False
  False
dtype: bool

By setting keep on False, all duplicates are True:

>>> animals.duplicated(keep=False)
   True
  False
   True
  False
   True
dtype: bool

eq(other, level: Level | None = None, fill_value: float | None = None, axis: Axis = 0) → Series

Return Equal to of series and other, element-wise (binary operator eq).

Equivalent to series == other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.eq(b, fill_value=0)
a     True
b    False
c    False
d    False
e    False
dtype: bool

explode(ignore_index: bool = False) → Series

Transform each element of a list-like to a row.

Parameters:: ignore_index (bool, default False) – If True, the resulting index will be labeled 0, 1, …, n - 1.
Returns:: Exploded lists to rows; index will be duplicated for these rows.
Return type:: Series

See also

Series.str.split: Split string values on specified separator.
Series.unstack: Unstack, a.k.a. pivot, Series with MultiIndex to produce DataFrame.
DataFrame.melt: Unpivot a DataFrame from wide format to long format.
DataFrame.explode: Explode a DataFrame from list-like columns to long format.

Notes

This routine will explode list-likes including lists, tuples, sets, Series, and np.ndarray. The result dtype of the subset rows will be object. Scalars will be returned unchanged, and empty list-likes will result in a np.nan for that row. In addition, the ordering of elements in the output will be non-deterministic when exploding sets.

Reference the user guide for more examples.

Examples

>>> s = pd.Series([[1, 2, 3], 'foo', [], [3, 4]])
>>> s
0    [1, 2, 3]
1          foo
2           []
3       [3, 4]
dtype: object

>>> s.explode()
    1
    2
    3
  foo
  NaN
    3
    4
dtype: object

floordiv(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Integer division of series and other, element-wise (binary operator floordiv).

Equivalent to series // other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rfloordiv: Reverse of the Integer division operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.floordiv(b, fill_value=0)
a    1.0
b    inf
c    inf
d    0.0
e    NaN
dtype: float64

ge(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Greater than or equal to of series and other, element-wise (binary operator ge).

Equivalent to series >= other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

Examples

>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
e    1.0
dtype: float64
>>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
>>> b
a    0.0
b    1.0
c    2.0
d    NaN
f    1.0
dtype: float64
>>> a.ge(b, fill_value=0)
a     True
b     True
c    False
d    False
e     True
f    False
dtype: bool

groupby(by=None, axis: Axis = 0, level: IndexLabel | None = None, as_index: bool = True, sort: bool = True, group_keys: bool = True, observed: bool | lib.NoDefault = <no_default>, dropna: bool = True) → SeriesGroupBy

Group Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:

by (mapping, function, label, pd.Grouper or list of such) –
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.
axis ({0 or 'index', 1 or 'columns'}, default 0) –
Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead.
level (int, level name, or sequence of such, default None) – If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.
as_index (bool, default True) –
Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).
sort (bool, default True) –
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.
group_keys (bool, default True) –
When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed Series or DataFrame. Specify group_keys explicitly to include the group keys or not.

Changed in version 2.0.0: group_keys now defaults to True.
observed (bool, default False) –
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.
dropna (bool, default True) – If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:

Returns a groupby object that contains information about the groups.

Return type:

pandas.api.typing.SeriesGroupBy

See also

resample: Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

Examples

>>> ser = pd.Series([390., 350., 30., 20.],
...                 index=['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...                 name="Max Speed")
>>> ser
Falcon    390.0
Falcon    350.0
Parrot     30.0
Parrot     20.0
Name: Max Speed, dtype: float64
>>> ser.groupby(["a", "b", "a", "b"]).mean()
a    210.0
b    185.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level=0).mean()
Falcon    370.0
Parrot     25.0
Name: Max Speed, dtype: float64
>>> ser.groupby(ser > 100).mean()
Max Speed
False     25.0
True     370.0
Name: Max Speed, dtype: float64

Grouping by Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> ser = pd.Series([390., 350., 30., 20.], index=index, name="Max Speed")
>>> ser
Animal  Type
Falcon  Captive    390.0
        Wild       350.0
Parrot  Captive     30.0
        Wild        20.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level=0).mean()
Animal
Falcon    370.0
Parrot     25.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level="Type").mean()
Type
Captive    210.0
Wild       185.0
Name: Max Speed, dtype: float64

We can also choose to include NA in group keys or not by defining dropna parameter, the default setting is True.

>>> ser = pd.Series([1, 2, 3, 3], index=["a", 'a', 'b', np.nan])
>>> ser.groupby(level=0).sum()
a    3
b    3
dtype: int64

>>> ser.groupby(level=0, dropna=False).sum()
a    3
b    3
NaN  3
dtype: int64

>>> arrays = ['Falcon', 'Falcon', 'Parrot', 'Parrot']
>>> ser = pd.Series([390., 350., 30., 20.], index=arrays, name="Max Speed")
>>> ser.groupby(["a", "b", "a", np.nan]).mean()
a    210.0
b    350.0
Name: Max Speed, dtype: float64

>>> ser.groupby(["a", "b", "a", np.nan], dropna=False).mean()
a    210.0
b    350.0
NaN   20.0
Name: Max Speed, dtype: float64

gt(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Greater than of series and other, element-wise (binary operator gt).

Equivalent to series > other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

Examples

>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
e    1.0
dtype: float64
>>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
>>> b
a    0.0
b    1.0
c    2.0
d    NaN
f    1.0
dtype: float64
>>> a.gt(b, fill_value=0)
a     True
b    False
c    False
d    False
e     True
f    False
dtype: bool

property hasnans: bool

Return True if there are any NaNs.

Enables various performance speedups.

Return type:: bool

Examples

>>> s = pd.Series([1, 2, 3, None])
>>> s
0    1.0
1    2.0
2    3.0
3    NaN
dtype: float64
>>> s.hasnans
True

Draw histogram of the input series using matplotlib.

Parameters:

by (object, optional) – If passed, then used to form histograms for separate groups.
ax (matplotlib axis object) – If not passed, uses gca().
grid (bool, default True) – Whether to show axis grid lines.
xlabelsize (int, default None) – If specified changes the x-axis label size.
xrot (float, default None) – Rotation of x axis labels.
ylabelsize (int, default None) – If specified changes the y-axis label size.
yrot (float, default None) – Rotation of y axis labels.
figsize (tuple, default None) – Figure size in inches by default.
bins (int or sequence, default 10) – Number of histogram bins to be used. If an integer is given, bins + 1 bin edges are calculated and returned. If bins is a sequence, gives bin edges, including left edge of first bin and right edge of last bin. In this case, bins is returned unmodified.
backend (str, default None) – Backend to use instead of the backend specified in the option plotting.backend. For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend.
legend (bool, default False) – Whether to show the legend.
**kwargs – To be passed to the actual plotting function.

Returns:

A histogram plot.

Return type:

matplotlib.AxesSubplot

See also

matplotlib.axes.Axes.hist: Plot a histogram using matplotlib.

Examples

For Series:

For Groupby:

idxmax(axis: Axis = 0, skipna: bool = True, *args, **kwargs) → Hashable

Return the row label of the maximum value.

If multiple values equal the maximum, the first row label with that value is returned.

Parameters:

axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
skipna (bool, default True) – Exclude NA/null values. If the entire Series is NA, the result will be NA.
*args – Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Label of the maximum value.

Return type:

Index

Raises:

ValueError – If the Series is empty.

See also

numpy.argmax: Return indices of the maximum values along the given axis.
DataFrame.idxmax: Return index of first occurrence of maximum over requested axis.
Series.idxmin: Return index label of the first occurrence of minimum of values.

Notes

This method is the Series version of ndarray.argmax. This method returns the label of the maximum, while ndarray.argmax returns the position. To get the position, use series.values.argmax().

Examples

>>> s = pd.Series(data=[1, None, 4, 3, 4],
...               index=['A', 'B', 'C', 'D', 'E'])
>>> s
A    1.0
B    NaN
C    4.0
D    3.0
E    4.0
dtype: float64

>>> s.idxmax()
'C'

If skipna is False and there is an NA value in the data, the function returns nan.

>>> s.idxmax(skipna=False)
nan

idxmin(axis: Axis = 0, skipna: bool = True, *args, **kwargs) → Hashable

Return the row label of the minimum value.

If multiple values equal the minimum, the first row label with that value is returned.

Parameters:

axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
skipna (bool, default True) – Exclude NA/null values. If the entire Series is NA, the result will be NA.
*args – Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Label of the minimum value.

Return type:

Index

Raises:

ValueError – If the Series is empty.

See also

numpy.argmin: Return indices of the minimum values along the given axis.
DataFrame.idxmin: Return index of first occurrence of minimum over requested axis.
Series.idxmax: Return index label of the first occurrence of maximum of values.

Notes

This method is the Series version of ndarray.argmin. This method returns the label of the minimum, while ndarray.argmin returns the position. To get the position, use series.values.argmin().

Examples

>>> s = pd.Series(data=[1, None, 4, 1],
...               index=['A', 'B', 'C', 'D'])
>>> s
A    1.0
B    NaN
C    4.0
D    1.0
dtype: float64

>>> s.idxmin()
'A'

If skipna is False and there is an NA value in the data, the function returns nan.

>>> s.idxmin(skipna=False)
nan

index

The index (axis labels) of the Series.

The index of a Series is used to label and identify each element of the underlying data. The index can be thought of as an immutable ordered set (technically a multi-set, as it may contain duplicate labels), and is used to index and align data in pandas.

Returns:: The index labels of the Series.
Return type:: Index

See also

Series.reindex: Conform Series to new index.
Index: The base pandas index type.

Notes

For more information on pandas indexing, see the indexing user guide.

Examples

To create a Series with a custom index and view the index labels:

>>> cities = ['Kolkata', 'Chicago', 'Toronto', 'Lisbon']
>>> populations = [14.85, 2.71, 2.93, 0.51]
>>> city_series = pd.Series(populations, index=cities)
>>> city_series.index
Index(['Kolkata', 'Chicago', 'Toronto', 'Lisbon'], dtype='object')

To change the index labels of an existing Series:

>>> city_series.index = ['KOL', 'CHI', 'TOR', 'LIS']
>>> city_series.index
Index(['KOL', 'CHI', 'TOR', 'LIS'], dtype='object')

Print a concise summary of a Series.

This method prints information about a Series including the index dtype, non-null values and memory usage.

Added in version 1.4.0.

Parameters:

verbose (bool, optional) – Whether to print the full summary. By default, the setting in pandas.options.display.max_info_columns is followed.
buf (writable buffer, defaults to sys.stdout) – Where to send the output. By default, the output is printed to sys.stdout. Pass a writable buffer if you need to further process the output.
memory_usage (bool, str, optional) –
Specifies whether total memory usage of the Series elements (including the index) should be displayed. By default, this follows the pandas.options.display.memory_usage setting.

True always show memory usage. False never shows memory usage. A value of ‘deep’ is equivalent to “True with deep introspection”. Memory usage is shown in human-readable units (base-2 representation). Without deep introspection a memory estimation is made based in column dtype and number of rows assuming values consume the same memory amount for corresponding dtypes. With deep memory introspection, a real memory usage calculation is performed at the cost of computational resources. See the Frequently Asked Questions for more details.
show_counts (bool, optional) – Whether to show the non-null counts. By default, this is shown only if the DataFrame is smaller than pandas.options.display.max_info_rows and pandas.options.display.max_info_columns. A value of True always shows the counts, and False never shows the counts.

Returns:

This method prints a summary of a Series and returns None.

Return type:

None

See also

Series.describe: Generate descriptive statistics of Series.
Series.memory_usage: Memory usage of Series.

Examples

>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> s = pd.Series(text_values, index=int_values)
>>> s.info()
<class 'pandas.core.series.Series'>
Index: 5 entries, 1 to 5
Series name: None
Non-Null Count  Dtype
--------------  -----
5 non-null      object
dtypes: object(1)
memory usage: 80.0+ bytes

Prints a summary excluding information about its values:

>>> s.info(verbose=False)
<class 'pandas.core.series.Series'>
Index: 5 entries, 1 to 5
dtypes: object(1)
memory usage: 80.0+ bytes

Pipe output of Series.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> s.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
...           encoding="utf-8") as f:  
...     f.write(s)
260

The memory_usage parameter allows deep introspection mode, specially useful for big Series and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> s = pd.Series(np.random.choice(['a', 'b', 'c'], 10 ** 6))
>>> s.info()
<class 'pandas.core.series.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count    Dtype
--------------    -----
1000000 non-null  object
dtypes: object(1)
memory usage: 7.6+ MB

>>> s.info(memory_usage='deep')
<class 'pandas.core.series.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count    Dtype
--------------    -----
1000000 non-null  object
dtypes: object(1)
memory usage: 55.3 MB

isin(values) → Series

Whether elements in Series are contained in values.

Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

Parameters:

values (set or list-like) – The sequence of values to test. Passing in a single string will raise a TypeError. Instead, turn a single string into a list of one element.

Returns:

Series of booleans indicating if each element is in values.

Return type:

Raises:

TypeError –

If values is a string

See also

DataFrame.isin: Equivalent method on DataFrame.

Examples

>>> s = pd.Series(['llama', 'cow', 'llama', 'beetle', 'llama',
...                'hippo'], name='animal')
>>> s.isin(['cow', 'llama'])
0     True
1     True
2     True
3    False
4     True
5    False
Name: animal, dtype: bool

To invert the boolean values, use the ~ operator:

>>> ~s.isin(['cow', 'llama'])
  False
  False
  False
   True
  False
   True
Name: animal, dtype: bool

Passing a single string as s.isin('llama') will raise an error. Use a list of one element instead:

>>> s.isin(['llama'])
   True
  False
   True
  False
   True
  False
Name: animal, dtype: bool

Strings and integers are distinct and are therefore not comparable:

>>> pd.Series([1]).isin(['1'])
0    False
dtype: bool
>>> pd.Series([1.1]).isin(['1.1'])
0    False
dtype: bool

isna() → Series

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:: Mask of bool values for each element in Series that indicates whether an element is an NA value.
Return type:: Series

See also

Series.isnull: Alias of isna.
Series.notna: Boolean inverse of isna.
Series.dropna: Omit axes labels with missing values.
isna: Top-level isna.

Examples

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

>>> ser.isna()
0    False
1    False
2     True
dtype: bool

isnull() → Series

Series.isnull is an alias for Series.isna.

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns:: Mask of bool values for each element in Series that indicates whether an element is an NA value.
Return type:: Series

See also

Series.isnull: Alias of isna.
Series.notna: Boolean inverse of isna.
Series.dropna: Omit axes labels with missing values.
isna: Top-level isna.

Examples

Show which entries in a DataFrame are NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

>>> ser.isna()
0    False
1    False
2     True
dtype: bool

items() → Iterable[tuple[Hashable, Any]]

Lazily iterate over (index, value) tuples.

This method returns an iterable tuple (index, value). This is convenient if you want to create a lazy iterator.

Returns:: Iterable of tuples containing the (index, value) pairs from a Series.
Return type:: iterable

See also

DataFrame.items: Iterate over (column name, Series) pairs.
DataFrame.iterrows: Iterate over DataFrame rows as (index, Series) pairs.

Examples

>>> s = pd.Series(['A', 'B', 'C'])
>>> for index, value in s.items():
...     print(f"Index : {index}, Value : {value}")
Index : 0, Value : A
Index : 1, Value : B
Index : 2, Value : C

keys() → Index

Return alias for index.

Returns:: Index of the Series.
Return type:: Index

Examples

>>> s = pd.Series([1, 2, 3], index=[0, 1, 2])
>>> s.keys()
Index([0, 1, 2], dtype='int64')

kurt(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

Return type:

scalar or scalar

kurtosis(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
>>> s
cat    1
dog    2
dog    2
mouse  3
dtype: int64
>>> s.kurt()
1.5

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
...                   index=['cat', 'dog', 'dog', 'mouse'])
>>> df
       a   b
  cat  1   3
  dog  2   4
  dog  2   4
mouse  3   4
>>> df.kurt()
a   1.5
b   4.0
dtype: float64

With axis=None

>>> df.kurt(axis=None).round(6)
-0.988693

Using axis=1

>>> df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
...                   index=['cat', 'dog'])
>>> df.kurt(axis=1)
cat   -6.0
dog   -6.0
dtype: float64

Return type:

scalar or scalar

le(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Less than or equal to of series and other, element-wise (binary operator le).

Equivalent to series <= other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

Examples

>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
e    1.0
dtype: float64
>>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
>>> b
a    0.0
b    1.0
c    2.0
d    NaN
f    1.0
dtype: float64
>>> a.le(b, fill_value=0)
a    False
b     True
c     True
d    False
e    False
f     True
dtype: bool

list: alias of ListAccessor

lt(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Less than of series and other, element-wise (binary operator lt).

Equivalent to series < other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

Examples

>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
e    1.0
dtype: float64
>>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
>>> b
a    0.0
b    1.0
c    2.0
d    NaN
f    1.0
dtype: float64
>>> a.lt(b, fill_value=0)
a    False
b    False
c     True
d    False
e    False
f     True
dtype: bool

map(arg: Callable | Mapping | Series, na_action: Literal['ignore'] | None = None) → Series

Map values of Series according to an input mapping or function.

Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.

Parameters:

arg (function, collections.abc.Mapping subclass or Series) – Mapping correspondence.
na_action ({None, 'ignore'}, default None) – If ‘ignore’, propagate NaN values, without passing them to the mapping correspondence.

Returns:

Same index as caller.

Return type:

See also

Series.apply: For applying more complex functions on a Series.
Series.replace: Replace values given in to_replace with value.
DataFrame.apply: Apply a function row-/column-wise.
DataFrame.map: Apply a function elementwise on a whole DataFrame.

Notes

When arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN. However, if the dictionary is a dict subclass that defines __missing__ (i.e. provides a method for default values), then this default is used rather than NaN.

Examples

>>> s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
>>> s
0      cat
1      dog
2      NaN
3   rabbit
dtype: object

map accepts a dict or a Series. Values that are not found in the dict are converted to NaN, unless the dict has a default value (e.g. defaultdict):

>>> s.map({'cat': 'kitten', 'dog': 'puppy'})
0   kitten
1    puppy
2      NaN
3      NaN
dtype: object

It also accepts a function:

>>> s.map('I am a {}'.format)
0       I am a cat
1       I am a dog
2       I am a nan
3    I am a rabbit
dtype: object

To avoid applying the function to missing values (and keep them as NaN) na_action='ignore' can be used:

>>> s.map('I am a {}'.format, na_action='ignore')
0     I am a cat
1     I am a dog
2            NaN
3  I am a rabbit
dtype: object

max(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

scalar or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.max()
8

mean(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return the mean of the values over the requested axis.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.mean()
2.0

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.mean()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.mean(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.mean(numeric_only=True)
a   1.5
dtype: float64

Return type:

scalar or scalar

median(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return the median of the values over the requested axis.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.median()
2.0

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.median()
a   1.5
b   2.5
dtype: float64

Using axis=1

>>> df.median(axis=1)
tiger   1.5
zebra   2.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.median(numeric_only=True)
a   1.5
dtype: float64

Return type:

scalar or scalar

memory_usage(index: bool = True, deep: bool = False) → int

Return the memory usage of the Series.

The memory usage can optionally include the contribution of the index and of elements of object dtype.

Parameters:

index (bool, default True) – Specifies whether to include the memory usage of the Series index.
deep (bool, default False) – If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned value.

Returns:

Bytes of memory consumed.

Return type:

int

See also

numpy.ndarray.nbytes: Total bytes consumed by the elements of the array.
DataFrame.memory_usage: Bytes consumed by a DataFrame.

Examples

>>> s = pd.Series(range(3))
>>> s.memory_usage()
152

Not including the index gives the size of the rest of the data, which is necessarily smaller:

>>> s.memory_usage(index=False)
24

The memory footprint of object values is ignored by default:

>>> s = pd.Series(["a", "b"])
>>> s.values
array(['a', 'b'], dtype=object)
>>> s.memory_usage()
144
>>> s.memory_usage(deep=True)
244

min(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

scalar or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.min()
0

mod(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Modulo of series and other, element-wise (binary operator mod).

Equivalent to series % other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rmod: Reverse of the Modulo operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.mod(b, fill_value=0)
a    0.0
b    NaN
c    NaN
d    0.0
e    NaN
dtype: float64

mode(dropna: bool = True) → Series

Return the mode(s) of the Series.

The mode is the value that appears most often. There can be multiple modes.

Always returns Series even if only one value is returned.

Parameters:: dropna (bool, default True) – Don’t consider counts of NaN/NaT.
Returns:: Modes of the Series in sorted order.
Return type:: Series

Examples

>>> s = pd.Series([2, 4, 2, 2, 4, None])
>>> s.mode()
0    2.0
dtype: float64

More than one mode:

>>> s = pd.Series([2, 4, 8, 2, 4, None])
>>> s.mode()
0    2.0
1    4.0
dtype: float64

With and without considering null value:

>>> s = pd.Series([2, 4, None, None, 4, None])
>>> s.mode(dropna=False)
0   NaN
dtype: float64
>>> s = pd.Series([2, 4, None, None, 4, None])
>>> s.mode()
0    4.0
dtype: float64

mul(other, level: Level | None = None, fill_value: float | None = None, axis: Axis = 0) → Series

Return Multiplication of series and other, element-wise (binary operator mul).

Equivalent to series * other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rmul: Reverse of the Multiplication operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.multiply(b, fill_value=0)
a    1.0
b    0.0
c    0.0
d    0.0
e    NaN
dtype: float64

multiply(other, level: Level | None = None, fill_value: float | None = None, axis: Axis = 0) → Series

Return Multiplication of series and other, element-wise (binary operator mul).

Equivalent to series * other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rmul: Reverse of the Multiplication operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.multiply(b, fill_value=0)
a    1.0
b    0.0
c    0.0
d    0.0
e    NaN
dtype: float64

property name: Hashable

Return the name of the Series.

The name of a Series becomes its index or column name if it is used to form a DataFrame. It is also used whenever displaying the Series using the interpreter.

Returns:: The name of the Series, also the column name if part of a DataFrame.
Return type:: label (hashable object)

See also

Series.rename: Sets the Series name when given a scalar input.
Index.name: Corresponding Index property.

Examples

The Series name can be set initially when calling the constructor.

>>> s = pd.Series([1, 2, 3], dtype=np.int64, name='Numbers')
>>> s
0    1
1    2
2    3
Name: Numbers, dtype: int64
>>> s.name = "Integers"
>>> s
0    1
1    2
2    3
Name: Integers, dtype: int64

The name of a Series within a DataFrame is its column name.

>>> df = pd.DataFrame([[1, 2], [3, 4], [5, 6]],
...                   columns=["Odd Numbers", "Even Numbers"])
>>> df
   Odd Numbers  Even Numbers
0            1             2
1            3             4
2            5             6
>>> df["Even Numbers"].name
'Even Numbers'

ne(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Not equal to of series and other, element-wise (binary operator ne).

Equivalent to series != other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.ne(b, fill_value=0)
a    False
b     True
c     True
d     True
e     True
dtype: bool

nlargest(n: int = 5, keep: Literal['first', 'last', 'all'] = 'first') → Series

Return the largest n elements.

Parameters:

n (int, default 5) – Return this many descending sorted values.
keep ({'first', 'last', 'all'}, default 'first') –
When there are duplicate values that cannot all fit in a Series of n elements:
- first : return the first n occurrences in order of appearance.
- last : return the last n occurrences in reverse order of appearance.
- all : keep all occurrences. This can result in a Series of size larger than n.

Returns:

The n largest values in the Series, sorted in decreasing order.

Return type:

See also

Series.nsmallest: Get the n smallest elements.
Series.sort_values: Sort Series by values.
Series.head: Return the first n rows.

Notes

Faster than .sort_values(ascending=False).head(n) for small n relative to the size of the Series object.

Examples

>>> countries_population = {"Italy": 59000000, "France": 65000000,
...                         "Malta": 434000, "Maldives": 434000,
...                         "Brunei": 434000, "Iceland": 337000,
...                         "Nauru": 11300, "Tuvalu": 11300,
...                         "Anguilla": 11300, "Montserrat": 5200}
>>> s = pd.Series(countries_population)
>>> s
Italy       59000000
France      65000000
Malta         434000
Maldives      434000
Brunei        434000
Iceland       337000
Nauru          11300
Tuvalu         11300
Anguilla       11300
Montserrat      5200
dtype: int64

The n largest elements where n=5 by default.

>>> s.nlargest()
France      65000000
Italy       59000000
Malta         434000
Maldives      434000
Brunei        434000
dtype: int64

The n largest elements where n=3. Default keep value is ‘first’ so Malta will be kept.

>>> s.nlargest(3)
France    65000000
Italy     59000000
Malta       434000
dtype: int64

The n largest elements where n=3 and keeping the last duplicates. Brunei will be kept since it is the last with value 434000 based on the index order.

>>> s.nlargest(3, keep='last')
France      65000000
Italy       59000000
Brunei        434000
dtype: int64

The n largest elements where n=3 with all duplicates kept. Note that the returned Series has five elements due to the three duplicates.

>>> s.nlargest(3, keep='all')
France      65000000
Italy       59000000
Malta         434000
Maldives      434000
Brunei        434000
dtype: int64

notna() → Series

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns:: Mask of bool values for each element in Series that indicates whether an element is not an NA value.
Return type:: Series

See also

Series.notnull: Alias of notna.
Series.isna: Boolean inverse of notna.
Series.dropna: Omit axes labels with missing values.
notna: Top-level notna.

Examples

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

>>> ser.notna()
0     True
1     True
2    False
dtype: bool

notnull() → Series

Series.notnull is an alias for Series.notna.

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns:: Mask of bool values for each element in Series that indicates whether an element is not an NA value.
Return type:: Series

See also

Series.notnull: Alias of notna.
Series.isna: Boolean inverse of notna.
Series.dropna: Omit axes labels with missing values.
notna: Top-level notna.

Examples

Show which entries in a DataFrame are not NA.

>>> df = pd.DataFrame(dict(age=[5, 6, np.nan],
...                        born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                              pd.Timestamp('1940-04-25')],
...                        name=['Alfred', 'Batman', ''],
...                        toy=[None, 'Batmobile', 'Joker']))
>>> df
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

>>> ser = pd.Series([5, 6, np.nan])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

>>> ser.notna()
0     True
1     True
2    False
dtype: bool

nsmallest(n: int = 5, keep: Literal['first', 'last', 'all'] = 'first') → Series

Return the smallest n elements.

Parameters:

n (int, default 5) – Return this many ascending sorted values.
keep ({'first', 'last', 'all'}, default 'first') –
When there are duplicate values that cannot all fit in a Series of n elements:
- first : return the first n occurrences in order of appearance.
- last : return the last n occurrences in reverse order of appearance.
- all : keep all occurrences. This can result in a Series of size larger than n.

Returns:

The n smallest values in the Series, sorted in increasing order.

Return type:

See also

Series.nlargest: Get the n largest elements.
Series.sort_values: Sort Series by values.
Series.head: Return the first n rows.

Notes

Faster than .sort_values().head(n) for small n relative to the size of the Series object.

Examples

>>> countries_population = {"Italy": 59000000, "France": 65000000,
...                         "Brunei": 434000, "Malta": 434000,
...                         "Maldives": 434000, "Iceland": 337000,
...                         "Nauru": 11300, "Tuvalu": 11300,
...                         "Anguilla": 11300, "Montserrat": 5200}
>>> s = pd.Series(countries_population)
>>> s
Italy       59000000
France      65000000
Brunei        434000
Malta         434000
Maldives      434000
Iceland       337000
Nauru          11300
Tuvalu         11300
Anguilla       11300
Montserrat      5200
dtype: int64

The n smallest elements where n=5 by default.

>>> s.nsmallest()
Montserrat    5200
Nauru        11300
Tuvalu       11300
Anguilla     11300
Iceland     337000
dtype: int64

The n smallest elements where n=3. Default keep value is ‘first’ so Nauru and Tuvalu will be kept.

>>> s.nsmallest(3)
Montserrat   5200
Nauru       11300
Tuvalu      11300
dtype: int64

The n smallest elements where n=3 and keeping the last duplicates. Anguilla and Tuvalu will be kept since they are the last with value 11300 based on the index order.

>>> s.nsmallest(3, keep='last')
Montserrat   5200
Anguilla    11300
Tuvalu      11300
dtype: int64

The n smallest elements where n=3 with all duplicates kept. Note that the returned Series has four elements due to the three duplicates.

>>> s.nsmallest(3, keep='all')
Montserrat   5200
Nauru       11300
Tuvalu      11300
Anguilla    11300
dtype: int64

plot: alias of PlotAccessor

pop(item: Hashable) → Any

Return item and drops from series. Raise KeyError if not found.

Parameters:: item (label) – Index of the element that needs to be removed.
Return type:: Value that is popped from series.

Examples

>>> ser = pd.Series([1, 2, 3])

>>> ser.pop(0)
1

>>> ser
1    2
2    3
dtype: int64

pow(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Exponential power of series and other, element-wise (binary operator pow).

Equivalent to series ** other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rpow: Reverse of the Exponential power operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.pow(b, fill_value=0)
a    1.0
b    1.0
c    1.0
d    0.0
e    NaN
dtype: float64

prod(axis: Axis | None = None, skipna: bool = True, numeric_only: bool = False, min_count: int = 0, **kwargs)

Return the product of the values over the requested axis.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

scalar or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

By default, the product of an empty or all-NA Series is 1

>>> pd.Series([], dtype="float64").prod()
1.0

This can be controlled with the min_count parameter

>>> pd.Series([], dtype="float64").prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).prod()
1.0

>>> pd.Series([np.nan]).prod(min_count=1)
nan

product(axis: Axis | None = None, skipna: bool = True, numeric_only: bool = False, min_count: int = 0, **kwargs)

Return the product of the values over the requested axis.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

scalar or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

By default, the product of an empty or all-NA Series is 1

>>> pd.Series([], dtype="float64").prod()
1.0

This can be controlled with the min_count parameter

>>> pd.Series([], dtype="float64").prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).prod()
1.0

>>> pd.Series([np.nan]).prod(min_count=1)
nan

quantile(q: float = 0.5, interpolation: QuantileInterpolation = 'linear') → float

quantile(q: Sequence[float] | AnyArrayLike, interpolation: QuantileInterpolation = 'linear') → Series

quantile(q: float | Sequence[float] | AnyArrayLike = 0.5, interpolation: QuantileInterpolation = 'linear') → float | Series

Return value at the given quantile.

Parameters:

q (float or array-like, default 0.5 (50% quantile)) – The quantile(s) to compute, which can lie in range: 0 <= q <= 1.
interpolation ({'linear', 'lower', 'higher', 'midpoint', 'nearest'}) –
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:
- linear: i + (j - i) * (x-i)/(j-i), where (x-i)/(j-i) is the fractional part of the index surrounded by i > j.
- lower: i.
- higher: j.
- nearest: i or j whichever is nearest.
- midpoint: (i + j) / 2.

Returns:

If q is an array, a Series will be returned where the index is q and the values are the quantiles, otherwise a float will be returned.

Return type:

float or Series

See also

core.window.Rolling.quantile: Calculate the rolling quantile.
numpy.percentile: Returns the q-th percentile(s) of the array elements.

Examples

>>> s = pd.Series([1, 2, 3, 4])
>>> s.quantile(.5)
2.5
>>> s.quantile([.25, .5, .75])
0.25    1.75
0.50    2.50
0.75    3.25
dtype: float64

radd(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Addition of series and other, element-wise (binary operator radd).

Equivalent to other + series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.add: Element-wise Addition, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.add(b, fill_value=0)
a    2.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64

ravel(order: str = 'C') → ArrayLike

Return the flattened underlying data as an ndarray or ExtensionArray.

Deprecated since version 2.2.0: Series.ravel is deprecated. The underlying array is already 1D, so ravel is not necessary. Use to_numpy() for conversion to a numpy array instead.

Returns:: Flattened data of the Series.
Return type:: numpy.ndarray or ExtensionArray

See also

numpy.ndarray.ravel: Return a flattened array.

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.ravel()  
array([1, 2, 3])

rdiv(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Floating division of series and other, element-wise (binary operator rtruediv).

Equivalent to other / series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.truediv: Element-wise Floating division, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.divide(b, fill_value=0)
a    1.0
b    inf
c    inf
d    0.0
e    NaN
dtype: float64

rdivmod(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Integer division and modulo of series and other, element-wise (binary operator rdivmod).

Equivalent to other divmod series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

2-Tuple of Series

See also

Series.divmod: Element-wise Integer division and modulo, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.divmod(b, fill_value=0)
(a    1.0
 b    inf
 c    inf
 d    0.0
 e    NaN
 dtype: float64,
 a    0.0
 b    NaN
 c    NaN
 d    0.0
 e    NaN
 dtype: float64)

Conform Series to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:

index (array-like, optional) – New labels for the index. Preferably an Index object to avoid duplicating data.
axis (int or str, optional) – Unused.
method ({None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}) –
Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.
- None (default): don’t fill gaps
- pad / ffill: Propagate last valid observation forward to next valid.
- backfill / bfill: Use next valid observation to fill gap.
- nearest: Use nearest valid observations to fill gap.
copy (bool, default True) –
Return a new object, even if the passed indexes are the same.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (scalar, default np.nan) – Value to use for missing values. Defaults to NaN, but can be any “compatible” value.
limit (int, default None) – Maximum number of consecutive elements to forward or backward fill.
tolerance (optional) –
Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Return type:

Series with changed index.

See also

DataFrame.set_index: Set row labels.
DataFrame.reset_index: Remove row labels or move them to new columns.
DataFrame.reindex_like: Change to same indices as other DataFrame.

Examples

DataFrame.reindex supports two calling conventions

(index=index_labels, columns=column_labels, ...)
(labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
...                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...                   index=index)
>>> df
           http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...              'Chrome']
>>> df.reindex(new_index)
               http_status  response_time
Safari               404.0           0.07
Iceweasel              NaN            NaN
Comodo Dragon          NaN            NaN
IE10                 404.0           0.08
Chrome               200.0           0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

>>> df.reindex(new_index, fill_value=0)
               http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02

>>> df.reindex(new_index, fill_value='missing')
              http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

We can also reindex the columns.

>>> df.reindex(columns=['http_status', 'user_agent'])
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

Or we can use “axis-style” keyword arguments

>>> df.reindex(['http_status', 'user_agent'], axis="columns")
           http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates).

>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
...                    index=date_index)
>>> df2
            prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0

Suppose we decide to expand the dataframe to cover a wider date range.

>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
            prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

The index entries that did not have a value in the original data frame (for example, ‘2009-12-29’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

>>> df2.reindex(date_index2, method='bfill')
            prices
2009-12-29   100.0
2009-12-30   100.0
2009-12-31   100.0
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

Please note that the NaN value present in the original dataframe (at index value 2010-01-03) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataframe, use the fillna() method.

See the user guide for more.

rename(index: Renamer | Hashable | None = None, *, axis: Axis | None = None, copy: bool = None, inplace: Literal[True], level: Level | None = None, errors: IgnoreRaise = 'ignore') → None

rename(index: Renamer | Hashable | None = None, *, axis: Axis | None = None, copy: bool = None, inplace: Literal[False] = False, level: Level | None = None, errors: IgnoreRaise = 'ignore') → Series

Alter Series index labels or name.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

Alternatively, change Series.name with a scalar value.

See the user guide for more.

Parameters:

index (scalar, hashable sequence, dict-like or function optional) – Functions or dict-like are transformations to apply to the index. Scalar or hashable sequence-like will alter the Series.name attribute.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
copy (bool, default True) –
Also copy underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True
inplace (bool, default False) – Whether to return a new Series. If True the value of copy is ignored.
level (int or level name, default None) – In case of MultiIndex, only rename labels in the specified level.
errors ({'ignore', 'raise'}, default 'ignore') – If ‘raise’, raise KeyError when a dict-like mapper or index contains labels that are not present in the index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.

Returns:

Series with index labels or name altered or None if inplace=True.

Return type:

Series or None

See also

DataFrame.rename: Corresponding DataFrame method.
Series.rename_axis: Set the name of the axis.

Examples

>>> s = pd.Series([1, 2, 3])
>>> s
0    1
1    2
2    3
dtype: int64
>>> s.rename("my_name")  # scalar, changes Series.name
0    1
1    2
2    3
Name: my_name, dtype: int64
>>> s.rename(lambda x: x ** 2)  # function, changes labels
0    1
1    2
4    3
dtype: int64
>>> s.rename({1: 3, 2: 5})  # mapping, changes labels
0    1
3    2
5    3
dtype: int64

rename_axis(mapper: IndexLabel | lib.NoDefault = <no_default>, *, index=<no_default>, axis: Axis = 0, copy: bool = True, inplace: Literal[True]) → None

rename_axis(mapper: IndexLabel | lib.NoDefault = <no_default>, *, index=<no_default>, axis: Axis = 0, copy: bool = True, inplace: Literal[False] = False) → Self

rename_axis(mapper: IndexLabel | lib.NoDefault = <no_default>, *, index=<no_default>, axis: Axis = 0, copy: bool = True, inplace: bool = False) → Self | None

Set the name of the axis for the index or columns.

Parameters:

mapper (scalar, list-like, optional) – Value to set the axis name attribute.
index (scalar, list-like, dict-like or function, optional) –
A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.
columns (scalar, list-like, dict-like or function, optional) –
A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.
axis ({0 or 'index', 1 or 'columns'}, default 0) – The axis to rename. For Series this parameter is unused and defaults to 0.
copy (bool, default None) –
Also copy underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True
inplace (bool, default False) – Modifies the object directly, instead of creating a new Series or DataFrame.

Returns:

The same type as the caller or None if inplace=True.

Return type:

Series, DataFrame, or None

See also

Series.rename: Alter Series index labels or name.
DataFrame.rename: Alter DataFrame index labels or name.
Index.rename: Set new names on index.

Notes

DataFrame.rename_axis supports two calling conventions

(index=index_mapper, columns=columns_mapper, ...)
(mapper, axis={'index', 'columns'}, ...)

The first calling convention will only modify the names of the index and/or the names of the Index object that is the columns. In this case, the parameter copy is ignored.

The second calling convention will modify the names of the corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.

We highly recommend using keyword arguments to clarify your intent.

Examples

Series

>>> s = pd.Series(["dog", "cat", "monkey"])
>>> s
0       dog
1       cat
2    monkey
dtype: object
>>> s.rename_axis("animal")
animal
0    dog
1    cat
2    monkey
dtype: object

DataFrame

>>> df = pd.DataFrame({"num_legs": [4, 4, 2],
...                    "num_arms": [0, 0, 2]},
...                   ["dog", "cat", "monkey"])
>>> df
        num_legs  num_arms
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("animal")
>>> df
        num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("limbs", axis="columns")
>>> df
limbs   num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2

MultiIndex

>>> df.index = pd.MultiIndex.from_product([['mammal'],
...                                        ['dog', 'cat', 'monkey']],
...                                       names=['type', 'name'])
>>> df
limbs          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2

>>> df.rename_axis(index={'type': 'class'})
limbs          num_legs  num_arms
class  name
mammal dog            4         0
       cat            4         0
       monkey         2         2

>>> df.rename_axis(columns=str.upper)
LIMBS          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2

reorder_levels(order: Sequence[Level]) → Series

Rearrange index levels using input order.

May not drop or duplicate levels.

Parameters:: order (list of int representing new level order) – Reference level by number or key.
Return type:: type of caller (new object)

Examples

>>> arrays = [np.array(["dog", "dog", "cat", "cat", "bird", "bird"]),
...           np.array(["white", "black", "white", "black", "white", "black"])]
>>> s = pd.Series([1, 2, 3, 3, 5, 2], index=arrays)
>>> s
dog   white    1
      black    2
cat   white    3
      black    3
bird  white    5
      black    2
dtype: int64
>>> s.reorder_levels([1, 0])
white  dog     1
black  dog     2
white  cat     3
black  cat     3
white  bird    5
black  bird    2
dtype: int64

repeat(repeats: int | Sequence[int], axis: None = None) → Series

Repeat elements of a Series.

Returns a new Series where each element of the current Series is repeated consecutively a given number of times.

Parameters:

repeats (int or array of ints) – The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty Series.
axis (None) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

Newly created Series with repeated elements.

Return type:

See also

Index.repeat: Equivalent function for Index.
numpy.repeat: Similar method for numpy.ndarray.

Examples

>>> s = pd.Series(['a', 'b', 'c'])
>>> s
0    a
1    b
2    c
dtype: object
>>> s.repeat(2)
0    a
0    a
1    b
1    b
2    c
2    c
dtype: object
>>> s.repeat([1, 2, 3])
0    a
1    b
1    b
2    c
2    c
2    c
dtype: object

reset_index(level: IndexLabel = None, *, drop: Literal[False] = False, name: Level = <no_default>, inplace: Literal[False] = False, allow_duplicates: bool = False) → DataFrame

reset_index(level: IndexLabel = None, *, drop: Literal[True], name: Level = <no_default>, inplace: Literal[False] = False, allow_duplicates: bool = False) → Series

reset_index(level: IndexLabel = None, *, drop: bool = False, name: Level = <no_default>, inplace: Literal[True], allow_duplicates: bool = False) → None

Generate a new DataFrame or Series with the index reset.

This is useful when the index needs to be treated as a column, or when the index is meaningless and needs to be reset to the default before another operation.

Parameters:

level (int, str, tuple, or list, default optional) – For a Series with a MultiIndex, only remove the specified levels from the index. Removes all levels by default.
drop (bool, default False) – Just reset the index, without inserting it as a column in the new DataFrame.
name (object, optional) – The name to use for the column containing the original Series values. Uses self.name by default. This argument is ignored when drop is True.
inplace (bool, default False) – Modify the Series in place (do not create a new object).
allow_duplicates (bool, default False) –
Allow duplicate column labels to be created.

Added in version 1.5.0.

Returns:

When drop is False (the default), a DataFrame is returned. The newly created columns will come first in the DataFrame, followed by the original Series values. When drop is True, a Series is returned. In either case, if inplace=True, no value is returned.

Return type:

Series or DataFrame or None

See also

DataFrame.reset_index: Analogous function for DataFrame.

Examples

>>> s = pd.Series([1, 2, 3, 4], name='foo',
...               index=pd.Index(['a', 'b', 'c', 'd'], name='idx'))

Generate a DataFrame with default index.

>>> s.reset_index()
  idx  foo
0   a    1
1   b    2
2   c    3
3   d    4

To specify the name of the new column use name.

>>> s.reset_index(name='values')
  idx  values
0   a       1
1   b       2
2   c       3
3   d       4

To generate a new Series with the default set drop to True.

>>> s.reset_index(drop=True)
0    1
1    2
2    3
3    4
Name: foo, dtype: int64

The level parameter is interesting for Series with a multi-level index.

>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz']),
...           np.array(['one', 'two', 'one', 'two'])]
>>> s2 = pd.Series(
...     range(4), name='foo',
...     index=pd.MultiIndex.from_arrays(arrays,
...                                     names=['a', 'b']))

To remove a specific level from the Index, use level.

>>> s2.reset_index(level='a')
       a  foo
b
one  bar    0
two  bar    1
one  baz    2
two  baz    3

If level is not set, all levels are removed from the Index.

>>> s2.reset_index()
     a    b  foo
0  bar  one    0
1  bar  two    1
2  baz  one    2
3  baz  two    3

rfloordiv(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Integer division of series and other, element-wise (binary operator rfloordiv).

Equivalent to other // series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.floordiv: Element-wise Integer division, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.floordiv(b, fill_value=0)
a    1.0
b    inf
c    inf
d    0.0
e    NaN
dtype: float64

rmod(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Modulo of series and other, element-wise (binary operator rmod).

Equivalent to other % series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.mod: Element-wise Modulo, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.mod(b, fill_value=0)
a    0.0
b    NaN
c    NaN
d    0.0
e    NaN
dtype: float64

rmul(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Multiplication of series and other, element-wise (binary operator rmul).

Equivalent to other * series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.mul: Element-wise Multiplication, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.multiply(b, fill_value=0)
a    1.0
b    0.0
c    0.0
d    0.0
e    NaN
dtype: float64

round(decimals: int = 0, *args, **kwargs) → Series

Round each value in a Series to the given number of decimals.

Parameters:

decimals (int, default 0) – Number of decimal places to round to. If decimals is negative, it specifies the number of positions to the left of the decimal point.
*args – Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.
**kwargs – Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

Rounded values of the Series.

Return type:

See also

numpy.around: Round values of an np.array.
DataFrame.round: Round values of a DataFrame.

Examples

>>> s = pd.Series([0.1, 1.3, 2.7])
>>> s.round()
0    0.0
1    1.0
2    3.0
dtype: float64

rpow(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Exponential power of series and other, element-wise (binary operator rpow).

Equivalent to other ** series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.pow: Element-wise Exponential power, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.pow(b, fill_value=0)
a    1.0
b    1.0
c    1.0
d    0.0
e    NaN
dtype: float64

rsub(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Subtraction of series and other, element-wise (binary operator rsub).

Equivalent to other - series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.sub: Element-wise Subtraction, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.subtract(b, fill_value=0)
a    0.0
b    1.0
c    1.0
d   -1.0
e    NaN
dtype: float64

rtruediv(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Floating division of series and other, element-wise (binary operator rtruediv).

Equivalent to other / series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.truediv: Element-wise Floating division, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.divide(b, fill_value=0)
a    1.0
b    inf
c    inf
d    0.0
e    NaN
dtype: float64

searchsorted(value: NumpyValueArrayLike | ExtensionArray, side: Literal['left', 'right'] = 'left', sorter: NumpySorter | None = None) → npt.NDArray[np.intp] | np.intp

Find indices where elements should be inserted to maintain order.

Find the indices into a sorted Series self such that, if the corresponding elements in value were inserted before the indices, the order of self would be preserved.

Note

The Series must be monotonically sorted, otherwise wrong locations will likely be returned. Pandas does not check this for you.

Parameters:

value (array-like or scalar) – Values to insert into self.
side ({'left', 'right'}, optional) – If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of self).
sorter (1-D array-like, optional) – Optional array of integer indices that sort self into ascending order. They are typically the result of np.argsort.

Returns:

A scalar or array of insertion points with the same shape as value.

Return type:

int or array of int

See also

sort_values: Sort by the values along either axis.
numpy.searchsorted: Similar method from NumPy.

Notes

Binary search is used to find the required insertion points.

Examples

>>> ser = pd.Series([1, 2, 3])
>>> ser
0    1
1    2
2    3
dtype: int64

>>> ser.searchsorted(4)
3

>>> ser.searchsorted([0, 4])
array([0, 3])

>>> ser.searchsorted([1, 3], side='left')
array([0, 2])

>>> ser.searchsorted([1, 3], side='right')
array([1, 3])

>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0   2000-03-11
1   2000-03-12
2   2000-03-13
dtype: datetime64[ns]

>>> ser.searchsorted('3/14/2000')
3

>>> ser = pd.Categorical(
...     ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']

>>> ser.searchsorted('bread')
1

>>> ser.searchsorted(['bread'], side='right')
array([3])

If the values are not monotonically sorted, wrong locations may be returned:

>>> ser = pd.Series([2, 1, 3])
>>> ser
0    2
1    1
2    3
dtype: int64

>>> ser.searchsorted(1)  
0  # wrong result, correct would be 1

sem(axis: Axis | None = None, skipna: bool = True, ddof: int = 1, numeric_only: bool = False, **kwargs)

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:

axis ({index (0)}) –
For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.sem with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.sem().round(6)
0.57735

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
>>> df
       a   b
tiger  1   2
zebra  2   3
>>> df.sem()
a   0.5
b   0.5
dtype: float64

Using axis=1

>>> df.sem(axis=1)
tiger   0.5
zebra   0.5
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
...                   index=['tiger', 'zebra'])
>>> df.sem(numeric_only=True)
a   0.5
dtype: float64

Return type:

scalar or Series (if level specified)

set_axis(labels, *, axis: Axis = 0, copy: bool | None = None) → Series

Assign desired index to given axis.

Indexes for row labels can be changed by assigning a list-like or Index.

Parameters:

labels (list-like, Index) – The values for the new index.
axis ({0 or 'index'}, default 0) – The axis to update. The value 0 identifies the rows. For Series this parameter is unused and defaults to 0.
copy (bool, default True) –
Whether to make a copy of the underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

An object of type Series.

Return type:

See also

Series.rename_axis: Alter the name of the index. Examples ——– >>> s = pd.Series([1, 2, 3]) >>> s 0 1 1 2 2 3
dtype: int64 >>> s.set_axis([‘a’, ‘b’, ‘c’], axis=0) a 1 b 2 c 3
dtype: int64

skew(axis: Axis | None = 0, skipna: bool = True, numeric_only: bool = False, **kwargs)

Return unbiased skew over requested axis.

Normalized by N-1.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
**kwargs – Additional keyword arguments to be passed to the function.

Returns:

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.skew()
0.0

With a DataFrame

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [1, 3, 5]},
...                   index=['tiger', 'zebra', 'cow'])
>>> df
        a   b   c
tiger   1   2   1
zebra   2   3   3
cow     3   4   5
>>> df.skew()
a   0.0
b   0.0
c   0.0
dtype: float64

Using axis=1

>>> df.skew(axis=1)
tiger   1.732051
zebra  -1.732051
cow     0.000000
dtype: float64

In this case, numeric_only should be set to True to avoid getting an error.

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': ['T', 'Z', 'X']},
...                   index=['tiger', 'zebra', 'cow'])
>>> df.skew(numeric_only=True)
a   0.0
dtype: float64

Return type:

scalar or scalar

sort_index(*, axis: Axis = 0, level: IndexLabel = None, ascending: bool | Sequence[bool] = True, inplace: Literal[True], kind: SortKind = 'quicksort', na_position: NaPosition = 'last', sort_remaining: bool = True, ignore_index: bool = False, key: IndexKeyFunc = None) → None

sort_index(*, axis: Axis = 0, level: IndexLabel = None, ascending: bool | Sequence[bool] = True, inplace: Literal[False] = False, kind: SortKind = 'quicksort', na_position: NaPosition = 'last', sort_remaining: bool = True, ignore_index: bool = False, key: IndexKeyFunc = None) → Series

sort_index(*, axis: Axis = 0, level: IndexLabel = None, ascending: bool | Sequence[bool] = True, inplace: bool = False, kind: SortKind = 'quicksort', na_position: NaPosition = 'last', sort_remaining: bool = True, ignore_index: bool = False, key: IndexKeyFunc = None) → Series | None

Sort Series by index labels.

Returns a new Series sorted by label if inplace argument is False, otherwise updates the original series and returns None.

Parameters:

axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
level (int, optional) – If not None, sort on values in specified index level(s).
ascending (bool or list-like of bools, default True) – Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually.
inplace (bool, default False) – If True, perform operation in-place.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm. See also numpy.sort() for more information. ‘mergesort’ and ‘stable’ are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.
na_position ({'first', 'last'}, default 'last') – If ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end. Not implemented for MultiIndex.
sort_remaining (bool, default True) – If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
key (callable, optional) – If not None, apply the key function to the index values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect an Index and return an Index of the same shape.

Returns:

The original Series sorted by the labels or None if inplace=True.

Return type:

Series or None

See also

DataFrame.sort_index: Sort DataFrame by the index.
DataFrame.sort_values: Sort DataFrame by the value.
Series.sort_values: Sort Series by the value.

Examples

>>> s = pd.Series(['a', 'b', 'c', 'd'], index=[3, 2, 1, 4])
>>> s.sort_index()
1    c
2    b
3    a
4    d
dtype: object

Sort Descending

>>> s.sort_index(ascending=False)
4    d
3    a
2    b
1    c
dtype: object

By default NaNs are put at the end, but use na_position to place them at the beginning

>>> s = pd.Series(['a', 'b', 'c', 'd'], index=[3, 2, 1, np.nan])
>>> s.sort_index(na_position='first')
NaN     d
 1.0    c
 2.0    b
 3.0    a
dtype: object

Specify index level to sort

>>> arrays = [np.array(['qux', 'qux', 'foo', 'foo',
...                     'baz', 'baz', 'bar', 'bar']),
...           np.array(['two', 'one', 'two', 'one',
...                     'two', 'one', 'two', 'one'])]
>>> s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=arrays)
>>> s.sort_index(level=1)
bar  one    8
baz  one    6
foo  one    4
qux  one    2
bar  two    7
baz  two    5
foo  two    3
qux  two    1
dtype: int64

Does not sort by remaining levels when sorting by levels

>>> s.sort_index(level=1, sort_remaining=False)
qux  one    2
foo  one    4
baz  one    6
bar  one    8
qux  two    1
foo  two    3
baz  two    5
bar  two    7
dtype: int64

Apply a key function before sorting

>>> s = pd.Series([1, 2, 3, 4], index=['A', 'b', 'C', 'd'])
>>> s.sort_index(key=lambda x : x.str.lower())
A    1
b    2
C    3
d    4
dtype: int64

sort_values(*, axis: Axis = 0, ascending: bool | Sequence[bool] = True, inplace: Literal[False] = False, kind: SortKind = 'quicksort', na_position: NaPosition = 'last', ignore_index: bool = False, key: ValueKeyFunc = None) → Series

sort_values(*, axis: Axis = 0, ascending: bool | Sequence[bool] = True, inplace: Literal[True], kind: SortKind = 'quicksort', na_position: NaPosition = 'last', ignore_index: bool = False, key: ValueKeyFunc = None) → None

sort_values(*, axis: Axis = 0, ascending: bool | Sequence[bool] = True, inplace: bool = False, kind: SortKind = 'quicksort', na_position: NaPosition = 'last', ignore_index: bool = False, key: ValueKeyFunc = None) → Series | None

Sort by the values.

Sort a Series in ascending or descending order by some criterion.

Parameters:

axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
ascending (bool or list of bools, default True) – If True, sort values in ascending order, otherwise descending.
inplace (bool, default False) – If True, perform operation in-place.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort') – Choice of sorting algorithm. See also numpy.sort() for more information. ‘mergesort’ and ‘stable’ are the only stable algorithms.
na_position ({'first' or 'last'}, default 'last') – Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.
ignore_index (bool, default False) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
key (callable, optional) – If not None, apply the key function to the series values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return an array-like.

Returns:

Series ordered by values or None if inplace=True.

Return type:

Series or None

See also

Series.sort_index: Sort by the Series indices.
DataFrame.sort_values: Sort DataFrame by the values along either axis.
DataFrame.sort_index: Sort DataFrame by indices.

Examples

>>> s = pd.Series([np.nan, 1, 3, 10, 5])
>>> s
0     NaN
1     1.0
2     3.0
3     10.0
4     5.0
dtype: float64

Sort values ascending order (default behaviour)

>>> s.sort_values(ascending=True)
   1.0
   3.0
   5.0
  10.0
   NaN
dtype: float64

Sort values descending order

>>> s.sort_values(ascending=False)
  10.0
   5.0
   3.0
   1.0
   NaN
dtype: float64

Sort values putting NAs first

>>> s.sort_values(na_position='first')
   NaN
   1.0
   3.0
   5.0
  10.0
dtype: float64

Sort a series of strings

>>> s = pd.Series(['z', 'b', 'd', 'a', 'c'])
>>> s
0    z
1    b
2    d
3    a
4    c
dtype: object

>>> s.sort_values()
  a
  b
  c
  d
  z
dtype: object

Sort using a key function. Your key function will be given the Series of values and should return an array-like.

>>> s = pd.Series(['a', 'B', 'c', 'D', 'e'])
>>> s.sort_values()
1    B
3    D
0    a
2    c
4    e
dtype: object
>>> s.sort_values(key=lambda x: x.str.lower())
0    a
1    B
2    c
3    D
4    e
dtype: object

NumPy ufuncs work well here. For example, we can sort by the sin of the value

>>> s = pd.Series([-4, -2, 0, 2, 4])
>>> s.sort_values(key=np.sin)
1   -2
4    4
2    0
0   -4
3    2
dtype: int64

More complicated user-defined functions can be used, as long as they expect a Series and return an array-like

>>> s.sort_values(key=lambda x: (np.tan(x.cumsum())))
 -4
  2
  4
 -2
  0
dtype: int64

sparse: alias of SparseAccessor

std(axis: Axis | None = None, skipna: bool = True, ddof: int = 1, numeric_only: bool = False, **kwargs)

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:

axis ({index (0)}) –
For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.std with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).
skipna (bool, default True) – Exclude NA/null values. If an entire row/column is NA, the result will be NA.
ddof (int, default 1) – Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.

Return type:

scalar or Series (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

Examples

>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
...                    'age': [21, 25, 62, 43],
...                    'height': [1.61, 1.87, 1.49, 2.01]}
...                   ).set_index('person_id')
>>> df
           age  height
person_id
0           21    1.61
1           25    1.87
2           62    1.49
3           43    2.01

The standard deviation of the columns can be found as follows:

>>> df.std()
age       18.786076
height     0.237417
dtype: float64

Alternatively, ddof=0 can be set to normalize by N instead of N-1:

>>> df.std(ddof=0)
age       16.269219
height     0.205609
dtype: float64

str: alias of StringMethods

struct: alias of StructAccessor

sub(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Subtraction of series and other, element-wise (binary operator sub).

Equivalent to series - other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rsub: Reverse of the Subtraction operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.subtract(b, fill_value=0)
a    0.0
b    1.0
c    1.0
d   -1.0
e    NaN
dtype: float64

subtract(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Subtraction of series and other, element-wise (binary operator sub).

Equivalent to series - other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rsub: Reverse of the Subtraction operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.subtract(b, fill_value=0)
a    0.0
b    1.0
c    1.0
d   -1.0
e    NaN
dtype: float64

sum(axis: Axis | None = None, skipna: bool = True, numeric_only: bool = False, min_count: int = 0, **kwargs)

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum.

Parameters:

axis ({index (0)}) –
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.sum with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.
skipna (bool, default True) – Exclude NA/null values when computing the result.
numeric_only (bool, default False) – Include only float, int, boolean columns. Not implemented for Series.
min_count (int, default 0) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
**kwargs – Additional keyword arguments to be passed to the function.

Return type:

scalar or scalar

See also

Series.sum: Return the sum.
Series.min: Return the minimum.
Series.max: Return the maximum.
Series.idxmin: Return the index of the minimum.
Series.idxmax: Return the index of the maximum.
DataFrame.sum: Return the sum over the requested axis.
DataFrame.min: Return the minimum over the requested axis.
DataFrame.max: Return the maximum over the requested axis.
DataFrame.idxmin: Return the index of the minimum over the requested axis.
DataFrame.idxmax: Return the index of the maximum over the requested axis.

Examples

>>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
         falcon    2
cold     fish      0
         spider    8
Name: legs, dtype: int64

>>> s.sum()
14

By default, the sum of an empty or all-NA Series is 0.

>>> pd.Series([], dtype="float64").sum()  # min_count=0 is the default
0.0

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

>>> pd.Series([], dtype="float64").sum(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).sum()
0.0

>>> pd.Series([np.nan]).sum(min_count=1)
nan

swaplevel(i: Level = -2, j: Level = -1, copy: bool | None = None) → Series

Swap levels i and j in a MultiIndex.

Default is to swap the two innermost levels of the index.

Parameters:

i (int or str) – Levels of the indices to be swapped. Can pass level name as string.
j (int or str) – Levels of the indices to be swapped. Can pass level name as string.
copy (bool, default True) –
Whether to copy underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

Series with levels swapped in MultiIndex.

Return type:

Examples

>>> s = pd.Series(
...     ["A", "B", "A", "C"],
...     index=[
...         ["Final exam", "Final exam", "Coursework", "Coursework"],
...         ["History", "Geography", "History", "Geography"],
...         ["January", "February", "March", "April"],
...     ],
... )
>>> s
Final exam  History     January      A
            Geography   February     B
Coursework  History     March        A
            Geography   April        C
dtype: object

In the following example, we will swap the levels of the indices. Here, we will swap the levels column-wise, but levels can be swapped row-wise in a similar manner. Note that column-wise is the default behaviour. By not supplying any arguments for i and j, we swap the last and second to last indices.

>>> s.swaplevel()
Final exam  January     History         A
            February    Geography       B
Coursework  March       History         A
            April       Geography       C
dtype: object

By supplying one argument, we can choose which index to swap the last index with. We can for example swap the first index with the last one as follows.

>>> s.swaplevel(0)
January     History     Final exam      A
February    Geography   Final exam      B
March       History     Coursework      A
April       Geography   Coursework      C
dtype: object

We can also define explicitly which indices we want to swap by supplying values for both i and j. Here, we for example swap the first and second indices.

>>> s.swaplevel(0, 1)
History     Final exam  January         A
Geography   Final exam  February        B
History     Coursework  March           A
Geography   Coursework  April           C
dtype: object

to_dict(*, into: type[MutableMappingT] | MutableMappingT) → MutableMappingT

to_dict(*, into: type[dict] = <class 'dict'>) → dict

Convert Series to {label -> value} dict or dict-like object.

Parameters:: into (class, default dict) – The collections.abc.MutableMapping subclass to use as the return object. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.
Returns:: Key-value representation of Series.
Return type:: collections.abc.MutableMapping

Examples

>>> s = pd.Series([1, 2, 3, 4])
>>> s.to_dict()
{0: 1, 1: 2, 2: 3, 3: 4}
>>> from collections import OrderedDict, defaultdict
>>> s.to_dict(into=OrderedDict)
OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)])
>>> dd = defaultdict(list)
>>> s.to_dict(into=dd)
defaultdict(<class 'list'>, {0: 1, 1: 2, 2: 3, 3: 4})

to_frame(name: Hashable = <no_default>) → DataFrame

Convert Series to DataFrame.

Parameters:: name (object, optional) – The passed name should substitute for the series name (if it has one).
Returns:: DataFrame representation of Series.
Return type:: DataFrame

Examples

>>> s = pd.Series(["a", "b", "c"],
...               name="vals")
>>> s.to_frame()
  vals
0    a
1    b
2    c

to_markdown(buf: IO[str] | None = None, mode: str = 'wt', index: bool = True, storage_options: StorageOptions | None = None, **kwargs) → str | None

Print Series in Markdown-friendly format.

Parameters:

buf (str, Path or StringIO-like, optional, default None) – Buffer to write to. If None, the output is returned as a string.
mode (str, optional) – Mode in which file is opened, “wt” by default.
index (bool, optional, default True) – Add index (row) labels.
storage_options (dict, optional) –
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.
**kwargs –
These parameters will be passed to tabulate.

Returns:

Series in Markdown-friendly format.

Return type:

str

Notes

Requires the tabulate package.

Examples

>>> s = pd.Series(["elk", "pig", "dog", "quetzal"], name="animal")
>>> print(s.to_markdown())
|    | animal   |
|---:|:---------|
|  0 | elk      |
|  1 | pig      |
|  2 | dog      |
|  3 | quetzal  |

Output markdown with a tabulate option.

>>> print(s.to_markdown(tablefmt="grid"))
+----+----------+
|    | animal   |
+====+==========+
|  0 | elk      |
+----+----------+
|  1 | pig      |
+----+----------+
|  2 | dog      |
+----+----------+
|  3 | quetzal  |
+----+----------+

to_period(freq: str | None = None, copy: bool | None = None) → Series

Convert Series from DatetimeIndex to PeriodIndex.

Parameters:

freq (str, default None) – Frequency associated with the PeriodIndex.
copy (bool, default True) –
Whether or not to return a copy.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:

Series with index converted to PeriodIndex.

Return type:

Examples

>>> idx = pd.DatetimeIndex(['2023', '2024', '2025'])
>>> s = pd.Series([1, 2, 3], index=idx)
>>> s = s.to_period()
>>> s
2023    1
2024    2
2025    3
Freq: Y-DEC, dtype: int64

Viewing the index

>>> s.index
PeriodIndex(['2023', '2024', '2025'], dtype='period[Y-DEC]')

to_string(buf: None = None, na_rep: str = 'NaN', float_format: str | None = None, header: bool = True, index: bool = True, length: bool = False, dtype=False, name=False, max_rows: int | None = None, min_rows: int | None = None) → str

to_string(buf: FilePath | WriteBuffer[str], na_rep: str = 'NaN', float_format: str | None = None, header: bool = True, index: bool = True, length: bool = False, dtype=False, name=False, max_rows: int | None = None, min_rows: int | None = None) → None

Render a string representation of the Series.

Parameters:

buf (StringIO-like, optional) – Buffer to write to.
na_rep (str, optional) – String representation of NaN to use, default ‘NaN’.
float_format (one-parameter function, optional) – Formatter function to apply to columns’ elements if they are floats, default None.
header (bool, default True) – Add the Series header (index name).
index (bool, optional) – Add index (row) labels, default True.
length (bool, default False) – Add the Series length.
dtype (bool, default False) – Add the Series dtype.
name (bool, default False) – Add the Series name if not None.
max_rows (int, optional) – Maximum number of rows to show before truncating. If None, show all.
min_rows (int, optional) – The number of rows to display in a truncated repr (when number of rows is above max_rows).

Returns:

String representation of Series if buf=None, otherwise None.

Return type:

str or None

Examples

>>> ser = pd.Series([1, 2, 3]).to_string()
>>> ser
'0    1\n1    2\n2    3'

to_timestamp(freq: Frequency | None = None, how: Literal['s', 'e', 'start', 'end'] = 'start', copy: bool | None = None) → Series

Cast to DatetimeIndex of Timestamps, at beginning of period.

Parameters:

freq (str, default frequency of PeriodIndex) – Desired frequency.
how ({'s', 'e', 'start', 'end'}) – Convention for converting period to timestamp; start of period vs. end.
copy (bool, default True) –
Whether or not to return a copy.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Return type:

Series with DatetimeIndex

Examples

>>> idx = pd.PeriodIndex(['2023', '2024', '2025'], freq='Y')
>>> s1 = pd.Series([1, 2, 3], index=idx)
>>> s1
2023    1
2024    2
2025    3
Freq: Y-DEC, dtype: int64

The resulting frequency of the Timestamps is YearBegin

>>> s1 = s1.to_timestamp()
>>> s1
2023-01-01    1
2024-01-01    2
2025-01-01    3
Freq: YS-JAN, dtype: int64

Using freq which is the offset that the Timestamps will have

>>> s2 = pd.Series([1, 2, 3], index=idx)
>>> s2 = s2.to_timestamp(freq='M')
>>> s2
2023-01-31    1
2024-01-31    2
2025-01-31    3
Freq: YE-JAN, dtype: int64

transform(func: AggFuncType, axis: Axis = 0, *args, **kwargs) → DataFrame | Series

Call func on self producing a Series with the same axis shape as self.

Parameters:

func (function, str, list-like or dict-like) –
Function to use for transforming the data. If a function, must either work when passed a Series or when passed to Series.apply. If func is both list-like and dict-like, dict-like behavior takes precedence.

Accepted combinations are:
- function
- string function name
- list-like of functions and/or function names, e.g. [np.exp, 'sqrt']
- dict-like of axis labels -> functions, function names or list-like of such.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.

Returns:

A Series that must have the same length as self.

Return type:

:raises ValueError : If the returned Series has a different length than self.:

See also

Series.agg: Only perform aggregating type operations.
Series.apply: Invoke function on a Series.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Examples

>>> df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df
   A  B
0  0  1
1  1  2
2  2  3
>>> df.transform(lambda x: x + 1)
   A  B
0  1  2
1  2  3
2  3  4

Even though the resulting Series must have the same length as the input Series, it is possible to provide several input functions:

>>> s = pd.Series(range(3))
>>> s
0    0
1    1
2    2
dtype: int64
>>> s.transform([np.sqrt, np.exp])
       sqrt        exp
0  0.000000   1.000000
1  1.000000   2.718282
2  1.414214   7.389056

You can call transform on a GroupBy object:

>>> df = pd.DataFrame({
...     "Date": [
...         "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
...         "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
...     "Data": [5, 8, 6, 1, 50, 100, 60, 120],
... })
>>> df
         Date  Data
0  2015-05-08     5
1  2015-05-07     8
2  2015-05-06     6
3  2015-05-05     1
4  2015-05-08    50
5  2015-05-07   100
6  2015-05-06    60
7  2015-05-05   120
>>> df.groupby('Date')['Data'].transform('sum')
0     55
1    108
2     66
3    121
4     55
5    108
6     66
7    121
Name: Data, dtype: int64

>>> df = pd.DataFrame({
...     "c": [1, 1, 1, 2, 2, 2, 2],
...     "type": ["m", "n", "o", "m", "m", "n", "n"]
... })
>>> df
   c type
0  1    m
1  1    n
2  1    o
3  2    m
4  2    m
5  2    n
6  2    n
>>> df['size'] = df.groupby('c')['type'].transform(len)
>>> df
   c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4

truediv(other, level=None, fill_value=None, axis: Axis = 0) → Series

Return Floating division of series and other, element-wise (binary operator truediv).

Equivalent to series / other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters:

other (Series or scalar value)
level (int or name) – Broadcast across a level, matching Index values on the passed MultiIndex level.
fill_value (None or float value, default None (NaN)) – Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.
axis ({0 or 'index'}) – Unused. Parameter needed for compatibility with DataFrame.

Returns:

The result of the operation.

Return type:

See also

Series.rtruediv: Reverse of the Floating division operator, see Python documentation for more details.

Examples

>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64
>>> a.divide(b, fill_value=0)
a    1.0
b    inf
c    inf
d    0.0
e    NaN
dtype: float64

unique() → ArrayLike

Return unique values of Series object.

Uniques are returned in order of appearance. Hash table-based unique, therefore does NOT sort.

Returns:: The unique values returned as a NumPy array. See Notes.
Return type:: ndarray or ExtensionArray

See also

Series.drop_duplicates: Return Series with duplicate values removed.
unique: Top-level unique method for any 1-d array-like object.
Index.unique: Return Index with unique values from an Index object.

Notes

Returns the unique values as a NumPy array. In case of an extension-array backed Series, a new ExtensionArray of that type with just the unique values is returned. This includes

Categorical

Period

Datetime with Timezone

Datetime without Timezone

Timedelta

Interval

Sparse

IntegerNA

See Examples section.

Examples

>>> pd.Series([2, 1, 3, 3], name='A').unique()
array([2, 1, 3])

>>> pd.Series([pd.Timestamp('2016-01-01') for _ in range(3)]).unique()
<DatetimeArray>
['2016-01-01 00:00:00']
Length: 1, dtype: datetime64[ns]

>>> pd.Series([pd.Timestamp('2016-01-01', tz='US/Eastern')
...            for _ in range(3)]).unique()
<DatetimeArray>
['2016-01-01 00:00:00-05:00']
Length: 1, dtype: datetime64[ns, US/Eastern]

An Categorical will return categories in the order of appearance and with the same dtype.

>>> pd.Series(pd.Categorical(list('baabc'))).unique()
['b', 'a', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> pd.Series(pd.Categorical(list('baabc'), categories=list('abc'),
...                          ordered=True)).unique()
['b', 'a', 'c']
Categories (3, object): ['a' < 'b' < 'c']

unstack(level: IndexLabel = -1, fill_value: Hashable | None = None, sort: bool = True) → DataFrame

Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.

Parameters:

level (int, str, or list of these, default last level) – Level(s) to unstack, can pass level name.
fill_value (scalar value, default None) – Value to use when replacing NaN values.
sort (bool, default True) – Sort the level(s) in the resulting MultiIndex columns.

Returns:

Unstacked Series.

Return type: