hipdf.MultiIndex

hipdf.MultiIndex#

70 min read time

Applies to Linux

class hipdf.MultiIndex(levels=None, codes=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity=True)#

Bases: Frame, BaseIndex, NotIterable

A multi-level or hierarchical index.

Provides N-Dimensional indexing into Series and DataFrame objects.

Parameters#

levelssequence of arrays: The unique labels for each level.
codes: sequence of arrays: Integers for each level designating which label at each location.
sortorderoptional int: Not yet supported
names: optional sequence of objects: Names for each of the index levels.
copybool, default False: Copy the levels and codes.
verify_integritybool, default True: Check that the levels/codes are consistent and valid. Not yet supported

Attributes#

names nlevels dtypes levels codes

Methods#

from_arrays from_tuples from_product from_frame set_levels set_codes to_frame to_flat_index sortlevel droplevel swaplevel reorder_levels remove_unused_levels get_level_values get_loc drop

Returns#

MultiIndex

Examples#

>>> import cudf
>>> cudf.MultiIndex(
... levels=[[1, 2], ['blue', 'red']], codes=[[0, 0, 1, 1], [1, 0, 1, 0]])
MultiIndex([(1,  'red'),
            (1, 'blue'),
            (2,  'red'),
            (2, 'blue')],
           )

__init__(levels=None, codes=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity=True)#

Methods

`__init__`([levels, codes, sortorder, names, ...])
`all`([axis, skipna])	Return whether all elements are True in DataFrame.
`any`([axis, skipna])	Return whether any elements is True in DataFrame.
`append`(other)	Append a collection of MultiIndex objects together
`argsort`([by, axis, kind, order, ascending, ...])	Return the integer indices that would sort the Series values.
`astype`(dtype[, copy])	Create an Index with values cast to dtypes.
`copy`([names, deep, name])	Returns copy of MultiIndex object.
`deserialize`(header, frames)	Generate an object from a serialized representation.
`device_deserialize`(header, frames)	Perform device-side deserialization tasks.
`device_serialize`()	Serialize data and metadata associated with device memory.
`difference`(other[, sort])	Return a new Index with elements from the index that are not in other.
`drop_duplicates`([keep, nulls_are_equal])	Drop duplicate rows in index.
`droplevel`([level])	Removes the specified levels from the MultiIndex.
`dropna`([how])	Drop null rows from Index.
`duplicated`([keep])	Indicate duplicate index values.
`equals`(other)	Test whether two objects contain the same elements.
`factorize`([sort, use_na_sentinel])
`fillna`(value)	Fill null values with the specified value.
`find_label_range`(loc)	Translate a label-based slice to an index-based slice
`from_arrays`(arrays[, sortorder, names])	Convert arrays to MultiIndex.
`from_arrow`(data)	Convert from PyArrow Table to Frame
`from_frame`(df[, sortorder, names])	Make a MultiIndex from a DataFrame.
`from_pandas`(multiindex[, nan_as_null])	Convert from a Pandas MultiIndex
`from_product`(iterables[, sortorder, names])	Make a MultiIndex from the cartesian product of multiple iterables.
`from_tuples`(tuples[, sortorder, names])	Convert list of tuples to MultiIndex.
`get_indexer`(target[, method, limit, tolerance])	Compute indexer and mask for new index given the current index.
`get_level_values`(level)	Return the values at the requested level
`get_loc`(key)	Get integer location, slice or boolean mask for requested label.
`get_slice_bound`(label, side)	Calculate slice bound that corresponds to given label.
`host_deserialize`(header, frames)	Perform device-side deserialization tasks.
`host_serialize`()	Serialize data and metadata associated with host memory.
`intersection`(other[, sort])	Form the intersection of two Index objects.
`is_boolean`()	Check if the Index only consists of booleans.
`is_categorical`()	Check if the Index holds categorical data.
`is_floating`()	Check if the Index is a floating type.
`is_integer`()	Check if the Index only consists of integers.
`is_interval`()	Check if the Index holds Interval objects.
`is_numeric`()	Check if the Index only consists of numeric data.
`is_object`()	Check if the Index is of the object dtype.
`isin`(values[, level])	Return a boolean array where the index values are in values.
`isna`()	Identify missing values.
`isnull`()	Identify missing values.
`join`(other[, how, level, return_indexers, sort])	Compute join_index and indexers to conform data structures to the new index.
`max`([axis, skipna, numeric_only])	Return the maximum of the values in the DataFrame.
`memory_usage`([deep])	Return the memory usage of an object.
`min`([axis, skipna, numeric_only])	Return the minimum of the values in the DataFrame.
`notna`()	Identify non-missing values.
`notnull`()	Identify non-missing values.
`nunique`([dropna])	Returns a per column mapping with counts of unique values for each column.
`rename`(names[, inplace])	Alter MultiIndex level names
`repeat`(repeats[, axis])	Repeat elements of a Index.
`searchsorted`(values[, side, sorter, ...])	Find indices where elements should be inserted to maintain order
`serialize`()	Generate an equivalent serializable representation of an object.
`set_names`(names[, level, inplace])	Set Index or MultiIndex name.
`shift`([periods, freq])	Not yet implemented
`sort_values`([return_indexer, ascending, ...])	Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
`swaplevel`([i, j])	Swap level i with level j.
`take`(indices)	Return a new index containing the rows specified by indices
`to_arrow`()	Convert to arrow Table
`to_cupy`([dtype, copy, na_value])	Convert the Frame to a CuPy array.
`to_dlpack`()	Converts a cuDF object into a DLPack tensor.
`to_flat_index`()	Convert a MultiIndex to an Index of Tuples containing the level values.
`to_frame`([index, name, allow_duplicates])	Create a DataFrame with the levels of the MultiIndex as columns.
`to_list`()
`to_numpy`()	Convert the Frame to a NumPy array.
`to_pandas`(*[, nullable, arrow_type])	Convert to a Pandas Index.
`to_series`([index, name])	Create a Series with both index and values equal to the index keys.
`tolist`()
`union`(other[, sort])	Form the union of two Index objects.
`unique`([level])	Return unique values in the index.
`where`(cond[, other, inplace])	Replace values where the condition is False.

Attributes

`codes`	Returns the codes of the underlying MultiIndex.
`dtype`
`empty`
`has_duplicates`
`hasnans`	Return True if there are any NaNs or nulls.
`is_monotonic_decreasing`	Return if the index is monotonic decreasing (only equal or decreasing) values.
`is_monotonic_increasing`	Return if the index is monotonic increasing (only equal or increasing) values.
`is_unique`	Return if the index has unique values.
`levels`	Returns list of levels in the MultiIndex
`name`	Returns the name of the Index.
`names`	Returns a FrozenList containing the name of the Index.
`ndim`	Dimension of the data.
`nlevels`	Integer number of levels in this MultiIndex.
`shape`	Get a tuple representing the dimensionality of the data.
`size`	Return the number of elements in the underlying data.
`str`	Not yet implemented.
`values`	Return a CuPy representation of the MultiIndex.
`values_host`	Return a numpy representation of the MultiIndex.

__init__(levels=None, codes=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity=True)#

property names#: Returns a FrozenList containing the name of the Index.

to_series(index=None, name=None)#

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters#

indexIndex, optional: Index of resulting Series. If None, defaults to original index.
namestr, optional: Name of resulting Series. If None, defaults to name of original index.

Returns#

Series: The dtype will be based on the type of the Index values.

astype(dtype, copy: bool = True) → Self#

Create an Index with values cast to dtypes.

The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters#

dtypenumpy.dtype: Use a numpy.dtype to cast entire Index object to.
copybool, default False: By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns#

Index: Index with values cast to specified dtype.

Examples#

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Index([1.0, 2.0, 3.0], dtype='float64')

rename(names, inplace: bool = False) → Self | None#

Alter MultiIndex level names

Parameters#

nameslist of label: Names to set, length must be the same as number of levels
inplacebool, default False: If True, modifies objects directly, otherwise returns a new MultiIndex instance

Returns#

None or MultiIndex

Examples#

Renaming each levels of a MultiIndex to specified name:

>>> midx = cudf.MultiIndex.from_product(
...     [('A', 'B'), (2020, 2021)], names=['c1', 'c2'])
>>> midx.rename(['lv1', 'lv2'])
MultiIndex([('A', 2020),
            ('A', 2021),
            ('B', 2020),
            ('B', 2021)],
        names=['lv1', 'lv2'])
>>> midx.rename(['lv1', 'lv2'], inplace=True)
>>> midx
MultiIndex([('A', 2020),
            ('A', 2021),
            ('B', 2020),
            ('B', 2021)],
        names=['lv1', 'lv2'])

names argument must be a list, and must have same length as MultiIndex.levels:

>>> midx.rename(['lv0'])
Traceback (most recent call last):
ValueError: Length of names must match number of levels in MultiIndex.

set_names(names, level=None, inplace: bool = False) → Self | None#

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters#

nameslabel or list of label: Name(s) to set.
levelint, label or list of int or label, optional: If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.
inplacebool, default False: Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns#

Index: The same type as the caller or None if inplace is True.

Examples#

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])

property name#: Returns the name of the Index.

copy(names=None, deep=False, name=None) → Self#

Returns copy of MultiIndex object.

Returns a copy of MultiIndex. The levels and codes value can be set to the provided parameters. When they are provided, the returned MultiIndex is always newly constructed.

Parameters#

namessequence of objects, optional (default None): Names for each of the index levels.
deepBool (default False): If True, ._data, ._levels, ._codes will be copied. Ignored if levels or codes are specified.
nameobject, optional (default None): Kept for compatibility with 1-dimensional Index. Should not be used.

Returns#

Copy of MultiIndex Instance

Examples#

>>> df = cudf.DataFrame({'Close': [3400.00, 226.58, 3401.80, 228.91]})
>>> idx1 = cudf.MultiIndex(
... levels=[['2020-08-27', '2020-08-28'], ['AMZN', 'MSFT']],
... codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
... names=['Date', 'Symbol'])
>>> idx2 = idx1.copy(
... names=['col1', 'col2'])

>>> df.index = idx1
>>> df
                     Close
Date       Symbol
2020-08-27 AMZN    3400.00
           MSFT     226.58
2020-08-28 AMZN    3401.80
           MSFT     228.91

>>> df.index = idx2
>>> df
                   Close
col1       col2
2020-08-27 AMZN  3400.00
           MSFT   226.58
2020-08-28 AMZN  3401.80
           MSFT   228.91

property codes: FrozenList#

Returns the codes of the underlying MultiIndex.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a':[1, 2, 3], 'b':[10, 11, 12]})
>>> midx = cudf.MultiIndex.from_frame(df)
>>> midx
MultiIndex([(1, 10),
            (2, 11),
            (3, 12)],
        names=['a', 'b'])
>>> midx.codes
FrozenList([[0, 1, 2], [0, 1, 2]])

get_slice_bound(label, side)#

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters#

label : object side : {‘left’, ‘right’}

Returns#

int: Index of label.

property nlevels: int#: Integer number of levels in this MultiIndex.

property levels: list[Index]#

Returns list of levels in the MultiIndex

Returns#

List of Index objects

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a':[1, 2, 3], 'b':[10, 11, 12]})
>>> cudf.MultiIndex.from_frame(df)
MultiIndex([(1, 10),
            (2, 11),
            (3, 12)],
        names=['a', 'b'])
>>> midx = cudf.MultiIndex.from_frame(df)
>>> midx
MultiIndex([(1, 10),
            (2, 11),
            (3, 12)],
        names=['a', 'b'])
>>> midx.levels
[Index([1, 2, 3], dtype='int64', name='a'), Index([10, 11, 12], dtype='int64', name='b')]

property ndim: int#: Dimension of the data. For MultiIndex ndim is always 2.

isin(values, level=None) → ndarray#

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters#

valuesset, list-like, Index or Multi-Index: Sought values.
levelstr or int, optional: Name or position of the index level to use (if the index is a MultiIndex).

Returns#

is_containedcupy array: CuPy array of boolean values.

Notes#

When level is None, values can only be MultiIndex, or a set/list-like tuples. When level is provided, values can be Index or MultiIndex, or a set/list-like tuples.

Examples#

>>> import cudf
>>> import pandas as pd
>>> midx = cudf.from_pandas(pd.MultiIndex.from_arrays([[1,2,3],
...                                  ['red', 'blue', 'green']],
...                                  names=('number', 'color')))
>>> midx
MultiIndex([(1,   'red'),
            (2,  'blue'),
            (3, 'green')],
           names=['number', 'color'])

Check whether the strings in the ‘color’ level of the MultiIndex are in a list of colors.

>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])

To check across the levels of a MultiIndex, pass a list of tuples:

>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])

where(cond, other=None, inplace=False)#

Replace values where the condition is False.

Parameters#

condbool Series/DataFrame, array-like

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns#

Same type as caller

Examples#

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.where(df % 2 == 0, [-1, -1])
   A  B
0 -1 -1
1  4 -1
2 -1  8

>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.where(ser > 2, 10)
0     4
1     3
2    10
3    10
4    10
dtype: int64
>>> ser.where(ser > 2)
0       4
1       3
2    <NA>
3    <NA>
4    <NA>
dtype: int64

property size: int#

Return the number of elements in the underlying data.

Returns#

size : Size of the DataFrame / Index / Series / MultiIndex

Examples#

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5

take(indices) → Self#

Return a new index containing the rows specified by indices

Parameters#

indicesarray-like: Array of ints indicating which positions to take.
axisint: The axis over which to select values, always 0.

allow_fill : Unsupported fill_value : Unsupported

Returns#

outIndex: New object with desired subset of rows.

Examples#

>>> idx = cudf.Index(['a', 'b', 'c', 'd', 'e'])
>>> idx.take([2, 0, 4, 3])
Index(['c', 'a', 'e', 'd'], dtype='object')

__getitem__(index)#

to_frame(index: bool = True, name=<no_default>, allow_duplicates: bool = False) → DataFrame#

Create a DataFrame with the levels of the MultiIndex as columns.

Column ordering is determined by the DataFrame constructor with data as a dict.

Parameters#

indexbool, default True: Set the index of the returned DataFrame as the original MultiIndex.
namelist / sequence of str, optional: The passed names should substitute index level names.
allow_duplicatesbool, optional default False: Allow duplicate column labels to be created. Note that this parameter is non-functional because duplicates column labels aren’t supported in cudf.

Returns#

DataFrame

Examples#

>>> import cudf
>>> mi = cudf.MultiIndex.from_tuples([('a', 'c'), ('b', 'd')])
>>> mi
MultiIndex([('a', 'c'),
            ('b', 'd')],
           )

>>> df = mi.to_frame()
>>> df
     0  1
a c  a  c
b d  b  d

>>> df = mi.to_frame(index=False)
>>> df
   0  1
0  a  c
1  b  d

>>> df = mi.to_frame(name=['x', 'y'])
>>> df
     x  y
a c  a  c
b d  b  d

get_level_values(level) → Index#: Return the values at the requested level

Parameters#

level : int or label

Returns#

An Index containing the values at the requested level.

classmethod from_tuples(tuples, sortorder: int | None = None, names=None) → Self#

Convert list of tuples to MultiIndex.

Parameters#

tupleslist / sequence of tuple-likes: Each tuple is the index of one row/column.
sortorderint or None: Level of sortedness (must be lexicographically sorted by that level).
nameslist / sequence of str, optional: Names for the levels in the index.

Returns#

MultiIndex

Examples#

>>> tuples = [(1, 'red'), (1, 'blue'),
...           (2, 'red'), (2, 'blue')]
>>> cudf.MultiIndex.from_tuples(tuples, names=('number', 'color'))
MultiIndex([(1,  'red'),
            (1, 'blue'),
            (2,  'red'),
            (2, 'blue')],
           names=['number', 'color'])

to_numpy() → ndarray#

Convert the Frame to a NumPy array.

Parameters#

dtypestr or numpy.dtype, optional: The dtype to pass to numpy.asarray().
copybool, default True: Whether to ensure that the returned value is not a view on another array. This parameter must be True since cuDF must copy device memory to host to provide a numpy array.
na_valueAny, default None: The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

Returns#

numpy.ndarray

to_flat_index()#

Convert a MultiIndex to an Index of Tuples containing the level values.

This is not currently implemented

property values_host: ndarray#

Return a numpy representation of the MultiIndex.

Only the values in the MultiIndex will be returned.

Returns#

outnumpy.ndarray: The values of the MultiIndex.

Examples#

>>> import cudf
>>> midx = cudf.MultiIndex(
...         levels=[[1, 3, 4, 5], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx.values_host
array([(1, 1), (1, 5), (3, 2), (4, 2), (5, 1)], dtype=object)
>>> type(midx.values_host)
<class 'numpy.ndarray'>

property values: ndarray#

Return a CuPy representation of the MultiIndex.

Only the values in the MultiIndex will be returned.

Returns#

out: cupy.ndarray: The values of the MultiIndex.

Examples#

>>> import cudf
>>> midx = cudf.MultiIndex(
...         levels=[[1, 3, 4, 5], [1, 2, 5]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...     )
>>> midx.values
array([[1, 1],
    [1, 5],
    [3, 2],
    [4, 2],
    [5, 1]])
>>> type(midx.values)
<class 'cupy...ndarray'>

classmethod from_frame(df: pd.DataFrame | cudf.DataFrame, sortorder: int | None = None, names=None) → Self#

Make a MultiIndex from a DataFrame.

Parameters#

dfDataFrame: DataFrame to be converted to MultiIndex.
sortorderint, optional: Level of sortedness (must be lexicographically sorted by that level).
nameslist-like, optional: If no names are provided, use the column names, or tuple of column names if the columns is a MultiIndex. If a sequence, overwrite names with the given sequence.

Returns#

MultiIndex: The MultiIndex representation of the given DataFrame.

Examples#

>>> import cudf
>>> df = cudf.DataFrame([['HI', 'Temp'], ['HI', 'Precip'],
...                    ['NJ', 'Temp'], ['NJ', 'Precip']],
...                   columns=['a', 'b'])
>>> df
      a       b
0    HI    Temp
1    HI  Precip
2    NJ    Temp
3    NJ  Precip
>>> cudf.MultiIndex.from_frame(df)
MultiIndex([('HI',   'Temp'),
            ('HI', 'Precip'),
            ('NJ',   'Temp'),
            ('NJ', 'Precip')],
           names=['a', 'b'])

Using explicit names, instead of the column names

>>> cudf.MultiIndex.from_frame(df, names=['state', 'observation'])
MultiIndex([('HI',   'Temp'),
            ('HI', 'Precip'),
            ('NJ',   'Temp'),
            ('NJ', 'Precip')],
           names=['state', 'observation'])

classmethod from_product(iterables, sortorder: int | None = None, names=None) → Self#

Make a MultiIndex from the cartesian product of multiple iterables.

Parameters#

iterableslist / sequence of iterables: Each iterable has unique labels for each level of the index.
sortorderint or None: Level of sortedness (must be lexicographically sorted by that level).
nameslist / sequence of str, optional: Names for the levels in the index. If not explicitly provided, names will be inferred from the elements of iterables if an element has a name attribute

Returns#

MultiIndex

Examples#

>>> numbers = [0, 1, 2]
>>> colors = ['green', 'purple']
>>> cudf.MultiIndex.from_product([numbers, colors],
...                            names=['number', 'color'])
MultiIndex([(0,  'green'),
            (0, 'purple'),
            (1,  'green'),
            (1, 'purple'),
            (2,  'green'),
            (2, 'purple')],
           names=['number', 'color'])

classmethod from_arrays(arrays, sortorder=None, names=None) → Self#

Convert arrays to MultiIndex.

Parameters#

arrayslist / sequence of array-likes: Each array-like gives one level’s value for each data point. len(arrays) is the number of levels.
sortorderoptional int: Not yet supported
nameslist / sequence of str, optional: Names for the levels in the index.

Returns#

MultiIndex

Examples#

>>> arrays = [[1, 1, 2, 2], ['red', 'blue', 'red', 'blue']]
>>> cudf.MultiIndex.from_arrays(arrays, names=('number', 'color'))
MultiIndex([(1,  'red'),
            (1, 'blue'),
            (2,  'red'),
            (2, 'blue')],
           names=['number', 'color'])

swaplevel(i=-2, j=-1) → Self#

Swap level i with level j. Calling this method does not change the ordering of the values.

Parameters#

iint or str, default -2: First level of index to be swapped.
jint or str, default -1: Second level of index to be swapped.

Returns#

MultiIndex: A new MultiIndex.

Examples#

>>> import cudf
>>> mi = cudf.MultiIndex(levels=[['a', 'b'], ['bb', 'aa']],
...                    codes=[[0, 0, 1, 1], [0, 1, 0, 1]])
>>> mi
MultiIndex([('a', 'bb'),
    ('a', 'aa'),
    ('b', 'bb'),
    ('b', 'aa')],
   )
>>> mi.swaplevel(0, 1)
MultiIndex([('bb', 'a'),
    ('aa', 'a'),
    ('bb', 'b'),
    ('aa', 'b')],
   )

droplevel(level=-1) → Self | cudf.Index#

Removes the specified levels from the MultiIndex.

Parameters#

levellevel name or index, list-like: Integer, name or list of such, specifying one or more levels to drop from the MultiIndex

Returns#

A MultiIndex or Index object, depending on the number of remaining levels.

Examples#

>>> import cudf
>>> idx = cudf.MultiIndex.from_frame(
...     cudf.DataFrame(
...         {
...             "first": ["a", "a", "a", "b", "b", "b"],
...             "second": [1, 1, 2, 2, 3, 3],
...             "third": [0, 1, 2, 0, 1, 2],
...         }
...     )
... )

Dropping level by index:

>>> idx.droplevel(0)
MultiIndex([(1, 0),
            (1, 1),
            (2, 2),
            (2, 0),
            (3, 1),
            (3, 2)],
           names=['second', 'third'])

Dropping level by name:

>>> idx.droplevel("first")
MultiIndex([(1, 0),
            (1, 1),
            (2, 2),
            (2, 0),
            (3, 1),
            (3, 2)],
           names=['second', 'third'])

Dropping multiple levels:

>>> idx.droplevel(["first", "second"])
Index([0, 1, 2, 0, 1, 2], dtype='int64', name='third')

to_pandas(*, nullable: bool = False, arrow_type: bool = False) → MultiIndex#

Convert to a Pandas Index.

Parameters#

nullablebool, Default False: If nullable is True, the resulting index will have a corresponding nullable Pandas dtype. If there is no corresponding nullable Pandas dtype present, the resulting dtype will be a regular pandas dtype. If nullable is False, the resulting index will either convert null values to np.nan or None depending on the dtype.
arrow_typebool, Default False: Return the Index with a pandas.ArrowDtype

Notes#

nullable and arrow_type cannot both be set to True

Examples#

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.base.Index'>
>>> type(idx)
<class 'cudf.core.index.Index'>
>>> idx.to_pandas(arrow_type=True)
Index([-3, 10, 15, 20], dtype='int64[pyarrow]')

classmethod from_pandas(multiindex: pd.MultiIndex, nan_as_null=<no_default>) → Self#

Convert from a Pandas MultiIndex

Raises#

TypeError for invalid input type.

Examples#

>>> import cudf
>>> import pandas as pd
>>> pmi = pd.MultiIndex(levels=[['a', 'b'], ['c', 'd']],
...                     codes=[[0, 1], [1, 1]])
>>> cudf.from_pandas(pmi)
MultiIndex([('a', 'd'),
            ('b', 'd')],
           )

property is_unique: bool#: Return if the index has unique values.

property dtype: dtype#

property is_monotonic_increasing: bool#: Return if the index is monotonic increasing (only equal or increasing) values.

property is_monotonic_decreasing: bool#: Return if the index is monotonic decreasing (only equal or decreasing) values.

fillna(value) → Self#

Fill null values with the specified value.

Parameters#

valuescalar: Scalar value to use to fill nulls. This value cannot be a list-likes.

Returns#

filled : MultiIndex

Examples#

>>> import cudf
>>> index = cudf.MultiIndex(
...         levels=[["a", "b", "c", None], ["1", None, "5"]],
...         codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...         names=["x", "y"],
...       )
>>> index
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> index.fillna('hello')
MultiIndex([(    'a',     '1'),
            (    'a',     '5'),
            (    'b', 'hello'),
            (    'c', 'hello'),
            ('hello',     '1')],
           names=['x', 'y'])

unique(level: int | None = None) → Self | cudf.Index#: Return unique values in the index.

Returns#

Index without duplicates

nunique(dropna: bool = True) → int#

Returns a per column mapping with counts of unique values for each column.

Parameters#

dropnabool, default True: Don’t include NaN in the counts.

Returns#

dict: Name and unique value counts of each column in frame.

memory_usage(deep: bool = False) → int#

Return the memory usage of an object.

Parameters#

deepbool: The deep parameter is ignored and is only included for pandas compatibility.

Returns#

The total bytes used.

difference(other, sort=None) → Self#

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters#

other : Index or array-like sort : False or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

False : Do not sort the result.

True : Sort the result (which may raise TypeError).

Returns#

difference : Index

Examples#

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Index([2, 1], dtype='int64')

append(other) → Self#

Append a collection of MultiIndex objects together

Parameters#

other : MultiIndex or list/tuple of MultiIndex objects

Returns#

appended : Index

Examples#

>>> import cudf
>>> idx1 = cudf.MultiIndex(
...     levels=[[1, 2], ['blue', 'red']],
...     codes=[[0, 0, 1, 1], [1, 0, 1, 0]]
... )
>>> idx2 = cudf.MultiIndex(
...     levels=[[3, 4], ['blue', 'red']],
...     codes=[[0, 0, 1, 1], [1, 0, 1, 0]]
... )
>>> idx1
MultiIndex([(1,  'red'),
            (1, 'blue'),
            (2,  'red'),
            (2, 'blue')],
           )
>>> idx2
MultiIndex([(3,  'red'),
            (3, 'blue'),
            (4,  'red'),
            (4, 'blue')],
           )
>>> idx1.append(idx2)
MultiIndex([(1,  'red'),
            (1, 'blue'),
            (2,  'red'),
            (2, 'blue'),
            (3,  'red'),
            (3, 'blue'),
            (4,  'red'),
            (4, 'blue')],
           )

get_indexer(target, method=None, limit=None, tolerance=None)#

Compute indexer and mask for new index given the current index.

The indexer should be then used as an input to ndarray.take to align the current data to the new index.

Parameters#

target : Index method : {None, ‘pad’/’fill’, ‘backfill’/’bfill’, ‘nearest’}, optional

default: exact matches only.

pad / ffill: find the PREVIOUS index value if no exact match.

backfill / bfill: use NEXT index value if no exact match.

nearest: use the NEAREST index value if no exact match. Tied distances are broken by preferring the larger index value.

toleranceint or float, optional: Maximum distance from index value for inexact matches. The value of the index at the matching location must satisfy the equation abs(index[loc] - target) <= tolerance.

Returns#

cupy.ndarray: Integers from 0 to n - 1 indicating that the index at these positions matches the corresponding target values. Missing values in the target are marked by -1.

Examples#

>>> import cudf
>>> index = cudf.Index(['c', 'a', 'b'])
>>> index
Index(['c', 'a', 'b'], dtype='object')
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1,  2, -1], dtype=int32)

get_loc(key)#

Get integer location, slice or boolean mask for requested label.

Parameters#

key : label

Returns#

int or slice or boolean mask

If result is unique, return integer index
If index is monotonic, loc is returned as a slice object
Otherwise, a boolean mask is returned

Examples#

>>> import cudf
>>> unique_index = cudf.Index(list('abc'))
>>> unique_index.get_loc('b')
1
>>> monotonic_index = cudf.Index(list('abbc'))
>>> monotonic_index.get_loc('b')
slice(1, 3, None)
>>> non_monotonic_index = cudf.Index(list('abcb'))
>>> non_monotonic_index.get_loc('b')
array([False,  True, False,  True])
>>> numeric_unique_index = cudf.Index([1, 2, 3])
>>> numeric_unique_index.get_loc(3)
2

MultiIndex

>>> multi_index = cudf.MultiIndex.from_tuples([('a', 'd'), ('b', 'e'), ('b', 'f')])
>>> multi_index
MultiIndex([('a', 'd'),
            ('b', 'e'),
            ('b', 'f')],
        )
>>> multi_index.get_loc('b')
slice(1, 3, None)
>>> multi_index.get_loc(('b', 'e'))
1

union(other, sort=None) → Self#

Form the union of two Index objects.

Parameters#

other : Index or array-like sort : bool or None, default None

Whether to sort the resulting Index.

None : Sort the result, except when

self and other are equal.

self or other has length 0.

False : do not sort the result.

True : Sort the result (which may raise TypeError).

Returns#

union : Index

Examples#

Union of an Index >>> import cudf >>> import pandas as pd >>> idx1 = cudf.Index([1, 2, 3, 4]) >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx1.union(idx2) Index([1, 2, 3, 4, 5, 6], dtype=’int64’)

MultiIndex case

>>> idx1 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx1
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (2,  'Red'),
            (2, 'Blue')],
           )
>>> idx2 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
...    )
... )
>>> idx2
MultiIndex([(3,   'Red'),
            (3, 'Green'),
            (2,   'Red'),
            (2, 'Green')],
           )
>>> idx1.union(idx2)
MultiIndex([(1,  'Blue'),
            (1,   'Red'),
            (2,  'Blue'),
            (2, 'Green'),
            (2,   'Red'),
            (3, 'Green'),
            (3,   'Red')],
           )
>>> idx1.union(idx2, sort=False)
MultiIndex([(1,   'Red'),
            (1,  'Blue'),
            (2,   'Red'),
            (2,  'Blue'),
            (3,   'Red'),
            (3, 'Green'),
            (2, 'Green')],
           )

all(axis=0, skipna=True, **kwargs)#

Return whether all elements are True in DataFrame.

Parameters#

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

0 or ‘index’reduce the index, return a Series
whose index is the original column labels.
1 or ‘columns’reduce the columns, return a Series
whose index is the original index.
None : reduce all axes, return a scalar.

skipna: bool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

Returns#

Series

Notes#

Parameters currently not supported are bool_only.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]})
>>> df.all()
a     True
b    False
dtype: bool

any(axis=0, skipna=True, **kwargs)#

Return whether any elements is True in DataFrame.

Parameters#

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

0 or ‘index’reduce the index, return a Series
whose index is the original column labels.
1 or ‘columns’reduce the columns, return a Series
whose index is the original index.
None : reduce all axes, return a scalar.

skipna: bool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

Returns#

Series

Notes#

Parameters currently not supported are bool_only.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]})
>>> df.any()
a    True
b    True
dtype: bool

argsort(by=None, axis=0, kind='quicksort', order=None, ascending=True, na_position='last') → ndarray#

Return the integer indices that would sort the Series values.

Parameters#

bystr or list of str, default None: Name or list of names to sort by. If None, sort by all columns.
axis{0 or “index”}: Has no effect but is accepted for compatibility with numpy.
kind{‘mergesort’, ‘quicksort’, ‘heapsort’, ‘stable’}, default ‘quicksort’: Choice of sorting algorithm. See numpy.sort() for more information. ‘mergesort’ and ‘stable’ are the only stable algorithms. Only quicksort is supported in cuDF.
orderNone: Has no effect but is accepted for compatibility with numpy.
ascendingbool or list of bool, default True: If True, sort values in ascending order, otherwise descending.
na_position{‘first’ or ‘last’}, default ‘last’: Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.

Returns#

cupy.ndarray: The indices sorted based on input.

Examples#

Series

>>> import cudf
>>> s = cudf.Series([3, 1, 2])
>>> s
0    3
1    1
2    2
dtype: int64
>>> s.argsort()
0    1
1    2
2    0
dtype: int32
>>> s[s.argsort()]
1    1
2    2
0    3
dtype: int64

DataFrame >>> import cudf >>> df = cudf.DataFrame({‘foo’: [3, 1, 2]}) >>> df.argsort() array([1, 2, 0], dtype=int32)

Index >>> import cudf >>> idx = cudf.Index([3, 1, 2]) >>> idx.argsort() array([1, 2, 0], dtype=int32)

drop_duplicates(keep='first', nulls_are_equal=True)#

Drop duplicate rows in index.

keep{“first”, “last”, False}, default “first”

‘first’ : Drop duplicates except for the first occurrence.
‘last’ : Drop duplicates except for the last occurrence.
False : Drop all duplicates.

nulls_are_equal: bool, default True

Null elements are considered equal to other null elements.

dropna(how='any')#

Drop null rows from Index.

how{“any”, “all”}, default “any”: Specifies how to decide whether to drop a row. “any” (default) drops rows containing at least one null value. “all” drops only rows containing all null values.

duplicated(keep='first') → cupy.ndarray#

Indicate duplicate index values.

Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.

Parameters#

keep{‘first’, ‘last’, False}, default ‘first’

The value or values in a set of duplicates to mark as missing.

'first' : Mark duplicates as True except for the first occurrence.
'last' : Mark duplicates as True except for the last occurrence.
False : Mark all duplicates as True.

Returns#

cupy.ndarray[bool]

Examples#

By default, for each set of duplicated values, the first occurrence is set to False and all others to True:

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False,  True, False,  True])

which is equivalent to

>>> idx.duplicated(keep='first')
array([False, False,  True, False,  True])

By using ‘last’, the last occurrence of each set of duplicated values is set to False and all others to True:

>>> idx.duplicated(keep='last')
array([ True, False,  True, False, False])

By setting keep to False, all duplicates are True:

>>> idx.duplicated(keep=False)
array([ True, False,  True, False,  True])

property empty#

equals(other) → bool#

Test whether two objects contain the same elements.

This function allows two objects to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type.

Parameters#

otherIndex, Series, DataFrame: The other object to be compared with.

Returns#

bool: True if all elements are the same in both objects, False otherwise.

Examples#

>>> import cudf

Comparing Series with equals:

>>> s = cudf.Series([1, 2, 3])
>>> other = cudf.Series([1, 2, 3])
>>> s.equals(other)
True
>>> different = cudf.Series([1.5, 2, 3])
>>> s.equals(different)
False

Comparing DataFrames with equals:

>>> df = cudf.DataFrame({1: [10], 2: [20]})
>>> df
    1   2
0  10  20
>>> exactly_equal = cudf.DataFrame({1: [10], 2: [20]})
>>> exactly_equal
    1   2
0  10  20
>>> df.equals(exactly_equal)
True

For two DataFrames to compare equal, the types of column values must be equal, but the types of column labels need not:

>>> different_column_type = cudf.DataFrame({1.0: [10], 2.0: [20]})
>>> different_column_type
   1.0  2.0
0   10   20
>>> df.equals(different_column_type)
True

factorize(sort: bool = False, use_na_sentinel: bool = True)#

find_label_range(loc: slice) → slice#

Translate a label-based slice to an index-based slice

Parameters#

loc: slice to search for.

Notes#

As with all label-based searches, the slice is right-closed.

Returns#

New slice translated into integer indices of the index (right-open).

classmethod from_arrow(data: Table) → Self#

Convert from PyArrow Table to Frame

Parameters#

data : PyArrow Table

Raises#

TypeError for invalid input type.

Examples#

>>> import cudf
>>> import pyarrow as pa
>>> data = pa.table({"a":[1, 2, 3], "b":[4, 5, 6]})
>>> cudf.core.frame.Frame.from_arrow(data)
   a  b
0  1  4
1  2  5
2  3  6

property has_duplicates#

property hasnans#

Return True if there are any NaNs or nulls.

Returns#

outbool: If Series has at least one NaN or null value, return True, if not return False.

Examples#

>>> import cudf
>>> import numpy as np
>>> index = cudf.Index([1, 2, np.nan, 3, 4], nan_as_null=False)
>>> index
Index([1.0, 2.0, nan, 3.0, 4.0], dtype='float64')
>>> index.hasnans
True

hasnans returns True for the presence of any NA values:

>>> index = cudf.Index([1, 2, None, 3, 4])
>>> index
Index([1, 2, <NA>, 3, 4], dtype='int64')
>>> index.hasnans
True

intersection(other, sort=False)#

Form the intersection of two Index objects.

This returns a new Index with elements common to the index and other.

Parameters#

other : Index or array-like sort : False or None, default False

Whether to sort the resulting index.

False : do not sort the result.

None : sort the result, except when self and other are equal or when the values cannot be compared.

True : Sort the result (which may raise TypeError).

Returns#

intersection : Index

Examples#

>>> import cudf
>>> import pandas as pd
>>> idx1 = cudf.Index([1, 2, 3, 4])
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Index([3, 4], dtype='int64')

MultiIndex case

>>> idx1 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 3, 4], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx2 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx1
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (3,  'Red'),
            (4, 'Blue')],
        )
>>> idx2
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (2,  'Red'),
            (2, 'Blue')],
        )
>>> idx1.intersection(idx2)
MultiIndex([(1,  'Red'),
            (1, 'Blue')],
        )
>>> idx1.intersection(idx2, sort=False)
MultiIndex([(1,  'Red'),
            (1, 'Blue')],
        )

is_boolean()#

Check if the Index only consists of booleans.

Deprecated since version 23.04: Use cudf.api.types.is_bool_dtype instead.

Returns#

bool: Whether or not the Index only consists of booleans.

Examples#

>>> import cudf
>>> idx = cudf.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = cudf.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = cudf.Index([1, 2, 3])
>>> idx.is_boolean()
False

is_categorical()#

Check if the Index holds categorical data.

Deprecated since version 23.04: Use cudf.api.types.is_categorical_dtype instead.

Returns#

bool: True if the Index is categorical.

Examples#

>>> import cudf
>>> idx = cudf.Index(["Watermelon", "Orange", "Apple",
...                 "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = cudf.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = cudf.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0        Peter
1       Victor
2    Elisabeth
3          Mar
dtype: object
>>> s.index.is_categorical()
False

is_floating()#

Check if the Index is a floating type.

The Index may consist of only floats, NaNs, or a mix of floats, integers, or NaNs.

Deprecated since version 23.04: Use cudf.api.types.is_float_dtype instead.

Returns#

bool: Whether or not the Index only consists of only consists of floats, NaNs, or a mix of floats, integers, or NaNs.

Examples#

>>> import cudf
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = cudf.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = cudf.Index([1, 2, 3, 4, np.nan], nan_as_null=False)
>>> idx.is_floating()
True
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_floating()
False

is_integer()#

Check if the Index only consists of integers.

Deprecated since version 23.04: Use cudf.api.types.is_integer_dtype instead.

Returns#

bool: Whether or not the Index only consists of integers.

Examples#

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = cudf.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False

is_interval()#

Check if the Index holds Interval objects.

Deprecated since version 23.04: Use cudf.api.types.is_interval_dtype instead.

Returns#

bool: Whether or not the Index holds Interval objects.

Examples#

>>> import cudf
>>> import pandas as pd
>>> idx = cudf.from_pandas(
...     pd.Index([pd.Interval(left=0, right=5),
...               pd.Interval(left=5, right=10)])
... )
>>> idx.is_interval()
True
>>> idx = cudf.Index([1, 3, 5, 7])
>>> idx.is_interval()
False

is_numeric()#

Check if the Index only consists of numeric data.

Deprecated since version 23.04: Use cudf.api.types.is_any_real_numeric_dtype instead.

Returns#

bool: Whether or not the Index only consists of numeric data.

Examples#

>>> import cudf
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = cudf.Index(["Apple", "cold"])
>>> idx.is_numeric()
False

is_object()#

Check if the Index is of the object dtype.

Deprecated since version 23.04: Use cudf.api.types.is_object_dtype instead.

Returns#

bool: Whether or not the Index is of the object dtype.

Examples#

>>> import cudf
>>> idx = cudf.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = cudf.Index(["Watermelon", "Orange", "Apple",
...                 "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False

isna()#

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

Values where null mask is set.
NaN in float dtype.
NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf in case of float are not considered <NA> values.

Returns#

DataFrame/Series/Index: Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples#

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.nan],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.nan, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.nan, 0.32, np.inf])
>>> idx
Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isna()
array([False, False,  True,  True, False, False])

isnull()#

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

Values where null mask is set.
NaN in float dtype.
NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf in case of float are not considered <NA> values.

Returns#

DataFrame/Series/Index: Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples#

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.nan],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.nan, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.nan, 0.32, np.inf])
>>> idx
Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isna()
array([False, False,  True,  True, False, False])

join(other, how='left', level=None, return_indexers=False, sort=False)#

Compute join_index and indexers to conform data structures to the new index.

Parameters#

other : Index. how : {‘left’, ‘right’, ‘inner’, ‘outer’} return_indexers : bool, default False sort : bool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples#

>>> import cudf
>>> lhs = cudf.DataFrame({
...     "a": [2, 3, 1],
...     "b": [3, 4, 2],
... }).set_index(['a', 'b']).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a": [1, 4, 3]}).set_index('a').index
>>> rhs
Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])

max(axis=0, skipna=True, numeric_only=False, **kwargs)#

Return the maximum of the values in the DataFrame.

Parameters#

axis: {index (0), columns(1)}: Axis for the function to be applied on.
skipna: bool, default True: Exclude NA/null values when computing the result.
numeric_only: bool, default False: If True, includes only float, int, boolean columns. If False, will raise error in-case there are non-numeric columns.

Returns#

Series

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.max()
a     4
b    10
dtype: int64

min(axis=0, skipna=True, numeric_only=False, **kwargs)#

Return the minimum of the values in the DataFrame.

Parameters#

axis: {index (0), columns(1)}: Axis for the function to be applied on.
skipna: bool, default True: Exclude NA/null values when computing the result.
numeric_only: bool, default False: If True, includes only float, int, boolean columns. If False, will raise error in-case there are non-numeric columns.

Returns#

Series

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> min_series = df.min()
>>> min_series
a    1
b    7
dtype: int64
>>> min_series.min()
1

notna()#

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

Values where null mask is set.
NaN in float dtype.
NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf in case of float are not considered <NA> values.

Returns#

DataFrame/Series/Index: Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples#

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.nan],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.nan, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.nan, 0.32, np.inf])
>>> idx
Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notna()
array([ True,  True, False, False,  True,  True])

notnull()#

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

Values where null mask is set.
NaN in float dtype.
NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf in case of float are not considered <NA> values.

Returns#

DataFrame/Series/Index: Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples#

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.nan],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.nan, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.nan, 0.32, np.inf])
>>> idx
Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notna()
array([ True,  True, False, False,  True,  True])

searchsorted(values, side: Literal['left', 'right'] = 'left', sorter=None, ascending: bool = True, na_position: Literal['first', 'last'] = 'last') → ScalarLike | cupy.ndarray#

Find indices where elements should be inserted to maintain order

Parameters#

valueFrame (Shape must be consistent with self): Values to be hypothetically inserted into Self
sidestr {‘left’, ‘right’} optional, default ‘left’: If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
sorter1-D array-like, optional: Optional array of integer indices that sort self into ascending order. They are typically the result of np.argsort. Currently not supported.
ascendingbool optional, default True: Sorted Frame is in ascending order (otherwise descending)
na_positionstr {‘last’, ‘first’} optional, default ‘last’: Position of null values in sorted order

Returns#

1-D cupy array of insertion points

Examples#

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1

>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)

property shape#: Get a tuple representing the dimensionality of the data.

shift(periods=1, freq=None)#: Not yet implemented

sort_values(return_indexer=False, ascending=True, na_position='last', key=None) → Self | tuple[Self, cupy.ndarray]#

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters#

return_indexerbool, default False: Should the indices that would sort the index be returned.
ascendingbool, default True: Should the index values be sorted in an ascending order.
na_position{‘first’ or ‘last’}, default ‘last’: Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.
keyNone, optional: This parameter is NON-FUNCTIONAL.

Returns#

sorted_indexIndex: Sorted copy of the index.
indexercupy.ndarray, optional: The indices that the index itself was sorted by.

Examples#

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])

property str#: Not yet implemented.

to_arrow() → Table#

Convert to arrow Table

Examples#

>>> import cudf
>>> df = cudf.DataFrame(
...     {"a":[1, 2, 3], "b":[4, 5, 6]}, index=[1, 2, 3])
>>> df.to_arrow()
pyarrow.Table
a: int64
b: int64
index: int64
----
a: [[1,2,3]]
b: [[4,5,6]]
index: [[1,2,3]]

to_cupy(dtype: Dtype | None = None, copy: bool = False, na_value=None) → cupy.ndarray#

Convert the Frame to a CuPy array.

Parameters#

dtypestr or numpy.dtype, optional: The dtype to pass to numpy.asarray().
copybool, default False: Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_cupy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.
na_valueAny, default None: The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

Returns#

cupy.ndarray

to_dlpack()#

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters#

cudf_obj : DataFrame, Series, Index, or Column

Returns#

pycapsule_objPyCapsule: Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_list()#

tolist()#

repeat(repeats, axis=None) → Self#

Repeat elements of a Index.

Returns a new Index where each element of the current Index is repeated consecutively a given number of times.

Parameters#

repeatsint, or array of ints: The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns#

Index: A newly created object of same type as caller with repeated elements.

Examples#

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')

hipdf.MultiIndex

Contents

hipdf.MultiIndex#

Parameters#

Attributes#

Methods#

Returns#

Examples#

Parameters#

Returns#

Parameters#

Returns#

Examples#

Parameters#

Returns#

Examples#

Parameters#

Returns#

See Also#

Examples#

Parameters#

Returns#

Examples#

Examples#

Parameters#

Returns#

Returns#

Examples#

Parameters#

Returns#

Notes#

Examples#

Parameters#

Returns#

Examples#

Returns#

Examples#

Parameters#

Returns#

Examples#

Parameters#

Returns#

Examples#

Parameters#

Returns#

Parameters#

Returns#

See Also#

Examples#

Parameters#

Returns#

Returns#

Examples#

Returns#

Examples#

Parameters#

Returns#

See Also#

Examples#

Parameters#

Returns#

See Also#

Examples#

Parameters#

Returns#

See Also#

Examples#

Parameters#

Returns#

Examples#

Parameters#

Returns#

Examples#

Parameters#

Notes#

Examples#

Raises#

Examples#

Parameters#

Returns#