hipdf.Index#

47 min read time

Applies to Linux

class hipdf.Index(data=None, dtype=None, copy=False, name=<no_default>, tupleize_cols=True, nan_as_null=True, **kwargs)#

Bases: BaseIndex

The basic object storing row labels for all cuDF objects.

Parameters#

dataarray-like (1-dimensional)/ DataFrame

If it is a DataFrame, it will return a MultiIndex

dtypeNumPy dtype (default: object)

If dtype is None, we find the dtype that best fits the data.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

tupleize_colsbool (default: True)

When True, attempt to create a MultiIndex if possible. tupleize_cols == False is not yet supported.

nan_as_nullbool, Default True

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Returns#

Index

cudf Index

Warnings#

This class should not be subclassed. It is designed as a factory for different subclasses of BaseIndex depending on the provided input. If you absolutely must, and if you’re intimately familiar with the internals of cuDF, subclass BaseIndex instead.

Examples#

>>> import cudf
>>> cudf.Index([1, 2, 3], dtype="uint64", name="a")
UInt64Index([1, 2, 3], dtype='uint64', name='a')
__init__()#

Methods

__init__()

any()

Return whether any elements is True in Index.

append(other)

Append a collection of Index objects together.

argsort(*args, **kwargs)

Return the integer indices that would sort the index.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

copy([deep])

deserialize(header, frames)

Generate an object from a serialized representation.

device_deserialize(header, frames)

Perform device-side deserialization tasks.

device_serialize()

Serialize data and metadata associated with device memory.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

drop_duplicates([keep, nulls_are_equal])

Drop duplicate rows in index.

dropna([how])

Drop null rows from Index.

duplicated([keep])

Indicate duplicate index values.

equals(other)

Determine if two Index objects contain the same elements.

factorize([sort, na_sentinel, use_na_sentinel])

fillna(value[, downcast])

Fill null values with the specified value.

find_label_range(loc)

Translate a label-based slice to an index-based slice

from_arrow(obj)

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_loc(key[, method, tolerance])

get_slice_bound(label, side[, kind])

Calculate slice bound that corresponds to given label.

host_deserialize(header, frames)

Perform device-side deserialization tasks.

host_serialize()

Serialize data and metadata associated with host memory.

intersection(other[, sort])

Form the intersection of two Index objects.

is_boolean()

Check if the Index only consists of booleans.

is_categorical()

Check if the Index holds categorical data.

is_floating()

Check if the Index is a floating type.

is_integer()

Check if the Index only consists of integers.

is_interval()

Check if the Index holds Interval objects.

is_numeric()

Check if the Index only consists of numeric data.

is_object()

Check if the Index is of the object dtype.

isin(values)

Return a boolean array where the index values are in values.

isna()

Detect missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

max()

The maximum value of the index.

memory_usage([deep])

Return the memory usage of an object.

min()

The minimum value of the index.

notna()

Detect existing (non-missing) values.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeat elements of a Index.

searchsorted(value[, side, ascending, ...])

Find index where elements should be inserted to maintain order

serialize()

Generate an equivalent serializable representation of an object.

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq])

Not yet implemented

sort_values([return_indexer, ascending, ...])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

take(indices[, axis, allow_fill, fill_value])

Return a new index containing the rows specified by indices

to_arrow()

Convert to a suitable Arrow object.

to_cupy()

Convert to a cupy array.

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_list()

to_numpy()

Convert to a numpy array.

to_pandas([nullable])

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

tolist()

union(other[, sort])

Form the union of two Index objects.

unique()

Return unique values in the index.

where(cond[, other, inplace])

Replace values where the condition is False.

Attributes

dtype

empty

has_duplicates

hasnans

Return True if there are any NaNs or nulls.

is_monotonic

Return boolean if values in the object are monotonic_increasing.

is_monotonic_decreasing

Return boolean if values in the object are monotonically decreasing.

is_monotonic_increasing

Return boolean if values in the object are monotonically increasing.

is_unique

Return if the index has unique values.

name

Returns the name of the Index.

names

Returns a tuple containing the name of the Index.

ndim

Number of dimensions of the underlying data, by definition 1.

nlevels

Number of levels.

shape

Get a tuple representing the dimensionality of the data.

size

str

Not yet implemented.

values

classmethod from_arrow(obj)#
property is_monotonic_increasing#

Return boolean if values in the object are monotonically increasing.

Returns#

bool

__getitem__(key)#
any()#

Return whether any elements is True in Index.

append(other)#

Append a collection of Index objects together.

Parameters#

other : Index or list/tuple of indices

Returns#

appended : Index

Examples#

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(*args, **kwargs)#

Return the integer indices that would sort the index.

Parameters vary by subclass.

astype(dtype, copy: bool = True)#

Create an Index with values cast to dtypes.

The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters#

dtypenumpy.dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns#

Index

Index with values cast to specified dtype.

Examples#

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
copy(deep: bool = True) Self#
difference(other, sort=None)#

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters#

other : Index or array-like sort : False or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns#

difference : Index

Examples#

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
drop_duplicates(keep='first', nulls_are_equal=True)#

Drop duplicate rows in index.

keep{“first”, “last”, False}, default “first”
  • ‘first’ : Drop duplicates except for the first occurrence.

  • ‘last’ : Drop duplicates except for the last occurrence.

  • False : Drop all duplicates.

nulls_are_equal: bool, default True

Null elements are considered equal to other null elements.

dropna(how='any')#

Drop null rows from Index.

how{“any”, “all”}, default “any”

Specifies how to decide whether to drop a row. “any” (default) drops rows containing at least one null value. “all” drops only rows containing all null values.

property dtype#
duplicated(keep='first')#

Indicate duplicate index values.

Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.

Parameters#

keep{‘first’, ‘last’, False}, default ‘first’

The value or values in a set of duplicates to mark as missing.

  • 'first' : Mark duplicates as True except for the first occurrence.

  • 'last' : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Returns#

cupy.ndarray[bool]

See Also#

Series.duplicated : Equivalent method on cudf.Series. DataFrame.duplicated : Equivalent method on cudf.DataFrame. Index.drop_duplicates : Remove duplicate values from Index.

Examples#

By default, for each set of duplicated values, the first occurrence is set to False and all others to True:

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False,  True, False,  True])

which is equivalent to

>>> idx.duplicated(keep='first')
array([False, False,  True, False,  True])

By using ‘last’, the last occurrence of each set of duplicated values is set to False and all others to True:

>>> idx.duplicated(keep='last')
array([ True, False,  True, False, False])

By setting keep to False, all duplicates are True:

>>> idx.duplicated(keep=False)
array([ True, False,  True, False,  True])
property empty#
equals(other)#

Determine if two Index objects contain the same elements.

Returns#

out: bool

True if “other” is an Index and it has the same elements as calling index; False otherwise.

factorize(sort=False, na_sentinel=None, use_na_sentinel=None)#
fillna(value, downcast=None)#

Fill null values with the specified value.

Parameters#

valuescalar

Scalar value to use to fill nulls. This value cannot be a list-likes.

downcastdict, default is None

This Parameter is currently NON-FUNCTIONAL.

Returns#

filled : Index

Examples#

>>> import cudf
>>> index = cudf.Index([1, 2, None, 4])
>>> index
Int64Index([1, 2, <NA>, 4], dtype='int64')
>>> index.fillna(3)
Int64Index([1, 2, 3, 4], dtype='int64')
find_label_range(loc: slice) slice#

Translate a label-based slice to an index-based slice

Parameters#

loc

slice to search for.

Notes#

As with all label-based searches, the slice is right-closed.

Returns#

New slice translated into integer indices of the index (right-open).

classmethod from_pandas(index, nan_as_null=<no_default>)#

Convert from a Pandas Index.

Parameters#

indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises#

TypeError for invalid input type.

Examples#

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)#

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters#

levelint or str

It is either the integer position or the name of the level.

Returns#

Index

Calling object, as there is only one level in the Index.

See Also#

cudf.MultiIndex.get_level_valuesGet values for

a level of a MultiIndex.

Notes#

For Index, level should be 0, since there are no multiple levels.

Examples#

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_loc(key, method=None, tolerance=None)#
get_slice_bound(label, side: str, kind=None) int#

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters#

label : object side : {‘left’, ‘right’} kind : {‘ix’, ‘loc’, ‘getitem’}

Returns#

int

Index of label.

property has_duplicates#
property hasnans#

Return True if there are any NaNs or nulls.

Returns#

outbool

If Series has at least one NaN or null value, return True, if not return False.

Examples#

>>> import cudf
>>> import numpy as np
>>> index = cudf.Index([1, 2, np.nan, 3, 4], nan_as_null=False)
>>> index
Float64Index([1.0, 2.0, nan, 3.0, 4.0], dtype='float64')
>>> index.hasnans
True

hasnans returns True for the presence of any NA values:

>>> index = cudf.Index([1, 2, None, 3, 4])
>>> index
Int64Index([1, 2, <NA>, 3, 4], dtype='int64')
>>> index.hasnans
True
intersection(other, sort=False)#

Form the intersection of two Index objects.

This returns a new Index with elements common to the index and other.

Parameters#

other : Index or array-like sort : False or None, default False

Whether to sort the resulting index.

  • False : do not sort the result.

  • None : sort the result, except when self and other are equal or when the values cannot be compared.

Returns#

intersection : Index

Examples#

>>> import cudf
>>> import pandas as pd
>>> idx1 = cudf.Index([1, 2, 3, 4])
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')

MultiIndex case

>>> idx1 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 3, 4], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx2 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx1
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (3,  'Red'),
            (4, 'Blue')],
        )
>>> idx2
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (2,  'Red'),
            (2, 'Blue')],
        )
>>> idx1.intersection(idx2)
MultiIndex([(1,  'Red'),
            (1, 'Blue')],
        )
>>> idx1.intersection(idx2, sort=False)
MultiIndex([(1,  'Red'),
            (1, 'Blue')],
        )
is_boolean()#

Check if the Index only consists of booleans.

Deprecated since version 23.04: Use cudf.api.types.is_bool_dtype instead.

Returns#

bool

Whether or not the Index only consists of booleans.

See Also#

is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = cudf.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = cudf.Index([1, 2, 3])
>>> idx.is_boolean()
False
is_categorical()#

Check if the Index holds categorical data.

Deprecated since version 23.04: Use cudf.api.types.is_categorical_dtype instead.

Returns#

bool

True if the Index is categorical.

See Also#

CategoricalIndex : Index for categorical data. is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index(["Watermelon", "Orange", "Apple",
...                 "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = cudf.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = cudf.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0        Peter
1       Victor
2    Elisabeth
3          Mar
dtype: object
>>> s.index.is_categorical()
False
is_floating()#

Check if the Index is a floating type.

The Index may consist of only floats, NaNs, or a mix of floats, integers, or NaNs.

Deprecated since version 23.04: Use cudf.api.types.is_float_dtype instead.

Returns#

bool

Whether or not the Index only consists of only consists of floats, NaNs, or a mix of floats, integers, or NaNs.

See Also#

is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = cudf.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = cudf.Index([1, 2, 3, 4, np.nan], nan_as_null=False)
>>> idx.is_floating()
True
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
is_integer()#

Check if the Index only consists of integers.

Deprecated since version 23.04: Use cudf.api.types.is_integer_dtype instead.

Returns#

bool

Whether or not the Index only consists of integers.

See Also#

is_boolean : Check if the Index only consists of booleans. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = cudf.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
is_interval()#

Check if the Index holds Interval objects.

Deprecated since version 23.04: Use cudf.api.types.is_interval_dtype instead.

Returns#

bool

Whether or not the Index holds Interval objects.

See Also#

IntervalIndex : Index for Interval objects. is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data.

Examples#

>>> import cudf
>>> import pandas as pd
>>> idx = cudf.from_pandas(
...     pd.Index([pd.Interval(left=0, right=5),
...               pd.Interval(left=5, right=10)])
... )
>>> idx.is_interval()
True
>>> idx = cudf.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
property is_monotonic#

Return boolean if values in the object are monotonic_increasing.

This property is an alias for is_monotonic_increasing.

Returns#

bool

property is_monotonic_decreasing#

Return boolean if values in the object are monotonically decreasing.

Returns#

bool

is_numeric()#

Check if the Index only consists of numeric data.

Deprecated since version 23.04: Use cudf.api.types.is_any_real_numeric_dtype instead.

Returns#

bool

Whether or not the Index only consists of numeric data.

See Also#

is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = cudf.Index(["Apple", "cold"])
>>> idx.is_numeric()
False
is_object()#

Check if the Index is of the object dtype.

Deprecated since version 23.04: Use cudf.api.types.is_object_dtype instead.

Returns#

bool

Whether or not the Index is of the object dtype.

See Also#

is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = cudf.Index(["Watermelon", "Orange", "Apple",
...                 "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
property is_unique#

Return if the index has unique values.

isin(values)#

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters#

valuesset, list-like, Index

Sought values.

Returns#

is_containedcupy array

CuPy array of boolean values.

Examples#

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()#

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None, numpy.NAN or cudf.NA, get mapped to True values. Everything else get mapped to False values.

Returns#

numpy.ndarray[bool]

A boolean array to indicate which entries are NA.

join(other, how='left', level=None, return_indexers=False, sort=False)#

Compute join_index and indexers to conform data structures to the new index.

Parameters#

other : Index. how : {‘left’, ‘right’, ‘inner’, ‘outer’} return_indexers : bool, default False sort : bool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples#

>>> import cudf
>>> lhs = cudf.DataFrame({
...     "a": [2, 3, 1],
...     "b": [3, 4, 2],
... }).set_index(['a', 'b']).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a": [1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
max()#

The maximum value of the index.

memory_usage(deep=False)#

Return the memory usage of an object.

Parameters#

deepbool

The deep parameter is ignored and is only included for pandas compatibility.

Returns#

The total bytes used.

min()#

The minimum value of the index.

property name#

Returns the name of the Index.

property names#

Returns a tuple containing the name of the Index.

property ndim#

Number of dimensions of the underlying data, by definition 1.

property nlevels#

Number of levels.

notna()#

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. NA values, such as None or numpy.NAN, get mapped to False values.

Returns#

numpy.ndarray[bool]

A boolean array to indicate which entries are not NA.

rename(name, inplace=False)#

Alter Index name.

Defaults to returning new index.

Parameters#

namelabel

Name(s) to set.

Returns#

Index

Examples#

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)#

Repeat elements of a Index.

Returns a new Index where each element of the current Index is repeated consecutively a given number of times.

Parameters#

repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns#

Index

A newly created object of same type as caller with repeated elements.

Examples#

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
searchsorted(value, side: str = 'left', ascending: bool = True, na_position: str = 'last')#

Find index where elements should be inserted to maintain order

Parameters#

value :

Value to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left’

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Index is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last’

Position of null values in sorted order

Returns#

Insertion point.

Notes#

As a precondition the index must be sorted in the same order as requested by the ascending flag.

set_names(names, level=None, inplace=False)#

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters#

nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns#

Index

The same type as the caller or None if inplace is True.

See Also#

cudf.Index.rename : Able to set new names without level.

Examples#

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape#

Get a tuple representing the dimensionality of the data.

shift(periods=1, freq=None)#

Not yet implemented

property size#
sort_values(return_indexer=False, ascending=True, na_position='last', key=None)#

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters#

return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

na_position{‘first’ or ‘last’}, default ‘last’

Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns#

sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See Also#

cudf.Series.min : Sort values of a Series. cudf.DataFrame.sort_values : Sort values in a DataFrame.

Examples#

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
property str#

Not yet implemented.

take(indices, axis=0, allow_fill=True, fill_value=None)#

Return a new index containing the rows specified by indices

Parameters#

indicesarray-like

Array of ints indicating which positions to take.

axisint

The axis over which to select values, always 0.

allow_fill : Unsupported fill_value : Unsupported

Returns#

outIndex

New object with desired subset of rows.

Examples#

>>> idx = cudf.Index(['a', 'b', 'c', 'd', 'e'])
>>> idx.take([2, 0, 4, 3])
StringIndex(['c' 'a' 'e' 'd'], dtype='object')
to_arrow()#

Convert to a suitable Arrow object.

to_cupy()#

Convert to a cupy array.

to_dlpack()#

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters#

cudf_obj : DataFrame, Series, Index, or Column

Returns#

pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=<no_default>)#

Create a DataFrame with a column containing this Index

Parameters#

indexboolean, default True

Set the index of the returned DataFrame as the original Index

nameobject, defaults to index.name

The passed name should substitute for the index name (if it has one).

Returns#

DataFrame

DataFrame containing the original Index data.

See Also#

Index.to_series : Convert an Index to a Series. Series.to_frame : Convert Series to DataFrame.

Examples#

>>> import cudf
>>> idx = cudf.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
       animal
animal
Ant       Ant
Bear     Bear
Cow       Cow

By default, the original Index is reused. To enforce a new Index:

>>> idx.to_frame(index=False)
    animal
0   Ant
1  Bear
2   Cow

To override the name of the resulting column, specify name:

>>> idx.to_frame(index=False, name='zoo')
    zoo
0   Ant
1  Bear
2   Cow
to_list()#
to_numpy()#

Convert to a numpy array.

to_pandas(nullable=False)#

Convert to a Pandas Index.

Parameters#

nullablebool, Default False

If nullable is True, the resulting index will have a corresponding nullable Pandas dtype. If there is no corresponding nullable Pandas dtype present, the resulting dtype will be a regular pandas dtype. If nullable is False, the resulting index will either convert null values to np.nan or None depending on the dtype.

Examples#

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.Int64Index'>
to_series(index=None, name=None)#

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters#

indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Name of resulting Series. If None, defaults to name of original index.

Returns#

Series

The dtype will be based on the type of the Index values.

tolist()#
union(other, sort=None)#

Form the union of two Index objects.

Parameters#

other : Index or array-like sort : bool or None, default None

Whether to sort the resulting Index.

  • None : Sort the result, except when

    1. self and other are equal.

    2. self or other has length 0.

  • False : do not sort the result.

Returns#

union : Index

Examples#

Union of an Index >>> import cudf >>> import pandas as pd >>> idx1 = cudf.Index([1, 2, 3, 4]) >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx1.union(idx2) Int64Index([1, 2, 3, 4, 5, 6], dtype=’int64’)

MultiIndex case

>>> idx1 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx1
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (2,  'Red'),
            (2, 'Blue')],
           )
>>> idx2 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
...    )
... )
>>> idx2
MultiIndex([(3,   'Red'),
            (3, 'Green'),
            (2,   'Red'),
            (2, 'Green')],
           )
>>> idx1.union(idx2)
MultiIndex([(1,  'Blue'),
            (1,   'Red'),
            (2,  'Blue'),
            (2, 'Green'),
            (2,   'Red'),
            (3, 'Green'),
            (3,   'Red')],
           )
>>> idx1.union(idx2, sort=False)
MultiIndex([(1,   'Red'),
            (1,  'Blue'),
            (2,   'Red'),
            (2,  'Blue'),
            (3,   'Red'),
            (3, 'Green'),
            (2, 'Green')],
           )
unique()#

Return unique values in the index.

Returns#

Index without duplicates

property values#
where(cond, other=None, inplace=False)#

Replace values where the condition is False.

The replacement is taken from other.

Parameters#

condbool array-like with the same length as self

Condition to select the values on.

otherscalar, or array-like, default None

Replacement if the condition is False.

Returns#

cudf.Index

A copy of self with values replaced from other where the condition is False.