hipdf.UInt64Index

Contents

hipdf.UInt64Index#

84 min read time

Applies to Linux

class hipdf.UInt64Index(data=None, dtype=None, copy=False, name=None)#

Bases: NumericIndex

Immutable, ordered and sliceable sequence of labels. The basic object storing row labels for all cuDF objects. UInt64Index is a special case of Index with purely integer(uint64) labels.

Parameters#

data : array-like (1-dimensional) dtype : NumPy dtype,

but not used.

copybool

Make a copy of input data.

nameobject

Name to be stored in the index.

Attributes#

None

Methods#

None

Returns#

UInt64Index

__init__(data=None, dtype=None, copy=False, name=None)#

Methods

__init__([data, dtype, copy, name])

abs()

Return a Series/DataFrame with absolute numeric value of each element.

all([axis, skipna, level])

Return whether all elements are True in DataFrame.

any()

Return whether any elements is True in DataFrame.

append(other)

Append a collection of Index objects together.

argsort([axis, kind, order, ascending, ...])

Return the integer indices that would sort the index.

astype(dtype[, copy])

Create an Index with values cast to dtypes.

copy([name, deep, dtype, names])

Make a copy of this object.

deserialize(header, frames)

Generate an object from a serialized representation.

device_deserialize(header, frames)

Perform device-side deserialization tasks.

device_serialize()

Serialize data and metadata associated with device memory.

difference(other[, sort])

Return a new Index with elements from the index that are not in other.

dot(other[, reflect])

Get dot product of frame and other, (binary operator dot).

drop_duplicates([keep, nulls_are_equal])

Drop duplicate rows in index.

dropna([how])

Drop null rows from Index.

duplicated([keep])

Indicate duplicate index values.

equals(other)

Test whether two objects contain the same elements.

factorize([sort, na_sentinel, use_na_sentinel])

Encode the input values as integer labels.

fillna([value, method, axis, inplace, limit])

Fill null values with value or specified method.

find_label_range(loc)

Translate a label-based slice to an index-based slice

from_arrow(array)

Create from PyArrow Array/ChunkedArray.

from_pandas(index[, nan_as_null])

Convert from a Pandas Index.

get_level_values(level)

Return an Index of values for requested level.

get_loc(key[, method, tolerance])

Get integer location, slice or boolean mask for requested label.

get_slice_bound(label, side[, kind])

Calculate slice bound that corresponds to given label.

head([n])

Return the first n rows.

host_deserialize(header, frames)

Perform device-side deserialization tasks.

host_serialize()

Serialize data and metadata associated with host memory.

intersection(other[, sort])

Form the intersection of two Index objects.

is_boolean()

Check if the Index only consists of booleans.

is_categorical()

Check if the Index holds categorical data.

is_floating()

Check if the Index is a floating type.

is_integer()

Check if the Index only consists of integers.

is_interval()

Check if the Index holds Interval objects.

is_numeric()

Check if the Index only consists of numeric data.

is_object()

Check if the Index is of the object dtype.

isin(values)

Return a boolean array where the index values are in values.

isna()

Identify missing values.

isnull()

Identify missing values.

join(other[, how, level, return_indexers, sort])

Compute join_index and indexers to conform data structures to the new index.

kurt([axis, skipna, level, numeric_only])

Return Fisher's unbiased kurtosis of a sample.

kurtosis([axis, skipna, level, numeric_only])

Return Fisher's unbiased kurtosis of a sample.

mask(cond[, other, inplace])

Replace values where the condition is True.

max([axis, skipna, level, numeric_only])

Return the maximum of the values in the DataFrame.

mean([axis, skipna, level, numeric_only])

Return the mean of the values for the requested axis.

median([axis, skipna, level, numeric_only])

Return the median of the values for the requested axis.

memory_usage([deep])

Return the memory usage of an object.

min([axis, skipna, level, numeric_only])

Return the minimum of the values in the DataFrame.

nans_to_nulls()

Convert nans (if any) to nulls

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

nunique([dropna])

Return count of unique values for the column.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

prod([axis, skipna, dtype, level, ...])

Return product of the values in the DataFrame.

product([axis, skipna, dtype, level, ...])

Return product of the values in the DataFrame.

rename(name[, inplace])

Alter Index name.

repeat(repeats[, axis])

Repeat elements of a Index.

rolling(window[, min_periods, center, axis, ...])

Rolling window calculations.

searchsorted(values[, side, ascending, ...])

Find indices where elements should be inserted to maintain order

serialize()

Generate an equivalent serializable representation of an object.

set_names(names[, level, inplace])

Set Index or MultiIndex name.

shift([periods, freq])

Not yet implemented

skew([axis, skipna, level, numeric_only])

Return unbiased Fisher-Pearson skew of a sample.

sort_values([return_indexer, ascending, ...])

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

std([axis, skipna, level, ddof, numeric_only])

Return sample standard deviation of the DataFrame.

sum([axis, skipna, dtype, level, ...])

Return sum of the values in the DataFrame.

tail([n])

Returns the last n rows as a new DataFrame or Series

take(indices[, axis, allow_fill, fill_value])

Return a new index containing the rows specified by indices

to_arrow()

Convert to a PyArrow Array.

to_cupy([dtype, copy, na_value])

Convert the Frame to a CuPy array.

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_frame([index, name])

Create a DataFrame with a column containing this Index

to_hdf(path_or_buf, key, *args, **kwargs)

Write the contained data to an HDF5 file using HDFStore.

to_json([path_or_buf])

Convert the cuDF object to a JSON string.

to_list()

to_numpy([dtype, copy, na_value])

Convert the Frame to a NumPy array.

to_pandas([nullable])

Convert to a Pandas Index.

to_series([index, name])

Create a Series with both index and values equal to the index keys.

to_string()

Convert to string

tolist()

union(other[, sort])

Form the union of two Index objects.

unique()

Return unique values in the index.

var([axis, skipna, level, ddof, numeric_only])

Return unbiased variance of the DataFrame.

where(cond[, other, inplace])

Replace values where the condition is False.

Attributes

dtype

dtype of the underlying values in GenericIndex.

empty

has_duplicates

hasnans

Return True if there are any NaNs or nulls.

is_monotonic

Return boolean if values in the object are monotonically increasing.

is_monotonic_decreasing

Return boolean if values in the object are monotonically decreasing.

is_monotonic_increasing

Return boolean if values in the object are monotonically increasing.

is_unique

Return if the index has unique values.

name

Get the name of this object.

names

Returns a tuple containing the name of the Index.

ndim

Number of dimensions of the underlying data, by definition 1.

nlevels

Number of levels.

shape

Get a tuple representing the dimensionality of the Index.

size

Return the number of elements in the underlying data.

str

Not yet implemented.

values

Return a CuPy representation of the DataFrame.

values_host

Return a NumPy representation of the data.

__getitem__(index)#
__init__(data=None, dtype=None, copy=False, name=None)#
abs()#

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns#

DataFrame/Series

Absolute value of each element.

Examples#

Absolute numeric values in a Series

>>> s = cudf.Series([-1.10, 2, -3.33, 4])
>>> s.abs()
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64
all(axis=0, skipna=True, level=None, **kwargs)#

Return whether all elements are True in DataFrame.

Parameters#

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

  • 0 or ‘index’reduce the index, return a Series

    whose index is the original column labels.

  • 1 or ‘columns’reduce the columns, return a Series

    whose index is the original index.

  • None : reduce all axes, return a scalar.

skipna: bool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

Returns#

Series

Notes#

Parameters currently not supported are bool_only, level.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]})
>>> df.all()
a     True
b    False
dtype: bool
any()#

Return whether any elements is True in DataFrame.

Parameters#

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

  • 0 or ‘index’reduce the index, return a Series

    whose index is the original column labels.

  • 1 or ‘columns’reduce the columns, return a Series

    whose index is the original index.

  • None : reduce all axes, return a scalar.

skipna: bool, default True

Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

Returns#

Series

Notes#

Parameters currently not supported are bool_only, level.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]})
>>> df.any()
a    True
b    True
dtype: bool
append(other)#

Append a collection of Index objects together.

Parameters#

other : Index or list/tuple of indices

Returns#

appended : Index

Examples#

>>> import cudf
>>> idx = cudf.Index([1, 2, 10, 100])
>>> idx
Int64Index([1, 2, 10, 100], dtype='int64')
>>> other = cudf.Index([200, 400, 50])
>>> other
Int64Index([200, 400, 50], dtype='int64')
>>> idx.append(other)
Int64Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')

append accepts list of Index objects

>>> idx.append([other, other])
Int64Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
argsort(axis=0, kind='quicksort', order=None, ascending=True, na_position='last')#

Return the integer indices that would sort the index.

Parameters#

axis{0 or “index”}

Has no effect but is accepted for compatibility with numpy.

kind{‘mergesort’, ‘quicksort’, ‘heapsort’, ‘stable’}, default ‘quicksort’

Choice of sorting algorithm. See numpy.sort() for more information. ‘mergesort’ and ‘stable’ are the only stable algorithms. Only quicksort is supported in cuDF.

orderNone

Has no effect but is accepted for compatibility with numpy.

ascendingbool or list of bool, default True

If True, sort values in ascending order, otherwise descending.

na_position{‘first’ or ‘last’}, default ‘last’

Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.

Returns#

cupy.ndarray: The indices sorted based on input.

astype(dtype, copy: bool = True)#

Create an Index with values cast to dtypes.

The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.

Parameters#

dtypenumpy.dtype

Use a numpy.dtype to cast entire Index object to.

copybool, default False

By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.

Returns#

Index

Index with values cast to specified dtype.

Examples#

>>> import cudf
>>> index = cudf.Index([1, 2, 3])
>>> index
Int64Index([1, 2, 3], dtype='int64')
>>> index.astype('float64')
Float64Index([1.0, 2.0, 3.0], dtype='float64')
copy(name=None, deep=False, dtype=None, names=None)#

Make a copy of this object.

Parameters#

nameobject, default None

Name of index, use original name when None

deepbool, default True

Make a deep copy of the data. With deep=False the original data is used

dtypenumpy dtype, default None

Target datatype to cast into, use original dtype when None

Deprecated since version 23.02: The dtype parameter is deprecated and will be removed in a future version of cudf. Use the astype method instead.

nameslist-like, default False

Kept compatibility with MultiIndex. Should not be used.

Deprecated since version 23.04: The parameter names is deprecated and will be removed in a future version of cudf. Use the name parameter instead.

Returns#

New index instance, casted to new dtype

difference(other, sort=None)#

Return a new Index with elements from the index that are not in other.

This is the set difference of two Index objects.

Parameters#

other : Index or array-like sort : False or None, default None

Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.

  • None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.

  • False : Do not sort the result.

Returns#

difference : Index

Examples#

>>> import cudf
>>> idx1 = cudf.Index([2, 1, 3, 4])
>>> idx1
Int64Index([2, 1, 3, 4], dtype='int64')
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx2
Int64Index([3, 4, 5, 6], dtype='int64')
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
dot(other, reflect=False)#

Get dot product of frame and other, (binary operator dot).

Among flexible wrappers (add, sub, mul, div, mod, pow, dot) to arithmetic operators: +, -, *, /, //, %, **, @.

Parameters#

otherSequence, Series, or DataFrame

Any multiple element data structure, or list-like object.

reflectbool, default False

If True, swap the order of the operands. See https://docs.python.org/3/reference/datamodel.html#object.__ror__ for more information on when this is necessary.

Returns#

scalar, Series, or DataFrame

The result of the operation.

Examples#

>>> import cudf
>>> df = cudf.DataFrame([[1, 2, 3, 4],
...                      [5, 6, 7, 8]])
>>> df @ df.T
    0    1
0  30   70
1  70  174
>>> s = cudf.Series([1, 1, 1, 1])
>>> df @ s
0    10
1    26
dtype: int64
>>> [1, 2, 3, 4] @ s
10
drop_duplicates(keep='first', nulls_are_equal=True)#

Drop duplicate rows in index.

keep{“first”, “last”, False}, default “first”
  • ‘first’ : Drop duplicates except for the first occurrence.

  • ‘last’ : Drop duplicates except for the last occurrence.

  • False : Drop all duplicates.

nulls_are_equal: bool, default True

Null elements are considered equal to other null elements.

dropna(how='any')#

Drop null rows from Index.

how{“any”, “all”}, default “any”

Specifies how to decide whether to drop a row. “any” (default) drops rows containing at least one null value. “all” drops only rows containing all null values.

property dtype#

dtype of the underlying values in GenericIndex.

duplicated(keep='first')#

Indicate duplicate index values.

Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.

Parameters#

keep{‘first’, ‘last’, False}, default ‘first’

The value or values in a set of duplicates to mark as missing.

  • 'first' : Mark duplicates as True except for the first occurrence.

  • 'last' : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Returns#

cupy.ndarray[bool]

See Also#

Series.duplicated : Equivalent method on cudf.Series. DataFrame.duplicated : Equivalent method on cudf.DataFrame. Index.drop_duplicates : Remove duplicate values from Index.

Examples#

By default, for each set of duplicated values, the first occurrence is set to False and all others to True:

>>> import cudf
>>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False,  True, False,  True])

which is equivalent to

>>> idx.duplicated(keep='first')
array([False, False,  True, False,  True])

By using ‘last’, the last occurrence of each set of duplicated values is set to False and all others to True:

>>> idx.duplicated(keep='last')
array([ True, False,  True, False, False])

By setting keep to False, all duplicates are True:

>>> idx.duplicated(keep=False)
array([ True, False,  True, False,  True])
property empty#
equals(other)#

Test whether two objects contain the same elements.

This function allows two objects to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type.

Parameters#

otherIndex, Series, DataFrame

The other object to be compared with.

Returns#

bool

True if all elements are the same in both objects, False otherwise.

Examples#

>>> import cudf

Comparing Series with equals:

>>> s = cudf.Series([1, 2, 3])
>>> other = cudf.Series([1, 2, 3])
>>> s.equals(other)
True
>>> different = cudf.Series([1.5, 2, 3])
>>> s.equals(different)
False

Comparing DataFrames with equals:

>>> df = cudf.DataFrame({1: [10], 2: [20]})
>>> df
    1   2
0  10  20
>>> exactly_equal = cudf.DataFrame({1: [10], 2: [20]})
>>> exactly_equal
    1   2
0  10  20
>>> df.equals(exactly_equal)
True

For two DataFrames to compare equal, the types of column values must be equal, but the types of column labels need not:

>>> different_column_type = cudf.DataFrame({1.0: [10], 2.0: [20]})
>>> different_column_type
   1.0  2.0
0   10   20
>>> df.equals(different_column_type)
True
factorize(sort=False, na_sentinel=None, use_na_sentinel=None)#

Encode the input values as integer labels.

Parameters#

sortbool, default True

Sort uniques and shuffle codes to maintain the relationship.

na_sentinelnumber, default -1

Value to indicate missing category.

Deprecated since version 23.04: The na_sentinel argument is deprecated and will be removed in a future version of cudf. Specify use_na_sentinel as either True or False.

use_na_sentinelbool, default True

If True, the sentinel -1 will be used for NA values. If False, NA values will be encoded as non-negative integers and will not drop the NA from the uniques of the values.

Returns#

(labels, cats)(cupy.ndarray, cupy.ndarray or Index)
  • labels contains the encoded values

  • cats contains the categories in order that the N-th item corresponds to the (N-1) code.

Examples#

>>> import cudf
>>> s = cudf.Series(['a', 'a', 'c'])
>>> codes, uniques = s.factorize()
>>> codes
array([0, 0, 1], dtype=int8)
>>> uniques
StringIndex(['a' 'c'], dtype='object')
fillna(value=None, method=None, axis=None, inplace=False, limit=None)#

Fill null values with value or specified method.

Parameters#

valuescalar, Series-like or dict

Value to use to fill nulls. If Series-like, null values are filled with values in corresponding indices. A dict can be used to provide different values to fill nulls in different columns. Cannot be used with method.

method{‘ffill’, ‘bfill’}, default None

Method to use for filling null values in the dataframe or series. ffill propagates the last non-null values forward to the next non-null value. bfill propagates backward with the next non-null value. Cannot be used with value.

Returns#

resultDataFrame, Series, or Index

Copy with nulls filled.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, None], 'b': [3, None, 5]})
>>> df
      a     b
0     1     3
1     2  <NA>
2  <NA>     5
>>> df.fillna(4)
   a  b
0  1  3
1  2  4
2  4  5
>>> df.fillna({'a': 3, 'b': 4})
   a  b
0  1  3
1  2  4
2  3  5

fillna on a Series object:

>>> ser = cudf.Series(['a', 'b', None, 'c'])
>>> ser
0       a
1       b
2    <NA>
3       c
dtype: object
>>> ser.fillna('z')
0    a
1    b
2    z
3    c
dtype: object

fillna can also supports inplace operation:

>>> ser.fillna('z', inplace=True)
>>> ser
0    a
1    b
2    z
3    c
dtype: object
>>> df.fillna({'a': 3, 'b': 4}, inplace=True)
>>> df
   a  b
0  1  3
1  2  4
2  3  5

fillna specified with fill method

>>> ser = cudf.Series([1, None, None, 2, 3, None, None])
>>> ser.fillna(method='ffill')
0    1
1    1
2    1
3    2
4    3
5    3
6    3
dtype: int64
>>> ser.fillna(method='bfill')
0       1
1       2
2       2
3       2
4       3
5    <NA>
6    <NA>
dtype: int64
find_label_range(loc: slice) slice#

Translate a label-based slice to an index-based slice

Parameters#

loc

slice to search for.

Notes#

As with all label-based searches, the slice is right-closed.

Returns#

New slice translated into integer indices of the index (right-open).

classmethod from_arrow(array)#

Create from PyArrow Array/ChunkedArray.

Parameters#

arrayPyArrow Array/ChunkedArray

PyArrow Object which has to be converted.

Raises#

TypeError for invalid input type.

Returns#

SingleColumnFrame

Examples#

>>> import cudf
>>> import pyarrow as pa
>>> cudf.Index.from_arrow(pa.array(["a", "b", None]))
StringIndex(['a' 'b' None], dtype='object')
>>> cudf.Series.from_arrow(pa.array(["a", "b", None]))
0       a
1       b
2    <NA>
dtype: object
classmethod from_pandas(index, nan_as_null=<no_default>)#

Convert from a Pandas Index.

Parameters#

indexPandas Index object

A Pandas Index object which has to be converted to cuDF Index.

nan_as_nullbool, Default None

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Raises#

TypeError for invalid input type.

Examples#

>>> import cudf
>>> import pandas as pd
>>> import numpy as np
>>> data = [10, 20, 30, np.nan]
>>> pdi = pd.Index(data)
>>> cudf.Index.from_pandas(pdi)
Float64Index([10.0, 20.0, 30.0, <NA>], dtype='float64')
>>> cudf.Index.from_pandas(pdi, nan_as_null=False)
Float64Index([10.0, 20.0, 30.0, nan], dtype='float64')
get_level_values(level)#

Return an Index of values for requested level.

This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.

Parameters#

levelint or str

It is either the integer position or the name of the level.

Returns#

Index

Calling object, as there is only one level in the Index.

See Also#

cudf.MultiIndex.get_level_valuesGet values for

a level of a MultiIndex.

Notes#

For Index, level should be 0, since there are no multiple levels.

Examples#

>>> import cudf
>>> idx = cudf.Index(["a", "b", "c"])
>>> idx.get_level_values(0)
StringIndex(['a' 'b' 'c'], dtype='object')
get_loc(key, method=None, tolerance=None)#

Get integer location, slice or boolean mask for requested label.

Parameters#

key : label method : {None, ‘pad’/’fill’, ‘backfill’/’bfill’, ‘nearest’}, optional

  • default: exact matches only.

  • pad / ffill: find the PREVIOUS index value if no exact match.

  • backfill / bfill: use NEXT index value if no exact match.

  • nearest: use the NEAREST index value if no exact match. Tied distances are broken by preferring the larger index value.

toleranceint or float, optional

Maximum distance from index value for inexact matches. The value of the index at the matching location must satisfy the equation abs(index[loc] - key) <= tolerance.

Returns#

int or slice or boolean mask
  • If result is unique, return integer index

  • If index is monotonic, loc is returned as a slice object

  • Otherwise, a boolean mask is returned

Examples#

>>> unique_index = cudf.Index(list('abc'))
>>> unique_index.get_loc('b')
1
>>> monotonic_index = cudf.Index(list('abbc'))
>>> monotonic_index.get_loc('b')
slice(1, 3, None)
>>> non_monotonic_index = cudf.Index(list('abcb'))
>>> non_monotonic_index.get_loc('b')
array([False,  True, False,  True])
>>> numeric_unique_index = cudf.Index([1, 2, 3])
>>> numeric_unique_index.get_loc(3)
2
get_slice_bound(label, side: str, kind=None) int#

Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if side=='right') position of given label.

Parameters#

label : object side : {‘left’, ‘right’} kind : {‘ix’, ‘loc’, ‘getitem’}

Returns#

int

Index of label.

property has_duplicates#
property hasnans#

Return True if there are any NaNs or nulls.

Returns#

outbool

If Series has at least one NaN or null value, return True, if not return False.

Examples#

>>> import cudf
>>> import numpy as np
>>> index = cudf.Index([1, 2, np.nan, 3, 4], nan_as_null=False)
>>> index
Float64Index([1.0, 2.0, nan, 3.0, 4.0], dtype='float64')
>>> index.hasnans
True

hasnans returns True for the presence of any NA values:

>>> index = cudf.Index([1, 2, None, 3, 4])
>>> index
Int64Index([1, 2, <NA>, 3, 4], dtype='int64')
>>> index.hasnans
True
head(n=5)#

Return the first n rows. This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it. For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

Parameters#

nint, default 5

Number of rows to select.

Returns#

DataFrame or Series

The first n rows of the caller object.

Examples#

Series

>>> ser = cudf.Series(['alligator', 'bee', 'falcon',
... 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra'])
>>> ser
0    alligator
1          bee
2       falcon
3         lion
4       monkey
5       parrot
6        shark
7        whale
8        zebra
dtype: object

Viewing the first 5 lines

>>> ser.head()
0    alligator
1          bee
2       falcon
3         lion
4       monkey
dtype: object

Viewing the first n lines (three in this case)

>>> ser.head(3)
0    alligator
1          bee
2       falcon
dtype: object

For negative values of n

>>> ser.head(-3)
0    alligator
1          bee
2       falcon
3         lion
4       monkey
5       parrot
dtype: object

DataFrame

>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df.head(2)
   key   val
0    0  10.0
1    1  11.0
intersection(other, sort=False)#

Form the intersection of two Index objects.

This returns a new Index with elements common to the index and other.

Parameters#

other : Index or array-like sort : False or None, default False

Whether to sort the resulting index.

  • False : do not sort the result.

  • None : sort the result, except when self and other are equal or when the values cannot be compared.

Returns#

intersection : Index

Examples#

>>> import cudf
>>> import pandas as pd
>>> idx1 = cudf.Index([1, 2, 3, 4])
>>> idx2 = cudf.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')

MultiIndex case

>>> idx1 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 3, 4], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx2 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx1
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (3,  'Red'),
            (4, 'Blue')],
        )
>>> idx2
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (2,  'Red'),
            (2, 'Blue')],
        )
>>> idx1.intersection(idx2)
MultiIndex([(1,  'Red'),
            (1, 'Blue')],
        )
>>> idx1.intersection(idx2, sort=False)
MultiIndex([(1,  'Red'),
            (1, 'Blue')],
        )
is_boolean()#

Check if the Index only consists of booleans.

Deprecated since version 23.04: Use cudf.api.types.is_bool_dtype instead.

Returns#

bool

Whether or not the Index only consists of booleans.

See Also#

is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = cudf.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = cudf.Index([1, 2, 3])
>>> idx.is_boolean()
False
is_categorical()#

Check if the Index holds categorical data.

Deprecated since version 23.04: Use cudf.api.types.is_categorical_dtype instead.

Returns#

bool

True if the Index is categorical.

See Also#

CategoricalIndex : Index for categorical data. is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index(["Watermelon", "Orange", "Apple",
...                 "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = cudf.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = cudf.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0        Peter
1       Victor
2    Elisabeth
3          Mar
dtype: object
>>> s.index.is_categorical()
False
is_floating()#

Check if the Index is a floating type.

The Index may consist of only floats, NaNs, or a mix of floats, integers, or NaNs.

Deprecated since version 23.04: Use cudf.api.types.is_float_dtype instead.

Returns#

bool

Whether or not the Index only consists of only consists of floats, NaNs, or a mix of floats, integers, or NaNs.

See Also#

is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = cudf.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = cudf.Index([1, 2, 3, 4, np.nan], nan_as_null=False)
>>> idx.is_floating()
True
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
is_integer()#

Check if the Index only consists of integers.

Deprecated since version 23.04: Use cudf.api.types.is_integer_dtype instead.

Returns#

bool

Whether or not the Index only consists of integers.

See Also#

is_boolean : Check if the Index only consists of booleans. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = cudf.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
is_interval()#

Check if the Index holds Interval objects.

Deprecated since version 23.04: Use cudf.api.types.is_interval_dtype instead.

Returns#

bool

Whether or not the Index holds Interval objects.

See Also#

IntervalIndex : Index for Interval objects. is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data.

Examples#

>>> import cudf
>>> import pandas as pd
>>> idx = cudf.from_pandas(
...     pd.Index([pd.Interval(left=0, right=5),
...               pd.Interval(left=5, right=10)])
... )
>>> idx.is_interval()
True
>>> idx = cudf.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
property is_monotonic#

Return boolean if values in the object are monotonically increasing.

This property is an alias for is_monotonic_increasing.

Returns#

bool

property is_monotonic_decreasing#

Return boolean if values in the object are monotonically decreasing.

Returns#

bool

property is_monotonic_increasing#

Return boolean if values in the object are monotonically increasing.

Returns#

bool

is_numeric()#

Check if the Index only consists of numeric data.

Deprecated since version 23.04: Use cudf.api.types.is_any_real_numeric_dtype instead.

Returns#

bool

Whether or not the Index only consists of numeric data.

See Also#

is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = cudf.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = cudf.Index(["Apple", "cold"])
>>> idx.is_numeric()
False
is_object()#

Check if the Index is of the object dtype.

Deprecated since version 23.04: Use cudf.api.types.is_object_dtype instead.

Returns#

bool

Whether or not the Index is of the object dtype.

See Also#

is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.

Examples#

>>> import cudf
>>> idx = cudf.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = cudf.Index(["Watermelon", "Orange", "Apple",
...                 "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
property is_unique#

Return if the index has unique values.

isin(values)#

Return a boolean array where the index values are in values.

Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.

Parameters#

valuesset, list-like, Index

Sought values.

Returns#

is_containedcupy array

CuPy array of boolean values.

Examples#

>>> idx = cudf.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether each index value in a list of values.

>>> idx.isin([1, 4])
array([ True, False, False])
isna()#

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf in case of float are not considered <NA> values.

Returns#

DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples#

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isna()
array([False, False,  True,  True, False, False])
isnull()#

Identify missing values.

Return a boolean same-sized object indicating if the values are <NA>. <NA> values gets mapped to True values. Everything else gets mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf in case of float are not considered <NA> values.

Returns#

DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is an NA value.

Examples#

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.isna()
     age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.isna()
0    False
1    False
2     True
3    False
4    False
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.isna()
array([False, False,  True,  True, False, False])
join(other, how='left', level=None, return_indexers=False, sort=False)#

Compute join_index and indexers to conform data structures to the new index.

Parameters#

other : Index. how : {‘left’, ‘right’, ‘inner’, ‘outer’} return_indexers : bool, default False sort : bool, default False

Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).

Returns: index

Examples#

>>> import cudf
>>> lhs = cudf.DataFrame({
...     "a": [2, 3, 1],
...     "b": [3, 4, 2],
... }).set_index(['a', 'b']).index
>>> lhs
MultiIndex([(2, 3),
            (3, 4),
            (1, 2)],
           names=['a', 'b'])
>>> rhs = cudf.DataFrame({"a": [1, 4, 3]}).set_index('a').index
>>> rhs
Int64Index([1, 4, 3], dtype='int64', name='a')
>>> lhs.join(rhs, how='inner')
MultiIndex([(3, 4),
            (1, 2)],
           names=['a', 'b'])
kurt(axis=<no_default>, skipna=True, level=None, numeric_only=None, **kwargs)#

Return Fisher’s unbiased kurtosis of a sample.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

Returns#

Series or scalar

Notes#

Parameters currently not supported are level and numeric_only

Examples#

Series

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4])
>>> series.kurtosis()
-1.1999999999999904

DataFrame

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.kurt()
a   -1.2
b   -1.2
dtype: float64
kurtosis(axis=<no_default>, skipna=True, level=None, numeric_only=None, **kwargs)#

Return Fisher’s unbiased kurtosis of a sample.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

Returns#

Series or scalar

Notes#

Parameters currently not supported are level and numeric_only

Examples#

Series

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4])
>>> series.kurtosis()
-1.1999999999999904

DataFrame

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.kurt()
a   -1.2
b   -1.2
dtype: float64
mask(cond, other=None, inplace=False)#

Replace values where the condition is True.

Parameters#

condbool Series/DataFrame, array-like

Where cond is False, keep the original value. Where True, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is True are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns#

Same type as caller

Examples#

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.mask(df % 2 == 0, [-1, -1])
   A  B
0  1  3
1 -1  5
2  5 -1
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.mask(ser > 2, 10)
0    10
1    10
2     2
3     1
4     0
dtype: int64
>>> ser.mask(ser > 2)
0    <NA>
1    <NA>
2       2
3       1
4       0
dtype: int64
max(axis=<no_default>, skipna=True, level=None, numeric_only=None, **kwargs)#

Return the maximum of the values in the DataFrame.

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

level: int or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_only: bool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.

Returns#

Series

Notes#

Parameters currently not supported are level, numeric_only.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.max()
a     4
b    10
dtype: int64
mean(axis=<no_default>, skipna=True, level=None, numeric_only=None, **kwargs)#

Return the mean of the values for the requested axis.

Parameters#

axis{0 or ‘index’, 1 or ‘columns’}

Axis for the function to be applied on.

skipnabool, default True

Exclude NA/null values when computing the result.

levelint or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_onlybool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**kwargs

Additional keyword arguments to be passed to the function.

Returns#

mean : Series or DataFrame (if level specified)

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.mean()
a    2.5
b    8.5
dtype: float64
median(axis=None, skipna=True, level=None, numeric_only=None, **kwargs)#

Return the median of the values for the requested axis.

Parameters#

skipnabool, default True

Exclude NA/null values when computing the result.

Returns#

scalar

Notes#

Parameters currently not supported are level and numeric_only.

Examples#

>>> import cudf
>>> ser = cudf.Series([10, 25, 3, 25, 24, 6])
>>> ser
0    10
1    25
2     3
3    25
4    24
5     6
dtype: int64
>>> ser.median()
17.0
memory_usage(deep=False)#

Return the memory usage of an object.

Parameters#

deepbool

The deep parameter is ignored and is only included for pandas compatibility.

Returns#

The total bytes used.

min(axis=<no_default>, skipna=True, level=None, numeric_only=None, **kwargs)#

Return the minimum of the values in the DataFrame.

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

level: int or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series.

numeric_only: bool, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.

Returns#

Series

Notes#

Parameters currently not supported are level, numeric_only.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.min()
a    1
b    7
dtype: int64
property name#

Get the name of this object.

property names#

Returns a tuple containing the name of the Index.

nans_to_nulls()#

Convert nans (if any) to nulls

Returns#

DataFrame or Series

Examples#

Series

>>> import cudf, numpy as np
>>> series = cudf.Series([1, 2, np.nan, None, 10], nan_as_null=False)
>>> series
0     1.0
1     2.0
2     NaN
3    <NA>
4    10.0
dtype: float64
>>> series.nans_to_nulls()
0     1.0
1     2.0
2    <NA>
3    <NA>
4    10.0
dtype: float64

DataFrame

>>> df = cudf.DataFrame()
>>> df['a'] = cudf.Series([1, None, np.nan], nan_as_null=False)
>>> df['b'] = cudf.Series([None, 3.14, np.nan], nan_as_null=False)
>>> df
      a     b
0   1.0  <NA>
1  <NA>  3.14
2   NaN   NaN
>>> df.nans_to_nulls()
      a     b
0   1.0  <NA>
1  <NA>  3.14
2  <NA>  <NA>
property ndim#

Number of dimensions of the underlying data, by definition 1.

property nlevels#

Number of levels.

notna()#

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf in case of float are not considered <NA> values.

Returns#

DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples#

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notna()
array([ True,  True, False, False,  True,  True])
notnull()#

Identify non-missing values.

Return a boolean same-sized object indicating if the values are not <NA>. Non-missing values get mapped to True. <NA> values get mapped to False values. <NA> values include:

  • Values where null mask is set.

  • NaN in float dtype.

  • NaT in datetime64 and timedelta64 types.

Characters such as empty strings '' or inf in case of float are not considered <NA> values.

Returns#

DataFrame/Series/Index

Mask of bool values for each element in the object that indicates whether an element is not an NA value.

Examples#

Show which entries in a DataFrame are NA.

>>> import cudf
>>> import numpy as np
>>> import pandas as pd
>>> df = cudf.DataFrame({'age': [5, 6, np.NaN],
...                    'born': [pd.NaT, pd.Timestamp('1939-05-27'),
...                             pd.Timestamp('1940-04-25')],
...                    'name': ['Alfred', 'Batman', ''],
...                    'toy': [None, 'Batmobile', 'Joker']})
>>> df
    age                        born    name        toy
0     5                        <NA>  Alfred       <NA>
1     6  1939-05-27 00:00:00.000000  Batman  Batmobile
2  <NA>  1940-04-25 00:00:00.000000              Joker
>>> df.notna()
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are NA.

>>> ser = cudf.Series([5, 6, np.NaN, np.inf, -np.inf])
>>> ser
0     5.0
1     6.0
2    <NA>
3     Inf
4    -Inf
dtype: float64
>>> ser.notna()
0     True
1     True
2    False
3     True
4     True
dtype: bool

Show which entries in an Index are NA.

>>> idx = cudf.Index([1, 2, None, np.NaN, 0.32, np.inf])
>>> idx
Float64Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64')
>>> idx.notna()
array([ True,  True, False, False,  True,  True])
nunique(dropna: bool = True)#

Return count of unique values for the column.

Parameters#

dropnabool, default True

Don’t include NaN in the counts.

Returns#

int

Number of unique values in the column.

pipe(func, *args, **kwargs)#

Apply func(self, *args, **kwargs).

Parameters#

funcfunction

Function to apply to the Series/DataFrame/Index. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame/Index.

argsiterable, optional

Positional arguments passed into func.

kwargsmapping, optional

A dictionary of keyword arguments passed into func.

Returns#

object : the return type of func.

Examples#

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

>>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

>>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )
prod(axis=<no_default>, skipna=True, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)#

Return product of the values in the DataFrame.

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

dtype: data type

Data type to cast the result to.

min_count: int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns#

Series

Notes#

Parameters currently not supported are level`, numeric_only.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.product()
a      24
b    5040
dtype: int64
product(axis=<no_default>, skipna=True, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)#

Return product of the values in the DataFrame.

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

dtype: data type

Data type to cast the result to.

min_count: int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns#

Series

Notes#

Parameters currently not supported are level`, numeric_only.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.product()
a      24
b    5040
dtype: int64
rename(name, inplace=False)#

Alter Index name.

Defaults to returning new index.

Parameters#

namelabel

Name(s) to set.

Returns#

Index

Examples#

>>> import cudf
>>> index = cudf.Index([1, 2, 3], name='one')
>>> index
Int64Index([1, 2, 3], dtype='int64', name='one')
>>> index.name
'one'
>>> renamed_index = index.rename('two')
>>> renamed_index
Int64Index([1, 2, 3], dtype='int64', name='two')
>>> renamed_index.name
'two'
repeat(repeats, axis=None)#

Repeat elements of a Index.

Returns a new Index where each element of the current Index is repeated consecutively a given number of times.

Parameters#

repeatsint, or array of ints

The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.

Returns#

Index

A newly created object of same type as caller with repeated elements.

Examples#

>>> index = cudf.Index([10, 22, 33, 55])
>>> index
Int64Index([10, 22, 33, 55], dtype='int64')
>>> index.repeat(5)
Int64Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33,
            33, 33, 33, 33, 55, 55, 55, 55, 55],
        dtype='int64')
rolling(window, min_periods=None, center=False, axis=0, win_type=None)#

Rolling window calculations.

Parameters#

windowint, offset or a BaseIndexer subclass

Size of the window, i.e., the number of observations used to calculate the statistic. For datetime indexes, an offset can be provided instead of an int. The offset must be convertible to a timedelta. As opposed to a fixed window size, each window will be sized to accommodate observations within the time period specified by the offset. If a BaseIndexer subclass is passed, calculates the window boundaries based on the defined get_window_bounds method.

min_periodsint, optional

The minimum number of observations in the window that are required to be non-null, so that the result is non-null. If not provided or None, min_periods is equal to the window size.

centerbool, optional

If True, the result is set at the center of the window. If False (default), the result is set at the right edge of the window.

Returns#

Rolling object.

Examples#

>>> import cudf
>>> a = cudf.Series([1, 2, 3, None, 4])

Rolling sum with window size 2.

>>> print(a.rolling(2).sum())
0
1    3
2    5
3
4
dtype: int64

Rolling sum with window size 2 and min_periods 1.

>>> print(a.rolling(2, min_periods=1).sum())
0    1
1    3
2    5
3    3
4    4
dtype: int64

Rolling count with window size 3.

>>> print(a.rolling(3).count())
0    1
1    2
2    3
3    2
4    2
dtype: int64

Rolling count with window size 3, but with the result set at the center of the window.

>>> print(a.rolling(3, center=True).count())
0    2
1    3
2    2
3    2
4    1 dtype: int64

Rolling max with variable window size specified by an offset; only valid for datetime index.

>>> a = cudf.Series(
...     [1, 9, 5, 4, np.nan, 1],
...     index=[
...         pd.Timestamp('20190101 09:00:00'),
...         pd.Timestamp('20190101 09:00:01'),
...         pd.Timestamp('20190101 09:00:02'),
...         pd.Timestamp('20190101 09:00:04'),
...         pd.Timestamp('20190101 09:00:07'),
...         pd.Timestamp('20190101 09:00:08')
...     ]
... )
>>> print(a.rolling('2s').max())
2019-01-01T09:00:00.000    1
2019-01-01T09:00:01.000    9
2019-01-01T09:00:02.000    9
2019-01-01T09:00:04.000    4
2019-01-01T09:00:07.000
2019-01-01T09:00:08.000    1
dtype: int64

Apply custom function on the window with the apply method

>>> import numpy as np
>>> import math
>>> b = cudf.Series([16, 25, 36, 49, 64, 81], dtype=np.float64)
>>> def some_func(A):
...     b = 0
...     for a in A:
...         b = b + math.sqrt(a)
...     return b
...
>>> print(b.rolling(3, min_periods=1).apply(some_func))
0     4.0
1     9.0
2    15.0
3    18.0
4    21.0
5    24.0
dtype: float64

And this also works for window rolling set by an offset

>>> import pandas as pd
>>> c = cudf.Series(
...     [16, 25, 36, 49, 64, 81],
...     index=[
...          pd.Timestamp('20190101 09:00:00'),
...          pd.Timestamp('20190101 09:00:01'),
...          pd.Timestamp('20190101 09:00:02'),
...          pd.Timestamp('20190101 09:00:04'),
...          pd.Timestamp('20190101 09:00:07'),
...          pd.Timestamp('20190101 09:00:08')
...      ],
...     dtype=np.float64
... )
>>> print(c.rolling('2s').apply(some_func))
2019-01-01T09:00:00.000     4.0
2019-01-01T09:00:01.000     9.0
2019-01-01T09:00:02.000    11.0
2019-01-01T09:00:04.000     7.0
2019-01-01T09:00:07.000     8.0
2019-01-01T09:00:08.000    17.0
dtype: float64
searchsorted(values, side='left', ascending=True, na_position='last')#

Find indices where elements should be inserted to maintain order

Parameters#

valueFrame (Shape must be consistent with self)

Values to be hypothetically inserted into Self

sidestr {‘left’, ‘right’} optional, default ‘left’

If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index

ascendingbool optional, default True

Sorted Frame is in ascending order (otherwise descending)

na_positionstr {‘last’, ‘first’} optional, default ‘last’

Position of null values in sorted order

Returns#

1-D cupy array of insertion points

Examples#

>>> s = cudf.Series([1, 2, 3])
>>> s.searchsorted(4)
3
>>> s.searchsorted([0, 4])
array([0, 3], dtype=int32)
>>> s.searchsorted([1, 3], side='left')
array([0, 2], dtype=int32)
>>> s.searchsorted([1, 3], side='right')
array([1, 3], dtype=int32)

If the values are not monotonically sorted, wrong locations may be returned:

>>> s = cudf.Series([2, 1, 3])
>>> s.searchsorted(1)
0   # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]})
>>> df
   a   b
0  1  10
1  3  12
2  5  14
3  7  16
>>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6],
... 'b': [10, 11, 13, 15]})
>>> values_df
   a   b
0  0  10
1  2  17
2  5  13
3  6  15
>>> df.searchsorted(values_df, ascending=False)
array([4, 4, 4, 0], dtype=int32)
set_names(names, level=None, inplace=False)#

Set Index or MultiIndex name. Able to set new names partially and by level.

Parameters#

nameslabel or list of label

Name(s) to set.

levelint, label or list of int or label, optional

If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.

inplacebool, default False

Modifies the object directly, instead of creating a new Index or MultiIndex.

Returns#

Index

The same type as the caller or None if inplace is True.

See Also#

cudf.Index.rename : Able to set new names without level.

Examples#

>>> import cudf
>>> idx = cudf.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = cudf.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
            ('python', 2019),
            ( 'cobra', 2018),
            ( 'cobra', 2019)],
           )
>>> idx.names
FrozenList([None, None])
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx.names
FrozenList(['kind', 'year'])
>>> idx.set_names('species', level=0, inplace=True)
>>> idx.names
FrozenList(['species', 'year'])
property shape#

Get a tuple representing the dimensionality of the Index.

shift(periods=1, freq=None)#

Not yet implemented

property size#

Return the number of elements in the underlying data.

Returns#

size : Size of the DataFrame / Index / Series / MultiIndex

Examples#

Size of an empty dataframe is 0.

>>> import cudf
>>> df = cudf.DataFrame()
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df.size
0
>>> df = cudf.DataFrame(index=[1, 2, 3])
>>> df
Empty DataFrame
Columns: []
Index: [1, 2, 3]
>>> df.size
0

DataFrame with values

>>> df = cudf.DataFrame({'a': [10, 11, 12],
...         'b': ['hello', 'rapids', 'ai']})
>>> df
    a       b
0  10   hello
1  11  rapids
2  12      ai
>>> df.size
6
>>> df.index
RangeIndex(start=0, stop=3)
>>> df.index.size
3

Size of an Index

>>> index = cudf.Index([])
>>> index
Float64Index([], dtype='float64')
>>> index.size
0
>>> index = cudf.Index([1, 2, 3, 10])
>>> index
Int64Index([1, 2, 3, 10], dtype='int64')
>>> index.size
4

Size of a MultiIndex

>>> midx = cudf.MultiIndex(
...                 levels=[["a", "b", "c", None], ["1", None, "5"]],
...                 codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...                 names=["x", "y"],
...             )
>>> midx
MultiIndex([( 'a',  '1'),
            ( 'a',  '5'),
            ( 'b', <NA>),
            ( 'c', <NA>),
            (<NA>,  '1')],
           names=['x', 'y'])
>>> midx.size
5
skew(axis=<no_default>, skipna=True, level=None, numeric_only=None, **kwargs)#

Return unbiased Fisher-Pearson skew of a sample.

Parameters#

skipna: bool, default True

Exclude NA/null values when computing the result.

Returns#

Series

Notes#

Parameters currently not supported are axis, level and numeric_only

Examples#

Series

>>> import cudf
>>> series = cudf.Series([1, 2, 3, 4, 5, 6, 6])
>>> series
0    1
1    2
2    3
3    4
4    5
5    6
6    6
dtype: int64

DataFrame

>>> import cudf
>>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 8, 10, 10]})
>>> df.skew()
a    0.00000
b   -0.37037
dtype: float64
sort_values(return_indexer=False, ascending=True, na_position='last', key=None)#

Return a sorted copy of the index, and optionally return the indices that sorted the index itself.

Parameters#

return_indexerbool, default False

Should the indices that would sort the index be returned.

ascendingbool, default True

Should the index values be sorted in an ascending order.

na_position{‘first’ or ‘last’}, default ‘last’

Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.

keyNone, optional

This parameter is NON-FUNCTIONAL.

Returns#

sorted_indexIndex

Sorted copy of the index.

indexercupy.ndarray, optional

The indices that the index itself was sorted by.

See Also#

cudf.Series.min : Sort values of a Series. cudf.DataFrame.sort_values : Sort values in a DataFrame.

Examples#

>>> import cudf
>>> idx = cudf.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

Sort values in ascending order (default behavior).

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

Sort values in descending order, and also get the indices idx was sorted by.

>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2],
                                                    dtype=int32))

Sorting values in a MultiIndex:

>>> midx = cudf.MultiIndex(
...      levels=[[1, 3, 4, -10], [1, 11, 5]],
...      codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]],
...      names=["x", "y"],
... )
>>> midx
MultiIndex([(  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11),
            (-10,  1)],
           names=['x', 'y'])
>>> midx.sort_values()
MultiIndex([(-10,  1),
            (  1,  1),
            (  1,  5),
            (  3, 11),
            (  4, 11)],
           names=['x', 'y'])
>>> midx.sort_values(ascending=False)
MultiIndex([(  4, 11),
            (  3, 11),
            (  1,  5),
            (  1,  1),
            (-10,  1)],
           names=['x', 'y'])
std(axis=<no_default>, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)#

Return sample standard deviation of the DataFrame.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddof: int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns#

Series

Notes#

Parameters currently not supported are level and numeric_only

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.std()
a    1.290994
b    1.290994
dtype: float64
property str#

Not yet implemented.

sum(axis=<no_default>, skipna=True, dtype=None, level=None, numeric_only=None, min_count=0, **kwargs)#

Return sum of the values in the DataFrame.

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values when computing the result.

dtype: data type

Data type to cast the result to.

min_count: int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

The default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

Returns#

Series

Notes#

Parameters currently not supported are level, numeric_only.

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.sum()
a    10
b    34
dtype: int64
tail(n=5)#

Returns the last n rows as a new DataFrame or Series

Examples#

DataFrame

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df.tail(2)
   key   val
3    3  13.0
4    4  14.0

Series

>>> import cudf
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.tail(2)
3    1
4    0
take(indices, axis=0, allow_fill=True, fill_value=None)#

Return a new index containing the rows specified by indices

Parameters#

indicesarray-like

Array of ints indicating which positions to take.

axisint

The axis over which to select values, always 0.

allow_fill : Unsupported fill_value : Unsupported

Returns#

outIndex

New object with desired subset of rows.

Examples#

>>> idx = cudf.Index(['a', 'b', 'c', 'd', 'e'])
>>> idx.take([2, 0, 4, 3])
StringIndex(['c' 'a' 'e' 'd'], dtype='object')
to_arrow()#

Convert to a PyArrow Array.

Returns#

PyArrow Array

Examples#

>>> import cudf
>>> sr = cudf.Series(["a", "b", None])
>>> sr.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7600>
[
  "a",
  "b",
  null
]
>>> ind = cudf.Index(["a", "b", None])
>>> ind.to_arrow()
<pyarrow.lib.StringArray object at 0x7f796b0e7750>
[
  "a",
  "b",
  null
]
to_cupy(dtype: Dtype | None = None, copy: bool = True, na_value=None) cupy.ndarray#

Convert the Frame to a CuPy array.

Parameters#

dtypestr or numpy.dtype, optional

The dtype to pass to numpy.asarray().

copybool, default False

Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_cupy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_valueAny, default None

The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

Returns#

cupy.ndarray

to_dlpack()#

Converts a cuDF object into a DLPack tensor.

DLPack is an open-source memory tensor structure: dmlc/dlpack.

This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.

Parameters#

cudf_obj : DataFrame, Series, Index, or Column

Returns#

pycapsule_objPyCapsule

Output DLPack tensor pointer which is encapsulated in a PyCapsule object.

to_frame(index=True, name=<no_default>)#

Create a DataFrame with a column containing this Index

Parameters#

indexboolean, default True

Set the index of the returned DataFrame as the original Index

nameobject, defaults to index.name

The passed name should substitute for the index name (if it has one).

Returns#

DataFrame

DataFrame containing the original Index data.

See Also#

Index.to_series : Convert an Index to a Series. Series.to_frame : Convert Series to DataFrame.

Examples#

>>> import cudf
>>> idx = cudf.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
       animal
animal
Ant       Ant
Bear     Bear
Cow       Cow

By default, the original Index is reused. To enforce a new Index:

>>> idx.to_frame(index=False)
    animal
0   Ant
1  Bear
2   Cow

To override the name of the resulting column, specify name:

>>> idx.to_frame(index=False, name='zoo')
    zoo
0   Ant
1  Bear
2   Cow
to_hdf(path_or_buf, key, *args, **kwargs)#

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.

For more information see the user guide.

Parameters#

path_or_bufstr or pandas.HDFStore

File path or HDFStore object.

keystr

Identifier for the group in the store.

mode{‘a’, ‘w’, ‘r+’}, default ‘a’

Mode to open file:

  • ‘w’: write, a new file is created (an existing file with the same name would be deleted).

  • ‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.

  • ‘r+’: similar to ‘a’, but the file must already exist.

format{‘fixed’, ‘table’}, default ‘fixed’

Possible values:

  • ‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.

  • ‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.

appendbool, default False

For Table formats, append the input data to the existing.

data_columnslist of columns or True, optional

List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via Data Columns. Applicable only to format=’table’.

complevel{0-9}, optional

Specifies a compression level for data. A value of 0 disables compression.

complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’

Specifies the compression library to be used. As of v0.20.2 these additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

fletcher32bool, default False

If applying compression use the fletcher32 checksum.

dropnabool, default False

If true, ALL nan rows will not be written to store.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

See Also#

cudf.read_hdf : Read from HDF file. cudf.DataFrame.to_parquet : Write a DataFrame to the binary parquet format. cudf.DataFrame.to_feather : Write out feather-format for DataFrames.

to_json(path_or_buf=None, *args, **kwargs)#

Convert the cuDF object to a JSON string. Note nulls and NaNs will be converted to null and datetime objects will be converted to UNIX timestamps.

Parameters#

path_or_bufstring or file handle, optional

File path or object. If not specified, the result is returned as a string.

engine{{ ‘auto’, ‘cudf’, ‘pandas’ }}, default ‘auto’

Parser engine to use. If ‘auto’ is passed, the pandas engine will be selected.

orientstring

Indication of expected JSON string format.

  • Series
    • default is ‘index’

    • allowed values are: {‘split’,’records’,’index’,’table’}

  • DataFrame
    • default is ‘columns’

    • allowed values are: {‘split’,’records’,’index’,’columns’,’values’,’table’}

  • The format of the JSON string
    • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

    • ‘records’ : list like [{column -> value}, … , {column -> value}]

    • ‘index’ : dict like {index -> {column -> value}}

    • ‘columns’ : dict like {column -> {index -> value}}

    • ‘values’ : just the values array

    • ‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}} describing the data, and the data component is like orient='records'.

date_format{None, ‘epoch’, ‘iso’}

Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.

double_precisionint, default 10

The number of decimal places to use when encoding floating point values.

force_asciibool, default True

Force encoded string to be ASCII.

date_unitstring, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.

default_handlercallable, default None

Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serializable object.

linesbool, default False

If ‘orient’ is ‘records’ write out line delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list like.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}

A string representing the compression to use in the output file, only used when the first argument is a filename. By default, the compression is inferred from the filename.

indexbool, default True

Whether to include the index values in the JSON string. Not including the index (index=False) is only supported when orient is ‘split’ or ‘table’.

See Also#

cudf.read_json

to_list()#
to_numpy(dtype: Dtype | None = None, copy: bool = True, na_value=None) numpy.ndarray#

Convert the Frame to a NumPy array.

Parameters#

dtypestr or numpy.dtype, optional

The dtype to pass to numpy.asarray().

copybool, default True

Whether to ensure that the returned value is not a view on another array. This parameter must be True since cuDF must copy device memory to host to provide a numpy array.

na_valueAny, default None

The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

Returns#

numpy.ndarray

to_pandas(nullable=False)#

Convert to a Pandas Index.

Parameters#

nullablebool, Default False

If nullable is True, the resulting index will have a corresponding nullable Pandas dtype. If there is no corresponding nullable Pandas dtype present, the resulting dtype will be a regular pandas dtype. If nullable is False, the resulting index will either convert null values to np.nan or None depending on the dtype.

Examples#

>>> import cudf
>>> idx = cudf.Index([-3, 10, 15, 20])
>>> idx
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> idx.to_pandas()
Int64Index([-3, 10, 15, 20], dtype='int64')
>>> type(idx.to_pandas())
<class 'pandas.core.indexes.numeric.Int64Index'>
>>> type(idx)
<class 'cudf.core.index.Int64Index'>
to_series(index=None, name=None)#

Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.

Parameters#

indexIndex, optional

Index of resulting Series. If None, defaults to original index.

namestr, optional

Name of resulting Series. If None, defaults to name of original index.

Returns#

Series

The dtype will be based on the type of the Index values.

to_string()#

Convert to string

cuDF uses Pandas internals for efficient string formatting. Set formatting options using pandas string formatting options and cuDF objects will print identically to Pandas objects.

cuDF supports null/None as a value in any column type, which is transparently supported during this output process.

Examples#

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2]
>>> df['val'] = [float(i + 10) for i in range(3)]
>>> df.to_string()
'   key   val\n0    0  10.0\n1    1  11.0\n2    2  12.0'
tolist()#
union(other, sort=None)#

Form the union of two Index objects.

Parameters#

other : Index or array-like sort : bool or None, default None

Whether to sort the resulting Index.

  • None : Sort the result, except when

    1. self and other are equal.

    2. self or other has length 0.

  • False : do not sort the result.

Returns#

union : Index

Examples#

Union of an Index >>> import cudf >>> import pandas as pd >>> idx1 = cudf.Index([1, 2, 3, 4]) >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx1.union(idx2) Int64Index([1, 2, 3, 4, 5, 6], dtype=’int64’)

MultiIndex case

>>> idx1 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
...    )
... )
>>> idx1
MultiIndex([(1,  'Red'),
            (1, 'Blue'),
            (2,  'Red'),
            (2, 'Blue')],
           )
>>> idx2 = cudf.MultiIndex.from_pandas(
...    pd.MultiIndex.from_arrays(
...         [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
...    )
... )
>>> idx2
MultiIndex([(3,   'Red'),
            (3, 'Green'),
            (2,   'Red'),
            (2, 'Green')],
           )
>>> idx1.union(idx2)
MultiIndex([(1,  'Blue'),
            (1,   'Red'),
            (2,  'Blue'),
            (2, 'Green'),
            (2,   'Red'),
            (3, 'Green'),
            (3,   'Red')],
           )
>>> idx1.union(idx2, sort=False)
MultiIndex([(1,   'Red'),
            (1,  'Blue'),
            (2,   'Red'),
            (2,  'Blue'),
            (3,   'Red'),
            (3, 'Green'),
            (2, 'Green')],
           )
unique()#

Return unique values in the index.

Returns#

Index without duplicates

property values#

Return a CuPy representation of the DataFrame.

Only the values in the DataFrame will be returned, the axes labels will be removed.

Returns#

cupy.ndarray

The values of the DataFrame.

property values_host#

Return a NumPy representation of the data.

Only the values in the DataFrame will be returned, the axes labels will be removed.

Returns#

numpy.ndarray

A host representation of the underlying data.

var(axis=<no_default>, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)#

Return unbiased variance of the DataFrame.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters#

axis: {index (0), columns(1)}

Axis for the function to be applied on.

skipna: bool, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddof: int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns#

scalar

Notes#

Parameters currently not supported are level and numeric_only

Examples#

>>> import cudf
>>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]})
>>> df.var()
a    1.666667
b    1.666667
dtype: float64
where(cond, other=None, inplace=False)#

Replace values where the condition is False.

Parameters#

condbool Series/DataFrame, array-like

Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.

other: scalar, list of scalars, Series/DataFrame

Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.

DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.

Series expects only scalar or series like with same length

inplacebool, default False

Whether to perform the operation in place on the data.

Returns#

Same type as caller

Examples#

>>> import cudf
>>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]})
>>> df.where(df % 2 == 0, [-1, -1])
   A  B
0 -1 -1
1  4 -1
2 -1  8
>>> ser = cudf.Series([4, 3, 2, 1, 0])
>>> ser.where(ser > 2, 10)
0     4
1     3
2    10
3    10
4    10
dtype: int64
>>> ser.where(ser > 2)
0       4
1       3
2    <NA>
3    <NA>
4    <NA>
dtype: int64