hipdf.DatetimeIndex#
80 min read time
- class hipdf.DatetimeIndex(data, *args, **kwargs)#
Bases:
IndexImmutable , ordered and sliceable sequence of datetime64 data, represented internally as int64.
Parameters#
- dataarray-like (1-dimensional), optional
Optional datetime-like data to construct index with.
- copybool
Make a copy of input.
- freqstr, optional
Frequency of the DatetimeIndex
- tzpytz.timezone or dateutil.tz.tzfile
This is not yet supported
- ambiguous‘infer’, bool-ndarray, ‘NaT’, default ‘raise’
This is not yet supported
- nameobject
Name to be stored in the index.
- dayfirstbool, default False
If True, parse dates in data with the day first order. This is not yet supported
- yearfirstbool, default False
If True parse dates in data with the year first order. This is not yet supported
Attributes#
year month day hour minute second microsecond nanosecond date time dayofyear day_of_year weekday quarter freq
Methods#
ceil floor round tz_convert tz_localize
Returns#
DatetimeIndex
Examples#
>>> import cudf >>> cudf.DatetimeIndex([1, 2, 3, 4], name="a") DatetimeIndex(['1970-01-01 00:00:00.000000001', '1970-01-01 00:00:00.000000002', '1970-01-01 00:00:00.000000003', '1970-01-01 00:00:00.000000004'], dtype='datetime64[ns]', name='a')
- __init__(data=None, freq=None, tz=None, normalize: bool = False, closed=None, ambiguous: Literal['raise'] = 'raise', dayfirst: bool = False, yearfirst: bool = False, dtype=None, copy: bool = False, name=None)#
Methods
__init__([data, freq, tz, normalize, ...])all([axis, skipna])Return whether all elements are True in DataFrame.
any()Return whether any elements is True in DataFrame.
append(other)Append a collection of Index objects together.
argsort([axis, kind, order, ascending, ...])Return the integer indices that would sort the index.
as_unit(unit[, round_ok])Convert to a dtype with the given unit resolution.
astype(dtype[, copy])Create an Index with values cast to dtypes.
ceil(freq)Perform ceil operation on the data to the specified freq.
copy([name, deep])Make a copy of this object.
day_name([locale])Return the day names.
deserialize(header, frames)Generate an object from a serialized representation.
device_deserialize(header, frames)Perform device-side deserialization tasks.
device_serialize()Serialize data and metadata associated with device memory.
difference(other[, sort])Return a new Index with elements from the index that are not in other.
drop_duplicates([keep, nulls_are_equal])Drop duplicate rows in index.
dropna([how])Drop null rows from Index.
duplicated([keep])Indicate duplicate index values.
equals(other)Test whether two objects contain the same elements.
factorize([sort, use_na_sentinel])Encode the input values as integer labels.
fillna([value, method, axis, inplace, limit])Fill null values with
valueor specifiedmethod.find_label_range(loc)Translate a label-based slice to an index-based slice
floor(freq)Perform floor operation on the data to the specified freq.
from_arrow(obj)Create from PyArrow Array/ChunkedArray.
from_pandas(index[, nan_as_null])Convert from a Pandas Index.
get_indexer(target[, method, limit, tolerance])Compute indexer and mask for new index given the current index.
get_level_values(level)Return an Index of values for requested level.
get_loc(key)Get integer location, slice or boolean mask for requested label.
get_slice_bound(label, side)Calculate slice bound that corresponds to given label.
host_deserialize(header, frames)Perform device-side deserialization tasks.
host_serialize()Serialize data and metadata associated with host memory.
intersection(other[, sort])Form the intersection of two Index objects.
Check if the Index only consists of booleans.
Check if the Index holds categorical data.
Check if the Index is a floating type.
Check if the Index only consists of integers.
Check if the Index holds Interval objects.
Check if the Index only consists of numeric data.
Check if the Index is of the object dtype.
isin(values[, level])Return a boolean array where the index values are in values.
isna()Identify missing values.
isnull()Identify missing values.
Returns a DataFrame with the year, week, and day calculated according to the ISO 8601 standard.
join(other[, how, level, return_indexers, sort])Compute join_index and indexers to conform data structures to the new index.
max([axis, skipna, numeric_only])Return the maximum of the values in the DataFrame.
mean(*[, skipna, axis])memory_usage([deep])Return the memory usage of an object.
min([axis, skipna, numeric_only])Return the minimum of the values in the DataFrame.
month_name([locale])Return the month names.
Convert times to midnight.
notna()Identify non-missing values.
notnull()Identify non-missing values.
nunique([dropna])Return count of unique values for the column.
rename(name[, inplace])Alter Index name.
repeat(repeats[, axis])Repeat elements of a Index.
round(freq)Perform round operation on the data to the specified freq.
searchsorted(value[, side, ascending, ...])Find indices where elements should be inserted to maintain order
serialize()Generate an equivalent serializable representation of an object.
set_names(names[, level, inplace])Set Index or MultiIndex name.
shift([periods, freq])Not yet implemented
sort_values([return_indexer, ascending, ...])Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
std(*[, skipna, axis, ddof])strftime(date_format)Convert to Index using specified date_format.
take(indices[, axis, allow_fill, fill_value])Return a new index containing the rows specified by indices
to_arrow()Convert to a PyArrow Array.
to_cupy([dtype, copy, na_value])Convert the Frame to a CuPy array.
Converts a cuDF object into a DLPack tensor.
Identity method.
to_frame([index, name])Create a DataFrame with a column containing this Index
to_list()to_numpy([dtype, copy, na_value])Convert the Frame to a NumPy array.
to_pandas(*[, nullable, arrow_type])Convert to a Pandas Index.
to_period(freq)Return an ndarray of
datetime.datetimeobjects.to_series([index, name])Create a Series with both index and values equal to the index keys.
tolist()Return the transpose, which is by definition self.
tz_convert(tz)Convert tz-aware datetimes from one time zone to another.
tz_localize(tz[, ambiguous, nonexistent])Localize timezone-naive data to timezone-aware data.
union(other[, sort])Form the union of two Index objects.
unique([level])Return unique values in the index.
where(cond[, other, inplace])Replace values where the condition is False.
Attributes
Return the transpose, which is by definition self.
Returns numpy array of python
datetime.dateobjects.The day of the datetime.
Get the day of week that the date falls on.
The day of the year, from 1-365 in non-leap years and from 1-366 in leap years.
The day of the week with Monday=0, Sunday=6.
The day of the year, from 1-365 in non-leap years and from 1-366 in leap years.
Get the total number of days in the month that the date falls on.
Get the total number of days in the month that the date falls on.
dtype of the underlying values in Index.
Return True if there are any NaNs or nulls.
The hours of the datetime.
Boolean indicator if the date belongs to a leap year.
Return boolean if values in the object are monotonically decreasing.
Return boolean if values in the object are monotonically increasing.
Booleans indicating if dates are the last day of the month.
Booleans indicating if dates are the first day of the month.
Returns True if all of the dates are at midnight ("no time")
Booleans indicating if dates are the last day of the quarter.
Booleans indicating if dates are the start day of the quarter.
Return boolean if values in the object are unique.
Booleans indicating if dates are the last day of the year.
Booleans indicating if dates are the first day of the year.
The microseconds of the datetime.
The minutes of the datetime.
The month as January=1, December=12.
Get the name of this object.
Returns a FrozenList containing the name of the Index.
The nanoseconds of the datetime.
Number of dimensions of the underlying data, by definition 1.
Number of levels.
Integer indicator for which quarter of the year the date belongs in.
Returns day, hour, minute, second, millisecond or microsecond
The seconds of the datetime.
Get a tuple representing the dimensionality of the Index.
Return the number of elements in the underlying data.
Vectorized string functions for Series and Index.
Returns numpy array of
datetime.timeobjects.Returns numpy array of
datetime.timeobjects with timezones.Return the timezone.
Alias for tz attribute
Return a CuPy representation of the DataFrame.
Return a NumPy representation of the data.
The day of the week with Monday=0, Sunday=6.
The year of the datetime.
- __init__(data=None, freq=None, tz=None, normalize: bool = False, closed=None, ambiguous: Literal['raise'] = 'raise', dayfirst: bool = False, yearfirst: bool = False, dtype=None, copy: bool = False, name=None)#
- __getitem__(index)#
- copy(name=None, deep=False)#
Make a copy of this object.
Parameters#
- nameobject, default None
Name of index, use original name when None
- deepbool, default True
Make a deep copy of the data. With
deep=Falsethe original data is used
Returns#
New index instance.
- searchsorted(value, side: Literal['left', 'right'] = 'left', ascending: bool = True, na_position: Literal['first', 'last'] = 'last')#
Find indices where elements should be inserted to maintain order
Parameters#
- valueFrame (Shape must be consistent with self)
Values to be hypothetically inserted into Self
- sidestr {‘left’, ‘right’} optional, default ‘left’
If ‘left’, the index of the first suitable location found is given If ‘right’, return the last such index
- sorter1-D array-like, optional
Optional array of integer indices that sort self into ascending order. They are typically the result of
np.argsort. Currently not supported.- ascendingbool optional, default True
Sorted Frame is in ascending order (otherwise descending)
- na_positionstr {‘last’, ‘first’} optional, default ‘last’
Position of null values in sorted order
Returns#
1-D cupy array of insertion points
Examples#
>>> s = cudf.Series([1, 2, 3]) >>> s.searchsorted(4) 3 >>> s.searchsorted([0, 4]) array([0, 3], dtype=int32) >>> s.searchsorted([1, 3], side='left') array([0, 2], dtype=int32) >>> s.searchsorted([1, 3], side='right') array([1, 3], dtype=int32)
If the values are not monotonically sorted, wrong locations may be returned:
>>> s = cudf.Series([2, 1, 3]) >>> s.searchsorted(1) 0 # wrong result, correct would be 1
>>> df = cudf.DataFrame({'a': [1, 3, 5, 7], 'b': [10, 12, 14, 16]}) >>> df a b 0 1 10 1 3 12 2 5 14 3 7 16 >>> values_df = cudf.DataFrame({'a': [0, 2, 5, 6], ... 'b': [10, 11, 13, 15]}) >>> values_df a b 0 0 10 1 2 17 2 5 13 3 6 15 >>> df.searchsorted(values_df, ascending=False) array([4, 4, 4, 0], dtype=int32)
- as_unit(unit: str, round_ok: bool = True) Self#
Convert to a dtype with the given unit resolution.
Currently not implemented.
Parameters#
unit : {‘s’, ‘ms’, ‘us’, ‘ns’} round_ok : bool, default True
If False and the conversion requires rounding, raise ValueError.
- strftime(date_format: str) Index#
Convert to Index using specified date_format.
Return an Index of formatted strings specified by date_format, which supports the same string format as the python standard library.
Parameters#
- date_formatstr
Date format string (e.g. “%Y-%m-%d”).
- property asi8: ndarray#
- property tz: tzinfo | None#
Return the timezone.
Returns#
- datetime.tzinfo or None
Returns None when the array is tz-naive.
- to_pydatetime() ndarray#
Return an ndarray of
datetime.datetimeobjects.Returns#
- numpy.ndarray
An ndarray of
datetime.datetimeobjects.
- to_period(freq) PeriodIndex#
- property time: ndarray#
Returns numpy array of
datetime.timeobjects.The time part of the Timestamps.
- property timetz: ndarray#
Returns numpy array of
datetime.timeobjects with timezones.The time part of the Timestamps.
- property date: ndarray#
Returns numpy array of python
datetime.dateobjects.Namely, the date part of Timestamps without time and timezone information.
- property is_month_start: ndarray#
Booleans indicating if dates are the first day of the month.
- property is_month_end: ndarray#
Booleans indicating if dates are the last day of the month.
- property is_quarter_end: ndarray#
Booleans indicating if dates are the last day of the quarter.
- property is_quarter_start: ndarray#
Booleans indicating if dates are the start day of the quarter.
- property is_year_end: ndarray#
Booleans indicating if dates are the last day of the year.
- property is_year_start: ndarray#
Booleans indicating if dates are the first day of the year.
- property year: Index#
The year of the datetime.
Examples#
>>> import cudf >>> import pandas as pd >>> datetime_index = cudf.Index(pd.date_range("2000-01-01", ... periods=3, freq="Y")) >>> datetime_index DatetimeIndex(['2000-12-31', '2001-12-31', '2002-12-31'], dtype='datetime64[ns]') >>> datetime_index.year Index([2000, 2001, 2002], dtype='int16')
- property month: Index#
The month as January=1, December=12.
Examples#
>>> import cudf >>> import pandas as pd >>> datetime_index = cudf.Index(pd.date_range("2000-01-01", ... periods=3, freq="M")) >>> datetime_index DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31'], dtype='datetime64[ns]') >>> datetime_index.month Index([1, 2, 3], dtype='int16')
- property day: Index#
The day of the datetime.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2000-01-01", ... periods=3, freq="D")) >>> datetime_index DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03'], dtype='datetime64[ns]') >>> datetime_index.day Index([1, 2, 3], dtype='int16')
- property hour: Index#
The hours of the datetime.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2000-01-01", ... periods=3, freq="h")) >>> datetime_index DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:00:00', '2000-01-01 02:00:00'], dtype='datetime64[ns]') >>> datetime_index.hour Index([0, 1, 2], dtype='int16')
- property minute: Index#
The minutes of the datetime.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2000-01-01", ... periods=3, freq="T")) >>> datetime_index DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:01:00', '2000-01-01 00:02:00'], dtype='datetime64[ns]') >>> datetime_index.minute Index([0, 1, 2], dtype='int16')
- property second: Index#
The seconds of the datetime.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2000-01-01", ... periods=3, freq="s")) >>> datetime_index DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:00:01', '2000-01-01 00:00:02'], dtype='datetime64[ns]') >>> datetime_index.second Index([0, 1, 2], dtype='int16')
- property microsecond: Index#
The microseconds of the datetime.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2000-01-01", ... periods=3, freq="us")) >>> datetime_index DatetimeIndex([ '2000-01-01 00:00:00', '2000-01-01 00:00:00.000001', '2000-01-01 00:00:00.000002'], dtype='datetime64[ns]') >>> datetime_index.microsecond Index([0, 1, 2], dtype='int32')
- property nanosecond: Index#
The nanoseconds of the datetime.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2000-01-01", ... periods=3, freq="ns")) >>> datetime_index DatetimeIndex([ '2000-01-01 00:00:00', '2000-01-01 00:00:00.000000001', '2000-01-01 00:00:00.000000002'], dtype='datetime64[ns]') >>> datetime_index.nanosecond Index([0, 1, 2], dtype='int16')
- property weekday: Index#
The day of the week with Monday=0, Sunday=6.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2016-12-31", ... "2017-01-08", freq="D")) >>> datetime_index DatetimeIndex(['2016-12-31', '2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08'], dtype='datetime64[ns]') >>> datetime_index.weekday Index([5, 6, 0, 1, 2, 3, 4, 5, 6], dtype='int16')
- property dayofweek: Index#
The day of the week with Monday=0, Sunday=6.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2016-12-31", ... "2017-01-08", freq="D")) >>> datetime_index DatetimeIndex(['2016-12-31', '2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08'], dtype='datetime64[ns]') >>> datetime_index.dayofweek Index([5, 6, 0, 1, 2, 3, 4, 5, 6], dtype='int16')
- property dayofyear: Index#
The day of the year, from 1-365 in non-leap years and from 1-366 in leap years.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2016-12-31", ... "2017-01-08", freq="D")) >>> datetime_index DatetimeIndex(['2016-12-31', '2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08'], dtype='datetime64[ns]') >>> datetime_index.dayofyear Index([366, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int16')
- property day_of_year: Index#
The day of the year, from 1-365 in non-leap years and from 1-366 in leap years.
Examples#
>>> import pandas as pd >>> import cudf >>> datetime_index = cudf.Index(pd.date_range("2016-12-31", ... "2017-01-08", freq="D")) >>> datetime_index DatetimeIndex(['2016-12-31', '2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08'], dtype='datetime64[ns]') >>> datetime_index.day_of_year Index([366, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int16')
- property is_leap_year: ndarray#
Boolean indicator if the date belongs to a leap year.
A leap year is a year, which has 366 days (instead of 365) including 29th of February as an intercalary day. Leap years are years which are multiples of four with the exception of years divisible by 100 but not by 400.
Returns#
ndarray Booleans indicating if dates belong to a leap year.
- property quarter: Index#
Integer indicator for which quarter of the year the date belongs in.
There are 4 quarters in a year. With the first quarter being from January - March, second quarter being April - June, third quarter being July - September and fourth quarter being October - December.
Returns#
Index Integer indicating which quarter the date belongs to.
Examples#
>>> import cudf >>> gIndex = cudf.DatetimeIndex(["2020-05-31 08:00:00", ... "1999-12-31 18:40:00"]) >>> gIndex.quarter Index([2, 4], dtype='int8')
- day_name(locale: str | None = None) Index#
Return the day names. Currently supports English locale only.
Examples#
>>> import cudf >>> datetime_index = cudf.date_range("2016-12-31", "2017-01-08", freq="D") >>> datetime_index DatetimeIndex(['2016-12-31', '2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08'], dtype='datetime64[ns]', freq='D') >>> datetime_index.day_name() Index(['Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], dtype='object')
- month_name(locale: str | None = None) Index#
Return the month names. Currently supports English locale only.
Examples#
>>> import cudf >>> datetime_index = cudf.date_range("2017-12-30", periods=6, freq='W') >>> datetime_index DatetimeIndex(['2017-12-30', '2018-01-06', '2018-01-13', '2018-01-20', '2018-01-27', '2018-02-03'], dtype='datetime64[ns]', freq='7D') >>> datetime_index.month_name() Index(['December', 'January', 'January', 'January', 'January', 'February'], dtype='object')
- isocalendar() DataFrame#
Returns a DataFrame with the year, week, and day calculated according to the ISO 8601 standard.
Returns#
DataFrame with columns year, week and day
Examples#
>>> gIndex = cudf.DatetimeIndex(["2020-05-31 08:00:00", ... "1999-12-31 18:40:00"]) >>> gIndex.isocalendar() year week day 2020-05-31 08:00:00 2020 22 7 1999-12-31 18:40:00 1999 52 5
- to_pandas(*, nullable: bool = False, arrow_type: bool = False) DatetimeIndex#
Convert to a Pandas Index.
Parameters#
- nullablebool, Default False
If
nullableisTrue, the resulting index will have a corresponding nullable Pandas dtype. If there is no corresponding nullable Pandas dtype present, the resulting dtype will be a regular pandas dtype. IfnullableisFalse, the resulting index will either convert null values tonp.nanorNonedepending on the dtype.- arrow_typebool, Default False
Return the Index with a
pandas.ArrowDtype
Notes#
nullable and arrow_type cannot both be set to
TrueExamples#
>>> import cudf >>> idx = cudf.Index([-3, 10, 15, 20]) >>> idx Index([-3, 10, 15, 20], dtype='int64') >>> idx.to_pandas() Index([-3, 10, 15, 20], dtype='int64') >>> type(idx.to_pandas()) <class 'pandas.core.indexes.base.Index'> >>> type(idx) <class 'cudf.core.index.Index'> >>> idx.to_pandas(arrow_type=True) Index([-3, 10, 15, 20], dtype='int64[pyarrow]')
- ceil(freq: str) Self#
Perform ceil operation on the data to the specified freq.
Parameters#
- freqstr
One of [“D”, “H”, “T”, “min”, “S”, “L”, “ms”, “U”, “us”, “N”]. Must be a fixed frequency like ‘S’ (second) not ‘ME’ (month end). See frequency aliases for more details on these aliases.
Returns#
- DatetimeIndex
Index of the same type for a DatetimeIndex
Examples#
>>> import cudf >>> gIndex = cudf.DatetimeIndex([ ... "2020-05-31 08:05:42", ... "1999-12-31 18:40:30", ... ]) >>> gIndex.ceil("T") DatetimeIndex(['2020-05-31 08:06:00', '1999-12-31 18:41:00'], dtype='datetime64[ns]')
- floor(freq: str) Self#
Perform floor operation on the data to the specified freq.
Parameters#
- freqstr
One of [“D”, “H”, “T”, “min”, “S”, “L”, “ms”, “U”, “us”, “N”]. Must be a fixed frequency like ‘S’ (second) not ‘ME’ (month end). See frequency aliases for more details on these aliases.
Returns#
- DatetimeIndex
Index of the same type for a DatetimeIndex
Examples#
>>> import cudf >>> gIndex = cudf.DatetimeIndex([ ... "2020-05-31 08:59:59", ... "1999-12-31 18:44:59", ... ]) >>> gIndex.floor("T") DatetimeIndex(['2020-05-31 08:59:00', '1999-12-31 18:44:00'], dtype='datetime64[ns]')
- round(freq: str) Self#
Perform round operation on the data to the specified freq.
Parameters#
- freqstr
One of [“D”, “H”, “T”, “min”, “S”, “L”, “ms”, “U”, “us”, “N”]. Must be a fixed frequency like ‘S’ (second) not ‘ME’ (month end). See frequency aliases for more details on these aliases.
Returns#
- DatetimeIndex
Index containing rounded datetimes.
Examples#
>>> import cudf >>> dt_idx = cudf.Index([ ... "2001-01-01 00:04:45", ... "2001-01-01 00:04:58", ... "2001-01-01 00:05:04", ... ], dtype="datetime64[ns]") >>> dt_idx DatetimeIndex(['2001-01-01 00:04:45', '2001-01-01 00:04:58', '2001-01-01 00:05:04'], dtype='datetime64[ns]') >>> dt_idx.round('H') DatetimeIndex(['2001-01-01', '2001-01-01', '2001-01-01'], dtype='datetime64[ns]') >>> dt_idx.round('T') DatetimeIndex(['2001-01-01 00:05:00', '2001-01-01 00:05:00', '2001-01-01 00:05:00'], dtype='datetime64[ns]')
- tz_localize(tz: str | None, ambiguous: Literal['NaT'] = 'NaT', nonexistent: Literal['NaT'] = 'NaT') Self#
Localize timezone-naive data to timezone-aware data.
Parameters#
- tzstr
Timezone to convert timestamps to.
Returns#
DatetimeIndex containing timezone aware timestamps.
Examples#
>>> import cudf >>> import pandas as pd >>> tz_naive = cudf.date_range('2018-03-01 09:00', periods=3, freq='D') >>> tz_aware = tz_naive.tz_localize("America/New_York") >>> tz_aware DatetimeIndex(['2018-03-01 09:00:00-05:00', '2018-03-02 09:00:00-05:00', '2018-03-03 09:00:00-05:00'], dtype='datetime64[ns, America/New_York]', freq='D')
Ambiguous or nonexistent datetimes are converted to NaT.
>>> s = cudf.to_datetime(cudf.Series(['2018-10-28 01:20:00', ... '2018-10-28 02:36:00', ... '2018-10-28 03:46:00'])) >>> s.dt.tz_localize("CET") 0 2018-10-28 01:20:00.000000000 1 NaT 2 2018-10-28 03:46:00.000000000 dtype: datetime64[ns, CET]
Notes#
‘NaT’ is currently the only supported option for the
ambiguousandnonexistentarguments. Any ambiguous or nonexistent timestamps are converted to ‘NaT’.
- tz_convert(tz: str | None) Self#
Convert tz-aware datetimes from one time zone to another.
Parameters#
- tzstr
Time zone for time. Corresponding timestamps would be converted to this time zone of the Datetime Array/Index. A tz of None will convert to UTC and remove the timezone information.
Returns#
DatetimeIndex containing timestamps corresponding to the timezone tz.
Examples#
>>> import cudf >>> dti = cudf.date_range('2018-03-01 09:00', periods=3, freq='D') >>> dti = dti.tz_localize("America/New_York") >>> dti DatetimeIndex(['2018-03-01 09:00:00-05:00', '2018-03-02 09:00:00-05:00', '2018-03-03 09:00:00-05:00'], dtype='datetime64[ns, America/New_York]', freq='D') >>> dti.tz_convert("Europe/London") DatetimeIndex(['2018-03-01 14:00:00+00:00', '2018-03-02 14:00:00+00:00', '2018-03-03 14:00:00+00:00'], dtype='datetime64[ns, Europe/London]')
- repeat(repeats, axis=None) Self#
Repeat elements of a Index.
Returns a new Index where each element of the current Index is repeated consecutively a given number of times.
Parameters#
- repeatsint, or array of ints
The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty object.
Returns#
- Index
A newly created object of same type as caller with repeated elements.
Examples#
>>> index = cudf.Index([10, 22, 33, 55]) >>> index Index([10, 22, 33, 55], dtype='int64') >>> index.repeat(5) Index([10, 10, 10, 10, 10, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 55, 55, 55, 55, 55], dtype='int64')
- property T#
Return the transpose, which is by definition self.
- all(axis=0, skipna=True, **kwargs)#
Return whether all elements are True in DataFrame.
Parameters#
- axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.
- 0 or ‘index’reduce the index, return a Series
whose index is the original column labels.
- 1 or ‘columns’reduce the columns, return a Series
whose index is the original index.
None : reduce all axes, return a scalar.
- skipna: bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
Returns#
Series
Notes#
Parameters currently not supported are bool_only.
Examples#
>>> import cudf >>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]}) >>> df.all() a True b False dtype: bool
- any() bool#
Return whether any elements is True in DataFrame.
Parameters#
- axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.
- 0 or ‘index’reduce the index, return a Series
whose index is the original column labels.
- 1 or ‘columns’reduce the columns, return a Series
whose index is the original index.
None : reduce all axes, return a scalar.
- skipna: bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.
Returns#
Series
Notes#
Parameters currently not supported are bool_only.
Examples#
>>> import cudf >>> df = cudf.DataFrame({'a': [3, 2, 3, 4], 'b': [7, 0, 10, 10]}) >>> df.any() a True b True dtype: bool
- append(other)#
Append a collection of Index objects together.
Parameters#
other : Index or list/tuple of indices
Returns#
appended : Index
Examples#
>>> import cudf >>> idx = cudf.Index([1, 2, 10, 100]) >>> idx Index([1, 2, 10, 100], dtype='int64') >>> other = cudf.Index([200, 400, 50]) >>> other Index([200, 400, 50], dtype='int64') >>> idx.append(other) Index([1, 2, 10, 100, 200, 400, 50], dtype='int64')
append accepts list of Index objects
>>> idx.append([other, other]) Index([1, 2, 10, 100, 200, 400, 50, 200, 400, 50], dtype='int64')
- argsort(axis=0, kind='quicksort', order=None, ascending=True, na_position='last') ndarray#
Return the integer indices that would sort the index.
Parameters#
- axis{0 or “index”}
Has no effect but is accepted for compatibility with numpy.
- kind{‘mergesort’, ‘quicksort’, ‘heapsort’, ‘stable’}, default ‘quicksort’
Choice of sorting algorithm. See
numpy.sort()for more information. ‘mergesort’ and ‘stable’ are the only stable algorithms. Only quicksort is supported in cuDF.- orderNone
Has no effect but is accepted for compatibility with numpy.
- ascendingbool or list of bool, default True
If True, sort values in ascending order, otherwise descending.
- na_position{‘first’ or ‘last’}, default ‘last’
Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.
Returns#
cupy.ndarray: The indices sorted based on input.
- astype(dtype, copy: bool = True) Index#
Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is impossible, a ValueError exception is raised.
Parameters#
- dtype
numpy.dtype Use a
numpy.dtypeto cast entire Index object to.- copybool, default False
By default, astype always returns a newly allocated object. If copy is set to False and internal requirements on dtype are satisfied, the original data is used to create a new Index or the original Index is returned.
Returns#
- Index
Index with values cast to specified dtype.
Examples#
>>> import cudf >>> index = cudf.Index([1, 2, 3]) >>> index Index([1, 2, 3], dtype='int64') >>> index.astype('float64') Index([1.0, 2.0, 3.0], dtype='float64')
- dtype
- difference(other, sort=None)#
Return a new Index with elements from the index that are not in other.
This is the set difference of two Index objects.
Parameters#
other : Index or array-like sort : False or None, default None
Whether to sort the resulting index. By default, the values are attempted to be sorted, but any TypeError from incomparable elements is caught by cudf.
None : Attempt to sort the result, but catch any TypeErrors from comparing incomparable elements.
False : Do not sort the result.
True : Sort the result (which may raise TypeError).
Returns#
difference : Index
Examples#
>>> import cudf >>> idx1 = cudf.Index([2, 1, 3, 4]) >>> idx1 Index([2, 1, 3, 4], dtype='int64') >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx2 Index([3, 4, 5, 6], dtype='int64') >>> idx1.difference(idx2) Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Index([2, 1], dtype='int64')
- drop_duplicates(keep='first', nulls_are_equal=True)#
Drop duplicate rows in index.
- keep{“first”, “last”, False}, default “first”
‘first’ : Drop duplicates except for the first occurrence.
‘last’ : Drop duplicates except for the last occurrence.
False: Drop all duplicates.
- nulls_are_equal: bool, default True
Null elements are considered equal to other null elements.
- dropna(how='any')#
Drop null rows from Index.
- how{“any”, “all”}, default “any”
Specifies how to decide whether to drop a row. “any” (default) drops rows containing at least one null value. “all” drops only rows containing all null values.
- property dtype#
dtype of the underlying values in Index.
- duplicated(keep='first') cupy.ndarray#
Indicate duplicate index values.
Duplicated values are indicated as
Truevalues in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.Parameters#
- keep{‘first’, ‘last’, False}, default ‘first’
The value or values in a set of duplicates to mark as missing.
'first': Mark duplicates asTrueexcept for the first occurrence.'last': Mark duplicates asTrueexcept for the last occurrence.False: Mark all duplicates asTrue.
Returns#
cupy.ndarray[bool]
See Also#
Series.duplicated : Equivalent method on cudf.Series. DataFrame.duplicated : Equivalent method on cudf.DataFrame. Index.drop_duplicates : Remove duplicate values from Index.
Examples#
By default, for each set of duplicated values, the first occurrence is set to False and all others to True:
>>> import cudf >>> idx = cudf.Index(['lama', 'cow', 'lama', 'beetle', 'lama']) >>> idx.duplicated() array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first') array([False, False, True, False, True])
By using ‘last’, the last occurrence of each set of duplicated values is set to False and all others to True:
>>> idx.duplicated(keep='last') array([ True, False, True, False, False])
By setting keep to
False, all duplicates are True:>>> idx.duplicated(keep=False) array([ True, False, True, False, True])
- property empty#
- equals(other) bool#
Test whether two objects contain the same elements.
This function allows two objects to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type.
Parameters#
- otherIndex, Series, DataFrame
The other object to be compared with.
Returns#
- bool
True if all elements are the same in both objects, False otherwise.
Examples#
>>> import cudf
Comparing Series with equals:
>>> s = cudf.Series([1, 2, 3]) >>> other = cudf.Series([1, 2, 3]) >>> s.equals(other) True >>> different = cudf.Series([1.5, 2, 3]) >>> s.equals(different) False
Comparing DataFrames with equals:
>>> df = cudf.DataFrame({1: [10], 2: [20]}) >>> df 1 2 0 10 20 >>> exactly_equal = cudf.DataFrame({1: [10], 2: [20]}) >>> exactly_equal 1 2 0 10 20 >>> df.equals(exactly_equal) True
For two DataFrames to compare equal, the types of column values must be equal, but the types of column labels need not:
>>> different_column_type = cudf.DataFrame({1.0: [10], 2.0: [20]}) >>> different_column_type 1.0 2.0 0 10 20 >>> df.equals(different_column_type) True
- factorize(sort: bool = False, use_na_sentinel: bool = True) tuple[cupy.ndarray, cudf.Index]#
Encode the input values as integer labels.
Parameters#
- sortbool, default True
Sort uniques and shuffle codes to maintain the relationship.
- use_na_sentinelbool, default True
If True, the sentinel -1 will be used for NA values. If False, NA values will be encoded as non-negative integers and will not drop the NA from the uniques of the values.
Returns#
- (labels, cats)(cupy.ndarray, cupy.ndarray or Index)
labels contains the encoded values
cats contains the categories in order that the N-th item corresponds to the (N-1) code.
Examples#
>>> import cudf >>> s = cudf.Series(['a', 'a', 'c']) >>> codes, uniques = s.factorize() >>> codes array([0, 0, 1], dtype=int8) >>> uniques Index(['a', 'c'], dtype='object')
- fillna(value: None | ScalarLike | cudf.Series = None, method: Literal['ffill', 'bfill', 'pad', 'backfill', None] = None, axis=None, inplace: bool = False, limit=None) Self | None#
Fill null values with
valueor specifiedmethod.Parameters#
- valuescalar, Series-like or dict
Value to use to fill nulls. If Series-like, null values are filled with values in corresponding indices. A dict can be used to provide different values to fill nulls in different columns. Cannot be used with
method.- method{‘ffill’, ‘bfill’}, default None
Method to use for filling null values in the dataframe or series. ffill propagates the last non-null values forward to the next non-null value. bfill propagates backward with the next non-null value. Cannot be used with
value.Deprecated since version 24.04: method is deprecated.
Returns#
- resultDataFrame, Series, or Index
Copy with nulls filled.
Examples#
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, None], 'b': [3, None, 5]}) >>> df a b 0 1 3 1 2 <NA> 2 <NA> 5 >>> df.fillna(4) a b 0 1 3 1 2 4 2 4 5 >>> df.fillna({'a': 3, 'b': 4}) a b 0 1 3 1 2 4 2 3 5
fillnaon a Series object:>>> ser = cudf.Series(['a', 'b', None, 'c']) >>> ser 0 a 1 b 2 <NA> 3 c dtype: object >>> ser.fillna('z') 0 a 1 b 2 z 3 c dtype: object
fillnacan also supports inplace operation:>>> ser.fillna('z', inplace=True) >>> ser 0 a 1 b 2 z 3 c dtype: object >>> df.fillna({'a': 3, 'b': 4}, inplace=True) >>> df a b 0 1 3 1 2 4 2 3 5
fillnaspecified with fillmethod>>> ser = cudf.Series([1, None, None, 2, 3, None, None]) >>> ser.fillna(method='ffill') 0 1 1 1 2 1 3 2 4 3 5 3 6 3 dtype: int64 >>> ser.fillna(method='bfill') 0 1 1 2 2 2 3 2 4 3 5 <NA> 6 <NA> dtype: int64
- find_label_range(loc: slice) slice#
Translate a label-based slice to an index-based slice
Parameters#
- loc
slice to search for.
Notes#
As with all label-based searches, the slice is right-closed.
Returns#
New slice translated into integer indices of the index (right-open).
- classmethod from_arrow(obj) Index | MultiIndex#
Create from PyArrow Array/ChunkedArray.
Parameters#
- arrayPyArrow Array/ChunkedArray
PyArrow Object which has to be converted.
Raises#
TypeError for invalid input type.
Returns#
SingleColumnFrame
Examples#
>>> import cudf >>> import pyarrow as pa >>> cudf.Index.from_arrow(pa.array(["a", "b", None])) Index(['a', 'b', <NA>], dtype='object')
- classmethod from_pandas(index: ~pandas.core.indexes.base.Index, nan_as_null=<no_default>)#
Convert from a Pandas Index.
Parameters#
- indexPandas Index object
A Pandas Index object which has to be converted to cuDF Index.
- nan_as_nullbool, Default None
If
None/True, convertsnp.nanvalues tonullvalues. IfFalse, leavesnp.nanvalues as is.
Raises#
TypeError for invalid input type.
Examples#
>>> import cudf >>> import pandas as pd >>> import numpy as np >>> data = [10, 20, 30, np.nan] >>> pdi = pd.Index(data) >>> cudf.Index.from_pandas(pdi) Index([10.0, 20.0, 30.0, <NA>], dtype='float64') >>> cudf.Index.from_pandas(pdi, nan_as_null=False) Index([10.0, 20.0, 30.0, nan], dtype='float64')
- get_indexer(target, method=None, limit=None, tolerance=None)#
Compute indexer and mask for new index given the current index.
The indexer should be then used as an input to ndarray.take to align the current data to the new index.
Parameters#
target : Index method : {None, ‘pad’/’fill’, ‘backfill’/’bfill’, ‘nearest’}, optional
default: exact matches only.
pad / ffill: find the PREVIOUS index value if no exact match.
backfill / bfill: use NEXT index value if no exact match.
nearest: use the NEAREST index value if no exact match. Tied distances are broken by preferring the larger index value.
- toleranceint or float, optional
Maximum distance from index value for inexact matches. The value of the index at the matching location must satisfy the equation
abs(index[loc] - target) <= tolerance.
Returns#
- cupy.ndarray
Integers from 0 to n - 1 indicating that the index at these positions matches the corresponding target values. Missing values in the target are marked by -1.
Examples#
>>> import cudf >>> index = cudf.Index(['c', 'a', 'b']) >>> index Index(['c', 'a', 'b'], dtype='object') >>> index.get_indexer(['a', 'b', 'x']) array([ 1, 2, -1], dtype=int32)
- get_level_values(level)#
Return an Index of values for requested level.
This is primarily useful to get an individual level of values from a MultiIndex, but is provided on Index as well for compatibility.
Parameters#
- levelint or str
It is either the integer position or the name of the level.
Returns#
- Index
Calling object, as there is only one level in the Index.
See Also#
- cudf.MultiIndex.get_level_valuesGet values for
a level of a MultiIndex.
Notes#
For Index, level should be 0, since there are no multiple levels.
Examples#
>>> import cudf >>> idx = cudf.Index(["a", "b", "c"]) >>> idx.get_level_values(0) Index(['a', 'b', 'c'], dtype='object')
- get_loc(key) int | slice | ndarray#
Get integer location, slice or boolean mask for requested label.
Parameters#
key : label
Returns#
- int or slice or boolean mask
If result is unique, return integer index
If index is monotonic, loc is returned as a slice object
Otherwise, a boolean mask is returned
Examples#
>>> import cudf >>> unique_index = cudf.Index(list('abc')) >>> unique_index.get_loc('b') 1 >>> monotonic_index = cudf.Index(list('abbc')) >>> monotonic_index.get_loc('b') slice(1, 3, None) >>> non_monotonic_index = cudf.Index(list('abcb')) >>> non_monotonic_index.get_loc('b') array([False, True, False, True]) >>> numeric_unique_index = cudf.Index([1, 2, 3]) >>> numeric_unique_index.get_loc(3) 2
MultiIndex
>>> multi_index = cudf.MultiIndex.from_tuples([('a', 'd'), ('b', 'e'), ('b', 'f')]) >>> multi_index MultiIndex([('a', 'd'), ('b', 'e'), ('b', 'f')], ) >>> multi_index.get_loc('b') slice(1, 3, None) >>> multi_index.get_loc(('b', 'e')) 1
- get_slice_bound(label, side: Literal['left', 'right']) int#
Calculate slice bound that corresponds to given label. Returns leftmost (one-past-the-rightmost if
side=='right') position of given label.Parameters#
label : object side : {‘left’, ‘right’}
Returns#
- int
Index of label.
- property has_duplicates#
- property hasnans: bool#
Return True if there are any NaNs or nulls.
Returns#
- outbool
If Series has at least one NaN or null value, return True, if not return False.
Examples#
>>> import cudf >>> import numpy as np >>> index = cudf.Index([1, 2, np.nan, 3, 4], nan_as_null=False) >>> index Index([1.0, 2.0, nan, 3.0, 4.0], dtype='float64') >>> index.hasnans True
hasnans returns True for the presence of any NA values:
>>> index = cudf.Index([1, 2, None, 3, 4]) >>> index Index([1, 2, <NA>, 3, 4], dtype='int64') >>> index.hasnans True
- intersection(other, sort=False)#
Form the intersection of two Index objects.
This returns a new Index with elements common to the index and other.
Parameters#
other : Index or array-like sort : False or None, default False
Whether to sort the resulting index.
False : do not sort the result.
None : sort the result, except when self and other are equal or when the values cannot be compared.
True : Sort the result (which may raise TypeError).
Returns#
intersection : Index
Examples#
>>> import cudf >>> import pandas as pd >>> idx1 = cudf.Index([1, 2, 3, 4]) >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Index([3, 4], dtype='int64')
MultiIndex case
>>> idx1 = cudf.MultiIndex.from_pandas( ... pd.MultiIndex.from_arrays( ... [[1, 1, 3, 4], ["Red", "Blue", "Red", "Blue"]] ... ) ... ) >>> idx2 = cudf.MultiIndex.from_pandas( ... pd.MultiIndex.from_arrays( ... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]] ... ) ... ) >>> idx1 MultiIndex([(1, 'Red'), (1, 'Blue'), (3, 'Red'), (4, 'Blue')], ) >>> idx2 MultiIndex([(1, 'Red'), (1, 'Blue'), (2, 'Red'), (2, 'Blue')], ) >>> idx1.intersection(idx2) MultiIndex([(1, 'Red'), (1, 'Blue')], ) >>> idx1.intersection(idx2, sort=False) MultiIndex([(1, 'Red'), (1, 'Blue')], )
- is_boolean()#
Check if the Index only consists of booleans.
Deprecated since version 23.04: Use cudf.api.types.is_bool_dtype instead.
Returns#
- bool
Whether or not the Index only consists of booleans.
See Also#
is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.
Examples#
>>> import cudf >>> idx = cudf.Index([True, False, True]) >>> idx.is_boolean() True >>> idx = cudf.Index(["True", "False", "True"]) >>> idx.is_boolean() False >>> idx = cudf.Index([1, 2, 3]) >>> idx.is_boolean() False
- is_categorical()#
Check if the Index holds categorical data.
Deprecated since version 23.04: Use cudf.api.types.is_categorical_dtype instead.
Returns#
- bool
True if the Index is categorical.
See Also#
CategoricalIndex : Index for categorical data. is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_interval : Check if the Index holds Interval objects.
Examples#
>>> import cudf >>> idx = cudf.Index(["Watermelon", "Orange", "Apple", ... "Watermelon"]).astype("category") >>> idx.is_categorical() True >>> idx = cudf.Index([1, 3, 5, 7]) >>> idx.is_categorical() False >>> s = cudf.Series(["Peter", "Victor", "Elisabeth", "Mar"]) >>> s 0 Peter 1 Victor 2 Elisabeth 3 Mar dtype: object >>> s.index.is_categorical() False
- is_floating()#
Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats, integers, or NaNs.
Deprecated since version 23.04: Use cudf.api.types.is_float_dtype instead.
Returns#
- bool
Whether or not the Index only consists of only consists of floats, NaNs, or a mix of floats, integers, or NaNs.
See Also#
is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.
Examples#
>>> import cudf >>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0]) >>> idx.is_floating() True >>> idx = cudf.Index([1.0, 2.0, np.nan, 4.0]) >>> idx.is_floating() True >>> idx = cudf.Index([1, 2, 3, 4, np.nan], nan_as_null=False) >>> idx.is_floating() True >>> idx = cudf.Index([1, 2, 3, 4]) >>> idx.is_floating() False
- is_integer()#
Check if the Index only consists of integers.
Deprecated since version 23.04: Use cudf.api.types.is_integer_dtype instead.
Returns#
- bool
Whether or not the Index only consists of integers.
See Also#
is_boolean : Check if the Index only consists of booleans. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.
Examples#
>>> import cudf >>> idx = cudf.Index([1, 2, 3, 4]) >>> idx.is_integer() True >>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0]) >>> idx.is_integer() False >>> idx = cudf.Index(["Apple", "Mango", "Watermelon"]) >>> idx.is_integer() False
- is_interval()#
Check if the Index holds Interval objects.
Deprecated since version 23.04: Use cudf.api.types.is_interval_dtype instead.
Returns#
- bool
Whether or not the Index holds Interval objects.
See Also#
IntervalIndex : Index for Interval objects. is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data.
Examples#
>>> import cudf >>> import pandas as pd >>> idx = cudf.from_pandas( ... pd.Index([pd.Interval(left=0, right=5), ... pd.Interval(left=5, right=10)]) ... ) >>> idx.is_interval() True >>> idx = cudf.Index([1, 3, 5, 7]) >>> idx.is_interval() False
- property is_monotonic_decreasing: bool#
Return boolean if values in the object are monotonically decreasing.
Returns#
bool
- property is_monotonic_increasing: bool#
Return boolean if values in the object are monotonically increasing.
Returns#
bool
- is_numeric()#
Check if the Index only consists of numeric data.
Deprecated since version 23.04: Use cudf.api.types.is_any_real_numeric_dtype instead.
Returns#
- bool
Whether or not the Index only consists of numeric data.
See Also#
is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_object : Check if the Index is of the object dtype. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.
Examples#
>>> import cudf >>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0]) >>> idx.is_numeric() True >>> idx = cudf.Index([1, 2, 3, 4.0]) >>> idx.is_numeric() True >>> idx = cudf.Index([1, 2, 3, 4]) >>> idx.is_numeric() True >>> idx = cudf.Index([1, 2, 3, 4.0, np.nan]) >>> idx.is_numeric() True >>> idx = cudf.Index(["Apple", "cold"]) >>> idx.is_numeric() False
- is_object()#
Check if the Index is of the object dtype.
Deprecated since version 23.04: Use cudf.api.types.is_object_dtype instead.
Returns#
- bool
Whether or not the Index is of the object dtype.
See Also#
is_boolean : Check if the Index only consists of booleans. is_integer : Check if the Index only consists of integers. is_floating : Check if the Index is a floating type. is_numeric : Check if the Index only consists of numeric data. is_categorical : Check if the Index holds categorical data. is_interval : Check if the Index holds Interval objects.
Examples#
>>> import cudf >>> idx = cudf.Index(["Apple", "Mango", "Watermelon"]) >>> idx.is_object() True >>> idx = cudf.Index(["Watermelon", "Orange", "Apple", ... "Watermelon"]).astype("category") >>> idx.is_object() False >>> idx = cudf.Index([1.0, 2.0, 3.0, 4.0]) >>> idx.is_object() False
- isin(values, level=None) ndarray#
Return a boolean array where the index values are in values.
Compute boolean array of whether each index value is found in the passed set of values. The length of the returned boolean array matches the length of the index.
Parameters#
- valuesset, list-like, Index
Sought values.
- levelstr or int, optional
Name or position of the index level to use (if the index is a MultiIndex).
Returns#
- is_containedcupy array
CuPy array of boolean values.
Examples#
>>> idx = cudf.Index([1,2,3]) >>> idx Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4]) array([ True, False, False])
- isna() ndarray#
Identify missing values.
Return a boolean same-sized object indicating if the values are
<NA>.<NA>values gets mapped toTruevalues. Everything else gets mapped toFalsevalues.<NA>values include:Values where null mask is set.
NaNin float dtype.NaTin datetime64 and timedelta64 types.
Characters such as empty strings
''orinfin case of float are not considered<NA>values.Returns#
- DataFrame/Series/Index
Mask of bool values for each element in the object that indicates whether an element is an NA value.
Examples#
Show which entries in a DataFrame are NA.
>>> import cudf >>> import numpy as np >>> import pandas as pd >>> df = cudf.DataFrame({'age': [5, 6, np.nan], ... 'born': [pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... 'name': ['Alfred', 'Batman', ''], ... 'toy': [None, 'Batmobile', 'Joker']}) >>> df age born name toy 0 5 <NA> Alfred <NA> 1 6 1939-05-27 00:00:00.000000 Batman Batmobile 2 <NA> 1940-04-25 00:00:00.000000 Joker >>> df.isna() age born name toy 0 False True False True 1 False False False False 2 True False False False
Show which entries in a Series are NA.
>>> ser = cudf.Series([5, 6, np.nan, np.inf, -np.inf]) >>> ser 0 5.0 1 6.0 2 <NA> 3 Inf 4 -Inf dtype: float64 >>> ser.isna() 0 False 1 False 2 True 3 False 4 False dtype: bool
Show which entries in an Index are NA.
>>> idx = cudf.Index([1, 2, None, np.nan, 0.32, np.inf]) >>> idx Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64') >>> idx.isna() array([False, False, True, True, False, False])
- isnull() ndarray#
Identify missing values.
Return a boolean same-sized object indicating if the values are
<NA>.<NA>values gets mapped toTruevalues. Everything else gets mapped toFalsevalues.<NA>values include:Values where null mask is set.
NaNin float dtype.NaTin datetime64 and timedelta64 types.
Characters such as empty strings
''orinfin case of float are not considered<NA>values.Returns#
- DataFrame/Series/Index
Mask of bool values for each element in the object that indicates whether an element is an NA value.
Examples#
Show which entries in a DataFrame are NA.
>>> import cudf >>> import numpy as np >>> import pandas as pd >>> df = cudf.DataFrame({'age': [5, 6, np.nan], ... 'born': [pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... 'name': ['Alfred', 'Batman', ''], ... 'toy': [None, 'Batmobile', 'Joker']}) >>> df age born name toy 0 5 <NA> Alfred <NA> 1 6 1939-05-27 00:00:00.000000 Batman Batmobile 2 <NA> 1940-04-25 00:00:00.000000 Joker >>> df.isna() age born name toy 0 False True False True 1 False False False False 2 True False False False
Show which entries in a Series are NA.
>>> ser = cudf.Series([5, 6, np.nan, np.inf, -np.inf]) >>> ser 0 5.0 1 6.0 2 <NA> 3 Inf 4 -Inf dtype: float64 >>> ser.isna() 0 False 1 False 2 True 3 False 4 False dtype: bool
Show which entries in an Index are NA.
>>> idx = cudf.Index([1, 2, None, np.nan, 0.32, np.inf]) >>> idx Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64') >>> idx.isna() array([False, False, True, True, False, False])
- join(other, how='left', level=None, return_indexers=False, sort=False)#
Compute join_index and indexers to conform data structures to the new index.
Parameters#
other : Index. how : {‘left’, ‘right’, ‘inner’, ‘outer’} return_indexers : bool, default False sort : bool, default False
Sort the join keys lexicographically in the result Index. If False, the order of the join keys depends on the join type (how keyword).
Returns: index
Examples#
>>> import cudf >>> lhs = cudf.DataFrame({ ... "a": [2, 3, 1], ... "b": [3, 4, 2], ... }).set_index(['a', 'b']).index >>> lhs MultiIndex([(2, 3), (3, 4), (1, 2)], names=['a', 'b']) >>> rhs = cudf.DataFrame({"a": [1, 4, 3]}).set_index('a').index >>> rhs Index([1, 4, 3], dtype='int64', name='a') >>> lhs.join(rhs, how='inner') MultiIndex([(3, 4), (1, 2)], names=['a', 'b'])
- max(axis=0, skipna=True, numeric_only=False, **kwargs)#
Return the maximum of the values in the DataFrame.
Parameters#
- axis: {index (0), columns(1)}
Axis for the function to be applied on.
- skipna: bool, default True
Exclude NA/null values when computing the result.
- numeric_only: bool, default False
If True, includes only float, int, boolean columns. If False, will raise error in-case there are non-numeric columns.
Returns#
Series
Examples#
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> df.max() a 4 b 10 dtype: int64
- memory_usage(deep: bool = False) int#
Return the memory usage of an object.
Parameters#
- deepbool
The deep parameter is ignored and is only included for pandas compatibility.
Returns#
The total bytes used.
- min(axis=0, skipna=True, numeric_only=False, **kwargs)#
Return the minimum of the values in the DataFrame.
Parameters#
- axis: {index (0), columns(1)}
Axis for the function to be applied on.
- skipna: bool, default True
Exclude NA/null values when computing the result.
- numeric_only: bool, default False
If True, includes only float, int, boolean columns. If False, will raise error in-case there are non-numeric columns.
Returns#
Series
Examples#
>>> import cudf >>> df = cudf.DataFrame({'a': [1, 2, 3, 4], 'b': [7, 8, 9, 10]}) >>> min_series = df.min() >>> min_series a 1 b 7 dtype: int64 >>> min_series.min() 1
- property name#
Get the name of this object.
- property names#
Returns a FrozenList containing the name of the Index.
- notna() ndarray#
Identify non-missing values.
Return a boolean same-sized object indicating if the values are not
<NA>. Non-missing values get mapped toTrue.<NA>values get mapped toFalsevalues.<NA>values include:Values where null mask is set.
NaNin float dtype.NaTin datetime64 and timedelta64 types.
Characters such as empty strings
''orinfin case of float are not considered<NA>values.Returns#
- DataFrame/Series/Index
Mask of bool values for each element in the object that indicates whether an element is not an NA value.
Examples#
Show which entries in a DataFrame are NA.
>>> import cudf >>> import numpy as np >>> import pandas as pd >>> df = cudf.DataFrame({'age': [5, 6, np.nan], ... 'born': [pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... 'name': ['Alfred', 'Batman', ''], ... 'toy': [None, 'Batmobile', 'Joker']}) >>> df age born name toy 0 5 <NA> Alfred <NA> 1 6 1939-05-27 00:00:00.000000 Batman Batmobile 2 <NA> 1940-04-25 00:00:00.000000 Joker >>> df.notna() age born name toy 0 True False True False 1 True True True True 2 False True True True
Show which entries in a Series are NA.
>>> ser = cudf.Series([5, 6, np.nan, np.inf, -np.inf]) >>> ser 0 5.0 1 6.0 2 <NA> 3 Inf 4 -Inf dtype: float64 >>> ser.notna() 0 True 1 True 2 False 3 True 4 True dtype: bool
Show which entries in an Index are NA.
>>> idx = cudf.Index([1, 2, None, np.nan, 0.32, np.inf]) >>> idx Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64') >>> idx.notna() array([ True, True, False, False, True, True])
- notnull() ndarray#
Identify non-missing values.
Return a boolean same-sized object indicating if the values are not
<NA>. Non-missing values get mapped toTrue.<NA>values get mapped toFalsevalues.<NA>values include:Values where null mask is set.
NaNin float dtype.NaTin datetime64 and timedelta64 types.
Characters such as empty strings
''orinfin case of float are not considered<NA>values.Returns#
- DataFrame/Series/Index
Mask of bool values for each element in the object that indicates whether an element is not an NA value.
Examples#
Show which entries in a DataFrame are NA.
>>> import cudf >>> import numpy as np >>> import pandas as pd >>> df = cudf.DataFrame({'age': [5, 6, np.nan], ... 'born': [pd.NaT, pd.Timestamp('1939-05-27'), ... pd.Timestamp('1940-04-25')], ... 'name': ['Alfred', 'Batman', ''], ... 'toy': [None, 'Batmobile', 'Joker']}) >>> df age born name toy 0 5 <NA> Alfred <NA> 1 6 1939-05-27 00:00:00.000000 Batman Batmobile 2 <NA> 1940-04-25 00:00:00.000000 Joker >>> df.notna() age born name toy 0 True False True False 1 True True True True 2 False True True True
Show which entries in a Series are NA.
>>> ser = cudf.Series([5, 6, np.nan, np.inf, -np.inf]) >>> ser 0 5.0 1 6.0 2 <NA> 3 Inf 4 -Inf dtype: float64 >>> ser.notna() 0 True 1 True 2 False 3 True 4 True dtype: bool
Show which entries in an Index are NA.
>>> idx = cudf.Index([1, 2, None, np.nan, 0.32, np.inf]) >>> idx Index([1.0, 2.0, <NA>, <NA>, 0.32, Inf], dtype='float64') >>> idx.notna() array([ True, True, False, False, True, True])
- nunique(dropna: bool = True) int#
Return count of unique values for the column.
Parameters#
- dropnabool, default True
Don’t include NaN in the counts.
Returns#
- int
Number of unique values in the column.
- rename(name, inplace=False)#
Alter Index name.
Defaults to returning new index.
Parameters#
- namelabel
Name(s) to set.
Returns#
Index
Examples#
>>> import cudf >>> index = cudf.Index([1, 2, 3], name='one') >>> index Index([1, 2, 3], dtype='int64', name='one') >>> index.name 'one' >>> renamed_index = index.rename('two') >>> renamed_index Index([1, 2, 3], dtype='int64', name='two') >>> renamed_index.name 'two'
- set_names(names, level=None, inplace=False)#
Set Index or MultiIndex name. Able to set new names partially and by level.
Parameters#
- nameslabel or list of label
Name(s) to set.
- levelint, label or list of int or label, optional
If the index is a MultiIndex, level(s) to set (None for all levels). Otherwise level must be None.
- inplacebool, default False
Modifies the object directly, instead of creating a new Index or MultiIndex.
Returns#
- Index
The same type as the caller or None if inplace is True.
See Also#
cudf.Index.rename : Able to set new names without level.
Examples#
>>> import cudf >>> idx = cudf.Index([1, 2, 3, 4]) >>> idx Index([1, 2, 3, 4], dtype='int64') >>> idx.set_names('quarter') Index([1, 2, 3, 4], dtype='int64', name='quarter') >>> idx = cudf.MultiIndex.from_product([['python', 'cobra'], ... [2018, 2019]]) >>> idx MultiIndex([('python', 2018), ('python', 2019), ( 'cobra', 2018), ( 'cobra', 2019)], ) >>> idx.names FrozenList([None, None]) >>> idx.set_names(['kind', 'year'], inplace=True) >>> idx.names FrozenList(['kind', 'year']) >>> idx.set_names('species', level=0, inplace=True) >>> idx.names FrozenList(['species', 'year'])
- shift(periods=1, freq=None)#
Not yet implemented
- property size: int#
Return the number of elements in the underlying data.
Returns#
size : Size of the DataFrame / Index / Series / MultiIndex
Examples#
Size of an empty dataframe is 0.
>>> import cudf >>> df = cudf.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>> df.size 0 >>> df = cudf.DataFrame(index=[1, 2, 3]) >>> df Empty DataFrame Columns: [] Index: [1, 2, 3] >>> df.size 0
DataFrame with values
>>> df = cudf.DataFrame({'a': [10, 11, 12], ... 'b': ['hello', 'rapids', 'ai']}) >>> df a b 0 10 hello 1 11 rapids 2 12 ai >>> df.size 6 >>> df.index RangeIndex(start=0, stop=3) >>> df.index.size 3
Size of an Index
>>> index = cudf.Index([]) >>> index Index([], dtype='float64') >>> index.size 0 >>> index = cudf.Index([1, 2, 3, 10]) >>> index Index([1, 2, 3, 10], dtype='int64') >>> index.size 4
Size of a MultiIndex
>>> midx = cudf.MultiIndex( ... levels=[["a", "b", "c", None], ["1", None, "5"]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex([( 'a', '1'), ( 'a', '5'), ( 'b', <NA>), ( 'c', <NA>), (<NA>, '1')], names=['x', 'y']) >>> midx.size 5
- sort_values(return_indexer=False, ascending=True, na_position='last', key=None) Self | tuple[Self, cupy.ndarray]#
Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
Parameters#
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- na_position{‘first’ or ‘last’}, default ‘last’
Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at the end.
- keyNone, optional
This parameter is NON-FUNCTIONAL.
Returns#
- sorted_indexIndex
Sorted copy of the index.
- indexercupy.ndarray, optional
The indices that the index itself was sorted by.
See Also#
cudf.Series.min : Sort values of a Series. cudf.DataFrame.sort_values : Sort values in a DataFrame.
Examples#
>>> import cudf >>> idx = cudf.Index([10, 100, 1, 1000]) >>> idx Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values() Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices idx was sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True) (Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2], dtype=int32))
Sorting values in a MultiIndex:
>>> midx = cudf.MultiIndex( ... levels=[[1, 3, 4, -10], [1, 11, 5]], ... codes=[[0, 0, 1, 2, 3], [0, 2, 1, 1, 0]], ... names=["x", "y"], ... ) >>> midx MultiIndex([( 1, 1), ( 1, 5), ( 3, 11), ( 4, 11), (-10, 1)], names=['x', 'y']) >>> midx.sort_values() MultiIndex([(-10, 1), ( 1, 1), ( 1, 5), ( 3, 11), ( 4, 11)], names=['x', 'y']) >>> midx.sort_values(ascending=False) MultiIndex([( 4, 11), ( 3, 11), ( 1, 5), ( 1, 1), (-10, 1)], names=['x', 'y'])
- property str#
Vectorized string functions for Series and Index.
This mimics pandas
df.strinterface. nulls stay null unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.
- take(indices, axis=0, allow_fill=True, fill_value=None)#
Return a new index containing the rows specified by indices
Parameters#
- indicesarray-like
Array of ints indicating which positions to take.
- axisint
The axis over which to select values, always 0.
allow_fill : Unsupported fill_value : Unsupported
Returns#
- outIndex
New object with desired subset of rows.
Examples#
>>> idx = cudf.Index(['a', 'b', 'c', 'd', 'e']) >>> idx.take([2, 0, 4, 3]) Index(['c', 'a', 'e', 'd'], dtype='object')
- to_arrow() pa.Array#
Convert to a PyArrow Array.
Returns#
PyArrow Array
Examples#
>>> import cudf >>> sr = cudf.Series(["a", "b", None]) >>> sr.to_arrow() <pyarrow.lib.StringArray object at 0x7f796b0e7600> [ "a", "b", null ] >>> ind = cudf.Index(["a", "b", None]) >>> ind.to_arrow() <pyarrow.lib.StringArray object at 0x7f796b0e7750> [ "a", "b", null ]
- to_cupy(dtype: Dtype | None = None, copy: bool = False, na_value=None) cupy.ndarray#
Convert the Frame to a CuPy array.
Parameters#
- dtypestr or
numpy.dtype, optional The dtype to pass to
numpy.asarray().- copybool, default False
Whether to ensure that the returned value is not a view on another array. Note that
copy=Falsedoes not ensure thatto_cupy()is no-copy. Rather,copy=Trueensure that a copy is made, even if not strictly necessary.- na_valueAny, default None
The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.
Returns#
cupy.ndarray
- dtypestr or
- to_dlpack()#
Converts a cuDF object into a DLPack tensor.
DLPack is an open-source memory tensor structure: dmlc/dlpack.
This function takes a cuDF object and converts it to a PyCapsule object which contains a pointer to a DLPack tensor. This function deep copies the data into the DLPack tensor from the cuDF object.
Parameters#
cudf_obj : DataFrame, Series, Index, or Column
Returns#
- pycapsule_objPyCapsule
Output DLPack tensor pointer which is encapsulated in a PyCapsule object.
- to_flat_index() Self#
Identity method.
This is implemented for compatibility with subclass implementations when chaining.
Returns#
- pd.Index
Caller.
See Also#
MultiIndex.to_flat_index : Subclass implementation.
- to_frame(index: bool = True, name: ~collections.abc.Hashable = <no_default>) DataFrame#
Create a DataFrame with a column containing this Index
Parameters#
- indexboolean, default True
Set the index of the returned DataFrame as the original Index
- nameobject, defaults to index.name
The passed name should substitute for the index name (if it has one).
Returns#
- DataFrame
DataFrame containing the original Index data.
See Also#
Index.to_series : Convert an Index to a Series. Series.to_frame : Convert Series to DataFrame.
Examples#
>>> import cudf >>> idx = cudf.Index(['Ant', 'Bear', 'Cow'], name='animal') >>> idx.to_frame() animal animal Ant Ant Bear Bear Cow Cow
By default, the original Index is reused. To enforce a new Index:
>>> idx.to_frame(index=False) animal 0 Ant 1 Bear 2 Cow
To override the name of the resulting column, specify name:
>>> idx.to_frame(index=False, name='zoo') zoo 0 Ant 1 Bear 2 Cow
- to_list()#
- to_numpy(dtype: Dtype | None = None, copy: bool = True, na_value=None) numpy.ndarray#
Convert the Frame to a NumPy array.
Parameters#
- dtypestr or
numpy.dtype, optional The dtype to pass to
numpy.asarray().- copybool, default True
Whether to ensure that the returned value is not a view on another array. This parameter must be
Truesince cuDF must copy device memory to host to provide a numpy array.- na_valueAny, default None
The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.
Returns#
numpy.ndarray
- dtypestr or
- to_series(index=None, name=None)#
Create a Series with both index and values equal to the index keys. Useful with map for returning an indexer based on an index.
Parameters#
- indexIndex, optional
Index of resulting Series. If None, defaults to original index.
- namestr, optional
Name of resulting Series. If None, defaults to name of original index.
Returns#
- Series
The dtype will be based on the type of the Index values.
- tolist()#
- transpose()#
Return the transpose, which is by definition self.
- union(other, sort=None)#
Form the union of two Index objects.
Parameters#
other : Index or array-like sort : bool or None, default None
Whether to sort the resulting Index.
None : Sort the result, except when
self and other are equal.
self or other has length 0.
False : do not sort the result.
True : Sort the result (which may raise TypeError).
Returns#
union : Index
Examples#
Union of an Index >>> import cudf >>> import pandas as pd >>> idx1 = cudf.Index([1, 2, 3, 4]) >>> idx2 = cudf.Index([3, 4, 5, 6]) >>> idx1.union(idx2) Index([1, 2, 3, 4, 5, 6], dtype=’int64’)
MultiIndex case
>>> idx1 = cudf.MultiIndex.from_pandas( ... pd.MultiIndex.from_arrays( ... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]] ... ) ... ) >>> idx1 MultiIndex([(1, 'Red'), (1, 'Blue'), (2, 'Red'), (2, 'Blue')], ) >>> idx2 = cudf.MultiIndex.from_pandas( ... pd.MultiIndex.from_arrays( ... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]] ... ) ... ) >>> idx2 MultiIndex([(3, 'Red'), (3, 'Green'), (2, 'Red'), (2, 'Green')], ) >>> idx1.union(idx2) MultiIndex([(1, 'Blue'), (1, 'Red'), (2, 'Blue'), (2, 'Green'), (2, 'Red'), (3, 'Green'), (3, 'Red')], ) >>> idx1.union(idx2, sort=False) MultiIndex([(1, 'Red'), (1, 'Blue'), (2, 'Red'), (2, 'Blue'), (3, 'Red'), (3, 'Green'), (2, 'Green')], )
- unique(level: int | None = None) Self#
Return unique values in the index.
Returns#
Index without duplicates
- property values: ndarray#
Return a CuPy representation of the DataFrame.
Only the values in the DataFrame will be returned, the axes labels will be removed.
Returns#
- cupy.ndarray
The values of the DataFrame.
- property values_host: numpy.ndarray#
Return a NumPy representation of the data.
Only the values in the DataFrame will be returned, the axes labels will be removed.
Returns#
- numpy.ndarray
A host representation of the underlying data.
- where(cond, other=None, inplace=False) Index#
Replace values where the condition is False.
Parameters#
- condbool Series/DataFrame, array-like
Where cond is True, keep the original value. Where False, replace with corresponding value from other. Callables are not supported.
- other: scalar, list of scalars, Series/DataFrame
Entries where cond is False are replaced with corresponding value from other. Callables are not supported. Default is None.
DataFrame expects only Scalar or array like with scalars or dataframe with same dimension as self.
Series expects only scalar or series like with same length
- inplacebool, default False
Whether to perform the operation in place on the data.
Returns#
Same type as caller
Examples#
>>> import cudf >>> df = cudf.DataFrame({"A":[1, 4, 5], "B":[3, 5, 8]}) >>> df.where(df % 2 == 0, [-1, -1]) A B 0 -1 -1 1 4 -1 2 -1 8
>>> ser = cudf.Series([4, 3, 2, 1, 0]) >>> ser.where(ser > 2, 10) 0 4 1 3 2 10 3 10 4 10 dtype: int64 >>> ser.where(ser > 2) 0 4 1 3 2 <NA> 3 <NA> 4 <NA> dtype: int64