hipdf.core.groupby.groupby.GroupBy.apply#
22 min read time
- GroupBy.apply(function, *args, engine='auto')#
Apply a python transformation function over the grouped chunk.
Parameters#
- functioncallable
The python transformation function that will be applied on the grouped chunk.
- argstuple
Optional positional arguments to pass to the function.
- engine: ‘auto’, ‘cudf’, or ‘jit’, default ‘auto’
Selects the GroupBy.apply implementation. Use jit to select the numba JIT pipeline. Only certain operations are allowed within the function when using this option: min, max, sum, mean, var, std, idxmax, and idxmin and any arithmetic formula involving them are allowed. Binary operations are not yet supported, so syntax like df[‘x’] * 2 is not yet allowed. For more information, see the cuDF guide to user defined functions. Use cudf to select the iterative groupby apply algorithm which aims to provide maximum flexibility at the expense of performance. The default value auto will attempt to use the numba JIT pipeline where possible and will fall back to the iterative algorithm if necessary.
Examples#
from cudf import DataFrame df = DataFrame() df['key'] = [0, 0, 1, 1, 2, 2, 2] df['val'] = [0, 1, 2, 3, 4, 5, 6] groups = df.groupby(['key']) # Define a function to apply to each row in a group def mult(df): df['out'] = df['key'] * df['val'] return df result = groups.apply(mult) print(result)
Output:
key val out 0 0 0 0 1 0 1 0 2 1 2 2 3 1 3 3 4 2 4 8 5 2 5 10 6 2 6 12
engine='jit'
may be used to accelerate certain functions, initially those that contain reductions and arithmetic operations between results of those reductions:>>> import cudf >>> df = cudf.DataFrame({'a':[1,1,2,2,3,3], 'b':[1,2,3,4,5,6]}) >>> df.groupby('a').apply( ... lambda group: group['b'].max() - group['b'].min(), ... engine='jit' ... ) a 1 1 2 1 3 1 dtype: int64