hipdf.io.parquet.read_parquet_metadata#
21 min read time
Applies to Linux
- hipdf.io.parquet.read_parquet_metadata(filepath_or_buffer) tuple[int, int, list[Hashable], int, list[dict[str, int]]]#
Read a Parquet file’s metadata and schema
Parameters#
- pathstring or path object
Path of file to be read
Returns#
Total number of rows Number of row groups List of column names Number of columns List of metadata of row groups
Examples#
>>> import cudf >>> num_rows, num_row_groups, names, num_columns, row_group_metadata = cudf.io.read_parquet_metadata(filename) >>> df = [cudf.read_parquet(fname, row_group=i) for i in range(row_groups)] >>> df = cudf.concat(df) >>> df num1 datetime text 0 123 2018-11-13T12:00:00.000 5451 1 456 2018-11-14T12:35:01.000 5784 2 789 2018-11-15T18:02:59.000 6117
See Also#
cudf.read_parquet