hipdf.io.parquet.read_parquet_metadata

hipdf.io.parquet.read_parquet_metadata#

21 min read time

Applies to Linux

hipdf.io.parquet.read_parquet_metadata(path)#

Read a Parquet file’s metadata and schema

Parameters#

pathstring or path object

Path of file to be read

Returns#

Total number of rows Number of row groups List of column names

Examples#

>>> import cudf
>>> num_rows, num_row_groups, names = cudf.io.read_parquet_metadata(filename)
>>> df = [cudf.read_parquet(fname, row_group=i) for i in range(row_groups)]
>>> df = cudf.concat(df)
>>> df
  num1                datetime text
0  123 2018-11-13T12:00:00.000 5451
1  456 2018-11-14T12:35:01.000 5784
2  789 2018-11-15T18:02:59.000 6117

See Also#

cudf.read_parquet