the log of a delta table stores metadata about the transactions and about statistics (data type, min, max, nr. columns etc). However, I can only see the data types when looking into the json file of this log. Does anyone know how to obtain the min, max and nr. columns of this delta table without computing anything (since the delta table should have this information when reading the file)?
With the release of data skipping in Delta Lake 1.2.0, column-level statistics like min/max are now available. Statistics are saved in the Delta Lake transaction log every time an add action is performed corresponding to adding a new parquet file. Using the delta-rs Python package (v0.8.0 or newer) we can retrieve and read these add action records from the transaction log to view the file-level statistics. First, install the package with pip or conda:
# using pip:
$ pip install deltalake
# using conda:
$ conda install -c conda-forge delta-spark
We'll create a dummy Delta Table using pandas and delta-rs. Then we'll read the add action records using the get_add_actions method:
from deltalake import DeltaTable, write_deltalake
import pandas as pd
df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
write_deltalake("tmp", df, partition_by=["x"])
dt = DeltaTable("tmp")
dt.get_add_actions().to_pandas()
You'll get the following output (exported as tsv for inclusion):
    path    size_bytes  modification_time   data_change partition_values    num_records null_count  min max
0   x=1/0-84bac732-65a4-4217-8782-14933373ad4b-0.parquet    1867    2023-03-28 23:28:25.634 True    {'x': 1}    1   {'y': 0}    {'y': 4}    {'y': 4}
1   x=2/0-84bac732-65a4-4217-8782-14933373ad4b-0.parquet    1867    2023-03-28 23:28:25.634 True    {'x': 2}    1   {'y': 0}    {'y': 5}    {'y': 5}
2   x=3/0-84bac732-65a4-4217-8782-14933373ad4b-0.parquet    1867    2023-03-28 23:28:25.634 True    {'x': 3}    1   {'y': 0}    {'y': 6}    {'y': 6}
note: each statistic will have its own column in the DataFrame, e.g. num_records, null_count,  min, and max.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With