Python Polars: How to get the row count of a LazyFrame?

Question

The CSV file I have is 70 Gb in size. I want to load the DF and count the number of rows, in lazy mode. What's the best way to do so?

As far as I can tell, there is no function like shape in lazy mode according to the documentation. I found this answer which provide a solution not based on Polars, but I wonder if it is possible to do this in Polars as well.

Dean MacGregor · Accepted Answer

For polars 0.20.5+

To get the row count using polars.

First load it into a lazyframe...

lzdf=pl.scan_csv("mybigfile.csv")

Then count the rows and return the result

lzdf.select(pl.len()).collect()

If you just want a python scalar rather than a table as a result then just subset it

lzdf.select(pl.len()).collect().item()

For older versions

To get the row count using polars.

First load it into a lazyframe...

lzdf=pl.scan_csv("mybigfile.csv")

Then count the rows and return the result

lzdf.select(pl.count()).collect()

If you just want a python scalar rather than a table as a result then just subset it

lzdf.select(pl.count()).collect().item()

Python Polars: How to get the row count of a LazyFrame?

Tags:

python

dataframe

python-polars

roei shlezinger

1 Answers

For polars 0.20.5+

For older versions

Dean MacGregor

Recent Activity

Donate For Us

Python Polars: How to get the row count of a LazyFrame?

Tags:

python

dataframe

python-polars

roei shlezinger

1 Answers

For polars 0.20.5+

For older versions

Dean MacGregor

Related questions

Recent Activity

Donate For Us