The CSV file I have is 70 Gb in size. I want to load the DF and count the number of rows, in lazy mode. What's the best way to do so?
As far as I can tell, there is no function like shape in lazy mode according to the documentation. I found this answer which provide a solution not based on Polars, but I wonder if it is possible to do this in Polars as well.
To get the row count using polars.
First load it into a lazyframe...
lzdf=pl.scan_csv("mybigfile.csv")
Then count the rows and return the result
lzdf.select(pl.len()).collect()
If you just want a python scalar rather than a table as a result then just subset it
lzdf.select(pl.len()).collect().item()
To get the row count using polars.
First load it into a lazyframe...
lzdf=pl.scan_csv("mybigfile.csv")
Then count the rows and return the result
lzdf.select(pl.count()).collect()
If you just want a python scalar rather than a table as a result then just subset it
lzdf.select(pl.count()).collect().item()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With