Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

polars dataFrame.sort() - large memory requirements

When I call the DataFrame.sort() from the Python-polars library, the RAM jumps to more then double of its original values. This is of course a problem when dealing with large datasets (we are talking tens to hundereds of GB). Is there any workaround that (even in cost of performance) cost less RAM?

Thank you for any hints.

like image 453
Galedon Avatar asked Jun 13 '26 18:06

Galedon


1 Answers

According to the Polars documentation, you can use the Streaming API not to run out your RAM. As shown by @Dean MacGregor, you can associate the Streaming with the polars.LazyFrame.sort

DataFrame.lazy().sort().collect(streaming=True)

LazyFrame is simply a DataFrame that utilizes this lazy evaluation. It is a technique to delay the evaluation of an expression until it’s actually needed.

like image 153
Gabriel Santello Avatar answered Jun 16 '26 06:06

Gabriel Santello



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!