I want to read_parquet
but read backwards from where you start (assuming a sorted index). I don't want to read the entire parquet into memory because that defeats the whole point of using it. Is there a nice way to do this?
Assuming that the dataframe is indexed, the inversion of the index can be done as a two step process: invert the order of partitions and invert the index within each partition:
from dask.datasets import timeseries
ddf = timeseries()
ddf_inverted = (
ddf
.partitions[::-1]
.map_partitions(lambda df: df.sort_index(ascending=False))
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With