I'm trying to convert a list of Python dicts into a Pandas DataFrame
.
Since every dict has different keys, it takes up too much memory. Since most of the values are NaN, a SparseDataFrame
should be helpful in this case.
import pandas
df = pandas.DataFrame(keyword_data).to_sparse(fill_value=.0)
This works, but takes up loads of memory because a DataFrame is created in the meanwhile, and sometimes raises MemoryError
.
Is it possible to create a SparseDataFrame with this data without that step? The Pandas documentation doesn't help much in this case... Doing this:
pandas.SparseDataFrame(keyword_data, default_fill_value=.0)
Raises:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The data looks something like:
[{'a': 0.672366,
'b': 0.667276,
# ...
},
{'c': 0.507752,
'd': 0.532593,
'e': 0.507793
# ...
},
# ...
]
The keys are always strings, with different keys per dict, the values are floats.
Is there a way to create a SparseDataFrame
directly from this data, without going through a regular DataFrame
?
As of pandas v1.0.0, SparseDataFrame
and SparseSeries
were removed.
There is no need for them anymore. Quoting the documentation:
There’s no performance or memory penalty to using a Series or DataFrame with sparse values, rather than a SparseSeries or SparseDataFrame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With