Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas SparseDataFrame from list of dicts

I'm trying to convert a list of Python dicts into a Pandas DataFrame. Since every dict has different keys, it takes up too much memory. Since most of the values are NaN, a SparseDataFrame should be helpful in this case.

import pandas

df = pandas.DataFrame(keyword_data).to_sparse(fill_value=.0)

This works, but takes up loads of memory because a DataFrame is created in the meanwhile, and sometimes raises MemoryError.

Is it possible to create a SparseDataFrame with this data without that step? The Pandas documentation doesn't help much in this case... Doing this:

pandas.SparseDataFrame(keyword_data, default_fill_value=.0)

Raises:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''


The data looks something like:

[{'a': 0.672366,
  'b': 0.667276,
  # ...
 },
 {'c': 0.507752,
  'd': 0.532593,
  'e': 0.507793
  # ...
 },
 # ...
]

The keys are always strings, with different keys per dict, the values are floats.

Is there a way to create a SparseDataFrame directly from this data, without going through a regular DataFrame?

like image 806
yprez Avatar asked Oct 29 '14 09:10

yprez


1 Answers

As of pandas v1.0.0, SparseDataFrame and SparseSeries were removed.

There is no need for them anymore. Quoting the documentation:

There’s no performance or memory penalty to using a Series or DataFrame with sparse values, rather than a SparseSeries or SparseDataFrame.

like image 113
Qusai Alothman Avatar answered Oct 19 '22 23:10

Qusai Alothman