I have an API that returns a single row of data as a Python dictionary. Most of the keys have a single value, but some of the keys have values that are lists (or even lists-of-lists or lists-of-dictionaries).
When I throw the dictionary into pd.DataFrame to try to convert it to a pandas DataFrame, it throws a "Arrays must be the same length" error. This is because it cannot process the keys which have multiple values (i.e. the keys which have values of lists).
How do I get pandas to treat the lists as 'single values'?
As a hypothetical example:
data = { 'building': 'White House', 'DC?': True,
'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }
I want to turn it into a DataFrame like this:
ix building DC? occupants
0 'White House' True ['Barack', 'Michelle', 'Sasha', 'Malia']
It is the most commonly used pandas object. Creating pandas data-frame from lists using dictionary can be achieved in multiple ways. Let's discuss different ways to create a DataFrame one by one. With this method in Pandas, we can transform a dictionary of lists into a dataframe.
Use pd. DataFrame. from_dict() to transform a list of dictionaries to pandas DatFrame. This function is used to construct DataFrame from dict of array-like or dicts.
When we create dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the dataframe. If there is no matching values and columns in the dictionary, then NaN value will be inserted in the resulted dataframe. For example, Python3.
On Initialising a DataFrame object with this kind of dictionary, each item (Key / Value pair) in dictionary will be converted to one column i.e. key will become Column Name and list in the value field will be the column data i.e.
This works if you pass a list (of rows):
In [11]: pd.DataFrame(data)
Out[11]:
DC? building occupants
0 True White House Barack
1 True White House Michelle
2 True White House Sasha
3 True White House Malia
In [12]: pd.DataFrame([data])
Out[12]:
DC? building occupants
0 True White House [Barack, Michelle, Sasha, Malia]
This turns out to be very trivial in the end
data = { 'building': 'White House', 'DC?': True, 'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }
df = pandas.DataFrame([data])
print df
Which results in:
DC? building occupants
0 True White House [Barack, Michelle, Sasha, Malia]
Solution to make dataframe from dictionary of lists where keys become a sorted index and column names are provided. Good for creating dataframes from scraped html tables.
d = { 'B':[10,11], 'A':[20,21] }
df = pd.DataFrame(d.values(),columns=['C1','C2'],index=d.keys()).sort_index()
df
C1 C2
A 20 21
B 10 11
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With