I have a DataFrame that includes a column where every cell is made up of a list of dicts, and each list of dicts is of varying length (including 0).
An example:
df = pd.DataFrame({'ID' : [13423,294847,322844,429847], 'RANKS': [[{u'name': u'A', u'price': u'$1.00', u'rank': u'1'},
{u'name': u'B', u'price': u'$4.00', u'rank': u'2'},
{u'name': u'C', u'price': u'$3.99', u'rank': u'3'},
{u'name': u'D', u'price': u'$2.00', u'rank': u'4'},
{u'name': u'E', u'price': u'$2.50', u'rank': u'5'}],
[{u'name': u'AA', u'price': u'$1.99', u'rank': u'1'},
{u'name': u'BB', u'price': u'$6.99', u'rank': u'2'}],
[{u'name': u'Z', u'price': u'$0.99', u'rank': u'1'},
{u'name': u'Y', u'price': u'$10.00', u'rank': u'2'},
{u'name': u'X', u'price': u'$1.99', u'rank': u'3'}],[]], 'count' : [5,2,3,0]})
Note that 'count' is the number of dicts in 'RANKS.' The goal I had in mind was to creating a series of additional dataframes/tables (one for each 'rank') and link these to the main table in a HDFStore. Something like:
Rank_2
ID Price Name
13423 $4.00 B
294847 $6.99 BB
322844 $10.99 Y
429847 NaN NaN
Rank_3
ID Price Name
13423 $3.99 C
294847 NaN NaN
322844 $1.99 X
429847 NaN NaN
This way I could easy query on ID and rank if needed, but the main table wouldn't get cluttered with the unwinding of this hierarchical data.
The problem, however, is I cannot figure out how to create the DataFrames from this column. I've tried a number of things, the first (to be nested in a for-loop if it worked, but of course it did not):
Rank_1 = pd.DataFrame(df.loc[df['count'] > 0]['RANKS'].map(lambda x: pd.DataFrame(x[0])))
And, the second, since price is the most important piece to me:
for i in range(0,5):
df['rank_%s' % str(i+1)] = df[df['count'] > i]['RANKS'].map(lambda x: x[i]['price'].strip('$'))
Then convert to float. This works but is a pretty big compromise. Is there an efficient way (that won't get hung up on NaNs) to accomplish my goal of separate DataFrames for each rank?
My gut reaction is that you probably should not break apart your DataFrame into many smaller DataFrames. Handling lots of small DataFrames requires a Python loop which is usually a step along the path to slowness. Instead I think you might be better served with one DataFrame, which flattens the list of dicts so that each inner dict is given its own row in the DataFrame. The keys of the inner dict will become new columns. I suspect this single flat DataFrame format would be able to do anything the multiple DataFrame alternative could do but faster, and it would make saving to HDFStore simple.
Suppose you have a DataFrame with a list of dicts in the RANKS
column:
import numpy as np
import pandas as pd
df = pd.DataFrame({'ID' : [13423,294847,322844,429847], 'RANKS': [[{u'name': u'A', u'price': u'$1.00', u'rank': u'1'},
{u'name': u'B', u'price': u'$4.00', u'rank': u'2'},
{u'name': u'C', u'price': u'$3.99', u'rank': u'3'},
{u'name': u'D', u'price': u'$2.00', u'rank': u'4'},
{u'name': u'E', u'price': u'$2.50', u'rank': u'5'}],
[{u'name': u'AA', u'price': u'$1.99', u'rank': u'1'},
{u'name': u'BB', u'price': u'$6.99', u'rank': u'2'}],
[{u'name': u'Z', u'price': u'$0.99', u'rank': u'1'},
{u'name': u'Y', u'price': u'$10.00', u'rank': u'2'},
{u'name': u'X', u'price': u'$1.99', u'rank': u'3'}],[]], 'count' : [5,2,3,0]})
then you can build a flat DataFrame with one dict per row like this:
result = []
for idx, row in df.iterrows():
for dct in row['RANKS']:
dct['ID'] = row['ID']
dct['count'] = row['count']
result.append(dct)
del df
result = pd.DataFrame(result)
result['rank'] = result['rank'].astype(np.int32)
result['price'] = result['price'].str.replace('$', '')
result['price'] = result['price'].astype('float')
print(result)
which yields
ID count name price rank
0 13423 5 A 1.00 1
1 13423 5 B 4.00 2
2 13423 5 C 3.99 3
3 13423 5 D 2.00 4
4 13423 5 E 2.50 5
5 294847 2 AA 1.99 1
6 294847 2 BB 6.99 2
7 322844 3 Z 0.99 1
8 322844 3 Y 10.00 2
9 322844 3 X 1.99 3
Note that building result
directly from the original data source (thus avoiding df
altogether) would be a cleaner, less memory-demanding solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With