Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a dataframe from a dict where values are variable-length lists

I have a dict where the values are is a list, for example;

my_dict = {1: [964725688, 6928857],
           ...

           22: [1667906, 35207807, 685530997, 35207807],
           ...
           }

In this example, the max items in a list is 4, but it could be greater than that.

I would like to convert it to a dataframe like:

1  964725688
1  6928857
...
22 1667906
22 35207807
22 685530997
22 35207807
like image 548
spitfiredd Avatar asked May 11 '17 18:05

spitfiredd


People also ask

Can we create DataFrame from dictionary of lists?

It is the most commonly used pandas object. Creating pandas data-frame from lists using dictionary can be achieved in multiple ways. Let's discuss different ways to create a DataFrame one by one. With this method in Pandas, we can transform a dictionary of lists into a dataframe.

How do you create a DataFrame with an array of different lengths?

Use pandas.DataFrame , from a dict of uneven arrays , and then concat the arrays together in a list-comprehension. This is a way to create a DataFrame of arrays , that are not equal in length.

When using a dictionary to create a DataFrame keys of the dictionary becomes?

Create DataFrame from dict using constructor When you convert a dict to DataFrame by default, all the keys of the dict object becomes columns, and the range of numbers 0, 1, 2,…,n is assigned as a row index.

What does Len DF do?

You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.


2 Answers

my_dict ={1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}

df = pd.DataFrame( [ [k,ele] for k,v in my_dict.iteritems() for ele in v ])

print df

   0   1        
0   1  964725688
1   1    6928857
2  22    1667906
3  22   35207807
4  22  685530997
5  22   35207807
like image 171
galaxyan Avatar answered Nov 10 '22 07:11

galaxyan


First Idea
pandas

s = pd.Series(my_dict)
pd.Series(
    np.concatenate(s.values),
    s.index.repeat(s.str.len())
)

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Faster!
numpy

values = list(my_dict.values())
lens = [len(value) for value in values]
keys = list(my_dict.keys())
pd.Series(np.concatenate(values), np.repeat(keys, lens))

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Interesting
pd.concat

pd.concat({k: pd.Series(v) for k, v in my_dict.items()}).reset_index(1, drop=True)

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64
like image 38
piRSquared Avatar answered Nov 10 '22 05:11

piRSquared