Creating dataframe from a dictionary where entries have different lengths

People also ask

Can DataFrame columns have different length?

We can add the different size of list values to DataFrame.

How do you turn a dictionary into a data frame?

You can convert a dictionary to Pandas Dataframe using df = pd. DataFrame. from_dict(my_dict) statement.

Can DataFrame be created from dictionary?

We can create a DataFrame from dictionary using DataFrame. from_dict() function too i.e. It accepts a dictionary and orientation too. By default orientation is columns it means keys in dictionary will be used as columns while creating DataFrame.

What method creates a Pandas DataFrame from a dictionary?

pandas. DataFrame. from_dict() can be used to create a pandas DataFrame from Dict (Dictionary) object. This method takes parameters data , orient , dtype , columns and returns a DataFrame.

In Python 3.x:

import pandas as pd
import numpy as np

d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
    
pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))

Out[7]: 
    A  B
0   1  1
1   2  2
2 NaN  3
3 NaN  4

In Python 2.x:

replace d.items() with d.iteritems().

Here's a simple way to do that:

In[20]: my_dict = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
In[21]: df = pd.DataFrame.from_dict(my_dict, orient='index')
In[22]: df
Out[22]: 
   0  1   2   3
A  1  2 NaN NaN
B  1  2   3   4
In[23]: df.transpose()
Out[23]: 
    A  B
0   1  1
1   2  2
2 NaN  3
3 NaN  4

A way of tidying up your syntax, but still do essentially the same thing as these other answers, is below:

>>> mydict = {'one': [1,2,3], 2: [4,5,6,7], 3: 8}

>>> dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in mydict.items() })

>>> dict_df

   one  2    3
0  1.0  4  8.0
1  2.0  5  NaN
2  3.0  6  NaN
3  NaN  7  NaN

A similar syntax exists for lists, too:

>>> mylist = [ [1,2,3], [4,5], 6 ]

>>> list_df = pd.DataFrame([ pd.Series(value) for value in mylist ])

>>> list_df

     0    1    2
0  1.0  2.0  3.0
1  4.0  5.0  NaN
2  6.0  NaN  NaN

Another syntax for lists is:

>>> mylist = [ [1,2,3], [4,5], 6 ]

>>> list_df = pd.DataFrame({ i:pd.Series(value) for i, value in enumerate(mylist) })

>>> list_df

   0    1    2
0  1  4.0  6.0
1  2  5.0  NaN
2  3  NaN  NaN

You may additionally have to transpose the result and/or change the column data types (float, integer, etc).

While this does not directly answer the OP's question. I found this to be an excellent solution for my case when I had unequal arrays and I'd like to share:

from pandas documentation

In [31]: d = {'one' : Series([1., 2., 3.], index=['a', 'b', 'c']),
   ....:      'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
   ....: 

In [32]: df = DataFrame(d)

In [33]: df
Out[33]: 
   one  two
a    1    1
b    2    2
c    3    3
d  NaN    4

Use `pandas.DataFrame` and `pandas.concat`

The following code will create a list of DataFrames with pandas.DataFrame, from a dict of uneven arrays, and then concat the arrays together in a list-comprehension.
- This is a way to create a DataFrame of arrays, that are not equal in length.
- For equal length arrays, use df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})

import pandas as pd
import numpy as np


# create the uneven arrays
mu, sigma = 200, 25
np.random.seed(365)
x1 = mu + sigma * np.random.randn(10, 1)
x2 = mu + sigma * np.random.randn(15, 1)
x3 = mu + sigma * np.random.randn(20, 1)

data = {'x1': x1, 'x2': x2, 'x3': x3}

# create the dataframe
df = pd.concat([pd.DataFrame(v, columns=[k]) for k, v in data.items()], axis=1)

Use `pandas.DataFrame` and `itertools.zip_longest`

For iterables of uneven length, zip_longest fills missing values with the fillvalue.
The zip generator needs to be unpacked, because the DataFrame constructor won't unpack it.

from itertools import zip_longest

# zip all the values together
zl = list(zip_longest(*data.values()))

# create dataframe
df = pd.DataFrame(zl, columns=data.keys())

plot

df.plot(marker='o', figsize=[10, 5])

enter image description here

dataframe

           x1         x2         x3
0   232.06900  235.92577  173.19476
1   176.94349  209.26802  186.09590
2   194.18474  168.36006  194.36712
3   196.55705  238.79899  218.33316
4   249.25695  167.91326  191.62559
5   215.25377  214.85430  230.95119
6   232.68784  240.30358  196.72593
7   212.43409  201.15896  187.96484
8   188.97014  187.59007  164.78436
9   196.82937  252.67682  196.47132
10        NaN  223.32571  208.43823
11        NaN  209.50658  209.83761
12        NaN  215.27461  249.06087
13        NaN  210.52486  158.65781
14        NaN  193.53504  199.10456
15        NaN        NaN  186.19700
16        NaN        NaN  223.02479
17        NaN        NaN  185.68525
18        NaN        NaN  213.41414
19        NaN        NaN  271.75376

Related questions
                            
                                Visibility of global variables in imported modules
                            
                                Disable individual Python unit tests temporarily
                            
                                filename and line number of Python script
                            
                                How to find children of nodes using BeautifulSoup
                            
                                Forced naming of parameters in Python
                            
                                How to list all installed packages and their versions in Python?
                            
                                Cleanest and most Pythonic way to get tomorrow's date?
                            
                                How to change values in a tuple?
                            
                                More elegant way of declaring multiple variables at the same time
                            
                                Convert Year/Month/Day to Day of Year in Python
                            
                                partial string formatting
                            
                                How to convert an XML string to a dictionary?
                            
                                Python AttributeError: 'module' object has no attribute 'Serial' [duplicate]
                            
                                Image library for Python 3
                            
                                Times-two faster than bit-shift, for Python 3.x integers?
                            
                                What is the difference between 'content' and 'text'
                            
                                How do I release memory used by a pandas dataframe?
                            
                                How to convert an array of strings to an array of floats in numpy?
                            
                                Allowing specific values for an Argparse argument [duplicate]
                            
                                TypeError: Cannot create a consistent method resolution order (MRO) [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating dataframe from a dictionary where entries have different lengths

Tags:

python

pandas

People also ask

Use `pandas.DataFrame` and `pandas.concat`

Use `pandas.DataFrame` and `itertools.zip_longest`

plot

dataframe

Recent Activity

Donate For Us

Creating dataframe from a dictionary where entries have different lengths

Tags:

python

pandas

People also ask

Use pandas.DataFrame and pandas.concat

Use pandas.DataFrame and itertools.zip_longest

plot

dataframe

Related questions

Recent Activity

Donate For Us

Use `pandas.DataFrame` and `pandas.concat`

Use `pandas.DataFrame` and `itertools.zip_longest`