Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Include empty series when creating a pandas dataframe with .concat

UPDATE: This is no longer an issue since at least pandas version 0.18.1. Concatenating empty series doesn't drop them anymore so this question is out of date.

I want to create a pandas dataframe from a list of series using .concat. The problem is that when one of the series is empty it doesn't get included in the resulting dataframe but this makes the dataframe be the wrong dimensions when I then try to rename its columns with a multi-index. UPDATE: Here's an example...

import pandas as pd

sers1 = pd.Series()
sers2 = pd.Series(['a', 'b', 'c'])
df1 = pd.concat([sers1, sers2], axis=1)

This produces the following dataframe:

>>> df1
0    a
1    b
2    c
dtype: object

But I want it to produce something like this:

>>> df2
    0  1
0 NaN  a
1 NaN  b
2 NaN  c

It does this if I put a single nan value anywhere in ser1 but it seems like this should be possible automatically even if some of my series are totally empty.

like image 280
Alex Avatar asked May 28 '15 23:05

Alex


People also ask

How can you create an empty DataFrame and series in pandas?

You can create an empty dataframe by importing pandas from the python library. Later, using the pd. DataFrame(), create an empty dataframe without rows and columns as shown in the below example.

Can pandas DataFrame be created using series?

Series is a type of list in pandas which can take integer values, string values, double values and more. But in Pandas Series we return an object in the form of list, having index starting from 0 to n, Where n is the length of values in series.


1 Answers

Passing an argument for levels will do the trick. Here's an example. First, the wrong way:

import pandas as pd
ser1 = pd.Series()
ser2 = pd.Series([1, 2, 3])
list_of_series = [ser1, ser2, ser1]
df = pd.concat(list_of_series, axis=1)

Which produces this:

>>> df
   0
0  1
1  2
2  3

But if we add some labels to the levels argument, it will include all the empty series too:

import pandas as pd
ser1 = pd.Series()
ser2 = pd.Series([1, 2, 3])
list_of_series = [ser1, ser2, ser1]
labels = range(len(list_of_series))
df = pd.concat(list_of_series, levels=labels, axis=1)

Which produces the desired dataframe:

>>> df
    0  1   2
0 NaN  1 NaN
1 NaN  2 NaN
2 NaN  3 NaN
like image 196
Alex Avatar answered Oct 03 '22 08:10

Alex