Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting columns for an empty pandas dataframe

Tags:

python

pandas

This is something that I'm confused about...

import pandas as pd

# this works fine
df1 = pd.DataFrame(columns=['A','B'])

# but let's say I have this
df2 = pd.DataFrame([])

# this doesn't work!
df2.columns = ['A','B']
# ValueError: Length mismatch: Expected axis has 0 elements, new values have 2 elements

Why doesn't this work? What can I do instead? Is the only way to do something like this?

if len(df2.index) == 0:
    df2 = pd.DataFrame(columns=['A','B'])
else:
    df2.columns = ['A','B']

There must be a more elegant way.

Thank you for your help!

Update 4/19/2015

Someone asked why do this at all:

df2 = pd.DataFrame([])

The reason is that actually I'm doing something like this:

df2 = pd.DataFrame(data)

... where data could be empty list of lists, but in most cases it is not. So yes, I could do:

if len(data) > 0:
    df2 = pd.DataFrame(data, columns=['A','B'])
else:
    df2 = pd.DataFrame(columns=['A','B'])

... but this doesn't seem very DRY (and certainly not concise).

Let me know if you have any questions. Thanks!

like image 939
JPN Avatar asked Apr 19 '15 13:04

JPN


2 Answers

This looks like a bug in pandas. All of these work:

pd.DataFrame(columns=['A', 'B'])
pd.DataFrame({}, columns=['A', 'B'])
pd.DataFrame(None, columns=['A', 'B'])

but not this:

pd.DataFrame([], columns=['A', 'B'])

Until it's fixed, I suggest something like this:

if len(data) == 0: data = None
df2 = pd.DataFrame(data, columns=['A','B'])

or:

df2 = pd.DataFrame(data if len(data) > 0 else None, columns=['A', 'B'])
like image 77
Evan Wright Avatar answered Oct 10 '22 01:10

Evan Wright


Update: as of Pandas version 0.16.1, passing data = [] works:

In [85]: df = pd.DataFrame([], columns=['a', 'b', 'c'])

In [86]: df
Out[86]: 
Empty DataFrame
Columns: [a, b, c]
Index: []

so the best solution is to update your version of Pandas.


If data is an empty list of lists, then

data = [[]]

But then len(data) would equal 1, so len(data) > 0 is not the right condition to check to see if data is an empty list of lists.

There are a number of values for data which could make

pd.DataFrame(data, columns=['A','B'])

raise an Exception. An AssertionError or ValueError is raised if data equals [] (no data), [[]] (no columns), [[0]] (one column) or [[0,1,2]] (too many columns). So instead of trying to check for all of these I think it is safer and easier to use try..except here:

columns = ['A', 'B']
try:
    df2 = pd.DataFrame(data, columns=columns)
except (AssertionError, ValueError):
    df2 = pd.DataFrame(columns=columns)

It would be nice if there is a DRY-er way to write this, but given that it's the caller's responsibility to check for this, I don't see a better way.

like image 21
unutbu Avatar answered Oct 10 '22 02:10

unutbu