Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - pandas - Append Series into Blank DataFrame

Say I have two pandas Series in python:

import pandas as pd
h = pd.Series(['g',4,2,1,1])
g = pd.Series([1,6,5,4,"abc"])

I can create a DataFrame with just h and then append g to it:

df = pd.DataFrame([h])
df1 = df.append(g, ignore_index=True)

I get:

>>> df1
   0  1  2  3    4
0  g  4  2  1    1
1  1  6  5  4  abc

But now suppose that I have an empty DataFrame and I try to append h to it:

df2 = pd.DataFrame([])
df3 = df2.append(h, ignore_index=True)

This does not work. I think the problem is in the second-to-last line of code. I need to somehow define the blank DataFrame to have the proper number of columns.

By the way, the reason I am trying to do this is that I am scraping text from the internet using requests+BeautifulSoup and I am processing it and trying to write it to a DataFrame one row at a time.

like image 651
bill999 Avatar asked May 31 '14 21:05

bill999


People also ask

How do you append data to an empty DataFrame in Python?

Append Data to an Empty Pandas Dataframe loc , we can also use the . append() method to add rows. The . append() method works by, well, appending a dataframe to another dataframe.

Can you append a series to a DataFrame?

Pandas DataFrame. append() will append rows (add rows) of other DataFrame, Series, Dictionary or list of these to another DataFrame.

How do I append rows to an empty data frame?

Append Rows to Empty DataFramepandas. DataFrame. append() function is used to add the rows of other DataFrame to the end of the given DataFrame and return a new DataFrame object. Yields below output.

How can you create an empty DataFrame and series in Pandas?

You can create an empty dataframe by importing pandas from the python library. Later, using the pd. DataFrame(), create an empty dataframe without rows and columns as shown in the below example.


1 Answers

So if you don't pass an empty list to the DataFrame constructor then it works:

In [16]:

df = pd.DataFrame()
h = pd.Series(['g',4,2,1,1])
df = df.append(h,ignore_index=True)
df
Out[16]:
   0  1  2  3  4
0  g  4  2  1  1

[1 rows x 5 columns]

The difference between the two constructor approaches appears to be that the index dtypes are set differently, with an empty list it is an Int64 with nothing it is an object:

In [21]:

df = pd.DataFrame()
print(df.index.dtype)
df = pd.DataFrame([])
print(df.index.dtype)
object
int64

Unclear to me why the above should affect the behaviour (I'm guessing here).

UPDATE

After revisiting this I can confirm that this looks to me to be a bug in pandas version 0.12.0 as your original code works fine:

In [13]:

import pandas as pd
df = pd.DataFrame([])
h = pd.Series(['g',4,2,1,1])
df.append(h,ignore_index=True)

Out[13]:
   0  1  2  3  4
0  g  4  2  1  1

[1 rows x 5 columns]

I am running pandas 0.13.1 and numpy 1.8.1 64-bit using python 3.3.5.0 but I think the problem is pandas but I would upgrade both pandas and numpy to be safe, I don't think this is a 32 versus 64-bit python issue.

like image 170
EdChum Avatar answered Sep 22 '22 03:09

EdChum