Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: combine two columns in a DataFrame

Tags:

I have a pandas DataFrame that has multiple columns in it:

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51 Data columns: foo                   11516  non-null values bar                   228381  non-null values Time_UTC              239897  non-null values dtstamp               239897  non-null values dtypes: float64(4), object(1) 

where foo and bar are columns which contain the same data yet are named differently. Is there are a way to move the rows which make up foo into bar, ideally whilst maintaining the name of bar?

In the end the DataFrame should appear as:

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51 Data columns: bar                   239897  non-null values Time_UTC              239897  non-null values dtstamp               239897  non-null values dtypes: float64(4), object(1) 

That is the NaN values that made up bar were replaced by the values from foo.

like image 609
BFTM Avatar asked Jun 10 '12 21:06

BFTM


1 Answers

you can use directly fillna and assigning the result to the column 'bar'

df['bar'].fillna(df['foo'], inplace=True) del df['foo'] 

general example:

import pandas as pd #creating the table with two missing values df1 = pd.DataFrame({'a':[1,2],'b':[3,4]}, index = [1,2]) df2 = pd.DataFrame({'b':[5,6]}, index = [3,4]) dftot = pd.concat((df1, df2)) print dftot #creating the dataframe to fill the missing values filldf = pd.DataFrame({'a':[7,7,7,7]})  #filling  print dftot.fillna(filldf) 
like image 160
user1883737 Avatar answered Sep 25 '22 23:09

user1883737