Add new rows to a pandas dataframe

Tags:

I have two dataframes df1 and df2 that that were computed from the same source, but with different methods, thus most of the values are same, with some differences. Now, I want to update df1 based on values in df2.

For example:

df1 = pd.DataFrame({'name':['john','deb','john','deb'], 'col1':[490,500,425,678], 'col2':[456,625,578,789],'col3':['TN','OK','OK','NY']})
 name col1 col2 col3
 john  490  456  TN
 deb   500  625  OK
 john  425  578  OK
 deb   678  789  NY

df2 = pd.DataFrame({'name':['deb','john','deb','john','deb'], 'col1':[400,490,500,425,678], 'col2':[225,456,625,578,789],'col3':['TN','TN','OK','OK','NY']})
 name col1 col2 col3
  deb  400  225  TN
 john  490  456  TN
  deb  500  625  OK
 john  425  578  OK
 deb   678  789  NY

So, in this case .append should append only the first row from df2 to df1. So, only if there is a new row in df2 that is not present in df1 (based on name and col3) that column will be added/updated, else it wont be.

This almost seems like something that concat should do.

202

asked Mar 25 '14 23:03

msakya

1 Answers

There are two ways of acheiving your result.

Concat both dataframes, then drop duplicates
Using an outer join/merge, then drop duplicates

I will show you both.

Concat then Drop

This should be more CPU friendly

df3 = pd.concat([df1,df2])
df3.drop_duplicates(subset=['name', 'col3'], inplace=True, keep='last')

This method is possibly more memory intensive than an outer join because at one point you are holding df1, df2 and the result of the concatination of both [df1, df2] (df3) in memory.

Outer join then Drop

This should be more memory friendly.

df3 = df1.merge(df2, on=list(df1), how='outer')
df3.drop_duplicates(subset=['name', 'col3'], inplace=True, keep='last')

Doing an outer join will make sure you get all entries from both dataframes, but df3 will be smaller than in the case where we use concat.

Version 0.15 and older note:

The keyword keep='last' used to be take_last=True

172

answered Oct 23 '22 20:10

firelynx

Related questions
                            
                                Python 2.6.1 : expected path separator ([)
                            
                                matplotlib and libpng issues with ipython notebook
                            
                                Separate mixture of gaussians in Python
                            
                                python regex error: unbalanced parenthesis
                            
                                Command history in interpreters in emacs
                            
                                Why does simple echo in subprocess not working
                            
                                Numpy, python: automatically expand dimensions of arrays when broadcasting
                            
                                customize BeautifulSoup's prettify by tag
                            
                                Access Hive Data Using Python
                            
                                Can the Python interpreter welcome message be suppressed?
                            
                                How to not load the comments while parsing XML in lxml
                            
                                error: invalid command 'bdist_egg'
                            
                                Using Matplotlib and iPython, How to reset x and y axis limits to Autoscale?
                            
                                Sort results non-lexicographically?
                            
                                How to use argmin with groupby in pandas
                            
                                Dynamically growing a python array when assigning to it
                            
                                Wait on Arduino auto-reset using pySerial
                            
                                Read from file after write, before closing
                            
                                Django and mysql problems on Mavericks
                            
                                Convert multi-channel PyAudio into NumPy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add new rows to a pandas dataframe

Tags:

python

pandas

python-2.7

msakya

People also ask

1 Answers

Version 0.15 and older note:

firelynx

Recent Activity

Donate For Us