Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating value in iterrow for pandas

I am doing some geocoding work that I used selenium to screen scrape the x-y coordinate I need for address of a location, I imported an xls file to panda dataframe and want to use explicit loop to update the rows which do not have the x-y coordinate, like below:

for index, row in rche_df.iterrows():     if isinstance(row.wgs1984_latitude, float):         row = row.copy()         target = row.address_chi                 dict_temp = geocoding(target)         row.wgs1984_latitude = dict_temp['lat']         row.wgs1984_longitude = dict_temp['long'] 

I have read Why doesn't this function "take" after I iterrows over a pandas DataFrame? and am fully aware that iterrow only gives us a view rather than a copy for editing, but what if I really to update the value row by row? Is lambda feasible?

like image 776
lokheart Avatar asked Aug 25 '14 03:08

lokheart


People also ask

How do you update a DataFrame in a for loop?

As Dataframe. iterrows() returns a copy of the dataframe contents in tuple, so updating it will have no effect on actual dataframe. So, to update the contents of dataframe we need to iterate over the rows of dataframe using iterrows() and then access each row using at() to update it's contents.

How do you update data in a DataFrame in Python?

Pandas DataFrame update() Method The update() method updates a DataFrame with elements from another similar object (like another DataFrame). Note: this method does NOT return a new DataFrame. The updating is done to the original DataFrame.


1 Answers

The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe:

for index, row in rche_df.iterrows():     if isinstance(row.wgs1984_latitude, float):         row = row.copy()         target = row.address_chi                 dict_temp = geocoding(target)         rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']         rche_df.loc[index, 'wgs1984_longitude'] = dict_temp['long'] 

In my experience, this approach seems slower than using an approach like apply or map, but as always, it's up to you to decide how to make the performance/ease of coding tradeoff.

like image 61
Marius Avatar answered Oct 16 '22 06:10

Marius