Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas merge and update with conditions and without renaming the column

Tags:

pandas

Pandas 1.0.5

I have a transaction file that I would like to enhance with latitudes and longitudes.

If the transaction file has a zipcode, then I would like to use that zipcode to lookup its latitude and longitude and add that to the file.

If the transaction file has a city/state, and no zipcode, then I would like to use that city/state to lookup its latitude and longitude and update that in the file. Only if there was no zipcode.

The problem with the code is it is adding a "_x" to the column name. The second problem is the city lookup is overwriting the zipcode lookup.

import pandas as pd
import numpy as np

#The transaction file
data = [
        ['MCDONALDS RESTAURANT STORE 100', '94521', '', ''],
        ['MCDONALDS RESTAURANT STORE 200', '94521', 'CLAYTON', 'CA'],  #zipcode is present so do not lookup with city
        ['BURGER KING RESTAURANT STORE 100', '', 'CONCORD', 'CA'],
        ['BURGER KING RESTAURANT STORE 200', '', 'CONCORD', 'CA'],
        ['TACO BELL RESTAURANT STORE 100', '', '', ''],
        ]
t = pd.DataFrame(data, columns = ['merchant', 'zipcode', 'city', 'state'])

#Step 1. Use zipcodes to lookup latitudes
data = [
        ['94521', '37.9780', '-121.0311'],
        ['94522', '40.1234', '-200.1234'],
        ]
z = pd.DataFrame(data, columns = ['zipcode', 'latitude', 'longitude'])

t = pd.merge(t, z[['zipcode', 'latitude', 'longitude']], how='left', on='zipcode') #works perfectly

#Step 2. Use city/states to lookup latitudes, if there was no zipcode
data = [
        ['CA', 'CONCORD', '37.9780', '-121.0311'],
        ['CA', 'CLAYTON', '40.1234', '-200.1234'],
        ]
c = pd.DataFrame(data, columns = ['state', 'city', 'latitude', 'longitude'])

t = pd.merge(t, c[['state', 'city', 'latitude', 'longitude']], how='left', on=['state', 'city']) #this line is the problem
like image 677
davidjhp Avatar asked Oct 15 '25 01:10

davidjhp


1 Answers

Not very elegant, but you can do the second merge on the remaining (lon / lat is NA) rows only and then concat both parts:

m = t.latitude.notna()
t = pd.concat([t.loc[m],
               pd.merge(t.loc[~m, ['merchant', 'zipcode', 'city', 'state']], c[['state', 'city', 'latitude', 'longitude']], how='left', on=['state', 'city'])])

Result:

                           merchant zipcode     city state latitude  longitude
0    MCDONALDS RESTAURANT STORE 100   94521                  37.978  -121.0311
1    MCDONALDS RESTAURANT STORE 200   94521  CLAYTON    CA   37.978  -121.0311
0  BURGER KING RESTAURANT STORE 100          CONCORD    CA   37.978  -121.0311
1  BURGER KING RESTAURANT STORE 200          CONCORD    CA   37.978  -121.0311
2    TACO BELL RESTAURANT STORE 100                             NaN        NaN
like image 188
Stef Avatar answered Oct 20 '25 16:10

Stef