Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas join on 2 columns

I'm having some trouble getting these two dfs to join the way I would like. The first df has a hierarchical index that I created using df1 = df3.groupby(["STATE_PROV_CODE", "COUNTY"]).size() to get the count for each county.

STATE_PROV_CODE  COUNTY            COUNT
AL               Autauga County      1
                 Baldwin County      1
                 Barbour County      1
                 Bibb County         1
                 Blount County       1

    STATE_PROV_CODE COUNTY  ANSI Cl FIPS
0   AL  Autauga County  H1  01001
1   AL  Baldwin County  H1  01003
2   AL  Barbour County  H1  01005
3   AL  Bibb County     H1  01007
4   AL  Blount County   H1  01009

In SQL I would like to do the following:

SELECT STATE_PROV_CODE, COUNTY, FIPS, COUNT,
FROM df1, df2
ON STATE_PROV_CODE, COUNTY
WHERE df1.STATE_PROV_CODE = df2.STATE_PROV_CODE
AND df1.COUNTY = df2.COUNTY

I would like the result to be as follows:

STATE_PROV_CODE  COUNTY            COUNT    FIPS
AL               Autauga County      1     01001
                 Baldwin County      1     01003
                 Barbour County      1     01005
                 Bibb County         1     01007
                 Blount County       1     01009
like image 429
jlmitch Avatar asked Jul 29 '14 15:07

jlmitch


1 Answers

I believe the way you have set up the groupby results and the second dataframe, this merge call will work:

df = pd.merge(df1, df2, left_index=True, right_on=['STATE_PROV_CODE', 'COUNTY'])

it will unravel the MultiIndex; however, if you want it back, all you have to do is

df = df.set_index(['STATE_PROV_CODE', 'COUNTY'])
like image 110
Ajean Avatar answered Nov 03 '22 23:11

Ajean