Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Match Vlookup columns based on header values

I have the following dataframe df:

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing
ABC            5      6     10     2015
BCD            6      7     3      2016        
DEF            10     4     5      2017
GHI            8      7     10     2016

I would like to look up the value of the customer in the year they joined the mailing list and save it in a new column.

Output would be:

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing | Purchases_1st_year
ABC            5      6     10     2015                       5
BCD            6      7     3      2016                       7       
DEF            10     4     5      2017                       5
GHI            8      9     10     2016                       9

I have found some solutions for match vlookup in python, but none that would use the headers of other columns.

like image 884
jeangelj Avatar asked Jul 19 '17 17:07

jeangelj


3 Answers

I would do it like this, assuming that the column headers and the Year_joined_mailing are the same data type and that all Year_joined_mailing values are valid columns. If the datatypes are not the same, you could convert it by adding str() or int() where appropriate.

df['Purchases_1st_year'] = [df[df['Year_joined_mailing'][i]][i] for i in df.index]

What we're doing here is iterating over the indexes in the dataframe to get the 'Year_joined_mailing' field for that index, then using that to get the column we want, and again selecting that index from the column, pushing it all to a list and assigning this to our new column 'Year_joined_mailing'

If your 'Year_joined_mailing' column will not always be a valid column name, then try:

from numpy import nan
new_col = []
for i in df.index:
    try:
        new_col.append(df[df['Year_joined_mailing'][i]][i])
    except IndexError:
        new_col.append(nan) #or whatever null value you want here)
df['Purchases_1st_year'] = new_col

This longer code snippet accomplishes the same thing, but will not break if 'Year_joined_mailing' is not in df.columns

like image 189
Jeremy Barnes Avatar answered Nov 11 '22 15:11

Jeremy Barnes


you can apply "apply" to each row

df.apply(lambda x: x[x['Year_joined_mailing']],axis=1)
like image 36
galaxyan Avatar answered Nov 11 '22 15:11

galaxyan


Deprecation Notice: lookup was deprecated in v1.2.0

Use pd.DataFrame.lookup
Keep in mind that I'm assuming Customer_ID is the index.

df.lookup(df.index, df.Year_joined_mailing)

array([5, 7, 5, 7])

df.assign(
    Purchases_1st_year=df.lookup(df.index, df.Year_joined_mailing)
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7

However, you have to be careful with comparing possible strings in the column names and integers in the first year column...

Nuclear option to ensure type comparisons are respected.

df.assign(
    Purchases_1st_year=df.rename(columns=str).lookup(
        df.index, df.Year_joined_mailing.astype(str)
    )
)

             2015  2016  2017  Year_joined_mailing  Purchases_1st_year
Customer_ID                                                           
ABC             5     6    10                 2015                   5
BCD             6     7     3                 2016                   7
DEF            10     4     5                 2017                   5
GHI             8     7    10                 2016                   7
like image 44
piRSquared Avatar answered Nov 11 '22 15:11

piRSquared