Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas lookup from one of multiple columns, based on value

I have the following DataFrame:

Date    best    a    b    c    d
1990    a       5    4    7    2
1991    c       10   1    2    0
1992    d       2    1    4    12
1993    a       5    8    11   6

I would like to make a dataframe as follows:

Date    best    value   
1990    a       5
1991    c       2
1992    d       12
1993    a       5

So I am looking to find a value based on another row value by using column names. For instance, the value for 1990 in the second df should lookup "a" from the first df and the second row should lookup "c" (=2) from the first df.

Any ideas?

like image 205
AtotheSiv Avatar asked Sep 19 '14 13:09

AtotheSiv


3 Answers

There is a built in lookup function that can handle this type of situation (looks up by row/column). I don't know how optimized it is, but may be faster than the apply solution.

In [9]: df['value'] = df.lookup(df.index, df['best'])

In [10]: df
Out[10]: 
   Date best   a  b   c   d  value
0  1990    a   5  4   7   2      5
1  1991    c  10  1   2   0      2
2  1992    d   2  1   4  12     12
3  1993    a   5  8  11   6      5
like image 165
chrisb Avatar answered Oct 03 '22 00:10

chrisb


You create a lookup function and call apply on your dataframe row-wise, this isn't very efficient for large dfs though

In [245]:

def lookup(x):
    return x[x.best]
df['value'] = df.apply(lambda row: lookup(row), axis=1)
df
Out[245]:
   Date best   a  b   c   d  value
0  1990    a   5  4   7   2      5
1  1991    c  10  1   2   0      2
2  1992    d   2  1   4  12     12
3  1993    a   5  8  11   6      5
like image 29
EdChum Avatar answered Oct 03 '22 01:10

EdChum


You can do this using np.where like below. I think it will be more efficient

import numpy as np
import pandas as pd

df = pd.DataFrame([['1990', 'a', 5, 4, 7, 2], ['1991', 'c', 10, 1, 2, 0], ['1992', 'd', 2, 1, 4, 12], ['1993', 'a', 5, 8, 11, 6]], columns=('Date', 'best', 'a', 'b', 'c', 'd'))
arr = df.best.values

cols = df.columns[2:]
for col in cols:
    arr2 = df[col].values
    arr = np.where(arr==col, arr2, arr)

df.drop(columns=cols, inplace=True)
df["values"] = arr
df

Result

Date    best    values
0   1990    a   5
1   1991    c   2
2   1992    d   12
3   1993    a   5
like image 29
Dev Khadka Avatar answered Oct 03 '22 00:10

Dev Khadka