Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a pandas column based on a lookup value from another dataframe

I have a pandas dataframe that has some data values by hour (which is also the index of this lookup dataframe). The dataframe looks like this:

In [1] print (df_lookup) 
Out[1] 0     1.109248
       1     1.102435
       2     1.085014
       3     1.073487
       4     1.079385
       5     1.088759
       6     1.044708
       7     0.902482
       8     0.852348
       9     0.995912
       10    1.031643
       11    1.023458
       12    1.006961
       ...
       23    0.889541

I want to multiply the values from this lookup dataframe to create a column of another dataframe, which has datetime as index. The dataframe looks like this:

In [2] print (df)
Out[2] 
Date_Label           ID  data-1  data-2    data-3
2015-08-09 00:00:00  1   2513.0    2502     NaN  
2015-08-09 00:00:00  1   2113.0    2102     NaN  
2015-08-09 01:00:00  2   2006.0    1988     NaN  
2015-08-09 02:00:00  3   2016.0    2003     NaN 
...
2018-07-19 23:00:00  33  3216.0    333      NaN  

I want to calculate the data-3 column from data-2 column, where the weight given to 'data-2' column depends on corresponding value in df_lookup. I get the desired values by looping over the index as follows, but that is too slow:

for idx in df.index:
   df.loc[idx,'data-3'] = df.loc[idx, 'data-2']*df_lookup.at[idx.hour]

Is there a faster way someone could suggest?

like image 204
Vakratund Avatar asked Jan 04 '19 22:01

Vakratund


People also ask

How do I get a column value of a pandas DataFrame based on another column?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

How do you create a new column in DataFrame based on another column?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How do you create a DataFrame with columns from another DataFrame?

You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.


Video Answer


2 Answers

Using .loc

df['data-2']*df_lookup.loc[df.index.hour].values
Out[275]: 
Date_Label
2015-08-09 00:00:00    2775.338496
2015-08-09 00:00:00    2331.639296
2015-08-09 01:00:00    2191.640780
2015-08-09 02:00:00    2173.283042
Name: data-2, dtype: float64
#df['data-3']=df['data-2']*df_lookup.loc[df.index.hour].values
like image 163
BENY Avatar answered Sep 23 '22 21:09

BENY


I'd probably try doing a join.

# Fix column name
df_lookup.columns = ['multiplier']

# Get hour index
df['hour'] = df.index.hour

# Join
df = df.join(df_lookup, how='left', on=['hour'])
df['data-3'] = df['data-2'] * df['multiplier']
df = df.drop(['multiplier', 'hour'], axis=1)
like image 31
CJR Avatar answered Sep 22 '22 21:09

CJR