Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add Multiple Columns to Pandas Dataframe from Function

Tags:

python

pandas

I have a pandas data frame mydf that has two columns,and both columns are datetime datatypes: mydate and mytime. I want to add three more columns: hour, weekday, and weeknum.

def getH(t): #gives the hour     return t.hour def getW(d): #gives the week number     return d.isocalendar()[1]  def getD(d): #gives the weekday     return d.weekday() # 0 for Monday, 6 for Sunday  mydf["hour"] = mydf.apply(lambda row:getH(row["mytime"]), axis=1) mydf["weekday"] = mydf.apply(lambda row:getD(row["mydate"]), axis=1) mydf["weeknum"] = mydf.apply(lambda row:getW(row["mydate"]), axis=1) 

The snippet works, but it's not computationally efficient as it loops through the data frame at least three times. I would just like to know if there's a faster and/or more optimal way to do this. For example, using zip or merge? If, for example, I just create one function that returns three elements, how should I implement this? To illustrate, the function would be:

def getHWd(d,t):     return t.hour, d.isocalendar()[1], d.weekday() 
like image 491
EFL Avatar asked May 04 '15 09:05

EFL


People also ask

How do I add multiple columns to a pandas DataFrame?

Using DataFrame. insert() method, we can add new columns at specific position of the column name sequence. Although insert takes single column name, value as input, but we can use it repeatedly to add multiple columns to the DataFrame.

How do you return multiple columns from pandas using the apply function?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.

How do I apply a function to all columns in pandas?

Using pandas. DataFrame. apply() method you can execute a function to a single column, all and list of multiple columns (two or more).


2 Answers

Here's on approach to do it using one apply

Say, df is like

In [64]: df Out[64]:        mydate     mytime 0  2011-01-01 2011-11-14 1  2011-01-02 2011-11-15 2  2011-01-03 2011-11-16 3  2011-01-04 2011-11-17 4  2011-01-05 2011-11-18 5  2011-01-06 2011-11-19 6  2011-01-07 2011-11-20 7  2011-01-08 2011-11-21 8  2011-01-09 2011-11-22 9  2011-01-10 2011-11-23 10 2011-01-11 2011-11-24 11 2011-01-12 2011-11-25 

We'll take the lambda function out to separate line for readability and define it like

In [65]: lambdafunc = lambda x: pd.Series([x['mytime'].hour,                                            x['mydate'].isocalendar()[1],                                            x['mydate'].weekday()]) 

And, apply and store the result to df[['hour', 'weekday', 'weeknum']]

In [66]: df[['hour', 'weekday', 'weeknum']] = df.apply(lambdafunc, axis=1) 

And, the output is like

In [67]: df Out[67]:        mydate     mytime  hour  weekday  weeknum 0  2011-01-01 2011-11-14     0       52        5 1  2011-01-02 2011-11-15     0       52        6 2  2011-01-03 2011-11-16     0        1        0 3  2011-01-04 2011-11-17     0        1        1 4  2011-01-05 2011-11-18     0        1        2 5  2011-01-06 2011-11-19     0        1        3 6  2011-01-07 2011-11-20     0        1        4 7  2011-01-08 2011-11-21     0        1        5 8  2011-01-09 2011-11-22     0        1        6 9  2011-01-10 2011-11-23     0        2        0 10 2011-01-11 2011-11-24     0        2        1 11 2011-01-12 2011-11-25     0        2        2 
like image 165
Zero Avatar answered Sep 18 '22 12:09

Zero


To complement John Galt's answer:

Depending on the task that is performed by lambdafunc, you may experience some speedup by storing the result of apply in a new DataFrame and then joining with the original:

lambdafunc = lambda x: pd.Series([x['mytime'].hour,                                   x['mydate'].isocalendar()[1],                                   x['mydate'].weekday()])  newcols = df.apply(lambdafunc, axis=1) newcols.columns = ['hour', 'weekday', 'weeknum'] newdf = df.join(newcols)  

Even if you do not see a speed improvement, I would recommend using the join. You will be able to avoid the (always annoying) SettingWithCopyWarning that may pop up when assigning directly on the columns:

SettingWithCopyWarning:  A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead 
like image 30
Pedro M Duarte Avatar answered Sep 20 '22 12:09

Pedro M Duarte