Here is my Code:
dfnew=pd.DataFrame({ 'year': [2015,2016],
'month': [10, 12],
'day': [25,31]})
print(dfnew)
def calc(yy,n):
if yy==2016:
return yy*2*n
else:
return yy
dfnew['nv']=map(calc, dfnew['year'],2)
print(dfnew['nv'])
How I can get this code running without error? I want the function to be applied only on the 'Year' column of the dataframe for all rows and store output on a new column named 'nv' of the same dataframe.
Pandas DataFrame apply() is a library function that allows the users to pass a function and apply it to every value of the Series. To apply a function to every row in a Pandas DataFrame, use the Pandas df. apply() function.
You use a Series to Series pandas UDF to vectorize scalar operations. You can use them with APIs such as select and withColumn . The Python function should take a pandas Series as an input and return a pandas Series of the same length, and you should specify these in the Python type hints.
Need apply
for custom function:
dfnew['nv']= dfnew['year'].apply(lambda x: calc(x, 2))
print (dfnew)
day month year nv
0 25 10 2015 2015
1 31 12 2016 8064
Better is use mask
for change values by condition:
dfnew['nv']= dfnew['year'].mask(dfnew['year'] == 2016, dfnew['year'] * 2 * 2)
print (dfnew)
day month year nv
0 25 10 2015 2015
1 31 12 2016 8064
Detail:
print (dfnew['year'] == 2016)
0 False
1 True
Name: year, dtype: bool
Many Thanks for your prompt reply. Your answer to my question has been very helpful.
In addition to this, I also needed to pass multiple column names to the function and this is how I have done it.
def yearCalc(year,month,n):
if year == 2016:
print("year:{} month:{}".format(year, month))
return year * month * n
else:
return year
df['nv']= df[['year' ,'month']].apply(lambda x: yearCalc(x['year'],x['month'],2),axis=1)
Many thanks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With