I want to apply a custom function and create a derived column called population2050 that is based on two columns already present in my data frame.
import pandas as pd import sqlite3 conn = sqlite3.connect('factbook.db') query = "select * from facts where area_land =0;" facts = pd.read_sql_query(query,conn) print(list(facts.columns.values)) def final_pop(initial_pop,growth_rate): final = initial_pop*math.e**(growth_rate*35) return(final) facts['pop2050'] = facts['population','population_growth'].apply(final_pop,axis=1)
When I run the above code, I get an error. Am I not using the 'apply' function correctly?
By using apply() you call a function to every row of pandas DataFrame. Here the add() function will be applied to every row of pandas DataFrame. In order to iterate row by row in apply() function use axis=1 .
Python's Pandas Library provides an member function in Dataframe class to apply a function along the axis of the Dataframe i.e. along each row or column i.e. Important Arguments are: func : Function to be applied to each column or row. This function accepts a series and returns a series.
The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).
You were almost there:
facts['pop2050'] = facts.apply(lambda row: final_pop(row['population'],row['population_growth']),axis=1)
Using lambda allows you to keep the specific (interesting) parameters listed in your function, rather than bundling them in a 'row'.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With