I'm trying to infer a classification according to the size of a person in a dataframe like this one:
Size
1 80000
2 8000000
3 8000000000
...
I want it to look like this:
Size Classification
1 80000 <1m
2 8000000 1-10m
3 8000000000 >1bi
...
I understand that the ideal process would be to apply a lambda function like this:
df['Classification']=df['Size'].apply(lambda x: "<1m" if x<1000000 else "1-10m" if 1000000<x<10000000 else ...)
I checked a few posts regarding multiple ifs in a lambda function, here is an example link, but that synthax is not working for me for some reason in a multiple ifs statement, but it was working in a single if condition.
So I tried this "very elegant" solution:
df['Classification']=df['Size'].apply(lambda x: "<1m" if x<1000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "1-10m" if 1000000 < x < 10000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "10-50m" if 10000000 < x < 50000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "50-100m" if 50000000 < x < 100000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "100-500m" if 100000000 < x < 500000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "500m-1bi" if 500000000 < x < 1000000000 else pass)
df['Classification']=df['Size'].apply(lambda x: ">1bi" if 1000000000 < x else pass)
Works out that "pass" seems not to apply to lambda functions as well:
df['Classification']=df['Size'].apply(lambda x: "<1m" if x<1000000 else pass)
SyntaxError: invalid syntax
Any suggestions on the correct synthax for a multiple if statement inside a lambda function in an apply method in Pandas? Either multi-line or single line solutions work for me.
Adding Multiple If statements: Now, To add multiple if statements to the lambda function we cannot add it directly in one line like the previous example. If we add more than one if statement or if we add an elif statement it will throw an error.
Lambda functions does not allow multiple statements, however, we can create two lambda functions and then call the other lambda function as a parameter to the first function.
Use lambda function syntax to use an if statement in a lambda function. Use the syntax lambda input: true_return if condition else false_return to return true_return if condition is True and false_return otherwise. condition can be an expression involving input .
You can use pd.cut
function:
bins = [0, 1000000, 10000000, 50000000, ...]
labels = ['<1m','1-10m','10-50m', ...]
df['Classification'] = pd.cut(df['Size'], bins=bins, labels=labels)
Here is a small example that you can build upon:
Basically, lambda x: x..
is the short one-liner of a function. What apply really asks for is a function which you can easily recreate yourself.
import pandas as pd
# Recreate the dataframe
data = dict(Size=[80000,8000000,800000000])
df = pd.DataFrame(data)
# Create a function that returns desired values
# You only need to check upper bound as the next elif-statement will catch the value
def func(x):
if x < 1e6:
return "<1m"
elif x < 1e7:
return "1-10m"
elif x < 5e7:
return "10-50m"
else:
return 'N/A'
# Add elif statements....
df['Classification'] = df['Size'].apply(func)
print(df)
Returns:
Size Classification
0 80000 <1m
1 8000000 1-10m
2 800000000 N/A
The apply lambda function actually does the job here, I just wonder what the problem was.... as your syntax looks ok and it works....
df1= [80000, 8000000, 8000000000, 800000000000]
df=pd.DataFrame(df1)
df.columns=['size']
df['Classification']=df['size'].apply(lambda x: '<1m' if x<1000000 else '1-10m' if 1000000<x<10000000 else '1bi')
df
Output:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With