I have a pandas dataframe df that I would like to subset based on the result of running Name through a certain function is_valid()
import pandas as pd
data = [['foo', 10], ['baar', 15], ['baz', 14]]
df = pd.DataFrame(data, columns = ['name', 'age'])
df
name age
0 foo 10
1 baar 15
2 baz 14
The function checks if the length of the input string is 3 and returns either True or False:
def is_valid(x):
assert isinstance(x, str)
return True if len(x) == 3 else False
My goal is to subset df where this function returns True, which would return an output of
name age
0 foo 10
2 baz 14
The following syntax returns an error; what is the correct syntax for applying a function to values of a column (series) and subsetting a dataframe if the output meets a condition (in this case = True) ?
df[is_valid(df['name'])]
Try:
df[df['name'].str.len()==3]
Or use your code with apply:
df[df['name'].apply(is_valid)]
Use Series.str.len with Series.eq for mask:
df = df[df['name'].str.len().eq(3)]
Or use Series.apply for pass custom function:
df = df[df['name'].apply(is_valid)]
print (df)
name age
0 foo 10
2 baz 14
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With