Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset pandas dataframe using function applied to a column/series

I have a pandas dataframe df that I would like to subset based on the result of running Name through a certain function is_valid()

import pandas as pd

data = [['foo', 10], ['baar', 15], ['baz', 14]]
df = pd.DataFrame(data, columns = ['name', 'age'])
df

    name    age
0   foo     10
1   baar    15
2   baz     14

The function checks if the length of the input string is 3 and returns either True or False:

def is_valid(x):
    assert isinstance(x, str)
    return True if len(x) == 3 else False

My goal is to subset df where this function returns True, which would return an output of

    name    age
0   foo     10
2   baz     14

The following syntax returns an error; what is the correct syntax for applying a function to values of a column (series) and subsetting a dataframe if the output meets a condition (in this case = True) ?

df[is_valid(df['name'])]
like image 923
iskandarblue Avatar asked Mar 11 '26 17:03

iskandarblue


2 Answers

Try:

df[df['name'].str.len()==3]

Or use your code with apply:

df[df['name'].apply(is_valid)]
like image 124
Quang Hoang Avatar answered Mar 13 '26 05:03

Quang Hoang


Use Series.str.len with Series.eq for mask:

df = df[df['name'].str.len().eq(3)]

Or use Series.apply for pass custom function:

df = df[df['name'].apply(is_valid)]
print (df)
  name  age
0  foo   10
2  baz   14
like image 22
jezrael Avatar answered Mar 13 '26 05:03

jezrael