Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter dataframe rows based on length of column values

Tags:

pandas

I have a pandas dataframe as follows:

df = pd.DataFrame([ [1,2], [np.NaN,1], ['test string1', 5]], columns=['A','B'] )

df
              A  B
0             1  2
1           NaN  1
2  test string1  5

I am using pandas 0.20. What is the most efficient way to remove any rows where 'any' of its column values has length > 10?

len('test string1') 12

So for the above e.g., I am expecting an output as follows:

df
              A  B
0             1  2
1           NaN  1
like image 747
D.prd Avatar asked Jul 13 '17 19:07

D.prd


People also ask

How do you filter a DataFrame based on column values in Python?

Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.

Can you use Len on a DataFrame?

Get Number of Rows in DataFrameYou can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.


2 Answers

If based on column A

In [865]: df[~(df.A.str.len() > 10)]
Out[865]:
     A  B
0    1  2
1  NaN  1

If based on all columns

In [866]: df[~df.applymap(lambda x: len(str(x)) > 10).any(axis=1)]
Out[866]:
     A  B
0    1  2
1  NaN  1
like image 123
Zero Avatar answered Sep 22 '22 09:09

Zero


I had to cast to a string for Diego's answer to work:

df = df[df['A'].apply(lambda x: len(str(x)) <= 10)]
like image 22
Elizabeth Avatar answered Sep 25 '22 09:09

Elizabeth