Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: remove all rows in pandas dataframe that contain a string

I've got a pandas dataframe called data and I want to remove all rows that contain a string in any column. For example, below we see the 'gdp' column has a string at index 3, and 'cap' at index 1.

data =

    y  gdp  cap
0   1    2    5
1   2    3    ab
2   8    7    2
3   3    bc   7
4   6    7    7
5   4    8    3
...

I've been trying to use something like this script because I will not know what is contained in exp_list ahead of time. Unfortunately, "data.var_name" throws out this error: 'DataFrame' object has no attribute 'var_name'. I also don't know what the strings will be ahead of time so is there anyway to generalize that as well?

exp_list = ['gdp', 'cap']

for var_name in exp_list:
    data = data[data.var_name != 'ab']
like image 676
natsuki_2002 Avatar asked Nov 08 '13 13:11

natsuki_2002


People also ask

How do I remove unwanted rows from a DataFrame in Python?

To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.

How do you delete certain rows in Python?

Python pandas drop rows by index To remove the rows by index all we have to do is pass the index number or list of index numbers in case of multiple drops. to drop rows by index simply use this code: df. drop(index) .

How do you get rid of rows in pandas?

To delete a row from a DataFrame, use the drop() method and set the index label as the parameter.


1 Answers

You can apply a function that tests row-wise your DataFrame for the presence of strings, e.g., say that df is your DataFrame

 rows_with_strings  = df.apply(
       lambda row : 
          any([ isinstance(e, basestring) for e in row ])
       , axis=1) 

This will produce a mask for your DataFrame indicating which rows contain at least one string. You can hence select the rows without strings through the opposite mask

 df_with_no_strings = df[~rows_with_strings]

.

Example:

 a = [[1,2],['a',2], [3,4], [7,'d']]
 df = pd.DataFrame(a,columns = ['a','b'])


 df 
   a  b
0  1  2
1  a  2
2  3  4
3  7  d

select  = df.apply(lambda r : any([isinstance(e, basestring) for e in r  ]),axis=1) 

df[~select]                                                                                                                                

    a  b
 0  1  2
 2  3  4
like image 143
Acorbe Avatar answered Sep 27 '22 15:09

Acorbe