Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows where column value type is string Pandas

I have a pandas dataframe. One of my columns should only be floats. When I try to convert that column to floats, I'm alerted that there are strings in there. I'd like to delete all rows where values in this column are strings...

like image 428
porteclefs Avatar asked Nov 06 '14 03:11

porteclefs


People also ask

How do I drop a specific value in a column in pandas?

We can use the column_name function along with the operator to drop the specific value.


3 Answers

Use convert_objects with param convert_numeric=True this will coerce any non numeric values to NaN:

In [24]:

df = pd.DataFrame({'a': [0.1,0.5,'jasdh', 9.0]})
df
Out[24]:
       a
0    0.1
1    0.5
2  jasdh
3      9
In [27]:

df.convert_objects(convert_numeric=True)
Out[27]:
     a
0  0.1
1  0.5
2  NaN
3  9.0
In [29]:

You can then drop them:

df.convert_objects(convert_numeric=True).dropna()
Out[29]:
     a
0  0.1
1  0.5
3  9.0

UPDATE

Since version 0.17.0 this method is now deprecated and you need to use to_numeric unfortunately this operates on a Series rather than a whole df so the equivalent code is now:

df.apply(lambda x: pd.to_numeric(x, errors='coerce')).dropna()
like image 73
EdChum Avatar answered Oct 25 '22 12:10

EdChum


One of my columns should only be floats. I'd like to delete all rows where values in this column are strings

You can convert your series to numeric via pd.to_numeric and then use pd.Series.notnull. Conversion to float is required as a separate step to avoid your series reverting to object dtype.

# Data from @EdChum

df = pd.DataFrame({'a': [0.1, 0.5, 'jasdh', 9.0]})

res = df[pd.to_numeric(df['a'], errors='coerce').notnull()]
res['a'] = res['a'].astype(float)

print(res)

     a
0  0.1
1  0.5
3  9.0
like image 22
jpp Avatar answered Oct 25 '22 13:10

jpp


Assume your data frame is df and you wanted to ensure that all data in one of the column of your data frame is numeric in specific pandas dtype, e.g float:

df[df.columns[n]] = df[df.columns[n]].apply(pd.to_numeric, errors='coerce').fillna(0).astype(float).dropna()
like image 24
geomars Avatar answered Oct 25 '22 13:10

geomars