Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignoring non-numerical string values in pandas dataframe

Tags:

python

pandas

I have a DataFrame in which a column might have three kinds of values, integers (12331), integers as strings ('345') or some other string ('text').

Is there a way to drop all rows with the last kind of string from the dataframe, and convert the first kind of string into integers? Or at least some way to ignore the rows that cause type errors if I'm summing the column.

This dataframe is from reading a pretty big CSV file (25 GB), so I'd like some solution that would work when reading in chunks.

like image 890
devil0150 Avatar asked Apr 18 '16 04:04

devil0150


People also ask

Does DataFrame mean ignore NaN?

DataFrame. mean() function is used to get the mean of the values over the requested axis in pandas. This by default returns a Series, if level specified, it returns a DataFrame. By default ignore NaN values and performs mean on index axis.


2 Answers

Pandas has some tools for converting these kinds of columns, but they may not suit your needs exactly. pd.to_numeric converts mixed columns like yours, but converts non-numeric strings to NaN. This means you'll get float columns, not integer, since only float columns can have NaN values. That usually doesn't matter too much but it's good to be aware of.

df = pd.DataFrame({'mixed_types': [12331, '345', 'text']})

pd.to_numeric(df['mixed_types'], errors='coerce')
Out[7]: 
0    12331.0
1      345.0
2        NaN
Name: mixed_types, dtype: float64

If you want to then drop all the NaN rows:

# Replace the column with the converted values
df['mixed_types'] = pd.to_numeric(df['mixed_types'], errors='coerce')

# Drop NA values, listing the converted columns explicitly
#   so NA values in other columns aren't dropped
df.dropna(subset = ['mixed_types'])
Out[11]: 
   mixed_types
0      12331.0
1        345.0
like image 147
Marius Avatar answered Sep 29 '22 02:09

Marius


you can use df._get_numeric_data() directly.

like image 20
PhilChang Avatar answered Sep 29 '22 02:09

PhilChang