convert entire pandas dataframe to integers in pandas (0.17.0)




My question is very similar to this one, but I need to convert my entire dataframe instead of just a series. The to_numeric function only works on one series at a time and is not a good replacement for the deprecated convert_objects command. Is there a way to get similar results to the convert_objects(convert_numeric=True) command in the new pandas release?

Thank you Mike Müller for your example. df.apply(pd.to_numeric) works very well if the values can all be converted to integers. What if in my dataframe I had strings that could not be converted into integers? Example:

df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']}) df.dtypes Out[59]:  Words    object ints     object dtype: object 

Then I could run the deprecated function and get:

df = df.convert_objects(convert_numeric=True) df.dtypes Out[60]:  Words    object ints      int64 dtype: object 

Running the apply command gives me errors, even with try and except handling.

2 Answers

All columns convertible

You can apply the function to all columns:



>>> df = pd.DataFrame({'a': ['1', '2'],                         'b': ['45.8', '73.9'],                        'c': [10.5, 3.7]})  >>> df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 3 columns): a    2 non-null object b    2 non-null object c    2 non-null float64 dtypes: float64(1), object(2) memory usage: 64.0+ bytes  >>> df.apply(pd.to_numeric).info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 3 columns): a    2 non-null int64 b    2 non-null float64 c    2 non-null float64 dtypes: float64(2), int64(1) memory usage: 64.0 bytes 

Not all columns convertible

pd.to_numeric has the keyword argument errors:

  Signature: pd.to_numeric(arg, errors='raise')   Docstring:   Convert argument to a numeric type.  Parameters ---------- arg : list, tuple or array of objects, or Series errors : {'ignore', 'raise', 'coerce'}, default 'raise'     - If 'raise', then invalid parsing will raise an exception     - If 'coerce', then invalid parsing will be set as NaN     - If 'ignore', then invalid parsing will return the input 

Setting it to ignore will return the column unchanged if it cannot be converted into a numeric type.

As pointed out by Anton Protopopov, the most elegant way is to supply ignore as keyword argument to apply():

>>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']}) >>> df.apply(pd.to_numeric, errors='ignore').info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 2 columns): Words    2 non-null object ints     2 non-null int64 dtypes: int64(1), object(1) memory usage: 48.0+ bytes 

My previously suggested way, using partial from the module functools, is more verbose:

>>> from functools import partial >>> df = pd.DataFrame({'ints': ['3', '5'],                         'Words': ['Kobe', 'Bryant']}) >>> df.apply(partial(pd.to_numeric, errors='ignore')).info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 2 columns): Words    2 non-null object ints     2 non-null int64 dtypes: int64(1), object(1) memory usage: 48.0+ bytes 
The accepted answer with pd.to_numeric() converts to float, as soon as it is needed. Reading the question in detail, it is about converting any numeric column to integer. That is why the accepted answer needs a loop over all columns to convert the numbers to int in the end.

Just for completeness, this is even possible without pd.to_numeric(); of course, this is not recommended:

df = pd.DataFrame({'a': ['1', '2'],                     'b': ['45.8', '73.9'],                    'c': [10.5, 3.7]})  for i in df.columns:     try:         df[[i]] = df[[i]].astype(float).astype(int)     except:         pass  print(df.dtypes) 


a    int32 b    int32 c    int32 dtype: object 

EDITED: Mind that this not recommended solution is unnecessarily complicated; pd.to_numeric() can simply use the keyword argument downcast='integer' to force integer as output, thank you for the comment. This is then still missing in the accepted answer, though.

