convert entire pandas dataframe to integers in pandas (0.17.0)

Tags:

My question is very similar to this one, but I need to convert my entire dataframe instead of just a series. The to_numeric function only works on one series at a time and is not a good replacement for the deprecated convert_objects command. Is there a way to get similar results to the convert_objects(convert_numeric=True) command in the new pandas release?

Thank you Mike Müller for your example. df.apply(pd.to_numeric) works very well if the values can all be converted to integers. What if in my dataframe I had strings that could not be converted into integers? Example:

df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']}) df.dtypes Out[59]:  Words    object ints     object dtype: object

Then I could run the deprecated function and get:

df = df.convert_objects(convert_numeric=True) df.dtypes Out[60]:  Words    object ints      int64 dtype: object

Running the apply command gives me errors, even with try and except handling.

306

asked Jan 17 '16 22:01

Bobe Kryant

2 Answers

All columns convertible

You can apply the function to all columns:

df.apply(pd.to_numeric)

Example:

>>> df = pd.DataFrame({'a': ['1', '2'],                         'b': ['45.8', '73.9'],                        'c': [10.5, 3.7]})  >>> df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 3 columns): a    2 non-null object b    2 non-null object c    2 non-null float64 dtypes: float64(1), object(2) memory usage: 64.0+ bytes  >>> df.apply(pd.to_numeric).info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 3 columns): a    2 non-null int64 b    2 non-null float64 c    2 non-null float64 dtypes: float64(2), int64(1) memory usage: 64.0 bytes

Not all columns convertible

pd.to_numeric has the keyword argument errors:

  Signature: pd.to_numeric(arg, errors='raise')   Docstring:   Convert argument to a numeric type.  Parameters ---------- arg : list, tuple or array of objects, or Series errors : {'ignore', 'raise', 'coerce'}, default 'raise'     - If 'raise', then invalid parsing will raise an exception     - If 'coerce', then invalid parsing will be set as NaN     - If 'ignore', then invalid parsing will return the input

Setting it to ignore will return the column unchanged if it cannot be converted into a numeric type.

As pointed out by Anton Protopopov, the most elegant way is to supply ignore as keyword argument to apply():

>>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']}) >>> df.apply(pd.to_numeric, errors='ignore').info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 2 columns): Words    2 non-null object ints     2 non-null int64 dtypes: int64(1), object(1) memory usage: 48.0+ bytes

My previously suggested way, using partial from the module functools, is more verbose:

>>> from functools import partial >>> df = pd.DataFrame({'ints': ['3', '5'],                         'Words': ['Kobe', 'Bryant']}) >>> df.apply(partial(pd.to_numeric, errors='ignore')).info() <class 'pandas.core.frame.DataFrame'> Int64Index: 2 entries, 0 to 1 Data columns (total 2 columns): Words    2 non-null object ints     2 non-null int64 dtypes: int64(1), object(1) memory usage: 48.0+ bytes

105

answered Sep 24 '22 10:09

Mike Müller

The accepted answer with pd.to_numeric() converts to float, as soon as it is needed. Reading the question in detail, it is about converting any numeric column to integer. That is why the accepted answer needs a loop over all columns to convert the numbers to int in the end.

Just for completeness, this is even possible without pd.to_numeric(); of course, this is not recommended:

df = pd.DataFrame({'a': ['1', '2'],                     'b': ['45.8', '73.9'],                    'c': [10.5, 3.7]})  for i in df.columns:     try:         df[[i]] = df[[i]].astype(float).astype(int)     except:         pass  print(df.dtypes)

Out:

a    int32 b    int32 c    int32 dtype: object

EDITED: Mind that this not recommended solution is unnecessarily complicated; pd.to_numeric() can simply use the keyword argument downcast='integer' to force integer as output, thank you for the comment. This is then still missing in the accepted answer, though.

answered Sep 21 '22 10:09

questionto42standswithUkraine

Related questions
                            
                                True or false output based on a probability
                            
                                from ... import OR import ... as for modules
                            
                                Foreign Key Django Model
                            
                                dict.keys()[0] on Python 3 [duplicate]
                            
                                Select only one index of multiindex DataFrame
                            
                                docker-compose not printing stdout in Python app
                            
                                Best method to delete an item from a dict [closed]
                            
                                Create a python object that can be accessed with square brackets
                            
                                How to set colors for nodes in NetworkX?
                            
                                Python decorator makes function forget that it belongs to a class
                            
                                How to print original variable's name in Python after it was returned from a function?
                            
                                Making a POST call instead of GET using urllib2
                            
                                Cutting out a portion of video - python
                            
                                How do I filter a pandas DataFrame based on value counts?
                            
                                Django vs other Python web frameworks?
                            
                                Function acting as both decorator and context manager in Python?
                            
                                Checking if particular value (in cell) is NaN in pandas DataFrame not working using ix or iloc
                            
                                Links between IPython notebooks
                            
                                Getting a list of indices where pandas boolean series is True
                            
                                Is there a way to perform a mouseover (hover over an element) using Selenium and Python bindings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

convert entire pandas dataframe to integers in pandas (0.17.0)

Tags:

python

pandas