Python Pandas inferring column datatypes

Tags:

I am reading JSON files into dataframes. The dataframe might have some String (object) type columns, some Numeric (int64 and/or float64), and some datetime type columns. When the data is read in, the datatype is often incorrect (ie datetime, int and float will often be stored as "object" type). I want to report on this possibility. (ie a column is in the dataframe as "object" (String), but it is actually a "datetime").

The problem i have is that when i use pd.to_numeric and pd.to_datetime they will both evaluate and try to convert the column, and many times it ends up depending on which of the two I call last... (I was going to use convert_objects() which works but that is depreciated, so wanted a better option).

The code I am using to evaluate the dataframe column is (i realize a lot of the below is redundant, but I have written it this way for readability):

Click to copy

try:
   inferred_type = pd.to_datetime(df[Field_Name]).dtype
   if inferred_type == "datetime64[ns]":
      inferred_type = "DateTime"
except:
   pass
try:
   inferred_type = pd.to_numeric(df[Field_Name]).dtype
   if inferred_type == int:
      inferred_type = "Integer"
   if inferred_type == float:
      inferred_type = "Float"
except:
   pass

215

asked Jan 25 '16 21:01

Calamari

1 Answers

I came across the same problem of having to figure out column types for incoming data where the type is not known beforehand (from a database read in my case). I couldn't find a good answer here on SO, or by reviewing the Pandas source code. I solved it using this function:

Click to copy

def _get_col_dtype(col):
        """
        Infer datatype of a pandas column, process only if the column dtype is object. 
        input:   col: a pandas Series representing a df column. 
        """

        if col.dtype == "object":
            # try numeric
            try:
                col_new = pd.to_datetime(col.dropna().unique())
                return col_new.dtype
            except:
                try:
                    col_new = pd.to_numeric(col.dropna().unique())
                    return col_new.dtype
                except:
                    try:
                        col_new = pd.to_timedelta(col.dropna().unique())
                        return col_new.dtype
                    except:
                        return "object"
        else:
            return col.dtype

answered Sep 20 '22 06:09

PabTorre

Related questions
                            
                                python logging alternatives [closed]
                            
                                Python/Scipy 2D Interpolation (Non-uniform Data)
                            
                                Django: why are Django model fields class attributes?
                            
                                What's your folder layout for a Flask app divided in modules?
                            
                                pickling error in python?
                            
                                mod_wsgi and multiple installations of python
                            
                                lxml not adding newlines when inserting a new element into existing xml
                            
                                RFCOMM without pairing using PyBluez on Debian?
                            
                                Multidimensional Scaling Fitting in Numpy, Pandas and Sklearn (ValueError)
                            
                                What part of speech does "s" stand for in WordNet synsets
                            
                                selenium.common.exceptions.WebDriverException: Message: 'Can not connect to GhostDriver'
                            
                                multiprocessing.Process.is_alive() returns True although process has finished, why?
                            
                                argparse argument dependency
                            
                                Multiprocessing of shared list
                            
                                How to Zoom with Axes3D in Matplotlib
                            
                                Why does python print version info to stderr?
                            
                                How to aggregate matching pairs into "connected components" in Python
                            
                                weights option for seaborn distplot?
                            
                                how to add href link in email content when sending email through smtplib
                            
                                PyCharm asks for python interpreter every time project is loaded

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas inferring column datatypes

Tags:

python

pandas

profiling

Calamari

People also ask

1 Answers

PabTorre

Recent Activity

Donate For Us