Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - How to get data types for all columns in CSV file?

I am trying to get all data types from a CSV file for each column.
There is no documentation about data types in a file and manually checking will take a long time (it has 150 columns).

Started using this approach:

df = pd.read_csv('/tmp/file.csv')

>>> df.dtypes
a   int64
b   int64
c   object
d   float64

Is above approach good enough or there is a better approach to figure out data types?
Also - file has 150 columns. When I type df.types - I can see only 15 or so columns. How to see them all?

like image 820
Joe Avatar asked Mar 05 '23 10:03

Joe


1 Answers

Depending on the size of your file, you might be able to save some time by only reading in the first few rows, using the nrows argument of pd.read_csv:

df = pd.read_csv('/tmp/file.csv', nrows=25)

This is only useful if you know for sure that the types can be correctly inferred from the first n rows though, so be careful with this.

Once you have the data (or a subset of it) loaded into a DataFrame, you can view the types in a number of different ways, a few of which have been posted already, but I'll share another using a simple loop and iteritems:

for name, dtype in df.dtypes.iteritems():
    print(name, dtype)

a int64
b float64
c object
like image 120
sjw Avatar answered Mar 08 '23 00:03

sjw