I am trying to get all data types from a CSV file for each column.
There is no documentation about data types in a file and manually checking will take a long time (it has 150 columns).
Started using this approach:
df = pd.read_csv('/tmp/file.csv')
>>> df.dtypes
a int64
b int64
c object
d float64
Is above approach good enough or there is a better approach to figure out data types?
Also - file has 150 columns. When I type df.types
- I can see only 15 or so columns. How to see them all?
Depending on the size of your file, you might be able to save some time by only reading in the first few rows, using the nrows
argument of pd.read_csv
:
df = pd.read_csv('/tmp/file.csv', nrows=25)
This is only useful if you know for sure that the types can be correctly inferred from the first n rows though, so be careful with this.
Once you have the data (or a subset of it) loaded into a DataFrame, you can view the types in a number of different ways, a few of which have been posted already, but I'll share another using a simple loop and iteritems
:
for name, dtype in df.dtypes.iteritems():
print(name, dtype)
a int64
b float64
c object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With