I want to bring some data into a pandas DataFrame and I want to assign dtypes for each column on import. I want to be able to do this for larger datasets with many different columns, but, as an example:
myarray = np.random.randint(0,5,size=(2,2)) mydf = pd.DataFrame(myarray,columns=['a','b'], dtype=[float,int]) mydf.dtypes
results in:
TypeError: data type not understood
I tried a few other methods such as:
mydf = pd.DataFrame(myarray,columns=['a','b'], dtype={'a': int})
TypeError: object of type 'type' has no len()
If I put dtype=(float,int)
it applies a float format to both columns.
In the end I would like to just be able to pass it a list of datatypes the same way I can pass it a list of column names.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
I just ran into this, and the pandas issue is still open, so I'm posting my workaround. Assuming df
is my DataFrame and dtype
is a dict mapping column names to types:
for k, v in dtype.items(): df[k] = df[k].astype(v)
(note: use dtype.iteritems()
in python 2)
For the reference:
dtypes
): https://docs.scipy.org/doc/numpy-1.12.0/reference/arrays.dtypes.html category
: http://pandas.pydata.org/pandas-docs/stable/categorical.html As of pandas version 0.24.2 (the current stable release) it is not possible to pass an explicit list of datatypes to the DataFrame constructor as the docs state:
dtype : dtype, default None Data type to force. Only a single dtype is allowed. If None, infer
However, the dataframe class does have a static method allowing you to convert a numpy structured array to a dataframe so you can do:
>>> myarray = np.random.randint(0,5,size=(2,2)) >>> record = np.array(map(tuple,myarray),dtype=[('a',np.float),('b',np.int)]) >>> mydf = pd.DataFrame.from_records(record) >>> mydf.dtypes a float64 b int64 dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With