Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set dtypes by column in pandas DataFrame

I want to bring some data into a pandas DataFrame and I want to assign dtypes for each column on import. I want to be able to do this for larger datasets with many different columns, but, as an example:

myarray = np.random.randint(0,5,size=(2,2)) mydf = pd.DataFrame(myarray,columns=['a','b'], dtype=[float,int]) mydf.dtypes 

results in:

TypeError: data type not understood

I tried a few other methods such as:

mydf = pd.DataFrame(myarray,columns=['a','b'], dtype={'a': int}) 

TypeError: object of type 'type' has no len()

If I put dtype=(float,int) it applies a float format to both columns.

In the end I would like to just be able to pass it a list of datatypes the same way I can pass it a list of column names.

like image 410
Chris Avatar asked Sep 01 '14 17:09

Chris


People also ask

How do I change the Dtypes of columns in pandas?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

How do I change DataFrame in Dtypes?

In order to convert data types in pandas, there are three basic options: Use astype() to force an appropriate dtype. Create a custom function to convert the data. Use pandas functions such as to_numeric() or to_datetime()

How do I get Dtype of pandas column?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.


2 Answers

I just ran into this, and the pandas issue is still open, so I'm posting my workaround. Assuming df is my DataFrame and dtype is a dict mapping column names to types:

for k, v in dtype.items():     df[k] = df[k].astype(v) 

(note: use dtype.iteritems() in python 2)

For the reference:

  • The list of allowed data types (NumPy dtypes): https://docs.scipy.org/doc/numpy-1.12.0/reference/arrays.dtypes.html
  • Pandas also supports some other types. E.g., category: http://pandas.pydata.org/pandas-docs/stable/categorical.html
  • The relevant GitHub issue: https://github.com/pandas-dev/pandas/issues/9287
like image 97
mattexx Avatar answered Sep 30 '22 22:09

mattexx


As of pandas version 0.24.2 (the current stable release) it is not possible to pass an explicit list of datatypes to the DataFrame constructor as the docs state:

dtype : dtype, default None      Data type to force. Only a single dtype is allowed. If None, infer 

However, the dataframe class does have a static method allowing you to convert a numpy structured array to a dataframe so you can do:

>>> myarray = np.random.randint(0,5,size=(2,2)) >>> record = np.array(map(tuple,myarray),dtype=[('a',np.float),('b',np.int)]) >>> mydf = pd.DataFrame.from_records(record) >>> mydf.dtypes a    float64 b      int64 dtype: object 
like image 37
user545424 Avatar answered Sep 30 '22 23:09

user545424