Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas create dataframe and force multiple column types

Tags:

python

pandas

I was able to create dataframe and force one data type by

import pandas as pd
test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=int)

But I want to specify type for each column. How can I do this? I tried the following which doesn't work as the resulting dtypes are objects and b columns are not casted into integers.

test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=[('a', int),('b', int)])

Jeff helped with above case. But I found another problem when I try to create an empty dataframe and I want to be able to specify column types. For single type across columns, I could do

test = pd.DataFrame(columns=['a','b'], dtype=int)

What if I want to specify type for each of 'a' and 'b'?

like image 876
user3461238 Avatar asked Mar 25 '14 18:03

user3461238


2 Answers

You can pass in a dictionary of numpy arrays, with specified dtypes; this works for creating both filled and empty arrays. (This answer is a slight adaptation on my answer here.)

Here's an empty array:

df = pd.DataFrame(data={'a' : np.array([], dtype=int),
                        'b' : np.array([], dtype=float)
                       }
                 )

Here's a filled_array:

df = pd.DataFrame(data={'a' : np.array([1,2,3], dtype=int),
                        'b' : np.array([4,5,6], dtype=float)
                       }
                 )

And you can use basically any type for dtype, such as object, str, datetime.datetime or CrazyClassYouDefined. That said, if pandas doesn't specifically support a type (such as str), pandas will fall back to treating that column as object. Don't worry though, everything should still work.

like image 176
Eric G. Avatar answered Oct 04 '22 11:10

Eric G.


You can pass in a Series which has a dtype parameter

In [15]: pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}).dtypes
Out[15]: 
a      int64
b    float64
dtype: object

In [16]: pd.DataFrame({'a':Series([1,2,3],dtype='int32'), 'b':Series([1.1,2.1,3.1],dtype='float32')}).dtypes
Out[16]: 
a      int32
b    float32
dtype: object
like image 41
Jeff Avatar answered Oct 04 '22 11:10

Jeff