I was able to create dataframe and force one data type by
import pandas as pd
test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=int)
But I want to specify type for each column. How can I do this? I tried the following which doesn't work as the resulting dtypes are objects and b columns are not casted into integers.
test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=[('a', int),('b', int)])
Jeff helped with above case. But I found another problem when I try to create an empty dataframe and I want to be able to specify column types. For single type across columns, I could do
test = pd.DataFrame(columns=['a','b'], dtype=int)
What if I want to specify type for each of 'a' and 'b'?
You can pass in a dictionary of numpy
arrays, with specified dtype
s; this works for creating both filled and empty arrays. (This answer is a slight adaptation on my answer here.)
Here's an empty array:
df = pd.DataFrame(data={'a' : np.array([], dtype=int),
'b' : np.array([], dtype=float)
}
)
Here's a filled_array:
df = pd.DataFrame(data={'a' : np.array([1,2,3], dtype=int),
'b' : np.array([4,5,6], dtype=float)
}
)
And you can use basically any type for dtype
, such as object
, str
, datetime.datetime
or CrazyClassYouDefined
. That said, if pandas doesn't specifically support a type (such as str
), pandas will fall back to treating that column as object
. Don't worry though, everything should still work.
You can pass in a Series which has a dtype parameter
In [15]: pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}).dtypes
Out[15]:
a int64
b float64
dtype: object
In [16]: pd.DataFrame({'a':Series([1,2,3],dtype='int32'), 'b':Series([1.1,2.1,3.1],dtype='float32')}).dtypes
Out[16]:
a int32
b float32
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With