I'd like to declare different types for the columns of a pandas DataFrame at instantiation:
frame = pandas.DataFrame({..some data..},dtype=[str,int,int])
This works if dtype is only one type (e.g dtype=float
), but not multiple types as above - is there a way to do this?
The common solution seems to be to cast later:
frame['some column'] = frame['some column'].astype(float)
but this has a couple of issues:
Pandas uses other names for data types than Python, for example: object for textual data. A column in a DataFrame can only have one data type.
Change column type in pandas using DataFrame.apply() to_numeric, pandas. to_datetime, and pandas. to_timedelta as arguments to apply the apply() function to change the data type of one or more columns to numeric, DateTime, and time delta respectively.
You can create a DataFrame from multiple Series objects by adding each series as a columns. By using concat() method you can merge multiple series together into DataFrame. This takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.
Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).
You can also create a NumPy array with specific dtypes and then convert it to DataFrame.
data = np.zeros((2,),dtype=[('A', 'i4'),('B', 'f4'),('C', 'a10')])
data[:] = [(1,2.,'Hello'),(2,3.,"World")]
DataFrame(data)
See From structured or record array
As an alternative, you can specify the dtype
for each column by creating the Series
objects first.
In [2]: df = pd.DataFrame({'x': pd.Series(['1.0', '2.0', '3.0'], dtype=float), 'y': pd.Series(['1', '2', '3'], dtype=int)})
In [3]: df
Out[3]:
x y
0 1 1
1 2 2
2 3 3
[3 rows x 2 columns]
In [4]: df.dtypes
Out[4]:
x float64
y int64
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With