Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

initialize pandas DataFrame with defined dtypes

The pd.DataFrame docstring specifies a scalar argument for the whole dataframe:

dtype : dtype, default None Data type to force, otherwise infer

Seemingly it is indeed intended to be a scalar, as following leads to an error:

dfbinseq = pd.DataFrame([],
                        columns = ["chr", "centre", "seq_binary"],
                        dtype = ["O", pd.np.int64, "O"])

dfbinseq = pd.DataFrame([],
                        columns = ["chr", "centre", "seq_binary"],
                        dtype = [pd.np.object, pd.np.int64, pd.np.object])

The only workaround for creating an empty data frame (which I need to put in a HDF5 store for further appends) for me was

dfbinseq.centre.dtype = np.int64

Is there a way to set dtypes arguments at once?

like image 901
Dima Lituiev Avatar asked Jul 07 '16 00:07

Dima Lituiev


People also ask

How do you specify Dtypes in pandas?

Cast a pandas object to a specified dtype dtype . Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.

How do I initialize a DataFrame in pandas?

To initialize a DataFrame from dictionary, pass this dictionary to pandas. DataFrame() constructor as data argument. In this example, we will create a DataFrame for list of lists.

What is DF Dtypes in Python?

dtypes attribute return the dtypes in the DataFrame. It returns a Series with the data type of each column. Syntax: DataFrame.dtypes.


1 Answers

You can set dtype to Series:

import pandas as pd

df = pd.DataFrame({'A':pd.Series([], dtype='str'),
                   'B':pd.Series([], dtype='int'),
                   'C':pd.Series([], dtype='float')})

print (df)
Empty DataFrame
Columns: [A, B, C]
Index: []

print (df.dtypes)
A     object
B      int32
C    float64
dtype: object

With data:

df = pd.DataFrame({'A':pd.Series([1,2,3], dtype='str'),
                   'B':pd.Series([4,5,6], dtype='int'),
                   'C':pd.Series([7,8,9], dtype='float')})

print (df)
   A  B    C
0  1  4  7.0
1  2  5  8.0
2  3  6  9.0

print (df.dtypes)
A     object
B      int32
C    float64
dtype: object
like image 70
jezrael Avatar answered Sep 29 '22 12:09

jezrael