Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas, create empty DataFrame specifying column dtypes

There is one thing that I find myself having to do quite often, and it surprises me how difficult it is to achieve this in Pandas. Suppose I need to create an empty DataFrame with specified index type and name, and column types and names. (I might want to fill it later, in a loop for example.) The easiest way to do this, that I have found, is to create an empty pandas.Series object for each column, specifying their dtypes, put them into a dictionary which specifies their names, and pass the dictionary into the DataFrame constructor. Something like the following.

def create_empty_dataframe():
    index = pandas.Index([], name="id", dtype=int)
    column_names = ["name", "score", "height", "weight"]
    series = [pandas.Series(dtype=str), pandas.Series(dtype=int), pandas.Series(dtype=float), pandas.Series(dtype=float)]
    columns = dict(zip(column_names, series))
    return pandas.DataFrame(columns, index=index, columns=column_names)
    # The columns=column_names is required because the dictionary will in general put the columns in arbitrary order.

First question. Is the above really the simplest way of doing this? There are so many things that are convoluted about this. What I really want to do, and what I'm pretty sure a lot of people really want to do, is something like the following.

df = pandas.DataFrame(columns=["id", "name", "score", "height", "weight"], dtypes=[int, str, int, float, float], index_column="id") 

Second question. Is this sort of syntax at all possible in Pandas? If not, are the devs considering supporting something like this at all? It feels to me that it really ought to be as simple as this (the above syntax).

like image 459
Ray Avatar asked Jul 22 '16 10:07

Ray


People also ask

How do you specify Dtypes in pandas?

Cast a pandas object to a specified dtype dtype . Use a numpy. dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.

Can we create empty DataFrame in pandas?

You can create an empty dataframe by importing pandas from the python library. Later, using the pd. DataFrame(), create an empty dataframe without rows and columns as shown in the below example.

How do I create a new data frame in a specific column?

You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.


1 Answers

Unfortunately the DateFrame ctor accepts a single dtype descriptor, however you can cheat a little by using read_csv:

In [143]:
import pandas as pd
import io
cols=["id", "name", "score", "height", "weight"]
df = pd.read_csv(io.StringIO(""), names=cols, dtype=dict(zip(cols,[int, str, int, float, float])), index_col=['id']) 
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 4 columns):
name      0 non-null object
score     0 non-null int32
height    0 non-null float64
weight    0 non-null float64
dtypes: float64(2), int32(1), object(1)
memory usage: 0.0+ bytes

So you can see that the dtypes are as desired and that the index is set as desired:

In [145]:

df.index
Out[145]:
Int64Index([], dtype='int64', name='id')
like image 92
EdChum Avatar answered Sep 18 '22 06:09

EdChum