Python Pandas, create empty DataFrame specifying column dtypes

Tags:

There is one thing that I find myself having to do quite often, and it surprises me how difficult it is to achieve this in Pandas. Suppose I need to create an empty DataFrame with specified index type and name, and column types and names. (I might want to fill it later, in a loop for example.) The easiest way to do this, that I have found, is to create an empty pandas.Series object for each column, specifying their dtypes, put them into a dictionary which specifies their names, and pass the dictionary into the DataFrame constructor. Something like the following.

def create_empty_dataframe():
    index = pandas.Index([], name="id", dtype=int)
    column_names = ["name", "score", "height", "weight"]
    series = [pandas.Series(dtype=str), pandas.Series(dtype=int), pandas.Series(dtype=float), pandas.Series(dtype=float)]
    columns = dict(zip(column_names, series))
    return pandas.DataFrame(columns, index=index, columns=column_names)
    # The columns=column_names is required because the dictionary will in general put the columns in arbitrary order.

First question. Is the above really the simplest way of doing this? There are so many things that are convoluted about this. What I really want to do, and what I'm pretty sure a lot of people really want to do, is something like the following.

df = pandas.DataFrame(columns=["id", "name", "score", "height", "weight"], dtypes=[int, str, int, float, float], index_column="id")

Second question. Is this sort of syntax at all possible in Pandas? If not, are the devs considering supporting something like this at all? It feels to me that it really ought to be as simple as this (the above syntax).

459

asked Jul 22 '16 10:07

Ray

1 Answers

Unfortunately the DateFrame ctor accepts a single dtype descriptor, however you can cheat a little by using read_csv:

In [143]:
import pandas as pd
import io
cols=["id", "name", "score", "height", "weight"]
df = pd.read_csv(io.StringIO(""), names=cols, dtype=dict(zip(cols,[int, str, int, float, float])), index_col=['id']) 
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 4 columns):
name      0 non-null object
score     0 non-null int32
height    0 non-null float64
weight    0 non-null float64
dtypes: float64(2), int32(1), object(1)
memory usage: 0.0+ bytes

So you can see that the dtypes are as desired and that the index is set as desired:

In [145]:

df.index
Out[145]:
Int64Index([], dtype='int64', name='id')

answered Sep 18 '22 06:09

EdChum

Related questions
                            
                                Python: inconsistence in the way you define the function __setattr__?
                            
                                Get display count and resolution for each display in Python without xrandr
                            
                                Python subprocess call returns "command not found", Terminal executes correctly
                            
                                How to set NetworkX edge labels offset? (to avoid label overlap)
                            
                                Select data at a particular level from a MultiIndex
                            
                                OpenCV imread hanging when called from a web request
                            
                                How to test database connectivity in python?
                            
                                Connect to SMTP (SSL or TLS) using Python
                            
                                True=False assignment in Python 2.x [duplicate]
                            
                                How to find the path to a SSL cert file?
                            
                                How to terminate multiprocessing Pool processes?
                            
                                Mocking Oauth providers while testing
                            
                                Find subset with K elements that are closest to eachother
                            
                                how to convert a bs4.element.ResultSet to strings? Python
                            
                                Why does a function that returns itself max out recursion in python 3
                            
                                Chi squared test in Python
                            
                                Pandas time series time between events
                            
                                Run a chord callback even if the main tasks fail
                            
                                Is there a pythonic way to skip decoration on a subclass' method?
                            
                                How does pandas calculate skew

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas, create empty DataFrame specifying column dtypes

Tags:

python

pandas

dataframe

Ray

People also ask

1 Answers

EdChum

Recent Activity

Donate For Us