Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to copy only the structure (not the data) of a Pandas DataFrame?

I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the original data frame was created as

import pandas as pd
df1 = pd.DataFrame([[11,12],[21,22]], columns=['c1','c2'], index=['i1','i2'])

I copied the structure by explicitly defining the columns and names:

df2 = pd.DataFrame(columns=df1.columns, index=df1.index)    

I don't want to copy the data, otherwise I could just write df2 = df1.copy(). In other words, after df2 being created it must contain only NaN elements:

In [1]: df1
Out[1]: 
    c1  c2
i1  11  12
i2  21  22

In [2]: df2
Out[2]: 
     c1   c2
i1  NaN  NaN
i2  NaN  NaN

Is there a more idiomatic way of doing it?

like image 397
bmello Avatar asked Dec 14 '14 08:12

bmello


People also ask

How do you copy a DataFrame structure in Python?

Use: new_df = dataframe. copy(deep=False); new_df. astype(dataframe. dtypes.

How do I copy all columns except one in pandas?

To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].

How do I get only certain columns from a data frame?

To select a single column, use square brackets [] with the column name of the column of interest.

How do I make a separate copy of a DataFrame in pandas?

To copy Pandas DataFrame, use the copy() method. The DataFrame. copy() method makes a copy of the provided object's indices and data. The copy() method accepts one parameter called deep, and it returns the Series or DataFrame that matches the caller.


3 Answers

That's a job for reindex_like. Start with the original:

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

Construct an empty DataFrame and reindex it like df1:

pd.DataFrame().reindex_like(df1)
Out: 
    c1  c2
i1 NaN NaN
i2 NaN NaN   
like image 154
ayhan Avatar answered Oct 08 '22 17:10

ayhan


In version 0.18 of pandas, the DataFrame constructor has no options for creating a dataframe like another dataframe with NaN instead of the values.

The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index) is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.

TLDR: So my suggestion is:

Explicit is better than implicit

df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)

Very much like yours, but more spelled out.

like image 59
firelynx Avatar answered Oct 08 '22 17:10

firelynx


Not exactly answering this question, but a similar one for people coming here via a search engine

My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.

empty_copy = df.drop(df.index)
like image 14
Martijn Lentink Avatar answered Oct 08 '22 15:10

Martijn Lentink