Creating a zero-filled pandas data frame

People also ask

How do you create a DataFrame with 0 values in python?

An easy fix is: d = pd. DataFrame(0.0, index=np. arange(len(data)), columns=feature_list) .

How do I create an empty Panda DataFrame?

You can create an empty dataframe by importing pandas from the python library. Later, using the pd. DataFrame(), create an empty dataframe without rows and columns as shown in the below example.

You can try this:

d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)

It's best to do this with numpy in my opinion

import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))

If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:

df_zeros = df * 0

If the existing data frame contains NaNs or non-numeric values you can instead apply a function to each cell that will just return 0:

df_zeros = df.applymap(lambda x: 0)

Similar to @Shravan, but without the use of numpy:

  height = 10
  width = 20
  df_0 = pd.DataFrame(0, index=range(height), columns=range(width))

Then you can do whatever you want with it:

post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)

If you already have a dataframe, this is the fastest way:

In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 µs per loop

Compare to:

In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 µs per loop

In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 µs per loop

Assuming having a template DataFrame, which one would like to copy with zero values filled here...

If you have no NaNs in your data set, multiplying by zero can be significantly faster:

In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       

In [20]: indices = xrange(2000)

In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)

In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop

In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop

Improvement depends on DataFrame size, but never found it slower.

And just for the heck of it:

In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop

In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop

But:

In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop

EDIT!!!

Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.

In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop

Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:

In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop

Related questions
                            
                                The order of keys in dictionaries
                            
                                Writing a dict to txt file and reading it back?
                            
                                Show distinct column values in pyspark dataframe
                            
                                Standard deviation of a list
                            
                                Python Matplotlib Y-Axis ticks on Right Side of Plot
                            
                                How to run Spyder in virtual environment?
                            
                                What does "three dots" in Python mean when indexing what looks like a number?
                            
                                Is "x < y < z" faster than "x < y and y < z"?
                            
                                Creating hidden arguments with Python argparse
                            
                                Threading in a PyQt application: Use Qt threads or Python threads?
                            
                                What is the difference between setUp() and setUpClass() in Python unittest?
                            
                                What is the most pythonic way to check if an object is a number?
                            
                                Revert the `--no-site-packages` option with virtualenv
                            
                                Reading in environment variables from an environment file
                            
                                How to programmatically generate markdown output in Jupyter notebooks?
                            
                                Creating functions in a loop
                            
                                Matplotlib connect scatterplot points with line - Python
                            
                                Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone
                            
                                Python Requests package: Handling xml response
                            
                                How to migrate back from initial migration in Django 1.7?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating a zero-filled pandas data frame

Tags:

python

pandas

dataframe

People also ask

Recent Activity

Donate For Us