How to set all the values of an existing Pandas DataFrame to zero?

Tags:

I currently have an existing Pandas DataFrame with a date index, and columns each with a specific name.

As for the data cells, they are filled with various float values.

I would like to copy my DataFrame, but replace all these values with zero.

The objective is to reuse the structure of the DataFrame (dimensions, index, column names), but clear all the current values by replacing them with zeroes.

The way I'm currently achieving this is as follow:

df[df > 0] = 0

However, this would not replace any negative value in the DataFrame.

Isn't there a more general approach to filling an entire existing DataFrame with a single common value?

Thank you in advance for your help.

990

asked Mar 06 '17 22:03

manocormen

1 Answers

The absolute fastest way, which also preserves dtypes, is the following:

for col in df.columns:     df[col].values[:] = 0

This directly writes to the underlying numpy array of each column. I doubt any other method will be faster than this, as this allocates no additional storage and doesn't pass through pandas's dtype handling. You can also use np.issubdtype to only zero out numeric columns. This is probably what you want if you have a mixed dtype DataFrame, but of course it's not necessary if your DataFrame is already entirely numeric.

for col in df.columns:     if np.issubdtype(df[col].dtype, np.number):         df[col].values[:] = 0

For small DataFrames, the subtype check is somewhat costly. However, the cost of zeroing a non-numeric column is substantial, so if you're not sure whether your DataFrame is entirely numeric, you should probably include the issubdtype check.

Timing comparisons

Setup

import pandas as pd import numpy as np  def make_df(n, only_numeric):     series = [         pd.Series(range(n), name="int", dtype=int),         pd.Series(range(n), name="float", dtype=float),     ]     if only_numeric:         series.extend(             [                 pd.Series(range(n, 2 * n), name="int2", dtype=int),                 pd.Series(range(n, 2 * n), name="float2", dtype=float),             ]         )     else:         series.extend(             [                 pd.date_range(start="1970-1-1", freq="T", periods=n, name="dt")                 .to_series()                 .reset_index(drop=True),                 pd.Series(                     [chr((i % 26) + 65) for i in range(n)],                     name="string",                     dtype="object",                 ),             ]         )      return pd.concat(series, axis=1)

>>> make_df(5, True)    int  float  int2  float2 0    0    0.0     5     5.0 1    1    1.0     6     6.0 2    2    2.0     7     7.0 3    3    3.0     8     8.0 4    4    4.0     9     9.0  >>> make_df(5, False)    int  float                  dt string 0    0    0.0 1970-01-01 00:00:00      A 1    1    1.0 1970-01-01 00:01:00      B 2    2    2.0 1970-01-01 00:02:00      C 3    3    3.0 1970-01-01 00:03:00      D 4    4    4.0 1970-01-01 00:04:00      E

Small DataFrame

n = 10_000                                                                                    # Numeric df, no issubdtype check %%timeit df = make_df(n, True) for col in df.columns:     df[col].values[:] = 0 36.1 µs ± 510 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)  # Numeric df, yes issubdtype check %%timeit df = make_df(n, True) for col in df.columns:     if np.issubdtype(df[col].dtype, np.number):         df[col].values[:] = 0 53 µs ± 645 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)  # Non-numeric df, no issubdtype check %%timeit df = make_df(n, False) for col in df.columns:     df[col].values[:] = 0 113 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)  # Non-numeric df, yes issubdtype check %%timeit df = make_df(n, False) for col in df.columns:     if np.issubdtype(df[col].dtype, np.number):         df[col].values[:] = 0 39.4 µs ± 1.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Large DataFrame

n = 10_000_000                                                                               # Numeric df, no issubdtype check %%timeit df = make_df(n, True) for col in df.columns:     df[col].values[:] = 0 38.7 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # Numeric df, yes issubdtype check %%timeit df = make_df(n, True) for col in df.columns:     if np.issubdtype(df[col].dtype, np.number):         df[col].values[:] = 0 39.1 ms ± 556 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # Non-numeric df, no issubdtype check %%timeit df = make_df(n, False) for col in df.columns:     df[col].values[:] = 0 99.5 ms ± 748 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # Non-numeric df, yes issubdtype check %%timeit df = make_df(n, False) for col in df.columns:     if np.issubdtype(df[col].dtype, np.number):         df[col].values[:] = 0 17.8 ms ± 228 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I’d previously suggested the answer below, but I now consider it harmful — it’s significantly slower than the above answers and is harder to reason about. Its only advantage is being nicer to write.

The cleanest way is to use a bare colon to reference the entire dataframe.
df[:] = 0 
Unfortunately the dtype situation is a bit fuzzy because every column in the resulting dataframe will have the same dtype. If every column of df was originally float, the new dtypes will still be float. But if a single column was int or object, it seems that the new dtypes will all be int.

answered Sep 18 '22 03:09

BallpointBen

Related questions
                            
                                Matching partial ids in BeautifulSoup
                            
                                Extracting only characters from a string in Python
                            
                                (Python) Counting lines in a huge (>10GB) file as fast as possible [duplicate]
                            
                                pandas: count things
                            
                                re.findall not returning full match?
                            
                                Python - Flatten a dict of lists into unique values?
                            
                                How to check for the existence of a get parameter in flask
                            
                                Sorting 2D list python [closed]
                            
                                Scipy.optimize: how to restrict argument values
                            
                                Http Redirection code 3XX in python requests
                            
                                Flask blueprint static directory does not work?
                            
                                Multiple histograms in Pandas
                            
                                What does this notation do for lists in Python: "someList[:]"?
                            
                                Django Rest Framework Read Only Model Serializer
                            
                                Python conversion from JSON to JSONL
                            
                                How do I use python for web development without relying on a framework?
                            
                                SQLAlchemy: filter by membership in at least one many-to-many related table
                            
                                Python DNS module import error
                            
                                Matrix inversion without Numpy
                            
                                Plot dynamically changing graph using matplotlib in Jupyter Notebook

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to set all the values of an existing Pandas DataFrame to zero?

Tags:

python

pandas

dataframe