An easy fix is: d = pd. DataFrame(0.0, index=np. arange(len(data)), columns=feature_list) .
You can create an empty dataframe by importing pandas from the python library. Later, using the pd. DataFrame(), create an empty dataframe without rows and columns as shown in the below example.
You can try this:
d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)
It's best to do this with numpy in my opinion
import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))
If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:
df_zeros = df * 0
If the existing data frame contains NaNs or non-numeric values you can instead apply a function to each cell that will just return 0:
df_zeros = df.applymap(lambda x: 0)
Similar to @Shravan, but without the use of numpy:
height = 10
width = 20
df_0 = pd.DataFrame(0, index=range(height), columns=range(width))
Then you can do whatever you want with it:
post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)
If you already have a dataframe, this is the fastest way:
In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 µs per loop
Compare to:
In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 µs per loop
In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 µs per loop
Assuming having a template DataFrame, which one would like to copy with zero values filled here...
If you have no NaNs in your data set, multiplying by zero can be significantly faster:
In [19]: columns = ["col{}".format(i) for i in xrange(3000)]
In [20]: indices = xrange(2000)
In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)
In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop
In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop
Improvement depends on DataFrame size, but never found it slower.
And just for the heck of it:
In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop
In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop
But:
In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop
EDIT!!!
Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.
In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop
Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:
In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With