I know it might be old debate, but out of pandas.drop
and python del
function which is better in terms of performance over large dataset?
I am learning machine learning using python 3
and not sure which one to use. My data is in pandas
data frame format. But python del
function is in built-in function
for python.
Pandas DataFrame drop() Method The drop() method removes the specified row or column. By specifying the column axis ( axis='columns' ), the drop() method removes the specified column. By specifying the row axis ( axis='index' ), the drop() method removes the specified row.
Remove First N Rows of Pandas DataFrame Using tail()tail(df. shape[0] -n) to remove the top/first n rows of pandas DataFrame. Generally, DataFrame. tail() function is used to show the last n rows of a pandas DataFrame but you can pass a negative value to skip the rows from the beginning.
From the above, we can see that for summation, the DataFrame implementation is only slightly faster than the List implementation. This difference is much more pronounced for the more complicated Haversine function, where the DataFrame implementation is about 10X faster than the List implementation.
Summarizing a few points about functionality:
drop
operates on both columns and rows; del
operates on column only. drop
can operate on multiple items at a time; del
operates only on one at a time. drop
can operate in-place or return a copy; del
is an in-place operation only. The documentation at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html has more details on drop
's features.
Using randomly generated data of about 1.6 GB, it appears that df.drop
is faster than del
, especially over multiple columns:
df = pd.DataFrame(np.random.rand(20000,10000))
t_1 = time.time()
df.drop(labels=[2,4,1000], inplace=True)
t_2 = time.time()
print(t_2 - t_1)
0.9118959903717041
Compared to:
df = pd.DataFrame(np.random.rand(20000,10000))
t_3 = time.time()
del df[2]
del df[4]
del df[1000]
t_4 = time.time()
print(t_4 - t_3)
4.052732944488525
@Inder's comparison is not quite the same since it doesn't use inplace=True
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With