related to why should I make a copy of a data frame in pandas
I noticed that in the popular backtesting library,
def __init__(self, data: pd.DataFrame)
data = data.copy(False)
in row 631. What's the purpose of such a copy?
A shallow copy allows you
In backtesting the developer tries to change the index to datetime
format (line 640) and adds a new column 'Volume'
with np.nan
values if it's not already in dataframe. And those changes won't reflect on the original dataframe.
Example
>>> a = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['i', 's'])
>>> b = a.copy(False)
>>> a
i s
0 1 a
1 2 b
>>> b
i s
0 1 a
1 2 b
>>> b.index = pd.to_datetime(b.index)
>>> b['volume'] = 0
>>> b
i s volume
1970-01-01 00:00:00.000000000 1 a 0
1970-01-01 00:00:00.000000001 2 b 0
>>> a
i s
0 1 a
1 2 b
Of course, if you won't create a shallow copy, those changes to dataframe structure will reflect in the original one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With