Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why should I make a *shallow* copy of a dataframe?

related to why should I make a copy of a data frame in pandas

I noticed that in the popular backtesting library,

def __init__(self, data: pd.DataFrame)
    data = data.copy(False)

in row 631. What's the purpose of such a copy?

like image 720
ihadanny Avatar asked Nov 06 '22 14:11

ihadanny


1 Answers

A shallow copy allows you

  1. have access to frames data without copying it (memory optimization, etc.)
  2. modify frames structure without reflecting it to the original dataframe

In backtesting the developer tries to change the index to datetime format (line 640) and adds a new column 'Volume' with np.nan values if it's not already in dataframe. And those changes won't reflect on the original dataframe.

Example

>>> a = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['i', 's'])
>>> b = a.copy(False)
>>> a
    i  s
 0  1  a
 1  2  b
>>> b
    i  s
 0  1  a
 1  2  b
>>> b.index = pd.to_datetime(b.index)
>>> b['volume'] = 0
>>> b
                               i  s  volume
1970-01-01 00:00:00.000000000  1  a       0
1970-01-01 00:00:00.000000001  2  b       0
>>> a
    i  s
 0  1  a
 1  2  b

Of course, if you won't create a shallow copy, those changes to dataframe structure will reflect in the original one.

like image 70
Viacheslav Z Avatar answered Nov 14 '22 20:11

Viacheslav Z