Does pandas groupby pass by reference or value?

Question

Let's say I have a pandas DataFrame data and I'd like to split it by a certain column, col, according to

def split_by_column(data, column):

    chunk_list = [(k,g) for k, g in data.groupby(column)]
    return dict(chunk_list)


collection = split_by_column(data, 'col')

This way I can easily access and apply functions to this collection later.

If I for instance have an object which has both data and collection as instance variables, do I have two separate copies of the data in the memory or does the dictionary contain references to the appropriate chucks in data?

Lepakk · Accepted Answer

I tried this:

data=pd.DataFrame({'a':[1,2,3,4], 'b':[6,9,8,9]})
print('data initial:',data)
def split_by_column(data, column):
    chunk_list = [(k,g) for k, g in data.groupby(column)]
    return dict(chunk_list)
collection = split_by_column(data, 'b')
print('collection initial:',collection)

Output is:

data initial:    a  b
0  1  6
1  2  9
2  3  8
3  4  9
collection initial: {6:    a  b
0  1  6, 8:    a  b
2  3  8, 9:    a  b
1  2  9
3  4  9}

If I change data now by

data.at[3,'a']=5

and print data and collection again, the output is this:

data new:    a  b
0  1  6
1  2  9
2  3  8
3  5  9
collection new: {6:    a  b
0  1  6, 8:    a  b
2  3  8, 9:    a  b
1  2  9
3  4  9}

Since I am also just starting to explore pandas, I can not tell you, what the underlying mechanisms are, but since the value 5 is only appearing in the dataframe, but not in the dict, I conclude, that you have two different copies of your data.

I hope, this is helpful for you. Best, lepakk

Does pandas groupby pass by reference or value?

Tags:

python

pandas

pandas-groupby

signalfel

1 Answers

Lepakk

Recent Activity

Donate For Us

Does pandas groupby pass by reference or value?

Tags:

python

pandas

pandas-groupby

signalfel

1 Answers

Lepakk

Related questions

Recent Activity

Donate For Us