Pandas - group by column and transform the data to numpy array

Tags:

Having the following data frame, group A have 4 samples, B 3 samples and C 1 sample:

  group   data_1   data_2
0     A        1        4
1     A        2        5
2     A        3        6
3     A        4        7
4     B        1        4
5     B        2        5
6     B        3        6
7     C        1        4

I would like to transform the data into numpy array, where each row is a group with all its samples and zero padding for groups that have fewer samples.

Resulting in an array like so:

[
   [[1,4],[2,5],[3,6],[4,7]], # this is A group 4 samples
   [[1,4],[2,5],[3,6],[0,0]], # this is B group 3 samples
   [[1,4],[0,0],[0,0],[0,0]], # this is C group 1 sample
]

336

asked Oct 03 '18 07:10

Shlomi Schwartz

1 Answers

First is necessary add missing values - first solution with unstack and stack, counter Series is created by cumcount.

Second solution use reindex by MultiIndex.

Last use lambda function with groupby, convert to numpy array by values and last to lists:

g = df.groupby('group').cumcount()
L = (df.set_index(['group',g])
       .unstack(fill_value=0)
       .stack().groupby(level=0)
       .apply(lambda x: x.values.tolist())
       .tolist())
print (L)

[[[1, 4], [2, 5], [3, 6], [4, 7]], 
 [[1, 4], [2, 5], [3, 6], [0, 0]], 
 [[1, 4], [0, 0], [0, 0], [0, 0]]]

Another solution:

g = df.groupby('group').cumcount()
mux = pd.MultiIndex.from_product([df['group'].unique(), g.unique()])
L = (df.set_index(['group',g])
       .reindex(mux, fill_value=0)
       .groupby(level=0)['data_1','data_2']
       .apply(lambda x: x.values.tolist())
       .tolist()
)

answered Oct 16 '22 09:10

jezrael

Related questions
                            
                                How to select cells greater than a value in a multi-index Pandas dataframe?
                            
                                Does Seaborn distplot not support a range?
                            
                                Optimization of arithmetic expressions - what is this technique called?
                            
                                How do I include .dll file in executable using pyinstaller?
                            
                                Python dynamic multiprocessing and signalling issues
                            
                                Librosa pitch tracking - STFT
                            
                                Parse pandas (multi)index to datetime
                            
                                Visual Studio - "The environment IronPython|2.7-32 appears to be incorrectly configured or missing"
                            
                                Blob.generate_signed_url() failing to AttributeError
                            
                                Python Unit Test : How to unit test the module which contains database operations?
                            
                                Use keras layer in tensorflow code
                            
                                Python async/await downloading a list of urls
                            
                                How to fix issues with E402?
                            
                                platform.linux_distribution() deprecated - what are the alternatives?
                            
                                Deleting elements of a list based on a condition
                            
                                What does the asterisk in the output of `reveal_type` mean?
                            
                                Are nested format specifications legal?
                            
                                How to schedule a task in asyncio so it runs at a certain date?
                            
                                Zero occurrences/frequency using value_counts() in PANDAS
                            
                                Seaborn: Avoid plotting missing values (line plot)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas - group by column and transform the data to numpy array

Tags:

python

pandas

pivot

grouping

Shlomi Schwartz

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us