I would like to make a nice function to aggregate data among an array (it's a numpy record array, but it does not change anything) you have an array of data that you want to aggregate among one axis: for example an array of <code>dtype=[(name, (np.str_,8), (job, (np.str_,8), (income, np.uint32)]</code> and you want to have the mean income per job I did this function, and in the example it should be called as <code>aggregate(data,'job','income',mean)</code> <hr> <pre class="prettyprint"><code>def aggregate(data, key, value, func): data_per_key = {} for k,v in zip(data[key], data[value]): if k not in data_per_key.keys(): data_per_key[k]=[] data_per_key[k].append(v) return [(k,func(data_per_key[k])) for k in data_per_key.keys()] </code></pre> <hr> the problem is that I find it not very nice I would like to have it in one line: do you have any ideas? Thanks for your answer Louis PS: I would like to keep the func in the call so that you can also ask for median, minimum...

Perhaps the function you are seeking is matplotlib.mlab.rec_groupby: <pre class="prettyprint"><code>import matplotlib.mlab data=np.array( [('Aaron','Digger',1), ('Bill','Planter',2), ('Carl','Waterer',3), ('Darlene','Planter',3), ('Earl','Digger',7)], dtype=[('name', np.str_,8), ('job', np.str_,8), ('income', np.uint32)]) result=matplotlib.mlab.rec_groupby(data, ('job',), (('income',np.mean,'avg_income'),)) </code></pre> yields <pre class="prettyprint"><code>('Digger', 4.0) ('Planter', 2.5) ('Waterer', 3.0) </code></pre> <code>matplotlib.mlab.rec_groupby</code> returns a recarray: <pre class="prettyprint"><code>print(result.dtype) # [('job', '|S7'), ('avg_income', '<f8')] </code></pre> <hr> You may also be interested in checking out pandas, which has even more versatile facilities for handling group-by operations.

pythonic way to aggregate arrays (numpy or not)

Tags:

python

arrays

aggregate

numpy

I would like to make a nice function to aggregate data among an array (it's a numpy record array, but it does not change anything)

you have an array of data that you want to aggregate among one axis: for example an array of dtype=[(name, (np.str_,8), (job, (np.str_,8), (income, np.uint32)] and you want to have the mean income per job

I did this function, and in the example it should be called as aggregate(data,'job','income',mean)

def aggregate(data, key, value, func):

    data_per_key = {}

    for k,v in zip(data[key], data[value]):

        if k not in data_per_key.keys():

            data_per_key[k]=[]

        data_per_key[k].append(v)

    return [(k,func(data_per_key[k])) for k in data_per_key.keys()]

the problem is that I find it not very nice I would like to have it in one line: do you have any ideas?

Thanks for your answer Louis

PS: I would like to keep the func in the call so that you can also ask for median, minimum...

763

asked Dec 01 '09 22:12

Louis

1 Answers

Perhaps the function you are seeking is matplotlib.mlab.rec_groupby:

import matplotlib.mlab

data=np.array(
    [('Aaron','Digger',1),
     ('Bill','Planter',2),
     ('Carl','Waterer',3),
     ('Darlene','Planter',3),
     ('Earl','Digger',7)],
    dtype=[('name', np.str_,8), ('job', np.str_,8), ('income', np.uint32)])

result=matplotlib.mlab.rec_groupby(data, ('job',), (('income',np.mean,'avg_income'),))

yields

('Digger', 4.0)
('Planter', 2.5)
('Waterer', 3.0)

matplotlib.mlab.rec_groupby returns a recarray:

print(result.dtype)
# [('job', '|S7'), ('avg_income', '<f8')]

You may also be interested in checking out pandas, which has even more versatile facilities for handling group-by operations.

173

answered Sep 21 '22 07:09

unutbu

Related questions
                            
                                Writing to the serial port in Vista from Python
                            
                                What are the steps to convert from using libglade to GtkBuilder? (Python)
                            
                                Django caching - can it be done pre-emptively?
                            
                                on my local Windows machine, how do i write a script to download a comic strip every day and email it to myself?
                            
                                Caching values in Python list comprehensions
                            
                                Browser automation: Python + Firefox using PyXPCOM
                            
                                How to parse for tags with '+' in python
                            
                                How can I parse the output of /proc/net/dev into key:value pairs per interface using Python?
                            
                                Programmatic control of python optimization?
                            
                                GTK: create a colored regular button
                            
                                What is the recommended Python module for fast Fourier transforms (FFT)?
                            
                                How to define properties in __init__
                            
                                Django: form values not updating when model updates
                            
                                Given a Python class, how can I inspect and find the place in my code where it is defined?
                            
                                Python: Int not iterable error
                            
                                text-mine PDF files with Python?
                            
                                PHPs call_user_func_array in Python
                            
                                Customize HTML Output of Django ModelForm
                            
                                Calling Method from Different Python File
                            
                                Scrapy BaseSpider: How does it work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With