Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping a series in Python

Title edit: capitalization fixed and 'for python' added.

Is there a better or more standard way to do what I'm describing? I want input like this:

[1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]

to be transformed to this:

[0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 2, 0]

or, even better, something like this (describing similar output differently, but now not limited to integers):

labels: [1, 2, 3, 1, 2]

positions(where 1 identified the first occupiable position, as per my matplotlib plot): [2, 7, 12.5, 17, 21]

The input data is categorical data that classified a plot - in the picture below, grouped plots share a categorical feature which I'd like to label only once for the group. I'll be using 2 axes for two different variables, but I think that's besides the point for now.

Note: This image does not reflect either set of sample data - it's just to get across the idea of grouping together categories. Group a should be labeled at x=5, since there's a blank space between the first two and second to vertical data groups, and 0 is the line on the right side.

Image demonstrating placement of tick marks in the center of a category of data

Here's what I've got:

data = [1, 1, 1, 2, 2, 2, 2, 2, 3, 4, 3, 2, 2, 1, 1, 1, 1]
last = None
runs = []
labels = []
run = 1
for x in data:
    if x in (last, 0):
        run += 1
    else:
        runs.append(run)
        run = 1
        labels.append(x)
    last = x
runs.append(run)
runs.pop(0)
labels.append(x)
tick_positions = [0]
last_run = 1
for run in runs:
    tick_positions.append(run/2.0+last_run/2.0+tick_positions[-1])
    last_run = run
tick_positions.pop(0)
print tick_positions
like image 845
Thomas Avatar asked Feb 08 '11 19:02

Thomas


People also ask

Can you groupby a series?

Group Series using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

How do you combine series in Python?

Combine Two Series Using pandas.merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd. merge(S1, S2, right_index=True, left_index=True) .

What is grouping in pandas?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.

How do you group multiple rows in Python?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.


1 Answers

To get the labels you can use itertools groupby:

>>> import itertools
>>> numbers = [1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]
>>> list(k for k, g in itertools.groupby(numbers))
[1, 0, 2, 0, 2, 0, 3, 0, 1, 2]

And to remove the zeros you can use a comprehension:

>>> list(k for k, g in itertools.groupby(x for x in numbers if x != 0))
[1, 2, 3, 1, 2]

If you want to get the positions too, then you'll have to iterate through the list yourself as you are already doing. groupby doesn't keep track of that for you.

like image 119
Mark Byers Avatar answered Oct 07 '22 21:10

Mark Byers