Title edit: capitalization fixed and 'for python' added.
Is there a better or more standard way to do what I'm describing? I want input like this:
[1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]
to be transformed to this:
[0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 2, 0]
or, even better, something like this (describing similar output differently, but now not limited to integers):
labels: [1, 2, 3, 1, 2]
positions(where 1 identified the first occupiable position, as per my matplotlib plot): [2, 7, 12.5, 17, 21]
The input data is categorical data that classified a plot - in the picture below, grouped plots share a categorical feature which I'd like to label only once for the group. I'll be using 2 axes for two different variables, but I think that's besides the point for now.
Note: This image does not reflect either set of sample data - it's just to get across the idea of grouping together categories. Group a should be labeled at x=5, since there's a blank space between the first two and second to vertical data groups, and 0 is the line on the right side.
Here's what I've got:
data = [1, 1, 1, 2, 2, 2, 2, 2, 3, 4, 3, 2, 2, 1, 1, 1, 1]
last = None
runs = []
labels = []
run = 1
for x in data:
if x in (last, 0):
run += 1
else:
runs.append(run)
run = 1
labels.append(x)
last = x
runs.append(run)
runs.pop(0)
labels.append(x)
tick_positions = [0]
last_run = 1
for run in runs:
tick_positions.append(run/2.0+last_run/2.0+tick_positions[-1])
last_run = run
tick_positions.pop(0)
print tick_positions
Group Series using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Combine Two Series Using pandas.merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd. merge(S1, S2, right_index=True, left_index=True) .
Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
To get the labels you can use itertools groupby
:
>>> import itertools
>>> numbers = [1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]
>>> list(k for k, g in itertools.groupby(numbers))
[1, 0, 2, 0, 2, 0, 3, 0, 1, 2]
And to remove the zeros you can use a comprehension:
>>> list(k for k, g in itertools.groupby(x for x in numbers if x != 0))
[1, 2, 3, 1, 2]
If you want to get the positions too, then you'll have to iterate through the list yourself as you are already doing. groupby
doesn't keep track of that for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With