Title edit: capitalization fixed and 'for python' added. Is there a better or more standard way to do what I'm describing? I want input like this: <code>[1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]</code> to be transformed to this: <code>[0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 2, 0]</code> or, even better, something like this (describing similar output differently, but now not limited to integers): labels: <code>[1, 2, 3, 1, 2]</code> positions(where 1 identified the first occupiable position, as per my matplotlib plot): <code>[2, 7, 12.5, 17, 21]</code> The input data is categorical data that classified a plot - in the picture below, grouped plots share a categorical feature which I'd like to label only once for the group. I'll be using 2 axes for two different variables, but I think that's besides the point for now. Note: This image does not reflect either set of sample data - it's just to get across the idea of grouping together categories. Group a should be labeled at x=5, since there's a blank space between the first two and second to vertical data groups, and 0 is the line on the right side. <img src="https://i.stack.imgur.com/P7jip.png" alt="Image demonstrating placement of tick marks in the center of a category of data"> Here's what I've got: <pre class="prettyprint"><code>data = [1, 1, 1, 2, 2, 2, 2, 2, 3, 4, 3, 2, 2, 1, 1, 1, 1] last = None runs = [] labels = [] run = 1 for x in data: if x in (last, 0): run += 1 else: runs.append(run) run = 1 labels.append(x) last = x runs.append(run) runs.pop(0) labels.append(x) tick_positions = [0] last_run = 1 for run in runs: tick_positions.append(run/2.0+last_run/2.0+tick_positions[-1]) last_run = run tick_positions.pop(0) print tick_positions </code></pre>

To get the labels you can use itertools <code>groupby</code>: <pre class="prettyprint"><code>>>> import itertools >>> numbers = [1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2] >>> list(k for k, g in itertools.groupby(numbers)) [1, 0, 2, 0, 2, 0, 3, 0, 1, 2] </code></pre> And to remove the zeros you can use a comprehension: <pre class="prettyprint"><code>>>> list(k for k, g in itertools.groupby(x for x in numbers if x != 0)) [1, 2, 3, 1, 2] </code></pre> If you want to get the positions too, then you'll have to iterate through the list yourself as you are already doing. <code>groupby</code> doesn't keep track of that for you.

Grouping a series in Python

Tags:

python

matplotlib

Title edit: capitalization fixed and 'for python' added.

Is there a better or more standard way to do what I'm describing? I want input like this:

[1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]

to be transformed to this:

[0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 2, 0]

or, even better, something like this (describing similar output differently, but now not limited to integers):

labels: [1, 2, 3, 1, 2]

positions(where 1 identified the first occupiable position, as per my matplotlib plot): [2, 7, 12.5, 17, 21]

The input data is categorical data that classified a plot - in the picture below, grouped plots share a categorical feature which I'd like to label only once for the group. I'll be using 2 axes for two different variables, but I think that's besides the point for now.

Note: This image does not reflect either set of sample data - it's just to get across the idea of grouping together categories. Group a should be labeled at x=5, since there's a blank space between the first two and second to vertical data groups, and 0 is the line on the right side.

Image demonstrating placement of tick marks in the center of a category of data

Here's what I've got:

data = [1, 1, 1, 2, 2, 2, 2, 2, 3, 4, 3, 2, 2, 1, 1, 1, 1]
last = None
runs = []
labels = []
run = 1
for x in data:
    if x in (last, 0):
        run += 1
    else:
        runs.append(run)
        run = 1
        labels.append(x)
    last = x
runs.append(run)
runs.pop(0)
labels.append(x)
tick_positions = [0]
last_run = 1
for run in runs:
    tick_positions.append(run/2.0+last_run/2.0+tick_positions[-1])
    last_run = run
tick_positions.pop(0)
print tick_positions

845

asked Feb 08 '11 19:02

Thomas

1 Answers

To get the labels you can use itertools groupby:

>>> import itertools
>>> numbers = [1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]
>>> list(k for k, g in itertools.groupby(numbers))
[1, 0, 2, 0, 2, 0, 3, 0, 1, 2]

And to remove the zeros you can use a comprehension:

>>> list(k for k, g in itertools.groupby(x for x in numbers if x != 0))
[1, 2, 3, 1, 2]

If you want to get the positions too, then you'll have to iterate through the list yourself as you are already doing. groupby doesn't keep track of that for you.

119

answered Oct 07 '22 21:10

Mark Byers

Related questions
                            
                                How can I use a Perl module from Python?
                            
                                Phylo BioPython building trees
                            
                                Writing metadata to a pdf using pyobjc
                            
                                Check if variable exists in tmpl_context (Python, Pylons, Genshi)?
                            
                                Diff multidimensional dictionaries in python
                            
                                Python re.sub use non-greedy mode (.*?) with end of string ($) it comes greedy!
                            
                                Whats the pythonic way to handle empty *args when creating a set?
                            
                                Creating sets of default values for Matplotlib
                            
                                converting white space in python files?
                            
                                Shortest way to convert these bytes to int in python?
                            
                                Python: Converting ('Monday', 'Tuesday', 'Wednesday') to 'Monday to Wednesday'
                            
                                Running a Python Script using Cron?
                            
                                Modify subclassed string in place
                            
                                How do I display add model in tabular format in the Django admin?
                            
                                Problems installing PyCurl on python2.7.0+
                            
                                Processing messages from a child process thorough stderr and stdout with Python
                            
                                What are the various Python CMS's and their statuses?
                            
                                most efficient way to find partial string matches in large file of strings (python)
                            
                                Many-to-many declarative SQLAlchemy definition for users, groups, and roles
                            
                                Why is it not possible to get a Py_buffer from an array object?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With