Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find Run length encoding in python

I have an array ar = [2,2,2,1,1,2,2,3,3,3,3]. For this array, I want to find the lengths of consecutive same numbers like:

 values: 2, 1, 2, 3
lengths: 3, 2, 2, 4

In R, this is obtained by using rle() function. Is there any existing function in python which provides required output?

like image 716
Haroon Rashid Avatar asked Apr 15 '17 10:04

Haroon Rashid


People also ask

How do you calculate run length encoding?

In run length encoding, we replace each row with numbers that say how many consecutive pixels are the same colour, always starting with the number of white pixels. For example, the first row in the image above contains one white, two black, four white, one black, four white, two black, and one white pixel.

What is run length encoding in Python?

August 12, 2021. Run Length Encoding is a lossless data compression algorithm. It compresses data by reducing repetitive, and consecutive data called runs. It does so by storing the number of these runs followed by the data.

How do you write run length encoding in Python?

Implementing run length encoding. To implement run length encoding, we will have to store the string first. Then, we have to scan the entire string, store each consecutive character by a single character, and count its occurrences. We will implement run length encoding in python using the list.

How do I unpack RLE?

The RLE decompression consists in browsing the message formed of pairs (character, number of repetition) and writing the equivalent text by writing the character the corresponding number of times.


2 Answers

You can do this with groupby

from itertools import groupby
ar = [2,2,2,1,1,2,2,3,3,3,3]
print([(k, sum(1 for i in g)) for k,g in groupby(ar)])
# [(2, 3), (1, 2), (2, 2), (3, 4)]
like image 194
Rahul K P Avatar answered Nov 12 '22 02:11

Rahul K P


Here is an answer for pure numpy:

import numpy as np


def find_runs(x):
    """Find runs of consecutive items in an array."""

    # ensure array
    x = np.asanyarray(x)
    if x.ndim != 1:
        raise ValueError('only 1D array supported')
    n = x.shape[0]

    # handle empty array
    if n == 0:
        return np.array([]), np.array([]), np.array([])

    else:
        # find run starts
        loc_run_start = np.empty(n, dtype=bool)
        loc_run_start[0] = True
        np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
        run_starts = np.nonzero(loc_run_start)[0]

        # find run values
        run_values = x[loc_run_start]

        # find run lengths
        run_lengths = np.diff(np.append(run_starts, n))

        return run_values, run_starts, run_lengths

Credit goes to https://github.com/alimanfoo

like image 28
The Unfun Cat Avatar answered Nov 12 '22 01:11

The Unfun Cat