I have an array ar = [2,2,2,1,1,2,2,3,3,3,3]
.
For this array, I want to find the lengths of consecutive same numbers like:
values: 2, 1, 2, 3
lengths: 3, 2, 2, 4
In R
, this is obtained by using rle()
function. Is there any existing function in python which provides required output?
In run length encoding, we replace each row with numbers that say how many consecutive pixels are the same colour, always starting with the number of white pixels. For example, the first row in the image above contains one white, two black, four white, one black, four white, two black, and one white pixel.
August 12, 2021. Run Length Encoding is a lossless data compression algorithm. It compresses data by reducing repetitive, and consecutive data called runs. It does so by storing the number of these runs followed by the data.
Implementing run length encoding. To implement run length encoding, we will have to store the string first. Then, we have to scan the entire string, store each consecutive character by a single character, and count its occurrences. We will implement run length encoding in python using the list.
The RLE decompression consists in browsing the message formed of pairs (character, number of repetition) and writing the equivalent text by writing the character the corresponding number of times.
You can do this with groupby
from itertools import groupby
ar = [2,2,2,1,1,2,2,3,3,3,3]
print([(k, sum(1 for i in g)) for k,g in groupby(ar)])
# [(2, 3), (1, 2), (2, 2), (3, 4)]
Here is an answer for pure numpy:
import numpy as np
def find_runs(x):
"""Find runs of consecutive items in an array."""
# ensure array
x = np.asanyarray(x)
if x.ndim != 1:
raise ValueError('only 1D array supported')
n = x.shape[0]
# handle empty array
if n == 0:
return np.array([]), np.array([]), np.array([])
else:
# find run starts
loc_run_start = np.empty(n, dtype=bool)
loc_run_start[0] = True
np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
run_starts = np.nonzero(loc_run_start)[0]
# find run values
run_values = x[loc_run_start]
# find run lengths
run_lengths = np.diff(np.append(run_starts, n))
return run_values, run_starts, run_lengths
Credit goes to https://github.com/alimanfoo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With