Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Length (count) of sequences with start and end condition Python

I have some acceleration data for which I am trying to count the length of sequences given a set of conditions. In this case I want to count the length of a sequence when the acceleration moves > 2.78 and then drops back below 0.

An example would be

[-1.1, -1, 0, 1.2, 1.8, 2, 2.88, 2.86, 2.53, 1.98, 1.21, 0.89, 0.11, -0.21]

The return result here would be a count of 7 (2.88, 2.86, 2.53, 1.98, 1.21, 0.89, 0.11)

I have done this previously to identify the length of sequences strictly >2.78 using the following code. I need to build on this to provide lengths using 0 as the endpoint.

def get_Accel_lengths( array ) :
    s = ''.join( ['0' if i < 2.78 else '1' for i in resultsQ4['AccelInt']] )
    parts = s.split('0')
    return [len(p) for p in parts if len(p) > 0]
Q4Accel = get_Accel_lengths(resultsQ4['AccelInt'])
Q4Accel = pd.DataFrame(Q4Accel)
Q4Accel 

Using the above example, the result for this code would be 2 (2.88, 2.86)

like image 750
Jake Avatar asked Jul 10 '20 04:07

Jake


People also ask

How do you find the length of a sequence in Python?

Len() Method There is a built-in function called len() for getting the total number of items in a list, tuple, arrays, dictionary, etc. The len() method takes an argument where you may provide a list and it returns the length of the given list.

Which function returns the length of a sequence?

The built-in len function returns the length of a sequence.


2 Answers

Using itertools.dropwhile and takewhile:

l = [-1.1, -1, 0, 1.2, 1.8, 2, 2.88, 2.86, 2.53, 1.98, 1.21, 0.89, 0.11, -0.21]
list(takewhile(lambda x: x > 0, dropwhile(lambda x: x < 2.78, l)))

Output:

[2.88, 2.86, 2.53, 1.98, 1.21, 0.89, 0.11]

Or just to get len:

sum(1 for _ in takewhile(lambda x: x > 0, dropwhile(lambda x: x < 2.78,  l)))
# 7
like image 163
Chris Avatar answered Nov 27 '22 01:11

Chris


will this work if there are multiple times this occurs in the dataset? I want to identify each one.

Let's switch from takewhile and dropwhile to groupby with a global boolean flag to identify multiple sequences. I'm simply going to concatenate your data onto itself to simulate two sequences:

from itertools import groupby

def keyfunc(datum):
    global in_sequence

    if datum < 0:
        in_sequence = False
    elif datum > 2.78:
        in_sequence = True

    return in_sequence

data = [
    -1.1, -1, 0, 1.2, 1.8, 2, 2.88, 2.86,
    2.53, 1.98, 1.21, 0.89, 0.11, -0.21,
    -1.1, -1, 0, 1.2, 1.8, 2, 2.88, 2.86,
    2.53, 1.98, 1.21, 0.89, 0.11, -0.21,
]

sequences = []
in_sequence = False

for valid, sequence in groupby(data, keyfunc):
    if valid:
        sequences.append(list(sequence))

print(*sequences, sep='\n')
print(*map(len, sequences), sep='\n')

OUTPUT

> python3 test.py
[2.88, 2.86, 2.53, 1.98, 1.21, 0.89, 0.11]
[2.88, 2.86, 2.53, 1.98, 1.21, 0.89, 0.11]
7
7
> 

Is it possible to tighten it up though to only provide the len numbers as I want to then convert into a df and export to csv?

Perhaps something like this:

from itertools import groupby

data = [
    -1.1, -1, 0, 1.2, 1.8, 2, 2.88, 2.86,
    2.53, 1.98, 1.21, 0.89, 0.11, -0.21,
    -1.1, -1, 0, 1.2, 1.8, 2, 2.88, 2.86,
    2.53, 1.98, 1.21, 0.89, 0.11, -0.21,
]

def sequence_lengths(data):
    in_sequence = False

    def keyfunc(datum):
        nonlocal in_sequence

        if datum < 0:
            in_sequence = False
        elif datum > 2.78:
            in_sequence = True

        return in_sequence

    lengths = []

    for valid, sequence in groupby(data, keyfunc):
        if valid:
                lengths.append(len(list(sequence)))

    return lengths

print(sequence_lengths(data))
like image 31
cdlane Avatar answered Nov 27 '22 01:11

cdlane