Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count appearances of a value until it changes to another value

I have the following DataFrame:

df = pd.DataFrame([10, 10, 23, 23, 9, 9, 9, 10, 10, 10, 10, 12], columns=['values'])

I want to calculate the frequency of each value, but not an overall count - the count of each value until it changes to another value.

I tried:

df['values'].value_counts()

but it gives me

10    6
9     3
23    2
12    1

The desired output is

10:2 
23:2
 9:3
10:4
12:1

How can I do this?

like image 573
Sascha Avatar asked Nov 29 '18 15:11

Sascha


4 Answers

You can keep track of where the changes in df['values'] occur, and groupby the changes and also df['values'] (to keep them as index) computing the size of each group

changes = df['values'].diff().ne(0).cumsum()
df.groupby([changes,'values']).size().reset_index(level=0, drop=True)

 values
10    2
23    2
9     3
10    4
12    1
dtype: int64
like image 139
yatu Avatar answered Nov 21 '22 00:11

yatu


Use:

df = df.groupby(df['values'].ne(df['values'].shift()).cumsum())['values'].value_counts()

Or:

df = df.groupby([df['values'].ne(df['values'].shift()).cumsum(), 'values']).size()

print (df)
values  values
1       10        2
2       23        2
3       9         3
4       10        4
5       12        1
Name: values, dtype: int64

Last for remove first level:

df = df.reset_index(level=0, drop=True)
print (df)
values
10    2
23    2
9     3
10    4
12    1
dtype: int64

Explanation:

Compare original column by shifted with not equal ne and then add cumsum for helper Series:

print (pd.concat([df['values'], a, b, c], 
                 keys=('orig','shifted', 'not_equal', 'cumsum'), axis=1))
    orig  shifted  not_equal  cumsum
0     10      NaN       True       1
1     10     10.0      False       1
2     23     10.0       True       2
3     23     23.0      False       2
4      9     23.0       True       3
5      9      9.0      False       3
6      9      9.0      False       3
7     10      9.0       True       4
8     10     10.0      False       4
9     10     10.0      False       4
10    10     10.0      False       4
11    12     10.0       True       5
like image 35
jezrael Avatar answered Nov 20 '22 23:11

jezrael


itertools.groupby

from itertools import groupby

pd.Series(*zip(*[[len([*v]), k] for k, v in groupby(df['values'])]))

10    2
23    2
9     3
10    4
12    1
dtype: int64

It's a generator

def f(x):
  count = 1
  for this, that in zip(x, x[1:]):
    if this == that:
      count += 1
    else:
      yield count, this
      count = 1
  yield count, [*x][-1]

pd.Series(*zip(*f(df['values'])))

10    2
23    2
9     3
10    4
12    1
dtype: int64
like image 5
piRSquared Avatar answered Nov 20 '22 22:11

piRSquared


Using crosstab

df['key']=df['values'].diff().ne(0).cumsum()
pd.crosstab(df['key'],df['values'])
Out[353]: 
values  9   10  12  23
key                   
1        0   2   0   0
2        0   0   0   2
3        3   0   0   0
4        0   4   0   0
5        0   0   1   0

Slightly modify the result above

pd.crosstab(df['key'],df['values']).stack().loc[lambda x:x.ne(0)]
Out[355]: 
key  values
1    10        2
2    23        2
3    9         3
4    10        4
5    12        1
dtype: int64

Base on python groupby

from itertools import groupby

[ (k,len(list(g))) for k,g in groupby(df['values'].tolist())]
Out[366]: [(10, 2), (23, 2), (9, 3), (10, 4), (12, 1)]
like image 4
BENY Avatar answered Nov 21 '22 00:11

BENY