i have below dataframe that have columns 0-1 .. and i wanna count the number of 0->1,1->0 every column. in below dataframe 'a' column state change number is 6, 'b' state change number is 3 , 'c' state change number is 2 .. actually i don't know how code in pandas.
number a b c
1 0 0 0
2 1 0 1
3 0 1 1
4 1 1 1
5 0 0 0
6 1 0 0
7 0 1 0
actually i don't have idea in pandas.. because recently used only r. but now i must use python pandas. so have little bit in difficult situation anybody can help ? thanks in advance !
Use rolling
and compare each value, then count all True
values by sum
:
df = df[['a','b','c']].rolling(2).apply(lambda x: x[0] != x[-1], raw=True).sum().astype(int)
a 6
b 3
c 2
dtype: int64
xor
(^
)Use the Numpy array df.values
and compare the shifted elements with ^
This is meant to be a fast solution.
Xor has the property that only one of the two items being operated on can be true as shown in this truth table
A B XOR
T T F
T F T
F T T
F F F
And replicated in 0
/1
form
a = np.array([1, 1, 0, 0])
b = np.array([1, 0, 1, 0])
pd.DataFrame(dict(A=a, B=b, XOR=a ^ b))
A B XOR
0 1 1 0
1 1 0 1
2 0 1 1
3 0 0 0
v = df.values
pd.Series((v[1:] ^ v[:-1]).sum(0), df.columns)
a 6
b 3
c 2
dtype: int64
Open in Colab
Open in GitHub
def pir_xor(df):
v = df.values
return pd.Series((v[1:] ^ v[:-1]).sum(0), df.columns)
def pir_diff1(df):
v = df.values
return pd.Series(np.abs(np.diff(v, axis=0)).sum(0), df.columns)
def pir_diff2(df):
v = df.values
return pd.Series(np.diff(v.astype(np.bool), axis=0).sum(0), df.columns)
def cold(df):
return df.ne(df.shift(-1)).sum(0) - 1
def jez(df):
return df.rolling(2).apply(lambda x: x[0] != x[-1]).sum().astype(int)
def naga(df):
return df.diff().abs().sum().astype(int)
np.random.seed([3, 1415])
idx = [10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000, 300000]
col = 'pir_xor pir_diff1 pir_diff2 cold jez naga'.split()
res = pd.DataFrame(np.nan, idx, col)
for i in idx:
df = pd.DataFrame(np.random.choice([0, 1], size=(i, 3)), columns=[*'abc'])
for j in col:
stmt = f"{j}(df)"
setp = f"from __main__ import {j}, df"
res.at[i, j] = timeit(stmt, setp, number=100)
res.div(res.min(1), 0)
pir_xor pir_diff1 pir_diff2 cold jez naga
10 1.06203 1.119769 1.000000 21.217555 16.768532 6.601518
30 1.00000 1.075406 1.115743 23.229013 18.844025 7.212369
100 1.00000 1.134082 1.174973 22.673289 21.478068 7.519898
300 1.00000 1.119153 1.166782 21.725495 26.293712 7.215490
1000 1.00000 1.106267 1.167786 18.394462 37.925160 6.284253
3000 1.00000 1.118554 1.342192 16.053097 64.953310 5.594610
10000 1.00000 1.163557 1.511631 12.008129 106.466636 4.503359
30000 1.00000 1.249835 1.431120 7.826387 118.380227 3.621455
100000 1.00000 1.275272 1.528840 6.690012 131.912349 3.150155
300000 1.00000 1.279373 1.528238 6.301007 140.667427 3.190868
res.plot(loglog=True, figsize=(15, 8))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With