Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect sign change of values of a column of a pandas dataframe using numpy or pandas?

I want to detect sign change of my data using either pandas or numpy. I want to count the number(s) of id which changes sign of y between two immediate TIMESTEP values (eg. for 2800 and 2900 TIMESTEPs, id 313 has changed sign (y becomes negative). I have tried the below code by counting negatives and then using drop duplicate but that again not efficient and correct.

df_negatives0 = df0.query('y < 0')
df_nonduplicate0=df_negatives0.drop_duplicates(subset=["id"])

My dataset:

TIMESTEP    id  mass    y
0       42  0.755047    0.489375
0       245 0.723805    0.479446
0       344 0.675664    0.463363
...
...
2800    313 0.795699    0.00492984
2800    425 0.68311     0.282356
2900    42  0.755047    0.424421
2900    245 0.723805    0.0378489
2900    344 0.675664    0.127917
2900    313 0.795699    -0.0149792
2900    425 0.68311     0.273884
...
...
upto
10000000

My desired data:

TIMESTEP    id_count    mass
2900        1           0.795699
...
...
500000      2           0.85245 + 0.54852 (i want to sum the masses if id count is more than one)
...
...
upto
10000000
like image 419
Sameer S Avatar asked Dec 18 '25 07:12

Sameer S


1 Answers

There is a dedicated function in numpy np.sign (thanks to @Asclepius for flagging an error in previous version of this answer) and for sign change from one row to the next, it's possible to use the .diff method:

from numpy import sign
from pandas import DataFrame

df = DataFrame([-2, 0, -1, 3, -2], columns=["x"])

# this will return the sign of the float x
df["sign"] = sign(df["x"])

# this will return the difference between the sign of two consecutive rows
print(df["sign"].diff())
# 0    NaN
# 1    1.0
# 2   -1.0
# 3    2.0
# 4   -2.0
like image 98
SultanOrazbayev Avatar answered Dec 20 '25 21:12

SultanOrazbayev