My dataset consists of daily (actually business days) timeseries for different companies from different industries and I work with PostgreSQL. I have an indicator variable in my dataset taking values 1, -1 and most of the times 0. For better readability of the question I refer to the case where the indicator variable is unequal to zero for a given company as indicator event.
If there is more than one indicator event for a given industry on a given day, the indicator variables of the respecting companies shall be updated to 0.
We can think of the following example dataset:
day company indicator industry
2012-01-12 A 1 financial
2012-01-12 B 1 consumer
2012-01-12 C 0 consumer
2012-01-13 A 0 financial
2012-01-13 B 1 consumer
2012-01-13 C 0 consumer
2012-01-16 A 1 financial
2012-01-16 B -1 consumer
2012-01-16 C 1 consumer
So the indicator values that shall be updated to zero are on 2012-01-16 the entries for companies B and C because they both come from the same industry and experienced the indicator event on the same day.
My idea was to work with the exists operator:
update mytable t1 set indicator = 0
where exists (
select 1
from mytable t2
where t2.day = t1.day
and t2.industry = t1.industry
and t2.indicator <> 0
and t1.indicator <> 0)
But somehow that updated all indicator values to 0, and I cannot figure out why.
Do you have any ideas how to fix this, or how to solve my problem with another approach?
You might want to add a condition to not join a row to itself (which will always be true), e.g.
update mytable t1 set indicator = 0
where exists (
select 1
from mytable t2
where t2.day = t1.day
and t1.company <> t2.company
and t2.industry = t1.industry
and t2.indicator <> 0
and t1.indicator <> 0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With