Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using if qualifier with egen in Stata

Tags:

stata

I'm using Stata, and I'm trying to compute the average price of firms' rivals in a market. I have data that looks like:

Market    Firm   Price
----------------------
1         1      100
1         2      150
1         3      125
2         1      50
2         2      100
2         3      75
3         1      100
3         2      200
3         3      200

And I'm trying to compute the average price of each firm's rivals, so I want to generate a new field that is the average values of the other firms in a market. It would look like:

Market    Firm   Price    AvRivalPrice
------------------------------------
1         1      100      137.2
1         2      150      112.5
1         3      125      125
2         1      50       87.5
2         2      100      62.5
2         3      75       75
3         1      100      200
3         2      200      150
3         3      200      150

To do the average by group, I could use the egen command:

egen AvPrice = mean(price), by(Market)

But that wouldn't exclude the firm's own price in the average, and to the best of my knowledge, using the if qualifier would only change the observations it operated on, not the groups it averaged over. Is there a simple way to do this, or do I need to create loops and generate each average manually?

like image 255
prototoast Avatar asked Mar 06 '12 04:03

prototoast


3 Answers

This is an old thread still of interest, so materials and techniques overlooked first time round still apply.

The more general technique is to work with totals. At its simplest, total of others = total of all - this value. In a egen framework that is going to look like

egen total = total(price), by(market) 
egen n = total(!missing(price)), by(market) 
gen avprice = (total - cond(missing(price), 0, price)) / cond(missing(price), n, n - 1) 

The total() function of egen ignores missing values in its argument. If there are missing values, we don't want to include them in the count, but we can use !missing() which yields 1 if not missing and 0 if missing. egen's count() is another way to do this.

Code given earlier gives the wrong answer if missings are present as they are included in the count _N.

Even if a value is missing, the average of the other values still makes sense.

If no value is missing, the last line above simplifies to

gen avprice = (total - price) / (n - 1) 

So far, this possibly looks like no more than a small variant on previous code, but it does extend easily to using weights. Presumably we want a weighted average of others' prices given some weight. We can exploit the fact that total() works on expressions, which can be more complicated than just variable names. Indeed the code above did that already, but it is often overlooked.

egen wttotal = total(weight * price), by(market) 
egen sumwt = total(weight), by(market) 
gen avprice = (wttotal - price * weight) / (sumwt - weight) 

As before, if price or weight is ever missing, you need more complicated code, or just to ensure that you exclude such observations from the calculations.

See also the Stata FAQ

How do I create variables summarizing for each individual properties of the other members of a group?

http://www.stata.com/support/faqs/data-management/creating-variables-recording-properties/

for a wider-ranging discussion.

(If the numbers get big, work with doubles.)

EDIT 2 March 2018 That was a newer post in an old thread, which in turn needs updating. rangestat (SSC) can be used here and gives one-line solutions. Not surprisingly, the option excludeself was explicitly added for these kinds of problem. But while the solution for means is easy using an identity

mean for others = (total - value for self) / (count - 1)

many other summary measures don't yield to a similar, simple trick and in that sense rangestat includes much more general coding.

clear
input Market    Firm   Price
1         1      100
1         2      150
1         3      125
2         1      50
2         2      100
2         3      75
3         1      100
3         2      200
3         3      200
end 

rangestat (mean) Price, interval(Firm . .) by(Market) excludeself 

list, sepby(Market) 

     +----------------------------------+
     | Market   Firm   Price   Price_~n |
     |----------------------------------|
  1. |      1      1     100      137.5 |
  2. |      1      2     150      112.5 |
  3. |      1      3     125        125 |
     |----------------------------------|
  4. |      2      1      50       87.5 |
  5. |      2      2     100       62.5 |
  6. |      2      3      75         75 |
     |----------------------------------|
  7. |      3      1     100        200 |
  8. |      3      2     200        150 |
  9. |      3      3     200        150 |
     +----------------------------------+
like image 118
Nick Cox Avatar answered Oct 28 '22 09:10

Nick Cox


This is a way that avoids explicit loops, though it takes several lines of code:

by Market: egen Total = total(Price)
replace Total = Total - Price
by Market: gen AvRivalPrice = Total / (_N-1)
drop Total
like image 45
onestop Avatar answered Oct 28 '22 08:10

onestop


Here's a shorter solution with fewer lines that kind of combines your original thought and @onestop's solution:

      egen AvPrice = mean(price), by(Market)
      bysort Market: replace AvPrice = (AvPrice*_N - price)/(_N-1)

This is all good for a census of firms. If you have a sample of the firms, and you need to apply the weights, I am not sure what a good solution would be. We can brainstorm it if needed.

like image 38
StasK Avatar answered Oct 28 '22 08:10

StasK