Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between gen and egen in Stata 12?

Tags:

stata

Is there a reason why there are two different commands to generate a new variable?

Is there a simple way to remember when to use gen and when to use egen?

like image 462
max Avatar asked Oct 20 '12 23:10

max


2 Answers

They both create a new variable, but work with different sets of functions. You will typically use gen when you have simple transformations of other variables in your dataset like

gen newvar = oldvar1^2 * oldvar2

In my workflow, egen usually appears when I need functions that work across all observations, like in

egen max_var = max(var)

or more complex instructions

egen newvar = rowmax(oldvar1 oldvar2)

to calculate the maximum for each observation between oldvar1 and oldvar2. I don't think there is a clear logic for separating the two commands.

like image 126
griverorz Avatar answered Oct 19 '22 01:10

griverorz


gen

generate may be abbreviated by gen or even g and can be used with the following mathematical operators and functions:

  • + addition
  • - subtraction
  • * multiplication
  • / division
  • ^ power

A large number of functions is available. Here are some examples:

  • abs(x) absolute value of x
  • exp(x) antilog of x
  • int(x) or trunc(x) truncation to integer value
  • ln(x), log(x) natural logarithm of x
  • round(x) rounds to the nearest integer of x
  • round(x,y) x rounded in units of y (i.e., round(x,.1) rounds to one decimal place)
  • sqrt(x)square root of x
  • runiform() returns uniformly distributed numbers between 0 and nearly 1
  • rnormal() returns numbers that follow a standard normal distribution
  • rnormal(x,y) returns numbers that follow a normal distribution with a mean of x and a s.d. of y

egen

A number of more complex possibilities have been implemented in the egen command like in the following examples:

  • egen nkids = anycount(pers1 pers2 pers3 pers4 pers5), value(1)
  • egen v323r = rank(v323)
  • egen myindex = rowmean(var15 var17 var18 var20 var23)
  • egen nmiss = rowmiss(x1-x10 var15-var23)
  • egen nmiss = rowtotal(x1-x10 var15-var23)
  • egen incomst = std(income)
  • bysort v3: egen mincome = mean(income)

Detailed usage explanations can be found at this link.

like image 1
GorkemHalulu Avatar answered Oct 19 '22 01:10

GorkemHalulu