Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stata tabstat change order/sort?

I am using tabstat in Stata, and using estpost and esttab to get its output to LaTeX. I have

tabstat

to display statistics by group. For example,

tabstat assets, by(industry) missing statistics(count mean sd p25 p50 p75) 

The question I have is whether there is a way for tabstat (or other Stata commands) to display the output ordered by the value of the mean, so that those categories that have higher means will be on top. By default, Stata displays by alphabetical order of industry when I use tabstat.

like image 856
rajvijay Avatar asked Dec 20 '22 11:12

rajvijay


2 Answers

tabstat does not offer such a hook, but there is an approach to problems like this that is general and quite easy to understand.

You don't provide a reproducible example, so we need one:

. sysuse auto, clear
(1978 Automobile Data)

. gen Make = word(make, 1)

. tab Make if foreign

       Make |      Freq.     Percent        Cum.
------------+-----------------------------------
       Audi |          2        9.09        9.09
        BMW |          1        4.55       13.64
     Datsun |          4       18.18       31.82
       Fiat |          1        4.55       36.36
      Honda |          2        9.09       45.45
      Mazda |          1        4.55       50.00
    Peugeot |          1        4.55       54.55
    Renault |          1        4.55       59.09
     Subaru |          1        4.55       63.64
     Toyota |          3       13.64       77.27
         VW |          4       18.18       95.45
      Volvo |          1        4.55      100.00
------------+-----------------------------------
      Total |         22      100.00

Make here is like your variable industry: it is a string variable, so in tables Stata will tend to show it in alphabetical (alphanumeric) order.

The work-around has several easy steps, some optional.

Calculate a variable on which you want to sort. egen is often useful here.

 . egen mean_mpg = mean(mpg), by(Make)

Map those values to a variable with distinct integer values. As two groups could have the same mean (or other summary statistic), make sure you break ties on the original string variable.

 . egen group = group(mean_mpg Make)

This variable is created to have value 1 for the group with the lowest mean (or other summary statistic), 2 for the next lowest, and so forth. If the opposite order is desired, as in this question, flip the grouping variable around.

 . replace group = -group
 (74 real changes made)

There is a problem with this new variable: the values of the original string variable, here Make, are nowhere to be seen. labmask (to be installed from the Stata Journal website after search labmask) is a helper here. We use the values of the original string variable as the value labels of the new variable. (The idea is that the value labels become the "mask" that the integer variable wears.)

 . labmask group, values(Make)

Optionally, work at the variable label of the new integer variable.

 . label var group "Make"

Now we can tabulate using the categories of the new variable.

 . tabstat mpg if foreign, s(mean) by(group) format(%2.1f)

 Summary for variables: mpg
 by categories of: group (Make)

   group |      mean
 --------+----------
  Subaru |      35.0
   Mazda |      30.0
      VW |      28.5
   Honda |      26.5
 Renault |      26.0
  Datsun |      25.8
     BMW |      25.0
  Toyota |      22.3
    Fiat |      21.0
    Audi |      20.0
   Volvo |      17.0
 Peugeot |      14.0
 --------+----------
   Total |      24.8
 -------------------

Note: other strategies are sometimes better or as good here.

  • If you collapse your data to a new dataset, you can then sort it as you please.

  • graph bar and graph dot are good at displaying summary statistics over groups, and the sort order can be tuned directly.

UPDATE 3 and 5 October 2021 A new helper command myaxis from SSC and the Stata Journal (see [paper here) condenses the example here with tabstat:

* set up data example 
sysuse auto, clear
gen Make = word(make, 1)

* sort order variable and tabulation 
myaxis Make2 = Make, sort(mean mpg) descending 
tabstat mpg if foreign, s(mean) by(Make2) format(%2.1f)
like image 57
Nick Cox Avatar answered Dec 31 '22 13:12

Nick Cox


I would look at the egenmore package on SSC. You can get that package by typing in Stata ssc install egenmore. In particular, I would look at the entry for axis() in the helpfile of egenmore. That contains an example that does exactly what you want.

like image 24
Maarten Buis Avatar answered Dec 31 '22 14:12

Maarten Buis