I am using tabstat
in Stata, and using estpost
and esttab
to get its output to LaTeX. I have
tabstat
to display statistics by group. For example,
tabstat assets, by(industry) missing statistics(count mean sd p25 p50 p75)
The question I have is whether there is a way for tabstat
(or other Stata commands) to display the output ordered by the value of the mean, so that those categories that have higher means will be on top. By default, Stata displays by alphabetical order of industry
when I use tabstat
.
tabstat
does not offer such a hook, but there is an approach to problems like this that is general and quite easy to understand.
You don't provide a reproducible example, so we need one:
. sysuse auto, clear
(1978 Automobile Data)
. gen Make = word(make, 1)
. tab Make if foreign
Make | Freq. Percent Cum.
------------+-----------------------------------
Audi | 2 9.09 9.09
BMW | 1 4.55 13.64
Datsun | 4 18.18 31.82
Fiat | 1 4.55 36.36
Honda | 2 9.09 45.45
Mazda | 1 4.55 50.00
Peugeot | 1 4.55 54.55
Renault | 1 4.55 59.09
Subaru | 1 4.55 63.64
Toyota | 3 13.64 77.27
VW | 4 18.18 95.45
Volvo | 1 4.55 100.00
------------+-----------------------------------
Total | 22 100.00
Make
here is like your variable industry
: it is a string variable, so in tables Stata will tend to show it in alphabetical (alphanumeric) order.
The work-around has several easy steps, some optional.
Calculate a variable on which you want to sort. egen
is often useful here.
. egen mean_mpg = mean(mpg), by(Make)
Map those values to a variable with distinct integer values. As two groups could have the same mean (or other summary statistic), make sure you break ties on the original string variable.
. egen group = group(mean_mpg Make)
This variable is created to have value 1 for the group with the lowest mean (or other summary statistic), 2 for the next lowest, and so forth. If the opposite order is desired, as in this question, flip the grouping variable around.
. replace group = -group
(74 real changes made)
There is a problem with this new variable: the values of the original string variable, here Make
, are nowhere to be seen. labmask
(to be installed from the Stata Journal website after search labmask
) is a helper here. We use the values of the original string variable as the value labels of the new variable. (The idea is that the value labels become the "mask" that the integer variable wears.)
. labmask group, values(Make)
Optionally, work at the variable label of the new integer variable.
. label var group "Make"
Now we can tabulate using the categories of the new variable.
. tabstat mpg if foreign, s(mean) by(group) format(%2.1f)
Summary for variables: mpg
by categories of: group (Make)
group | mean
--------+----------
Subaru | 35.0
Mazda | 30.0
VW | 28.5
Honda | 26.5
Renault | 26.0
Datsun | 25.8
BMW | 25.0
Toyota | 22.3
Fiat | 21.0
Audi | 20.0
Volvo | 17.0
Peugeot | 14.0
--------+----------
Total | 24.8
-------------------
Note: other strategies are sometimes better or as good here.
If you collapse
your data to a new dataset, you can then sort
it as you please.
graph bar
and graph dot
are good at displaying summary statistics over groups, and the sort order can be tuned directly.
UPDATE 3 and 5 October 2021 A new helper command myaxis
from SSC and the Stata Journal (see [paper here) condenses the example here with tabstat
:
* set up data example
sysuse auto, clear
gen Make = word(make, 1)
* sort order variable and tabulation
myaxis Make2 = Make, sort(mean mpg) descending
tabstat mpg if foreign, s(mean) by(Make2) format(%2.1f)
I would look at the egenmore
package on SSC. You can get that package by typing in Stata ssc install egenmore
. In particular, I would look at the entry for axis()
in the helpfile of egenmore
. That contains an example that does exactly what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With