I'm trying to get a 2 way table in R similar to this one from Stata. I was trying to use CrossTable
from gmodels
package, but the table is not the same. Do you known how can this be done in R?
I hope at least to get the frequencies from
when cursmoke1 == "Yes" & cursmoke2 == "No" and reversed
In R I'm only getting totals from yes, no and NA.
Here is the output:
Stata
. tabulate cursmoke1 cursmoke2, cell column miss row
+-------------------+
| Key |
|-------------------|
| frequency |
| row percentage |
| column percentage |
| cell percentage |
+-------------------+
Current |
smoker, | Current smoker, exam 2
exam 1 | No Yes . | Total
-----------+---------------------------------+----------
No | 1,898 131 224 | 2,253
| 84.24 5.81 9.94 | 100.00
| 86.16 7.59 44.44 | 50.81
| 42.81 2.95 5.05 | 50.81
-----------+---------------------------------+----------
Yes | 305 1,596 280 | 2,181
| 13.98 73.18 12.84 | 100.00
| 13.84 92.41 55.56 | 49.19
| 6.88 35.99 6.31 | 49.19
-----------+---------------------------------+----------
Total | 2,203 1,727 504 | 4,434
| 49.68 38.95 11.37 | 100.00
| 100.00 100.00 100.00 | 100.00
| 49.68 38.95 11.37 | 100.00
R
> CrossTable(cursmoke2, cursmoke1, missing.include = T, format="SAS")
Cell Contents
|-------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
| N / Col Total |
| N / Table Total |
|-------------------------|
Total Observations in Table: 4434
| cursmoke1
cursmoke2 | No | Yes | NA | Row Total |
-------------|-----------|-----------|-----------|-----------|
No | 2203 | 0 | 0 | 2203 |
| 1122.544 | 858.047 | 250.409 | |
| 1.000 | 0.000 | 0.000 | 0.497 |
| 1.000 | 0.000 | 0.000 | |
| 0.497 | 0.000 | 0.000 | |
-------------|-----------|-----------|-----------|-----------|
Yes | 0 | 1727 | 0 | 1727 |
| 858.047 | 1652.650 | 196.303 | |
| 0.000 | 1.000 | 0.000 | 0.389 |
| 0.000 | 1.000 | 0.000 | |
| 0.000 | 0.389 | 0.000 | |
-------------|-----------|-----------|-----------|-----------|
NA | 0 | 0 | 504 | 504 |
| 250.409 | 196.303 | 3483.288 | |
| 0.000 | 0.000 | 1.000 | 0.114 |
| 0.000 | 0.000 | 1.000 | |
| 0.000 | 0.000 | 0.114 | |
-------------|-----------|-----------|-----------|-----------|
Column Total | 2203 | 1727 | 504 | 4434 |
| 0.497 | 0.389 | 0.114 | |
-------------|-----------|-----------|-----------|-----------|
Maybe I'm missing something here. The default settings for CrossTable
seem to provide essentially what you are looking for.
Here is CrossTable
with minimal arguments. (I've loaded the dataset as "temp".) Note that the results are the same as what you posted from the Stata output (you just need to multiply by 100 if you want the result as a percentage).
library(gmodels)
with(temp, CrossTable(cursmoke1, cursmoke2, missing.include=TRUE))
Cell Contents
|-------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
| N / Col Total |
| N / Table Total |
|-------------------------|
Total Observations in Table: 4434
| cursmoke2
cursmoke1 | No | Yes | NA | Row Total |
-------------|-----------|-----------|-----------|-----------|
No | 1898 | 131 | 224 | 2253 |
| 541.582 | 635.078 | 4.022 | |
| 0.842 | 0.058 | 0.099 | 0.508 |
| 0.862 | 0.076 | 0.444 | |
| 0.428 | 0.030 | 0.051 | |
-------------|-----------|-----------|-----------|-----------|
Yes | 305 | 1596 | 280 | 2181 |
| 559.461 | 656.043 | 4.154 | |
| 0.140 | 0.732 | 0.128 | 0.492 |
| 0.138 | 0.924 | 0.556 | |
| 0.069 | 0.360 | 0.063 | |
-------------|-----------|-----------|-----------|-----------|
Column Total | 2203 | 1727 | 504 | 4434 |
| 0.497 | 0.389 | 0.114 | |
-------------|-----------|-----------|-----------|-----------|
Alternatively, you can use format="SPSS"
if you want the numbers displayed as percentages.
with(temp, CrossTable(cursmoke1, cursmoke2, missing.include=TRUE, format="SPSS"))
Cell Contents
|-------------------------|
| Count |
| Chi-square contribution |
| Row Percent |
| Column Percent |
| Total Percent |
|-------------------------|
Total Observations in Table: 4434
| cursmoke2
cursmoke1 | No | Yes | NA | Row Total |
-------------|-----------|-----------|-----------|-----------|
No | 1898 | 131 | 224 | 2253 |
| 541.582 | 635.078 | 4.022 | |
| 84.243% | 5.814% | 9.942% | 50.812% |
| 86.155% | 7.585% | 44.444% | |
| 42.806% | 2.954% | 5.052% | |
-------------|-----------|-----------|-----------|-----------|
Yes | 305 | 1596 | 280 | 2181 |
| 559.461 | 656.043 | 4.154 | |
| 13.984% | 73.177% | 12.838% | 49.188% |
| 13.845% | 92.415% | 55.556% | |
| 6.879% | 35.995% | 6.315% | |
-------------|-----------|-----------|-----------|-----------|
Column Total | 2203 | 1727 | 504 | 4434 |
| 49.684% | 38.949% | 11.367% | |
-------------|-----------|-----------|-----------|-----------|
prop.table()
Just FYI (to save you the tedious work you did in making your own data.frame
as you did), you may also be interested in the prop.table()
function.
Again, using the data you linked to and assuming it is named "temp", the following gives you the underlying data from which you can construct your data.frame
. You may also be interested in looking into the functions margin.table()
or addmargins()
:
## Your basic table
CurSmoke <- with(temp, table(cursmoke1, cursmoke2, useNA = "ifany"))
CurSmoke
# cursmoke2
# cursmoke1 No Yes <NA>
# No 1898 131 224
# Yes 305 1596 280
## Row proportions
prop.table(CurSmoke, 1) # * 100 # If you so desire
# cursmoke2
# cursmoke1 No Yes <NA>
# No 0.84243231 0.05814470 0.09942299
# Yes 0.13984411 0.73177442 0.12838148
## Column proportions
prop.table(CurSmoke, 2) # * 100 # If you so desire
# cursmoke2
# cursmoke1 No Yes <NA>
# No 0.86155243 0.07585408 0.44444444
# Yes 0.13844757 0.92414592 0.55555556
## Cell proportions
prop.table(CurSmoke) # * 100 # If you so desire
# cursmoke2
# cursmoke1 No Yes <NA>
# No 0.42805593 0.02954443 0.05051872
# Yes 0.06878665 0.35994587 0.06314840
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With