R Language: How do I print / see summary statistics for sample subset?

Tags:

These are some newbie questions about statistical programming for R for which I haven't been able to find an answer online. My dataframe is labeled "eitc" in the code below.

1) Once I've loaded in a data frame, I would like to look at summary statistics. I've used the functions:

eitc <- read.dta(file="/Users/Documents/eitc.dta")
summary(eitc)
sapply(eitc,mean,na.rm=TRUE) #for sample mean, min, max, etc.

How do I find summary statistics on my dataframe when certain qualifications are met. For example, I would like to see the summary statistics on all variables when the variable "children" is greater than or equal to 1. The equivalent Stata code is:

summarize if children >= 1

2) Similarly, how do I find specific parameters when certain qualifications are met? For example, I want to find the mean of the variable "work" when both "post93" variable is equal to zero and "anykids" variable is equal to 1. The equivalent Stata code is:

mean work if post93==0 & anykids==1

3) Ideally, when I run the summary statistics above, I would like to find out how many observations were included in the calculation / fit the criteria.

4) When I read in my data frame, it would also be nice to see how many observations are included in the data set (and perhaps how many rows have missing values or "NA" in them).

5) Also, I have been creating dummy variables using the following code. Is this the correct way to do it or is there a more efficient route?

post93.dummy <- as.numeric(eitc$year>1993)
eitc=cbind(eitc,post93.dummy)

719

asked Jan 29 '11 08:01

baha-kev

3 Answers

A lot of your requirements are answered by subset, e.g.

summary(subset(eitc, post93 == 0 & anykids == 1, select=work))
nrow(subset(eitc, post93 == 0 & anykids == 1, select=work)) # for number of obs.

The ?subset documentation has good examples.

The cbind method of attaching dummy variables is unneccesary. Just do:

eitc$post93.dummy <- as.numeric(eitc$year>1993)

135

answered Sep 28 '22 08:09

Michael Dunn

I'll use mtcars data available in datasets package. See ?mtcars.

Ad 1. You can see the summary of mtcars when gear is greater than 3:

summary(mtcars[mtcars$gear > 3, ])
## or by using Tukey's five number summary
sapply(mtcars[mtcars$gear > 3, ], fivenum)

Ad 2. Use with:

with(mtcars, mean(hp[gear > 3 & mpg > 20]))

Ad 3. Ibid (but use length):

with(mtcars, length(hp[gear > 3 & mpg > 20]))
## or
sapply(mtcars[mtcars$gear > 3, ], length) ## which is trivial when there are no NA's
sapply(mtcars[mtcars$gear > 3, ], length, na.rm = TRUE) ## but this one's good when there are NA's
nrow(mtcars[mtcars$gear > 3, ])

Ad 4. See previous, but to find out

how many rows have missing values or "NA" in them

do something like this:

apply(dtf, 1, function(x) length(is.na(x)))

Ad 5. This is not a dummy variable, this is some kind of subset of original data, columnwise concatenated. What are you trying to achieve anyway?

Please be concise. One question per question, please!

answered Sep 28 '22 08:09

aL3xa

I would recomend you look at the plyr package for generating summaries. Here's some quick code (not run);

#Generate a new factor based on the numeric value of children with 5 levels
eitc$childfac<-cut(eitc$children,5)

# Generate mean and sd of the variables foo and bar based on that factor
ddply(eitc, .(childfac), function(df) {
  return(data.frame(meanfoo=mean(df$foo), sdfoo=stdev(df$foo),
    meanbar=mean(df$bar), sdbar=stdev(df$bar))
  })

You might also want to look at the hmisc and psych packages for more descriptive stat routines. (Check out Quick-R for more info)

answered Sep 28 '22 07:09

PaulHurleyuk

Related questions
                            
                                Change colors in ggpairs now that params is deprecated
                            
                                Association rule in R - removing redundant rule (arules)
                            
                                Obtain vertices of the ellipse on an ellipse covariance plot (created by `car::ellipse`)
                            
                                R (Ubuntu) - Can't install packages "readr" and "eurostat"
                            
                                Align grid() to plot ticks
                            
                                Mark a function as deprecated in customised R package [closed]
                            
                                Using dplyr filter() in programming
                            
                                R apply function returns numeric value on date variables
                            
                                write.xlsx function not working
                            
                                Scale only certain columns R [closed]
                            
                                How to do operations on list columns in an R data.table to output another list column?
                            
                                Plotly does not show lines
                            
                                How to add notes to a ggplot
                            
                                How to order rows by conditions in other columns in r?
                            
                                How to Superimpose Multiple Density Curves Into One Plot in R
                            
                                Operate on pairs of rows of a data frame
                            
                                Transform a 3D array into a matrix in R
                            
                                how to show $\{ X_t \}$ in the title of a plot of R
                            
                                Construct dynamic-sized array in R
                            
                                Change Dendrogram leaves

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R Language: How do I print / see summary statistics for sample subset?

Tags:

r

statistics

stata