Workflow for statistical analysis and report writing

People also ask

What are the 5 basic methods of statistical analysis?

For this analysis, there are five to choose from: mean, standard deviation, regression, hypothesis testing, and sample size determination.

What are the format of writing statistical report?

Statistical reports typically are typed single-spaced, using a font such as Arial or Times New Roman in 12-point size. If you have an assignment sheet that describes the formatting requirements, follow those exactly. You typically want to have 1-inch margins around all sides of your report.

What should a statistical analysis report include?

A good outline is: 1) overview of the problem, 2) your data and modeling approach, 3) the results of your data analysis (plots, numbers, etc), and 4) your substantive conclusions. Describe the problem. What substantive question are you trying to address? This needn't be long, but it should be clear.

I generally break my projects into 4 pieces:

load.R
clean.R
func.R
do.R

load.R: Takes care of loading in all the data required. Typically this is a short file, reading in data from files, URLs and/or ODBC. Depending on the project at this point I'll either write out the workspace using save() or just keep things in memory for the next step.

clean.R: This is where all the ugly stuff lives - taking care of missing values, merging data frames, handling outliers.

func.R: Contains all of the functions needed to perform the actual analysis. source()'ing this file should have no side effects other than loading up the function definitions. This means that you can modify this file and reload it without having to go back an repeat steps 1 & 2 which can take a long time to run for large data sets.

do.R: Calls the functions defined in func.R to perform the analysis and produce charts and tables.

The main motivation for this set up is for working with large data whereby you don't want to have to reload the data each time you make a change to a subsequent step. Also, keeping my code compartmentalized like this means I can come back to a long forgotten project and quickly read load.R and work out what data I need to update, and then look at do.R to work out what analysis was performed.

If you'd like to see some examples, I have a few small (and not so small) data cleaning and analysis projects available online. In most, you'll find a script to download the data, one to clean it up, and a few to do exploration and analysis:

Baby names from the social security administration
30+ years of fuel economy data from the EPI
A big collection of data about the housing crisis
Movie ratings from the IMDB
House sale data in the Bay Area

Recently I have started numbering the scripts, so it's completely obvious in which order they should be run. (If I'm feeling really fancy I'll sometimes make it so that the exploration script will call the cleaning script which in turn calls the download script, each doing the minimal work necessary - usually by checking for the presence of output files with file.exists. However, most times this seems like overkill).

I use git for all my projects (a source code management system) so its easy to collaborate with others, see what is changing and easily roll back to previous versions.

If I do a formal report, I usually keep R and latex separate, but I always make sure that I can source my R code to produce all the code and output that I need for the report. For the sorts of reports that I do, I find this easier and cleaner than working with latex.

I agree with the other responders: Sweave is excellent for report writing with R. And rebuilding the report with updated results is as simple as re-calling the Sweave function. It's completely self-contained, including all the analysis, data, etc. And you can version control the whole file.

I use the StatET plugin for Eclipse for developing the reports, and Sweave is integrated (Eclipse recognizes latex formating, etc). On Windows, it's easy to use MikTEX.

I would also add, that you can create beautiful reports with Beamer. Creating a normal report is just as simple. I included an example below that pulls data from Yahoo! and creates a chart and a table (using quantmod). You can build this report like so:

Sweave(file = "test.Rnw")

Here's the Beamer document itself:

% 
\documentclass[compress]{beamer}
\usepackage{Sweave}
\usetheme{PaloAlto} 
\begin{document}

\title{test report}
\author{john doe}
\date{September 3, 2009} 

\maketitle

\begin{frame}[fragile]\frametitle{Page 1: chart}

<<echo=FALSE,fig=TRUE,height=4, width=7>>=
library(quantmod)
getSymbols("PFE", from="2009-06-01")
chartSeries(PFE)
@

\end{frame}


\begin{frame}[fragile]\frametitle{Page 2: table}

<<echo=FALSE,results=tex>>=
library(xtable)
xtable(PFE[1:10,1:4], caption = "PFE")
@

\end{frame}

\end{document}

I just wanted to add, in case anyone missed it, that there's a great post on the learnr blog about creating repetitive reports with Jeffrey Horner's brew package. Matt and Kevin both mentioned brew above. I haven't actually used it much myself.

The entries follows a nice workflow, so it's well worth a read:

Prepare the data.
Prepare the report template.
Produce the report.

Actually producing the report once the first two steps are complete is very simple:

library(tools)
library(brew)
brew("population.brew", "population.tex")
texi2dvi("population.tex", pdf = TRUE)

Related questions
                            
                                How to count TRUE values in a logical vector
                            
                                Importing data from a JSON file into R [duplicate]
                            
                                How to remove all whitespace from a string?
                            
                                Fastest way to find second (third...) highest/lowest value in vector or column
                            
                                Problems installing the devtools package
                            
                                Difference between R MarkDown and R NoteBook
                            
                                Order data frame rows according to vector with specific order
                            
                                Append value to empty vector in R?
                            
                                What does .SD stand for in data.table in R
                            
                                Call apply-like function on each row of dataframe with multiple arguments from each row
                            
                                Prevent row names to be written to file when using write.csv
                            
                                How to find common elements from multiple vectors?
                            
                                Annotating text on individual facet in ggplot2
                            
                                For each row in an R dataframe
                            
                                Convert row names into first column
                            
                                How to combine multiple conditions to subset a data-frame using "OR"?
                            
                                Show percent % instead of counts in charts of categorical variables
                            
                                How to split data into training/testing sets using sample function
                            
                                Remove duplicated rows
                            
                                Capitalize the first letter of both words in a two word string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Workflow for statistical analysis and report writing

Tags:

r

data-visualization

statistics

People also ask

Recent Activity

Donate For Us