Edit -- This question was originally titled << Long to wide data reshaping in R >>
I'm just learning R and trying to find ways to apply it to help out others in my life. As a test case, I'm working on reshaping some data, and I'm having trouble following the examples I've found online. What I'm starting with looks like this:
ID Obs 1 Obs 2 Obs 3 1 43 48 37 1 27 29 22 1 36 32 40 2 33 38 36 2 29 32 27 2 32 31 35 2 25 28 24 3 45 47 42 3 38 40 36
And what I want to end up with will look like this:
ID Obs 1 mean Obs 1 std dev Obs 2 mean Obs 2 std dev 1 x x x x 2 x x x x 3 x x x x
And so forth. What I'm unsure of is whether I need additional information in my long-form data, or what. I imagine that the math part (finding the mean and standard deviations) will be the easy part, but I haven't been able to find a way that seems to work to reshape the data correctly to start in on that process.
Thanks very much for any help.
Step 1: Find the mean. Step 2: For each data point, find the square of its distance to the mean. Step 3: Sum the values from Step 2. Step 4: Divide by the number of data points.
This is an aggregation problem, not a reshaping problem as the question originally suggested -- we wish to aggregate each column into a mean and standard deviation by ID. There are many packages that handle such problems. In the base of R it can be done using aggregate
like this (assuming DF
is the input data frame):
ag <- aggregate(. ~ ID, DF, function(x) c(mean = mean(x), sd = sd(x)))
Note 1: A commenter pointed out that ag
is a data frame for which some columns are matrices. Although initially that may seem strange, in fact it simplifies access. ag
has the same number of columns as the input DF
. Its first column ag[[1]]
is ID
and the ith column of the remainder ag[[i+1]]
(or equivalanetly ag[-1][[i]]
) is the matrix of statistics for the ith input observation column. If one wishes to access the jth statistic of the ith observation it is therefore ag[[i+1]][, j]
which can also be written as ag[-1][[i]][, j]
.
On the other hand, suppose there are k
statistic columns for each observation in the input (where k=2 in the question). Then if we flatten the output then to access the jth statistic of the ith observation column we must use the more complex ag[[k*(i-1)+j+1]]
or equivalently ag[-1][[k*(i-1)+j]]
.
For example, compare the simplicity of the first expression vs. the second:
ag[-1][[2]] ## mean sd ## [1,] 36.333 10.2144 ## [2,] 32.250 4.1932 ## [3,] 43.500 4.9497 ag_flat <- do.call("data.frame", ag) # flatten ag_flat[-1][, 2 * (2-1) + 1:2] ## Obs_2.mean Obs_2.sd ## 1 36.333 10.2144 ## 2 32.250 4.1932 ## 3 43.500 4.9497
Note 2: The input in reproducible form is:
Lines <- "ID Obs_1 Obs_2 Obs_3 1 43 48 37 1 27 29 22 1 36 32 40 2 33 38 36 2 29 32 27 2 32 31 35 2 25 28 24 3 45 47 42 3 38 40 36" DF <- read.table(text = Lines, header = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With