Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining all data in a data frame per column and groups in R

Tags:

loops

for-loop

r

I have this dataset, which is composed by 3 columns and 5 observations:

sex <- c("M", "M", "F", "F", "F")
var1 <- c(1, 2, 3, 4, 5)
var2 <- c(6, 7, 8, 9, 10)

data <- data.frame(sex, var1, var2)
print(data)

   sex var1 var2
1   M   1   6
2   M   2   7
3   F   3   8
4   F   4   9
5   F   5   10

I would like to divide each male (M) by each female (F) in every column.

In this example, which is very simple, I would like to get for var1 a vector of 1/3, 1/4, 1/5, 2/3, 2/4 and 2/5.

For var2, the vector would be 6/8, 6/9, 6/10, 7/8, 7/9 and 7/10.

Finally, I would have 2 vectors, each for every variable.

How can I automate this considering I have much more columns and rows?

like image 258
antecessor Avatar asked Aug 11 '19 15:08

antecessor


People also ask

How do I combine two data frames in R?

Let's say you have a list of users in one data frame and a list of their purchases in a second data frame. You'd like to combine these data frames into one based on the user id. In this article, we will learn how to use joins in R to combine data frames by column. The basic way to merge two data frames is to use the merge function.

How to group data based on multiple columns in R?

In order to group our data based on multiple columns, we have to specify all grouping columns within the group_by function: By executing the previous R code we have created Table 2, i.e. a data frame that has been grouped by two variables.

How to merge two data frames by the ID column?

We can now use the merge () R function to combine our two data frames by the id column as follows: Table 1: Basic Merging of Two Data Frames. Table 1 illustrates the merging process of the two data frames.

How many columns are in a data frame in R?

Consider the following R code: Each of our two example data frames contains three columns. Both data frames have an id column (e.g. an identification number for a household).


2 Answers

An option would be to get the index of elements in 'sex' that are "M", loop, subset the 'var' columns where the sex is "F" and divide the the vars corresponding to "M" and rbind

out <- do.call(rbind, lapply(which(data$sex == "M"), function(i) {
     d1 <- data[data$sex == "F", -1]
     data[i, -1][rep(1, nrow(d1)),]/d1 }))
row.names(out) <- NULL
out
#       var1      var2
#1 0.3333333 0.7500000
#2 0.2500000 0.6666667
#3 0.2000000 0.6000000
#4 0.6666667 0.8750000
#5 0.5000000 0.7777778
#6 0.4000000 0.7000000

Another option is outer

i1 <- which(data$sex == "M")
i2 <- setdiff(seq_len(nrow(data)), i1)
sapply(2:ncol(data), function(u) 
        outer(i1, i2, FUN  = function(i, j) data[i, u]/data[j, u]))
#      [,1]      [,2]
#[1,] 0.3333333 0.7500000
#[2,] 0.6666667 0.8750000
#[3,] 0.2500000 0.6666667
#[4,] 0.5000000 0.7777778
#[5,] 0.2000000 0.6000000
#[6,] 0.4000000 0.7000000
like image 125
akrun Avatar answered Sep 25 '22 08:09

akrun


One option would be to use the base R merge function, in cross join mode:

cross <- merge(data[sex=="M",], data[sex=="F",], by=NULL)
df <- data.frame(var1=cross$var1.x/cross$var1.y, var2=cross$var2.x/cross$var2.y)
df

       var1      var2
1 0.3333333 0.7500000
2 0.6666667 0.8750000
3 0.2500000 0.6666667
4 0.5000000 0.7777778
5 0.2000000 0.6000000
6 0.4000000 0.7000000

I didn't bother to sort the data frame above, or bring in any of the original variables, but it would not be too difficult to do that.

like image 25
Tim Biegeleisen Avatar answered Sep 25 '22 08:09

Tim Biegeleisen