Combining all data in a data frame per column and groups in R

Tags:

I have this dataset, which is composed by 3 columns and 5 observations:

sex <- c("M", "M", "F", "F", "F")
var1 <- c(1, 2, 3, 4, 5)
var2 <- c(6, 7, 8, 9, 10)

data <- data.frame(sex, var1, var2)
print(data)

   sex var1 var2
1   M   1   6
2   M   2   7
3   F   3   8
4   F   4   9
5   F   5   10

I would like to divide each male (M) by each female (F) in every column.

In this example, which is very simple, I would like to get for var1 a vector of 1/3, 1/4, 1/5, 2/3, 2/4 and 2/5.

For var2, the vector would be 6/8, 6/9, 6/10, 7/8, 7/9 and 7/10.

Finally, I would have 2 vectors, each for every variable.

How can I automate this considering I have much more columns and rows?

258

asked Aug 11 '19 15:08

antecessor

2 Answers

An option would be to get the index of elements in 'sex' that are "M", loop, subset the 'var' columns where the sex is "F" and divide the the vars corresponding to "M" and rbind

out <- do.call(rbind, lapply(which(data$sex == "M"), function(i) {
     d1 <- data[data$sex == "F", -1]
     data[i, -1][rep(1, nrow(d1)),]/d1 }))
row.names(out) <- NULL
out
#       var1      var2
#1 0.3333333 0.7500000
#2 0.2500000 0.6666667
#3 0.2000000 0.6000000
#4 0.6666667 0.8750000
#5 0.5000000 0.7777778
#6 0.4000000 0.7000000

Another option is outer

i1 <- which(data$sex == "M")
i2 <- setdiff(seq_len(nrow(data)), i1)
sapply(2:ncol(data), function(u) 
        outer(i1, i2, FUN  = function(i, j) data[i, u]/data[j, u]))
#      [,1]      [,2]
#[1,] 0.3333333 0.7500000
#[2,] 0.6666667 0.8750000
#[3,] 0.2500000 0.6666667
#[4,] 0.5000000 0.7777778
#[5,] 0.2000000 0.6000000
#[6,] 0.4000000 0.7000000

125

answered Sep 25 '22 08:09

akrun

One option would be to use the base R merge function, in cross join mode:

cross <- merge(data[sex=="M",], data[sex=="F",], by=NULL)
df <- data.frame(var1=cross$var1.x/cross$var1.y, var2=cross$var2.x/cross$var2.y)
df

       var1      var2
1 0.3333333 0.7500000
2 0.6666667 0.8750000
3 0.2500000 0.6666667
4 0.5000000 0.7777778
5 0.2000000 0.6000000
6 0.4000000 0.7000000

I didn't bother to sort the data frame above, or bring in any of the original variables, but it would not be too difficult to do that.

answered Sep 25 '22 08:09

Tim Biegeleisen

Related questions
                            
                                R - finding pattern in a column and replacing it (more efficient solution)
                            
                                How to extract stan code from rstanarm object
                            
                                create a matrix in `R` and each element in that matrix is another matrix
                            
                                Function parameter; passing variable name without quotes
                            
                                Make Y-axis start at 1 instead of 0 within ggplot bar chart
                            
                                Is there a way to make a kable without lines/borders for pdf?
                            
                                Icons in data table in Shiny
                            
                                join data frames and replace one column with another
                            
                                How to fix an error when adding a manual scale in ggplot?
                            
                                How to change alpha in geom_sf?
                            
                                In R: How to replace NA in a Vector found between two integers
                            
                                autoplot does not accept ts object
                            
                                How to stop ggrepel labels moving between gganimate frames in R/ggplot2?
                            
                                Mutate_if or mutate_at in dplyr with Dates
                            
                                How to generate README.md from README.Rmd for R package?
                            
                                "recursive" self join in data.table
                            
                                How to solve an equation for a given variable in R?
                            
                                How to do faster list-column operations inside data.table
                            
                                str_extract_all: return all patterns found in string concatenated as vector
                            
                                How to name a list of a group_split in dplyr when grouped by more than one column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combining all data in a data frame per column and groups in R

Tags:

loops

for-loop

r

antecessor

People also ask

2 Answers

akrun

Tim Biegeleisen

Recent Activity

Donate For Us