I have this dataset, which is composed by 3 columns and 5 observations:
sex <- c("M", "M", "F", "F", "F")
var1 <- c(1, 2, 3, 4, 5)
var2 <- c(6, 7, 8, 9, 10)
data <- data.frame(sex, var1, var2)
print(data)
sex var1 var2
1 M 1 6
2 M 2 7
3 F 3 8
4 F 4 9
5 F 5 10
I would like to divide each male (M
) by each female (F
) in every column.
In this example, which is very simple, I would like to get for var1
a vector of 1/3
, 1/4
, 1/5
, 2/3
, 2/4
and 2/5
.
For var2
, the vector would be 6/8
, 6/9
, 6/10
, 7/8
, 7/9
and 7/10
.
Finally, I would have 2 vectors, each for every variable.
How can I automate this considering I have much more columns and rows?
Let's say you have a list of users in one data frame and a list of their purchases in a second data frame. You'd like to combine these data frames into one based on the user id. In this article, we will learn how to use joins in R to combine data frames by column. The basic way to merge two data frames is to use the merge function.
In order to group our data based on multiple columns, we have to specify all grouping columns within the group_by function: By executing the previous R code we have created Table 2, i.e. a data frame that has been grouped by two variables.
We can now use the merge () R function to combine our two data frames by the id column as follows: Table 1: Basic Merging of Two Data Frames. Table 1 illustrates the merging process of the two data frames.
Consider the following R code: Each of our two example data frames contains three columns. Both data frames have an id column (e.g. an identification number for a household).
An option would be to get the index of elements in 'sex' that are "M", loop, subset the 'var' columns where the sex is "F" and divide the the vars corresponding to "M" and rbind
out <- do.call(rbind, lapply(which(data$sex == "M"), function(i) {
d1 <- data[data$sex == "F", -1]
data[i, -1][rep(1, nrow(d1)),]/d1 }))
row.names(out) <- NULL
out
# var1 var2
#1 0.3333333 0.7500000
#2 0.2500000 0.6666667
#3 0.2000000 0.6000000
#4 0.6666667 0.8750000
#5 0.5000000 0.7777778
#6 0.4000000 0.7000000
Another option is outer
i1 <- which(data$sex == "M")
i2 <- setdiff(seq_len(nrow(data)), i1)
sapply(2:ncol(data), function(u)
outer(i1, i2, FUN = function(i, j) data[i, u]/data[j, u]))
# [,1] [,2]
#[1,] 0.3333333 0.7500000
#[2,] 0.6666667 0.8750000
#[3,] 0.2500000 0.6666667
#[4,] 0.5000000 0.7777778
#[5,] 0.2000000 0.6000000
#[6,] 0.4000000 0.7000000
One option would be to use the base R merge
function, in cross join mode:
cross <- merge(data[sex=="M",], data[sex=="F",], by=NULL)
df <- data.frame(var1=cross$var1.x/cross$var1.y, var2=cross$var2.x/cross$var2.y)
df
var1 var2
1 0.3333333 0.7500000
2 0.6666667 0.8750000
3 0.2500000 0.6666667
4 0.5000000 0.7777778
5 0.2000000 0.6000000
6 0.4000000 0.7000000
I didn't bother to sort the data frame above, or bring in any of the original variables, but it would not be too difficult to do that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With