Probably a terrible title, but I have a table of qualifiers stored as "1", "2", and "3". What I'm trying to do is is look in each row (approximately 300,000 rows, but variable.) and determine where a single "3" occurs, (if it occurs more than once, I am not interested in it) and the rest of the columns in that row have a "1", and return that to a list. (The number of columns and column names change based on the input files.)
Instinctively I want to attempt this by doing nested for loops that index the row count, and then the column count, then some function that looks for one "3" and no "2"'s. --which likely means the preferred way would be some apply function correct?
Another though was to total the number of columns, add 2, and then sum the row while having a qualifier that no 2's can be in the row. But that seemed pretty complicated.
df1
seq loc Ball Cat Square Water
1 AAAAAACCAGTCCCAGTTCGGATTG t 3 1 1 1
2 AAAAAACCAGTCTCAGTTCGGATTG b 1 1 3 3
3 AAAAAACCAGTCTCAGTTCGGATTG t 1 3 2 1
4 AAAAAACCGGTCACAGTTCAGATTG b 1 1 1 2
5 AAAAAACCGGTCACAGTTCAGATTG t 1 1 3 1
Expected Ouput:
seq loc Group
1 AAAAAACCAGTCCCAGTTCGGATTG t Ball
2 AAAAAACCGGTCACAGTTCAGATTG t Square
dput of df1:
structure(list(seq = structure(c(1L, 2L, 2L, 3L, 3L), .Label =
c("AAAAAACCAGTCCCAGTTCGGATTG",
"AAAAAACCAGTCTCAGTTCGGATTG", "AAAAAACCGGTCACAGTTCAGATTG"), class =
"factor"),
loc = structure(c(2L, 1L, 2L, 1L, 2L), .Label = c("b",
"t"), class = "factor"), Ball = c("3", "1", "1", "1", "1"
), Cat = c("1", "1", "3", "1", "1"), Square = c("1", "3",
"2", "1", "3"), Water = c("1", "3", "1", "2", "1")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
Here's a solution without tidyverse and even *apply functions. First, let's convert those four columns to integers:
cols <- 3:6
df1[cols] <- lapply(df1[cols], as.integer)
Then
df <- df1[rowSums(df1[cols]) == (3 + length(cols) - 1) & rowSums(df1[cols] == 3) == 1, ]
df$Group <- names(df)[cols][which(t(df[cols]) == 3, arr.ind = TRUE)[, 1]]
df
# A tibble: 2 x 7
# seq loc Ball Cat Square Water Group
# <fct> <fct> <int> <int> <int> <int> <chr>
# 1 AAAAAACCAGTCCCAGTTCGGATTG t 3 1 1 1 Ball
# 2 AAAAAACCGGTCACAGTTCAGATTG t 1 1 3 1 Square
In the first line I select the right rows with two conditions: there has to be only one element equal to 3 in those cols
columns (rowSums(df1[cols] == 3) == 1
) and the total sum of the row has to be 3 + length(cols) - 1
. Then in the second row I check which columns have 3
and pick corresponding names of df
as values for Group
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With