let say I have a data frame df
like that
txt A1 A2 B1 B2
1 ala 6 9 12 23
2 ata 1 3 3 11
....
I would like to use dplyr
for filtering the rows based on a sum of a range of variables.
I tried:
filter(df, sum(A2:B1)>10)
.... but it does not work.
Could anyone suggest a solution in dplyr
? And yes I know it can be done differently by simple subsetting.
To get multiple columns of matrix, specify the column numbers as a vector preceded by a comma, in square brackets, after the matrix variable name. This expression returns the required columns as a matrix.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.
I think the most dplyr
-esque way would be:
df %>%
filter(rowSums(select_(., 'A2:B1')) > 10)
Which gives:
# txt A1 A2 B1 B2
#1 ala 6 9 12 23
We can get the indexes first and then use rowSums
,
v1 <- which(names(df) == 'A2') #find first column
#[1] 3
v2 <- which(names(df) == 'B1') #find last column
#[1] 4
df[rowSums(df[v1:v2])>10,]
# txt A1 A2 B1 B2
#1 ala 6 9 12 23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With