I have a dataframe. I want to keep the first value in multiple columns.
I prefer a tidyverse solution using the pipe %>% operator. I prefer a solution where I can input E1:N3 because I have about 50 columns in the real dataset.
This is the dataframe:
df <- data.frame(age = c(20, 25, 30), E1 = c("1 Alpha", "2 Bravo", "1 Alpha"), E2 = c("2 Bravo", "2 Bravo", "2 Bravo"), E3 = c("1 Alpha", "2 Bravo", "2 Bravo"), N1 = c("1 Alpha", "1 Alpha", "1 Alpha"), N2 = c("2 Bravo", "1 Alpha", "2 Bravo"), N3 = c("2 Bravo", "2 Bravo", "1 Alpha"))
df
#> age E1 E2 E3 N1 N2 N3
#> 1 20 1 Alpha 2 Bravo 1 Alpha 1 Alpha 2 Bravo 2 Bravo
#> 2 25 2 Bravo 2 Bravo 2 Bravo 1 Alpha 1 Alpha 2 Bravo
#> 3 30 1 Alpha 2 Bravo 2 Bravo 1 Alpha 2 Bravo 1 Alpha
This is what I want:
df_expected <- data.frame(age = c(20, 25, 30), E1 = c("1", "2", "1"), E2 = c("2", "2", "2"), E3 = c("1", "2", "2"), N1 = c("1", "1", "1"), N2 = c("2", "1", "2"), N3 = c("2", "2", "1"))
df_expected
#> age E1 E2 E3 N1 N2 N3
#> 1 20 1 2 1 1 2 2
#> 2 25 2 2 2 1 1 2
#> 3 30 1 2 2 1 2 1
The VLOOKUP function can be combined with other functions such as the Sum, Max, or Average to calculate values in multiple columns. As this is an array formula, to make it work we simply need to press CTRL+SHIFT+ENTER at the end of the formula.
Excel allows a user to do a multi-column lookup using the INDEX and MATCH functions.
You can also do:
df %>%
mutate_at(vars(E1:N3), ~ substr(., 1, 1))
age E1 E2 E3 N1 N2 N3
1 20 1 2 1 1 2 2
2 25 2 2 2 1 1 2
3 30 1 2 2 1 2 1
Then, if it is always numbers and you are looking for numeric vectors, you can do:
df %>%
mutate_at(vars(E1:N3), ~ as.numeric(substr(., 1, 1)))
Here is an option to extract the first numeric part with parse_number
on columns that starts with 'E' or 'N' followed by one or more digits (\\d+
) in the column name
library(dplyr)
library(stringr)
df %>%
mutate_at(vars(matches("^(E|N)\\d+$")), ~readr::parse_number(as.character(.)))
Or using str_remove
to remove the substring that starts from one or more space including other characters (.*
)
df %>%
mutate_at(vars(-age), ~ str_remove(., "\\s+.*"))
# age E1 E2 E3 N1 N2 N3
#1 20 1 2 1 1 2 2
#2 25 2 2 2 1 1 2
#3 30 1 2 2 1 2 1
Or using base R
df[-1] <- lapply(df[-1], sub, pattern = "\\s.*", replacement = "")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With