Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you keep the first value in multiple columns?

Tags:

r

dplyr

I have a dataframe. I want to keep the first value in multiple columns.

I prefer a tidyverse solution using the pipe %>% operator. I prefer a solution where I can input E1:N3 because I have about 50 columns in the real dataset.

This is the dataframe:

df <- data.frame(age = c(20, 25, 30), E1 = c("1 Alpha", "2 Bravo", "1 Alpha"), E2 = c("2 Bravo", "2 Bravo", "2 Bravo"), E3 = c("1 Alpha", "2 Bravo", "2 Bravo"), N1 = c("1 Alpha", "1 Alpha", "1 Alpha"), N2 = c("2 Bravo", "1 Alpha", "2 Bravo"), N3 = c("2 Bravo", "2 Bravo", "1 Alpha"))
df
#>   age      E1      E2      E3      N1      N2      N3
#> 1  20 1 Alpha 2 Bravo 1 Alpha 1 Alpha 2 Bravo 2 Bravo
#> 2  25 2 Bravo 2 Bravo 2 Bravo 1 Alpha 1 Alpha 2 Bravo
#> 3  30 1 Alpha 2 Bravo 2 Bravo 1 Alpha 2 Bravo 1 Alpha

This is what I want:

df_expected <- data.frame(age = c(20, 25, 30), E1 = c("1", "2", "1"), E2 = c("2", "2", "2"), E3 = c("1", "2", "2"), N1 = c("1", "1", "1"), N2 = c("2", "1", "2"), N3 = c("2", "2", "1"))
df_expected
#>   age E1 E2 E3 N1 N2 N3
#> 1  20  1  2  1  1  2  2
#> 2  25  2  2  2  1  1  2
#> 3  30  1  2  2  1  2  1
like image 752
OTA Avatar asked Nov 27 '19 22:11

OTA


People also ask

How do I get values from multiple columns?

The VLOOKUP function can be combined with other functions such as the Sum, Max, or Average to calculate values in multiple columns. As this is an array formula, to make it work we simply need to press CTRL+SHIFT+ENTER at the end of the formula.

Can you search multiple columns in Excel?

Excel allows a user to do a multi-column lookup using the INDEX and MATCH functions.


2 Answers

You can also do:

df %>%
 mutate_at(vars(E1:N3), ~ substr(., 1, 1))

  age E1 E2 E3 N1 N2 N3
1  20  1  2  1  1  2  2
2  25  2  2  2  1  1  2
3  30  1  2  2  1  2  1

Then, if it is always numbers and you are looking for numeric vectors, you can do:

df %>%
 mutate_at(vars(E1:N3), ~ as.numeric(substr(., 1, 1)))
like image 64
tmfmnk Avatar answered Oct 13 '22 21:10

tmfmnk


Here is an option to extract the first numeric part with parse_number on columns that starts with 'E' or 'N' followed by one or more digits (\\d+) in the column name

library(dplyr)
library(stringr)
df %>%
   mutate_at(vars(matches("^(E|N)\\d+$")), ~readr::parse_number(as.character(.)))

Or using str_remove to remove the substring that starts from one or more space including other characters (.*)

df %>%
   mutate_at(vars(-age), ~ str_remove(., "\\s+.*"))
#   age E1 E2 E3 N1 N2 N3
#1  20  1  2  1  1  2  2
#2  25  2  2  2  1  1  2
#3  30  1  2  2  1  2  1

Or using base R

df[-1] <- lapply(df[-1], sub, pattern = "\\s.*", replacement = "")
like image 24
akrun Avatar answered Oct 13 '22 21:10

akrun