Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

matching row values (text) with column names and return value

I am trying to find the column name with the same name as the text in another column called "region", and return the corresponding value. My data "df" looks like something similar to this

region  A   B   C   D   E   F
H      796  792 844 812 796 776
J      568  564 508 268 320 396
A      820  804 748 528 560 600
X      292  272 260 324 224 200
M      872  812 792 760 668 656
N      100 992  972 880 872 864
C      940  948 952 916 864 880
L      960  956 952 920 900 920
E      980  968 956 940 944 932
F      236  364 460 524 552 616
P      796  792 844 812 796 776
Q      568  564 508 268 320 396

And I want to get something that looks like this:

region  A   B   C   D   E   F
H       NA  NA  NA  NA  NA  NA
J       NA  NA  NA  NA  NA  NA
A       820 NA  NA  NA  NA  NA
X       NA  NA  NA  NA  NA  NA
M       NA  NA  NA  NA  NA  NA
N       NA  NA  NA  NA  NA  NA
C       NA  NA  952 NA  NA  NA
L       NA  NA  NA  NA  NA  NA
E       NA  NA  NA  NA  944 NA
F       NA  NA  NA  NA  NA  616
P       NA  NA  NA  NA  NA  NA
Q       NA  NA  NA  NA  NA  NA

To do this, I tried this piece of code from this other questions (Loop that matches row to column names and computes an average of the 3 preceding columns) but it only returns the position, and I would like to get the value as shown in the example above.

apply (df, MARGIN = 1, FUN = function(x, i){ position <- (which(x[['region']] == colnames(df))) })

How can I modify the code to get the real value? Thanks

like image 965
marlaska Avatar asked Feb 16 '20 20:02

marlaska


Video Answer


2 Answers

A fifth option also using base functions

idx <- na.omit(cbind(match(names(df1), df1$region),
                     1:length(df1)))
vals <- as.integer(df1[idx])
df1[-1] <- NA
df1[idx] <- vals
df1
#   region   A  B   C  D   E   F
#1       H  NA NA  NA NA  NA  NA
#2       J  NA NA  NA NA  NA  NA
#3       A 820 NA  NA NA  NA  NA
#4       X  NA NA  NA NA  NA  NA
#5       M  NA NA  NA NA  NA  NA
#6       N  NA NA  NA NA  NA  NA
#7       C  NA NA 952 NA  NA  NA
#8       L  NA NA  NA NA  NA  NA
#9       E  NA NA  NA NA 944  NA
#10      F  NA NA  NA NA  NA 616
#11      P  NA NA  NA NA  NA  NA
#12      Q  NA NA  NA NA  NA  NA

data

Thanks to @akrun

df1 <- structure(list(region = c("H", "J", "A", "X", "M", "N", "C", 
"L", "E", "F", "P", "Q"), A = c(796L, 568L, 820L, 292L, 872L, 
100L, 940L, 960L, 980L, 236L, 796L, 568L), B = c(792L, 564L, 
804L, 272L, 812L, 992L, 948L, 956L, 968L, 364L, 792L, 564L), 
    C = c(844L, 508L, 748L, 260L, 792L, 972L, 952L, 952L, 956L, 
    460L, 844L, 508L), D = c(812L, 268L, 528L, 324L, 760L, 880L, 
    916L, 920L, 940L, 524L, 812L, 268L), E = c(796L, 320L, 560L, 
    224L, 668L, 872L, 864L, 900L, 944L, 552L, 796L, 320L), F = c(776L, 
    396L, 600L, 200L, 656L, 864L, 880L, 920L, 932L, 616L, 776L, 
    396L)), class = "data.frame", row.names = c(NA, -12L))
like image 180
markus Avatar answered Oct 04 '22 17:10

markus


Here is one option with tidyverse where we reshape into 'long' format with pivot_longer, replace the elements in 'value' where the 'region' is not equal to 'name' column value and then reshape back to 'wide' format

library(dplyr)
library(tidyr)
df1 %>% 
  pivot_longer(cols = -region) %>% 
  mutate(value = replace(value, name!= region, NA)) %>%
  pivot_wider(names_from = name, values_from = value)
#   region   A  B   C  D   E   F
#1       H  NA NA  NA NA  NA  NA
#2       J  NA NA  NA NA  NA  NA
#3       A 820 NA  NA NA  NA  NA
#4       X  NA NA  NA NA  NA  NA
#5       M  NA NA  NA NA  NA  NA
#6       N  NA NA  NA NA  NA  NA
#7       C  NA NA 952 NA  NA  NA
#8       L  NA NA  NA NA  NA  NA
#9       E  NA NA  NA NA 944  NA
#10      F  NA NA  NA NA  NA 616
#11      P  NA NA  NA NA  NA  NA
#12      Q  NA NA  NA NA  NA  NA

Another option is imap

library(purrr)
imap_dfc(df1[-1], ~ replace(.x, .y != df1[['region']], NA)) %>%
   bind_cols(df1['region'], .)
#    region   A  B   C  D   E   F
#1       H  NA NA  NA NA  NA  NA
#2       J  NA NA  NA NA  NA  NA
#3       A 820 NA  NA NA  NA  NA
#4       X  NA NA  NA NA  NA  NA
#5       M  NA NA  NA NA  NA  NA
#6       N  NA NA  NA NA  NA  NA
#7       C  NA NA 952 NA  NA  NA
#8       L  NA NA  NA NA  NA  NA
#9       E  NA NA  NA NA 944  NA
#10      F  NA NA  NA NA  NA 616
#11      P  NA NA  NA NA  NA  NA
#12      Q  NA NA  NA NA  NA  NA

Or using base R, we replicate the names of the dataset and do a comparison with the 'region' column, change those values in those columns to NA based on the comparison

df1[-1] <- NA^(df1$region != names(df1)[-1][col(df1[-1])]) * df1[-1]

data

df1 <- structure(list(region = c("H", "J", "A", "X", "M", "N", "C", 
"L", "E", "F", "P", "Q"), A = c(796L, 568L, 820L, 292L, 872L, 
100L, 940L, 960L, 980L, 236L, 796L, 568L), B = c(792L, 564L, 
804L, 272L, 812L, 992L, 948L, 956L, 968L, 364L, 792L, 564L), 
    C = c(844L, 508L, 748L, 260L, 792L, 972L, 952L, 952L, 956L, 
    460L, 844L, 508L), D = c(812L, 268L, 528L, 324L, 760L, 880L, 
    916L, 920L, 940L, 524L, 812L, 268L), E = c(796L, 320L, 560L, 
    224L, 668L, 872L, 864L, 900L, 944L, 552L, 796L, 320L), F = c(776L, 
    396L, 600L, 200L, 656L, 864L, 880L, 920L, 932L, 616L, 776L, 
    396L)), class = "data.frame", row.names = c(NA, -12L))
like image 36
akrun Avatar answered Oct 04 '22 15:10

akrun