In order to make a GROUP VARIABLE for long data, I want to group multiple values into one new value.
I have already one solution but I feel there could be a better implementation.
set.seed(1337)
df <- data.frame(coli = sample(rep(1:6,2)), newi = 0 )
replaceList <- list(oneAndTwo=1:2, threeAndFour=3:4, fiveAndSix=5:6)
> df
coli newi
1 1 0
2 6 0
3 1 0
4 5 0
5 3 0
6 2 0
7 6 0
8 2 0
9 4 0
10 4 0
11 3 0
12 5 0
> replaceList
$oneAndTwo
[1] 1 2
$threeAndFour
[1] 3 4
$fiveAndSix
[1] 5 6
coli newi
1 1 oneAndTwo
2 6 fiveAndSix
3 1 oneAndTwo
4 5 fiveAndSix
5 3 threeAndFour
6 2 oneAndTwo
7 6 fiveAndSix
8 2 oneAndTwo
9 4 threeAndFour
10 4 threeAndFour
11 3 threeAndFour
12 5 fiveAndSix
mapply(function(fnd,rplc){IND=df$coli %in% fnd;df$newi[IND]<<-rplc},fnd=replaceList,rplc=names(replaceList))
If there is a better practice, also in regards to how to set up the replaceList
I'm happy to learn.
How would you tackle/approach such a problem?
The easiest way to find and replace multiple entries in Excel is by using the SUBSTITUTE function. The formula's logic is very simple: you write a few individual functions to replace an old value with a new one.
Go to Home > Replace. Enter the word or phrase you want to replace in Find what. Enter your new text in Replace with. Choose Replace All to change all occurrences of the word or phrase.
To do this, click on the "Edit" menu, then click on "Find and Replace." In the "Find what" field, type in the text or value that you want to replace. In the "Replace with" field, type in the text or value that you want to use as a replacement. Then, click on the "Replace All" button.
We can stack
the list
to a key/value dataset ('df2') and then do a match
between the 'coli' of 'df' with 'values' column of 'df2' to get the corresponding index for 'ind' and assign it to 'newi'
df2 <- stack(replaceList)
df$newi <- df2$ind[match(df$coli, df2$values)]
df
# coli newi
#1 4 threeAndFour
#2 3 threeAndFour
#3 6 fiveAndSix
#4 1 oneAndTwo
#5 2 oneAndTwo
#6 1 oneAndTwo
#7 5 fiveAndSix
#8 2 oneAndTwo
#9 4 threeAndFour
#10 6 fiveAndSix
#11 3 threeAndFour
#12 5 fiveAndSix
Make a named vector instead of your replaceList
list, then match by name:
set.seed(1337);df <- data.frame(coli = sample(rep(1:6,2)), newi = 0 )
# make a named vector
myLookup <- setNames(c("oneAndTwo","oneAndTwo","threeAndFour","threeAndFour","fiveAndSix","fiveAndSix"),
1:6)
# then match by name
df$newi <- myLookup[ df$coli ]
# check
head(df)
# coli newi
# 1 1 oneAndTwo
# 2 6 fiveAndSix
# 3 1 oneAndTwo
# 4 5 fiveAndSix
# 5 3 threeAndFour
# 6 2 oneAndTwo
Other (preferred) option would be to use cut, and get factor column:
# using cut, no need for lookup
df$newiFactor <- cut(df$coli, c(0, 2, 4, 6))
# check
head(df[order(df$coli), ])
# coli newi newiFactor
# 1 1 oneAndTwo (0,2]
# 3 1 oneAndTwo (0,2]
# 6 2 oneAndTwo (0,2]
# 8 2 oneAndTwo (0,2]
# 5 3 threeAndFour (2,4]
# 11 3 threeAndFour (2,4]
Note: we could use labels
option for cut
and get your desired naming "oneAndTwo", etc
. Again, in this case, I prefer to have numerical looking names: "(0,2]", etc
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With