I want to number the letters in a large dataset. Some letters occur multiple times and are numbered ("A1", "A2"), others also occur multiple times but are not numbered. There are also letters that occur only once... but maybe it's easier to look at the example data below.
The numbers in df$nr are the desired result. How can I get df$nr from df$word and df$letter ?
df <-tibble(word=c(rep("Amamam", 17), rep("Bobob", 14)),
letter=c("A1", "A1", "A1", "A1", "A2", "A2", "m", "m", "m", "a", "a", "m", "m", "a", "a", "m", "m",
"B1", "B1", "B2", "B2", "B3", "B3", "o", "b", "b", "b", "o", "o", "o", "b"),
nr=c(1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6,
1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 5) )
Count Number of Occurrences in a String with .count() method. The method takes one argument, either a character or a substring, and returns the number of times that character exists in the string associated with the method.
if(string.charAt(i) != ' ') count++; } //Displays the total number of characters present in the given string.
We can group by 'word', remove the numeric part from the 'letter' column, convert to run-length-id (rleid
from data.table
)
library(dplyr)
library(stringr)
library(data.table)
df1 <- df %>%
group_by(word) %>%
mutate(nr1 = rleid(str_remove(letter, "\\d+")))
all.equal(df1$nr, df1$nr1)
#[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With