Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numbering characters in a string

Tags:

string

r

I want to number the letters in a large dataset. Some letters occur multiple times and are numbered ("A1", "A2"), others also occur multiple times but are not numbered. There are also letters that occur only once... but maybe it's easier to look at the example data below.

The numbers in df$nr are the desired result. How can I get df$nr from df$word and df$letter ?

df <-tibble(word=c(rep("Amamam", 17), rep("Bobob", 14)),
            letter=c("A1", "A1", "A1", "A1", "A2", "A2", "m", "m", "m", "a", "a", "m", "m", "a", "a", "m", "m",
                     "B1", "B1", "B2", "B2", "B3", "B3", "o", "b", "b", "b", "o", "o", "o", "b"),
            nr=c(1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6,
                 1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 5) )
like image 754
Rumpl Avatar asked Sep 12 '19 16:09

Rumpl


People also ask

How do you count occurrences in a string?

Count Number of Occurrences in a String with .count() method. The method takes one argument, either a character or a substring, and returns the number of times that character exists in the string associated with the method.

How do I count the number of characters in a string in Java?

if(string.charAt(i) != ' ') count++; } //Displays the total number of characters present in the given string.


1 Answers

We can group by 'word', remove the numeric part from the 'letter' column, convert to run-length-id (rleid from data.table)

library(dplyr)
library(stringr)
library(data.table)
df1 <- df %>% 
        group_by(word) %>%
        mutate(nr1 = rleid(str_remove(letter, "\\d+")))

all.equal(df1$nr, df1$nr1)
#[1] TRUE
like image 127
akrun Avatar answered Oct 25 '22 15:10

akrun