extract character preceding first dot in a string

Question

I would like to extract the character preceding the first dot in a column of strings. I can do so with the code below. Although, the code seems overly complex and I had to resort to a for-loop. Is there an easier way? I particularly am interested in a regex solution.

Note that finding the last number in each string will not work with my real data, although that approach would work with this example.

Thank you for any advice.

my.data <- read.table(text = '
     my.string  state
     .........    A
     1........    B
     112......    C
     11111....    D
     1111113..    E
     111111111    F
     111111111    G
', header = TRUE, stringsAsFactors = FALSE)

desired.result <- c(NA,1,2,1,3,NA,NA)

Identify the position of the first dot:

my.data$first.dot <- apply(my.data, 1, function(x) {     
                                as.numeric(gregexpr("\.", x['my.string'])[[1]])[1]
                          })

Split strings:

split.strings <- t(apply(my.data, 1, function(x) { (strsplit(x['my.string'], '')[[1]]) } ))

my.data$revised.first.dot <- ifelse(my.data$first.dot < 2, NA, my.data$first.dot-1)

Extract the character preceding the first dot:

for(i in 1:nrow(my.data)) {
     my.data$character.before.dot[i] <- split.strings[i,my.data$revised.first.dot[i]]
}

my.data

#   my.string state first.dot revised.first.dot character.before.dot
# 1 .........     A         1                NA                 <NA>
# 2 1........     B         2                 1                    1
# 3 112......     C         4                 3                    2
# 4 11111....     D         6                 5                    1
# 5 1111113..     E         8                 7                    3
# 6 111111111     F        -1                NA                 <NA>
# 7 111111111     G        -1                NA                 <NA>

Here is a related post:

find location of character in string

Avinash Raj · Accepted Answer

Use the below regex and don't forget to enable perl=TRUE parameter.

^[^.]*?\K[^.](?=\.)

In R, the regex would be like,

^[^.]*?\K[^.](?=\.)

DEMO

> library(stringr)
> as.numeric(str_extract(my.data$my.string, perl("^[^.]*?\K[^.](?=\.)")))
[1] NA  1  2  1  3 NA NA

Pattern Explanation:

^ Asserts that we are at the start.
[^.]*? Non-greedy match of any character upto the first dot.
\K Discards previously matched characters.
[^.] Character we are going to match must not be a dot.
(?=\.) And this character must be followed by a dot. So it matches the character which exists just before to the first dot.

Tim Pietzcker · Answer

The simplest regex would be ^([^.])+(?=\.):

^      # Start of string
(      # Start of group 1
 [^.]  # Match any character except .
)+     # Repeat as many times as needed, overwriting the previous match
(?=\.) # Assert the next character is a .

Test it live on regex101.com.

The contents of group 1 will be your desired character. I'm not much of an R guy, but according to RegexBuddy, the following should work:

matches <- regexpr("^([^.])+(?=\.)", my.data, perl=TRUE);
result <- attr(matches, "capture.start")[,1]
attr(result, "match.length") <- attr(matches, "capture.length")[,1]
regmatches(my.data, result)

extract character preceding first dot in a string

Tags:

string

regex

r

Mark Miller

2 Answers

Avinash Raj

Tim Pietzcker

Recent Activity

Donate For Us

extract character preceding first dot in a string

Tags:

string

regex

r

Mark Miller

2 Answers

Avinash Raj

Tim Pietzcker

Related questions

Recent Activity

Donate For Us