Is there a way to isolate parts of a string that are in alphabetical order? In other words, if you have a string like this: <code>hjubcdepyvb</code> Could you just pull out the portion in alphabetical order?: <code>bcde</code> I have thought about using the <code>is.unsorted()</code> function, but I'm not sure how to apply this to only a portion of a string.

<pre class="prettyprint"><code>myf = function(x){ x = unlist(strsplit(x, "")) ind = charmatch(x, letters) d = c(0, diff(ind)) d[d !=1] = 0 d = d + c(sapply(1:(length(d)-1), function(i) { ifelse(d[i] == 0 & d[i+1] == 1, 1, 0) } ), 0) d = split(seq_along(d)[d!=0], with(rle(d), rep(seq_along(values), lengths))[d!=0]) return(sapply(d, function(a) paste(x[a], collapse = ""))) } myf(x = "hjubcdepyvblltpqrs") # 2 4 #"bcde" "pqrs" </code></pre>

Isolate alphabetical strings within a larger string

3 Answers

Here's one way by converting to ASCII and back:

Click to copy

input <- "hjubcdepyvb"
spl_asc <- as.integer(charToRaw(input))       # Convert to ASCII
d1 <- diff(spl_asc) == 1                      # Find sequences
filt <- spl_asc[c(FALSE, d1) | c(d1, FALSE)]  # Only keep sequences (incl start and end)
rawToChar(as.raw(filt))                       # Convert back to character

#[1] "bcde"

Note that this will concatenate any parts that are in alphabetical order.

i.e. If input is "abcxasdicfgaqwe" then output would be abcfg.

If you wanted to get separate vectors for each sequential string, you could do the following

Click to copy

input <- "abcxasdicfgaqwe"
spl_asc <- as.integer(charToRaw(input))
d1 <- diff(spl_asc) == 1
r <- rle(c(FALSE, d1) | c(d1, FALSE))                   # Find boundaries
cm <- cumsum(c(1, r$lengths))                           # Map these to string positions
substring(input, cm[-length(cm)], cm[-1] - 1)[r$values] # Extract matching strings

Finally, I had to come up with a way to use regex:

Click to copy

input <- c("abcxasdicfgaqwe", "xufasiuxaboqdasdij", "abcikmcapnoploDEFgnm",
           "acfhgik")
(rg <- paste0("(", paste0(c(letters[-26], LETTERS[-26]),
                           "(?=", c(letters[-1], LETTERS[-1]), ")", collapse = "|"), ")+."))

#[1] "(a(?=b)|b(?=c)|c(?=d)|d(?=e)|e(?=f)|f(?=g)|g(?=h)|h(?=i)|i(?=j)|j(?=k)|
#k(?=l)|l(?=m)|m(?=n)|n(?=o)|o(?=p)|p(?=q)|q(?=r)|r(?=s)|s(?=t)|t(?=u)|u(?=v)|
#v(?=w)|w(?=x)|x(?=y)|y(?=z)|A(?=B)|B(?=C)|C(?=D)|D(?=E)|E(?=F)|F(?=G)|G(?=H)|
#H(?=I)|I(?=J)|J(?=K)|K(?=L)|L(?=M)|M(?=N)|N(?=O)|O(?=P)|P(?=Q)|Q(?=R)|R(?=S)|
#S(?=T)|T(?=U)|U(?=V)|V(?=W)|W(?=X)|X(?=Y)|Y(?=Z))+."

regmatches(input, gregexpr(rg, input, perl = TRUE))
#[[1]]
#[1] "abc" "fg" 
#
#[[2]]
#[1] "ab" "ij"
#
#[[3]]
#[1] "abc" "nop" "DEF"
#
#[[4]]
#character(0)

This regular expression will identify consecutive upper or lower case letters (but not mixed case). As demonstrated, it works for character vectors and produces a list of vectors with all the matches identified. If no match is found, the output is character(0).

196

answered Nov 01 '22 19:11

Nick Kennedy

Using factor integer conversion:

Click to copy

input <- "hjubcdepyvb"
d1 <- diff(as.integer(factor(unlist(strsplit(input, "")), levels = letters))) == 1
filt <- c(FALSE, d1) | c(d1, FALSE)
paste(unlist(strsplit(input, ""))[filt], collapse = "")
# [1] "bcde"

answered Nov 01 '22 20:11

zx8754

Click to copy

myf = function(x){
    x = unlist(strsplit(x, ""))
    ind = charmatch(x, letters)
    d = c(0, diff(ind))
    d[d !=1] = 0
    d = d + c(sapply(1:(length(d)-1), function(i) {
        ifelse(d[i] == 0 & d[i+1] == 1, 1, 0)
    }
    ), 0)
    d = split(seq_along(d)[d!=0], with(rle(d), rep(seq_along(values), lengths))[d!=0])
    return(sapply(d, function(a) paste(x[a], collapse = "")))
}

myf(x = "hjubcdepyvblltpqrs")
#     2      4 
#"bcde" "pqrs"

answered Nov 01 '22 18:11

d.b

Related questions
                            
                                Tree cut and Rectangles around clusters for a horizontal dendrogram in R
                            
                                Rpy2 error wac-a-mole: R_USER not defined
                            
                                Overlay two geom_bar like two barplots with par(new=TRUE)
                            
                                Extracting coefficients and their standard error for each unit in an lme model fit
                            
                                Saving results from for loop as a vector in r
                            
                                Permission Denied Error when downloading a file
                            
                                Split a vector into unequal chunks in R
                            
                                Insert a row of NAs after each group of data using data.table
                            
                                outptut two objects using foreach
                            
                                How to install R 3.1.2 on Linux Mint 17.1
                            
                                Split a string by a plus sign (+) character
                            
                                Align a double line chart and a bar plot on the x axis when both charts have the same X axis. ggplot2
                            
                                How to handle null entries in SparkR
                            
                                ggplot2: How to combine histogram, rug plot, and logistic regression prediction in a single graph
                            
                                Train test split in `r`'s `caret` package
                            
                                Row wise Sorting in R
                            
                                Plotly charts in a for loop
                            
                                How to sum over j=1 to (i-1) for each element of [i] (typing formula from article)
                            
                                All possible combinations of two vectors while keeping the order in R
                            
                                Xgboost dealing with imbalanced classification data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Isolate alphabetical strings within a larger string

Tags:

string

sorting

r

alphabetical

tdm

People also ask

3 Answers

Nick Kennedy

zx8754

d.b

Recent Activity

Donate For Us