Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string into regular intervals in R?

Tags:

split

r

I have a long string that I want to split into regular intervals of, say, 10 words each:

x <- "Hrothgar, king of the Danes, or Scyldings, builds a great mead-hall, or palace, in which he hopes to feast his liegemen and to give them presents. The joy of king and retainers is, however, of short duration. Grendel, the monster, is seized with hateful jealousy. He cannot brook the sounds of joyance that reach him down in his fen-dwelling near the hall. Oft and anon he goes to the joyous building, bent on direful mischief. Thane after thane is ruthlessly carried off and devoured, while no one is found strong enough and bold enough to cope with the monster. For twelve years he persecutes Hrothgar and his vassals."

Using strsplit I can split the sentence into individual words:

x1 <- unlist(strsplit(x, " "))

Using paste I can paste together 10 words each:

paste(x1[1:10], collapse = " ")
paste(x1[11:20], collapse = " ")
...
paste(x1[101:110], collapse = " ")

But that's tedious, so I've tried sapply and seq:

lapply(x1, function(x) paste(x[seq(1,100,10)], collapse = " "))

but the result is not what I want. What I want is something like this:

[1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
[2] "mead-hall, or palace, in which he hopes to feast his"
[3] "liegemen and to give them presents. The joy of king"
[4] "and retainers is, however, of short duration. Grendel, the monster,"
[5] "is seized with hateful jealousy. He cannot brook the sounds"
...
[10] "twelve years he persecutes Hrothgar and his vassals. NA NA"

I'm open to any solution but would be particularly grateful for a base R one.

like image 759
Chris Ruehlemann Avatar asked Sep 23 '20 07:09

Chris Ruehlemann


People also ask

How do you split a string in R?

To split a string in R, use the strsplit() method. The strsplit() is a built-in R function that splits the string vector into sub-strings. The strsplit() method returns the list, where each list item resembles the item of input that has been split.

What function can be used to split the string in R?

Strsplit(): An R Language function which is used to split the strings into substrings with split arguments.

How do you split a delimiter in R?

Use str_split to Split String by Delimiter in R Alternatively, the str_split function can also be utilized to split string by delimiter. str_split is part of the stringr package. It almost works in the same way as strsplit does, except that str_split also takes regular expressions as the pattern.

How do you split a character vector in R?

Note that splitting into single characters can be done via split = character(0) or split = "" ; the two are equivalent.

How to split a string in R?

The strsplit () function is extensively used and most popular in terms of splitting the strings. In the R language, we use the paste () function to concatenate and the strsplit () function to split the string. Let’s see how to split the string.

How to use strsplit in R?

How to use strsplit in R In this tutorial we will take a look how to split strings in R. String is collection of characters. Whenever you work with text, you need to be able to concatenate words (string them together) and split them apart. In R, you use the paste () function to concatenate and the strsplit () function to split.

How to get the length of a string after splitting it?

Sometime we might prefer to have a fixed length of our new vector of character strings that we retain after splitting our character string. In this case we can use the str_split_fixed command and specify a certain length: The str_split_fixed function returns a matrix with our specified length (i.e. the number of columns is five).

How do I split a string based on spaces in Python?

The following code shows how to use the strsplit () function to split a string based on spaces: #split string based on spaces split_up <- strsplit ("Hey there people", split=" ") #view results split_up [ [1]] [1] "Hey" "there" "people" #view class of split_up class (split_up) [1] "list"


5 Answers

Another option with only base R, using regex to capture (\\1) groups of 10 words (alphanumeric characters, which may contain hyphen, with a word bound \b), and punctuations, and put a "remarkable" string ("XXX" here) in the end, so it can be split by this string afterwards (putting a space before this string in the strsplit pattern avoids trailing space at the end of each bit):

unlist(strsplit(gsub("(((\\w|-)+\\b[ ,.]*){10})", "\\1XXX", x), " XXX"))

# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"          
# [2] "mead-hall, or palace, in which he hopes to feast his"               
# [3] "liegemen and to give them presents. The joy of king"                
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"        
# [6] "of joyance that reach him down in his fen-dwelling near"            
# [7] "the hall. Oft and anon he goes to the joyous"                       
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"             
#[10] "enough and bold enough to cope with the monster. For"               
#[11] "twelve years he persecutes Hrothgar and his vassals."     
like image 69
Cath Avatar answered Oct 18 '22 01:10

Cath


You could create a sequence and paste the words from x1 :

sapply(seq(1, length(x1), 10), function(i) 
       paste0(x1[i:min(i + 9, length(x1))], collapse = " "))

# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"          
# [2] "mead-hall, or palace, in which he hopes to feast his"               
# [3] "liegemen and to give them presents. The joy of king"                
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"        
# [6] "of joyance that reach him down in his fen-dwelling near"            
# [7] "the hall. Oft and anon he goes to the joyous"                       
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"             
#[10] "enough and bold enough to cope with the monster. For"               
#[11] "twelve years he persecutes Hrothgar and his vassals."        
like image 4
Ronak Shah Avatar answered Oct 18 '22 00:10

Ronak Shah


You can use gregexpr with regmatches and quantify the words with {1,10}.

trimws(regmatches(x, gregexpr("([^[:space:]]+\\s*){1,10}", x))[[1]])
# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"          
# [2] "mead-hall, or palace, in which he hopes to feast his"               
# [3] "liegemen and to give them presents. The joy of king"                
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"        
# [6] "of joyance that reach him down in his fen-dwelling near"            
# [7] "the hall. Oft and anon he goes to the joyous"                       
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"             
#[10] "enough and bold enough to cope with the monster. For"               
#[11] "twelve years he persecutes Hrothgar and his vassals."               
like image 2
GKi Avatar answered Oct 18 '22 00:10

GKi


Hope this might help

sapply(
  unname(split(
    y <- unlist(strsplit(x, " ")),
    ceiling(seq_along(y) / 10)
  )),
  paste,
  collapse = " "
)

which gives

 [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
 [2] "mead-hall, or palace, in which he hopes to feast his"
 [3] "liegemen and to give them presents. The joy of king"
 [4] "and retainers is, however, of short duration. Grendel, the monster,"
 [5] "is seized with hateful jealousy. He cannot brook the sounds"
 [6] "of joyance that reach him down in his fen-dwelling near"
 [7] "the hall. Oft and anon he goes to the joyous"
 [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
 [9] "carried off and devoured, while no one is found strong"
[10] "enough and bold enough to cope with the monster. For"
[11] "twelve years he persecutes Hrothgar and his vassals."
like image 1
ThomasIsCoding Avatar answered Oct 18 '22 00:10

ThomasIsCoding


using stringr:

library(stringr)
N = length(strsplit(x, ' ')[[1]]) 
start = seq.int(1, N, 10)
end = start+9
end[length(end)] = N
word(x, start, end)

# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"          
# [2] "mead-hall, or palace, in which he hopes to feast his"               
# [3] "liegemen and to give them presents. The joy of king"                
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"        
# [6] "of joyance that reach him down in his fen-dwelling near"            
# [7] "the hall. Oft and anon he goes to the joyous"                       
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"             
# [10] "enough and bold enough to cope with the monster. For"               
# [11] "twelve years he persecutes Hrothgar and his vassals." 
like image 1
dww Avatar answered Oct 18 '22 00:10

dww