I have a long string that I want to split into regular intervals of, say, 10 words each:
x <- "Hrothgar, king of the Danes, or Scyldings, builds a great mead-hall, or palace, in which he hopes to feast his liegemen and to give them presents. The joy of king and retainers is, however, of short duration. Grendel, the monster, is seized with hateful jealousy. He cannot brook the sounds of joyance that reach him down in his fen-dwelling near the hall. Oft and anon he goes to the joyous building, bent on direful mischief. Thane after thane is ruthlessly carried off and devoured, while no one is found strong enough and bold enough to cope with the monster. For twelve years he persecutes Hrothgar and his vassals."
Using strsplit
I can split the sentence into individual words:
x1 <- unlist(strsplit(x, " "))
Using paste
I can paste together 10 words each:
paste(x1[1:10], collapse = " ")
paste(x1[11:20], collapse = " ")
...
paste(x1[101:110], collapse = " ")
But that's tedious, so I've tried sapply
and seq
:
lapply(x1, function(x) paste(x[seq(1,100,10)], collapse = " "))
but the result is not what I want. What I want is something like this:
[1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
[2] "mead-hall, or palace, in which he hopes to feast his"
[3] "liegemen and to give them presents. The joy of king"
[4] "and retainers is, however, of short duration. Grendel, the monster,"
[5] "is seized with hateful jealousy. He cannot brook the sounds"
...
[10] "twelve years he persecutes Hrothgar and his vassals. NA NA"
I'm open to any solution but would be particularly grateful for a base R
one.
To split a string in R, use the strsplit() method. The strsplit() is a built-in R function that splits the string vector into sub-strings. The strsplit() method returns the list, where each list item resembles the item of input that has been split.
Strsplit(): An R Language function which is used to split the strings into substrings with split arguments.
Use str_split to Split String by Delimiter in R Alternatively, the str_split function can also be utilized to split string by delimiter. str_split is part of the stringr package. It almost works in the same way as strsplit does, except that str_split also takes regular expressions as the pattern.
Note that splitting into single characters can be done via split = character(0) or split = "" ; the two are equivalent.
The strsplit () function is extensively used and most popular in terms of splitting the strings. In the R language, we use the paste () function to concatenate and the strsplit () function to split the string. Let’s see how to split the string.
How to use strsplit in R In this tutorial we will take a look how to split strings in R. String is collection of characters. Whenever you work with text, you need to be able to concatenate words (string them together) and split them apart. In R, you use the paste () function to concatenate and the strsplit () function to split.
Sometime we might prefer to have a fixed length of our new vector of character strings that we retain after splitting our character string. In this case we can use the str_split_fixed command and specify a certain length: The str_split_fixed function returns a matrix with our specified length (i.e. the number of columns is five).
The following code shows how to use the strsplit () function to split a string based on spaces: #split string based on spaces split_up <- strsplit ("Hey there people", split=" ") #view results split_up [ [1]] [1] "Hey" "there" "people" #view class of split_up class (split_up) [1] "list"
Another option with only base R
, using regex
to capture (\\1
) groups of 10 words (alphanumeric characters, which may contain hyphen, with a word bound \b
), and punctuations, and put a "remarkable" string ("XXX"
here) in the end, so it can be split by this string afterwards (putting a space before this string in the strsplit
pattern avoids trailing space at the end of each bit):
unlist(strsplit(gsub("(((\\w|-)+\\b[ ,.]*){10})", "\\1XXX", x), " XXX"))
# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
# [2] "mead-hall, or palace, in which he hopes to feast his"
# [3] "liegemen and to give them presents. The joy of king"
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"
# [6] "of joyance that reach him down in his fen-dwelling near"
# [7] "the hall. Oft and anon he goes to the joyous"
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"
#[10] "enough and bold enough to cope with the monster. For"
#[11] "twelve years he persecutes Hrothgar and his vassals."
You could create a sequence and paste the words from x1
:
sapply(seq(1, length(x1), 10), function(i)
paste0(x1[i:min(i + 9, length(x1))], collapse = " "))
# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
# [2] "mead-hall, or palace, in which he hopes to feast his"
# [3] "liegemen and to give them presents. The joy of king"
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"
# [6] "of joyance that reach him down in his fen-dwelling near"
# [7] "the hall. Oft and anon he goes to the joyous"
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"
#[10] "enough and bold enough to cope with the monster. For"
#[11] "twelve years he persecutes Hrothgar and his vassals."
You can use gregexpr
with regmatches
and quantify the words with {1,10}
.
trimws(regmatches(x, gregexpr("([^[:space:]]+\\s*){1,10}", x))[[1]])
# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
# [2] "mead-hall, or palace, in which he hopes to feast his"
# [3] "liegemen and to give them presents. The joy of king"
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"
# [6] "of joyance that reach him down in his fen-dwelling near"
# [7] "the hall. Oft and anon he goes to the joyous"
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"
#[10] "enough and bold enough to cope with the monster. For"
#[11] "twelve years he persecutes Hrothgar and his vassals."
Hope this might help
sapply(
unname(split(
y <- unlist(strsplit(x, " ")),
ceiling(seq_along(y) / 10)
)),
paste,
collapse = " "
)
which gives
[1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
[2] "mead-hall, or palace, in which he hopes to feast his"
[3] "liegemen and to give them presents. The joy of king"
[4] "and retainers is, however, of short duration. Grendel, the monster,"
[5] "is seized with hateful jealousy. He cannot brook the sounds"
[6] "of joyance that reach him down in his fen-dwelling near"
[7] "the hall. Oft and anon he goes to the joyous"
[8] "building, bent on direful mischief. Thane after thane is ruthlessly"
[9] "carried off and devoured, while no one is found strong"
[10] "enough and bold enough to cope with the monster. For"
[11] "twelve years he persecutes Hrothgar and his vassals."
using stringr:
library(stringr)
N = length(strsplit(x, ' ')[[1]])
start = seq.int(1, N, 10)
end = start+9
end[length(end)] = N
word(x, start, end)
# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
# [2] "mead-hall, or palace, in which he hopes to feast his"
# [3] "liegemen and to give them presents. The joy of king"
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"
# [6] "of joyance that reach him down in his fen-dwelling near"
# [7] "the hall. Oft and anon he goes to the joyous"
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"
# [10] "enough and bold enough to cope with the monster. For"
# [11] "twelve years he persecutes Hrothgar and his vassals."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With