Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a Long String into smaller strings

Tags:

string

split

r

I have a dataframe that includes a column of numbers like this:

360010001001002
360010001001004
360010001001005
360010001001006

I'd like to break into chunks of 2 digits, 3 digits, 5 digits, 1 digit, 4 digits:

36 001 00010 0 1002
36 001 00010 0 1004
36 001 00010 0 1005
36 001 00010 0 1006

That seems like it should be straightforward but I'm reading the strsplit documentation and I can't sort out how I'd do this by lengths.

like image 808
Amanda Avatar asked May 07 '13 22:05

Amanda


People also ask

How to split a string into an array of strings?

The most common way is using the split () method which is used to split a string into an array of sub-strings and returns the new array. 1. Using String.split ()

How to split a string by the length of delimiter?

If you are given that the length of the delimiter is 1, then you can simply use a temp string to split the string. This will save the function overhead time in the case of method 2.

What is the use of string split in Java?

The string split() method breaks a given string around matches of the given regular expression. There are two variants of split() method in Java: This method takes a regular expression as a parameter and breaks the given string around matches of this regular expression regex. By default limit is 0.

How do you split a string in a regular expression?

1. Using String.split () ¶. The string split () method breaks a given string around matches of the given regular expression. There are two variants of split () method in Java: public String split (String regex) This method takes a regular expression as a parameter and breaks the given string around matches of this regular expression regex.


3 Answers

You can use substring (assuming the length of string/number is fixed):

xx <- c(360010001001002, 360010001001004, 360010001001005, 360010001001006)
out <- do.call(rbind, lapply(xx, function(x) as.numeric(substring(x, 
                     c(1,3,6,11,12), c(2,5,10,11,15)))))
out <- as.data.frame(out)
like image 117
Arun Avatar answered Oct 17 '22 09:10

Arun


Assuming this data:

x <- c("360010001001002", "360010001001004", "360010001001005", "360010001001006")

try this:

read.fwf(textConnection(x), widths = c(2, 3, 5, 1, 4))

If x is numeric then replace x with as.character(x) in this statement.

like image 29
G. Grothendieck Avatar answered Oct 17 '22 09:10

G. Grothendieck


A functional version:

split.fixed.len <- function(x, lengths) {
   cum.len <- c(0, cumsum(lengths))
   start   <- head(cum.len, -1) + 1
   stop    <- tail(cum.len, -1)
   mapply(substring, list(x), start, stop)
}    

a <- c(360010001001002,
       360010001001004,
       360010001001005,
       360010001001006)

split.fixed.len(a, c(2, 3, 5, 1, 4))
#      [,1] [,2]  [,3]    [,4] [,5]  
# [1,] "36" "001" "00010" "0"  "1002"
# [2,] "36" "001" "00010" "0"  "1004"
# [3,] "36" "001" "00010" "0"  "1005"
# [4,] "36" "001" "00010" "0"  "1006"
like image 22
flodel Avatar answered Oct 17 '22 11:10

flodel