I have a dataframe that includes a column of numbers like this:
360010001001002
360010001001004
360010001001005
360010001001006
I'd like to break into chunks of 2 digits, 3 digits, 5 digits, 1 digit, 4 digits:
36 001 00010 0 1002
36 001 00010 0 1004
36 001 00010 0 1005
36 001 00010 0 1006
That seems like it should be straightforward but I'm reading the strsplit documentation and I can't sort out how I'd do this by lengths.
The most common way is using the split () method which is used to split a string into an array of sub-strings and returns the new array. 1. Using String.split ()
If you are given that the length of the delimiter is 1, then you can simply use a temp string to split the string. This will save the function overhead time in the case of method 2.
The string split() method breaks a given string around matches of the given regular expression. There are two variants of split() method in Java: This method takes a regular expression as a parameter and breaks the given string around matches of this regular expression regex. By default limit is 0.
1. Using String.split () ¶. The string split () method breaks a given string around matches of the given regular expression. There are two variants of split () method in Java: public String split (String regex) This method takes a regular expression as a parameter and breaks the given string around matches of this regular expression regex.
You can use substring
(assuming the length of string/number is fixed):
xx <- c(360010001001002, 360010001001004, 360010001001005, 360010001001006)
out <- do.call(rbind, lapply(xx, function(x) as.numeric(substring(x,
c(1,3,6,11,12), c(2,5,10,11,15)))))
out <- as.data.frame(out)
Assuming this data:
x <- c("360010001001002", "360010001001004", "360010001001005", "360010001001006")
try this:
read.fwf(textConnection(x), widths = c(2, 3, 5, 1, 4))
If x
is numeric then replace x
with as.character(x)
in this statement.
A functional version:
split.fixed.len <- function(x, lengths) {
cum.len <- c(0, cumsum(lengths))
start <- head(cum.len, -1) + 1
stop <- tail(cum.len, -1)
mapply(substring, list(x), start, stop)
}
a <- c(360010001001002,
360010001001004,
360010001001005,
360010001001006)
split.fixed.len(a, c(2, 3, 5, 1, 4))
# [,1] [,2] [,3] [,4] [,5]
# [1,] "36" "001" "00010" "0" "1002"
# [2,] "36" "001" "00010" "0" "1004"
# [3,] "36" "001" "00010" "0" "1005"
# [4,] "36" "001" "00010" "0" "1006"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With