I have a data frame with a numerical ID variable which identify the Primary, Secondary and Ultimate Sampling Units from a multistage sampling scheme. I want to split the original ID variable into three new variables, identifying the different sampling units separately:
Example:
>df[1:2,]
ID Var var1 var2 var3 var4 var5
501901 9 SP.1 1 W 12.10
501901 9 SP.1 2 W 17.68
What I want:
>df[1:2,]
ID1 ID2 ID3 var1 var2 var3 var4 var5
5 01 901 9 SP.1 1 W 12.10
5 01 901 9 SP.1 2 W 17.68
I know there is some functions available in R to split character strings, but I could not find same facilities for numbers.
Thank you,
Juan
To split a number into digits in R, we can use strsplit function by reading the number with as. character and then reading the output with as. numeric.
Step 1 − Divide the decimal number to be converted by the value of the new base. Step 2 − Get the remainder from Step 1 as the rightmost digit (least significant digit) of new base number. Step 3 − Divide the quotient of the previous divide by the new base.
To split a string into a list of integers: Use the str. split() method to split the string into a list of strings. Use the map() function to convert each string into an integer.
To split an integer into digits:Use the str() class to convert the integer to a string. Use a for loop to iterate over the string. Use the int() class to convert each substring to an integer and append them to a list.
Now, we have to split the digit 1 from number 12. This can be achieved by dividing the number by 10 and take the modulo 10. Using above method, we can split each digit from a number.
I know there is some functions available in R to split character strings, but I could not find same facilities for numbers. why don't you try convert your id to string with as.character () then to use strsplit () and then back to numbers with as.numeric () ?
The split function allows dividing data in groups based on factor levels. In this tutorial we are going to show you how to split in R with different examples, reviewing all the arguments of the function.
Split vector in R. Suppose you have a named vector, where the name of each element corresponds to the group the element belongs. Hence, you can split the vector in two vectors where the elements are of the same group, passing the names of the vector with the names function to the argument f.
You could use for example use substring
:
df <- data.frame(ID = c(501901, 501902))
splitted <- t(sapply(df$ID, function(x) substring(x, first=c(1,2,4), last=c(1,3,6))))
cbind(df, splitted)
# ID 1 2 3
#1 501901 5 01 901
#2 501902 5 01 902
Yet another alternative is to re-read the first column using read.fwf
and specify the widths:
cbind(read.fwf(file = textConnection(as.character(df[, 1])),
widths = c(1, 2, 3), colClasses = "character",
col.names = c("ID1", "ID2", "ID3")),
df[-1])
# ID1 ID2 ID3 var1 var2 var3 var4 var5
# 1 5 01 901 9 SP.1 1 W 12.10
# 2 5 01 901 9 SP.1 2 W 17.68
One advantage here is being able to set the resulting column names in a convenient manner, and ensure that the columns are characters, thus retaining any leading zeroes that might be present.
This should work:
df <- cbind(do.call(rbind, strsplit(gsub('(.)(..)(...)', '\\1 \\2 \\3', paste(df[,1])),' ')), df[,-1]) # You need that paste() there because gsub() works only with text.
Or with substr()
df <- cbind(ID1=substr(df[, 1],1,1), ID2=substr(df[, 1],2,3), ID3=substr(df[, 1],4,6), df[, -1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With