I have a data frame with a numerical ID variable which identify the Primary, Secondary and Ultimate Sampling Units from a multistage sampling scheme. I want to split the original ID variable into three new variables, identifying the different sampling units separately: Example: <pre class="prettyprint"><code>>df[1:2,] ID Var var1 var2 var3 var4 var5 501901 9 SP.1 1 W 12.10 501901 9 SP.1 2 W 17.68 </code></pre> What I want: <pre class="prettyprint"><code>>df[1:2,] ID1 ID2 ID3 var1 var2 var3 var4 var5 5 01 901 9 SP.1 1 W 12.10 5 01 901 9 SP.1 2 W 17.68 </code></pre> I know there is some functions available in R to split character strings, but I could not find same facilities for numbers. Thank you, Juan

You could use for example use <code>substring</code>: <pre class="prettyprint"><code>df <- data.frame(ID = c(501901, 501902)) splitted <- t(sapply(df$ID, function(x) substring(x, first=c(1,2,4), last=c(1,3,6)))) cbind(df, splitted) # ID 1 2 3 #1 501901 5 01 901 #2 501902 5 01 902 </code></pre>

This should work: <pre class="prettyprint"><code>df <- cbind(do.call(rbind, strsplit(gsub('(.)(..)(...)', '\\1 \\2 \\3', paste(df[,1])),' ')), df[,-1]) # You need that paste() there because gsub() works only with text. </code></pre> Or with <code>substr()</code> <pre class="prettyprint"><code>df <- cbind(ID1=substr(df[, 1],1,1), ID2=substr(df[, 1],2,3), ID3=substr(df[, 1],4,6), df[, -1]) </code></pre>

How to split a number into digits in R

Tags:

split

dataframe

r

I have a data frame with a numerical ID variable which identify the Primary, Secondary and Ultimate Sampling Units from a multistage sampling scheme. I want to split the original ID variable into three new variables, identifying the different sampling units separately:

Example:

>df[1:2,]
ID Var        var1     var2      var3     var4         var5  
501901          9    SP.1          1        W         12.10    
501901          9    SP.1          2        W         17.68

What I want:

>df[1:2,]
ID1    ID2     ID3   var1   var2  var3     var4    var5  
5      01      901    9    SP.1    1        W     12.10    
5      01      901    9    SP.1    2        W     17.68

I know there is some functions available in R to split character strings, but I could not find same facilities for numbers.

Thank you,

Juan

214

asked Mar 19 '13 11:03

jrs-x

3 Answers

You could use for example use substring:

df <- data.frame(ID = c(501901, 501902))

splitted <- t(sapply(df$ID, function(x) substring(x, first=c(1,2,4), last=c(1,3,6))))
cbind(df, splitted)
#      ID 1  2   3
#1 501901 5 01 901
#2 501902 5 01 902

198

answered Oct 20 '22 23:10

EDi

Yet another alternative is to re-read the first column using read.fwf and specify the widths:

cbind(read.fwf(file = textConnection(as.character(df[, 1])), 
               widths = c(1, 2, 3), colClasses = "character", 
               col.names = c("ID1", "ID2", "ID3")), 
      df[-1])
#   ID1 ID2 ID3 var1 var2 var3 var4  var5
# 1   5  01 901    9 SP.1    1    W 12.10
# 2   5  01 901    9 SP.1    2    W 17.68

One advantage here is being able to set the resulting column names in a convenient manner, and ensure that the columns are characters, thus retaining any leading zeroes that might be present.

answered Oct 20 '22 21:10

A5C1D2H2I1M1N2O1R2T1

This should work:

df <- cbind(do.call(rbind, strsplit(gsub('(.)(..)(...)', '\\1 \\2 \\3', paste(df[,1])),' ')), df[,-1]) # You need that paste() there because gsub() works only with text.

Or with substr()

df <- cbind(ID1=substr(df[, 1],1,1), ID2=substr(df[, 1],2,3), ID3=substr(df[, 1],4,6), df[, -1])

answered Oct 20 '22 21:10

Rcoster

Related questions
                            
                                R: getting "inside" environments
                            
                                Help me copy data over the Amazon's EC2 and run a script
                            
                                Basic input file parsing in R
                            
                                How to create side-by-side bar charts (for multiple series) with ggplot?
                            
                                Define starting value different than zero for geom_area()
                            
                                R Shiny: async downloadHandler
                            
                                data.table: subsetting a grouping variable in j with keyby
                            
                                Footer Position in Shiny
                            
                                How can I move a boxplot slightly left or right of its original position?
                            
                                Using renderDataTable within renderUi in Shiny
                            
                                Display and save the plot simultaneously in R, RStudio
                            
                                data.table linearly interpolating NA values without groups
                            
                                Finding elements that do not overlap between two vectors
                            
                                Unable to install.packages(): system call failed: Cannot allocate memory; installation of package had non-zero exit status
                            
                                insert elements in a vector in R
                            
                                Duplicating (and modifying) discrete axis in ggplot2
                            
                                Center x and y axis with ggplot2
                            
                                Read remote file beginning with "smb://" using R
                            
                                How to use facet_grid correctly in ggplot2?
                            
                                Documenting equations with deqn and roxygen

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With