Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting Columns by Number of Characters [duplicate]

Tags:

r

I have a column of dates in a data table entered in 6-digit numbers as such: 201401, 201402, 201403, 201412, etc. where the first 4 digits are the year and second two digits are month.

I'm trying to split that column into two columns, one called "year" and one called "month". Been messing around with strsplit() but can't figure out how to get it to do number of characters instead of a string pattern, i.e. split in the middle of the 4th and 5th digit.

like image 418
Versipellis Avatar asked Jul 07 '16 18:07

Versipellis


2 Answers

Without using any external package, we can do this with substr

transform(df1, Year = substr(dates, 1, 4), Month = substr(dates, 5, 6))
#    dates Year Month
#1  201401 2014    01
#2  201402 2014    02
#3  201403 2014    03
#4  201412 2014    12

We have the option to remove or keep the column.


Or with sub

cbind(df1, read.csv(text=sub('(.{4})(.{2})', "\\1,\\2", df1$dates), header=FALSE))

Or using some package solutions

library(tidyr)
extract(df1, dates, into = c("Year", "Month"), "(.{4})(.{2})", remove=FALSE)

Or with data.table

library(data.table)
setDT(df1)[, tstrsplit(dates, "(?<=.{4})", perl = TRUE)]
like image 156
akrun Avatar answered Oct 29 '22 13:10

akrun


tidyr::separate can take an integer for its sep parameter, which will split at a particular location:

library(tidyr)

df <- data.frame(date = c(201401, 201402, 201403, 201412))

df %>% separate(date, into = c('year', 'month'), sep = 4)
#>   year month
#> 1 2014    01
#> 2 2014    02
#> 3 2014    03
#> 4 2014    12

Note the new columns are character; add convert = TRUE to coerce back to numbers.

like image 41
alistaire Avatar answered Oct 29 '22 13:10

alistaire