I have a dataset that has dates and interest rates in the same column. I need to split these two numbers into two separate columns, however when I use the following code:
Split <- str_split(df$Dates, "[ ]", n = 2)
Dates <- unlist(Split)[1]
Rates <- unlist(Split)[2]
It returns only the first "value" of each element, i.e., "1971-04-01" for Dates and "7.43" for Rates. I need it to return all values for the portion of the string split and the same for the second portion of the string split
Below is a portion of the dataset, total rows = 518.
1971-04-01 7.31
1971-05-01 7.43
1971-06-01 7.53
1971-07-01 7.60
1971-08-01 7.70
1971-09-01 7.69
1971-10-01 7.63
1971-11-01 7.55
1971-12-01 7.48
1972-01-01 7.44
Thanks
You can use reshape2::colsplit
library(reshape2)
colsplit(df$Dates, ' ', names = c('Dates','Rates'))
# Dates Rates
# 1 1971-04-01 7.31
# 2 1971-05-01 7.43
# 3 1971-06-01 7.53
# 4 1971-07-01 7.60
# 5 1971-08-01 7.70
# 6 1971-09-01 7.69
# 7 1971-10-01 7.63
# 8 1971-11-01 7.55
# 9 1971-12-01 7.48
# 10 1972-01-01 7.44
Could do
Split <- strsplit(as.character(df$Dates), " ", fixed = TRUE)
Dates <- sapply(Split, "[", 1)
Rates <- sapply(Split, "[", 2)
df <- data.frame(
Date = c("1971-04-01 7.31", "1971-05-01 7.43", "1971-06-01 7.53",
"1971-07-01 7.60", "1971-08-01 7.70", "1971-09-01 7.69",
"1971-10-01 7.63", "1971-11-01 7.55", "1971-12-01 7.48",
"1972-01-01 7.44"))
do.call(rbind, strsplit(as.character(df$Date), split = '\\s+', fixed = FALSE))
Perhaps I'm biased, but I would suggest my cSplit
function for this problem.
First, I'm assuming we are starting with the following (single column) data.frame
(where there are multiple spaces between the "date" value and the "rate" value).
df <- data.frame(
Date = c("1971-04-01 7.31", "1971-05-01 7.43", "1971-06-01 7.53",
"1971-07-01 7.60", "1971-08-01 7.70", "1971-09-01 7.69",
"1971-10-01 7.63", "1971-11-01 7.55", "1971-12-01 7.48",
"1972-01-01 7.44"))
Next, get the cSplit
function from my GitHub Gist, and use it. You can split on a regular expression (here, multiple spaces).
cSplit(df, "Date", "\\s+", fixed = FALSE)
# Date_1 Date_2
# 1: 1971-04-01 7.31
# 2: 1971-05-01 7.43
# 3: 1971-06-01 7.53
# 4: 1971-07-01 7.60
# 5: 1971-08-01 7.70
# 6: 1971-09-01 7.69
# 7: 1971-10-01 7.63
# 8: 1971-11-01 7.55
# 9: 1971-12-01 7.48
# 10: 1972-01-01 7.44
Since the function converts a data.frame
to a data.table
, you have access to setnames
which would let you rename your columns in place.
setnames(cSplit(df, "Date", "\\s+", fixed = FALSE), c("Dates", "Rates"))[]
# Dates Rates
# 1: 1971-04-01 7.31
# 2: 1971-05-01 7.43
# 3: 1971-06-01 7.53
# 4: 1971-07-01 7.60
# 5: 1971-08-01 7.70
# 6: 1971-09-01 7.69
# 7: 1971-10-01 7.63
# 8: 1971-11-01 7.55
# 9: 1971-12-01 7.48
# 10: 1972-01-01 7.44
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With