EDIT: In creating a simple sample data.frame I used the same dates for the two Date columns however this is not the case, which makes this problem harder.
Instead of this dataframe:
ID Date Balance Date2 Balance2
1 01-01-2014 10000 01-02-2014 5000
2 01-01-2014 50000 01-02-2014 30000
3 01-01-2014 30000 01-02-2014 15000
4 01-01-2014 5000 01-02-2014 3500
I have this dataframe instead:
ID Date Balance Date2 Balance2
1 01-01-2014 10000 01-02-2017 5000
2 01-01-2015 50000 01-02-2016 30000
3 01-08-2014 30000 01-02-2015 15000
4 01-02-2016 5000 01-02-2018 3500
Which I would like to reshape to the following:
ID Date Balance
1 01-01-2014 10000
1 02-02-2017 5000
2 01-01-2015 50000
2 01-02-2016 30000
3 ... ... And so on...
I have the following at the moment.
Dates = a character containing all the columns with Dates (Date, Date2, Date3...)
Balances = a character containing all the columns with Balances (Balance1, Balance2...)
df <- reshape(df,
varying = Balances,
v.names = "Balance"
timevar = "Date"
times = Dates,
direction = "long")
The results with your excellently proposed methods does not get me the results when I changed my sample data.frame / data.table.
The main problem is that I have different dates in the dates column, there is no way I can change this. Date1 - Date2 - Date3 are always in chronological order though.
I need a way where R understands it needs to take the Date column and the Balance column, place it in a new DF, then take Date2 and Balance2, rbind them with the first DF, then Date3, Balance3 and so on, until I got my 700ish variables.
I'm thinking of writing a loop, any thoughts? See below for sample data.
Thanks in advance,
Robert
df <- data.frame(ID=seq(1:4),
Date= c("01-01-2014","01-01-2015","01-08-2014","01-02-2016"),
Balance = c(10000,50000,30000,5000),
Date2= c("01-02-2017","01-02-2016","01-02-2015","01-02-2018"),
Balance2 = c(5000,30000,15000,3500))
If your columns are named as you've provided in your example, you can try merged.stack
from my "splitstackshape" package. Note that the values in your "ID" column must be unique to work correctly though (as they are in your sample data).
Usage is straightforward: Specify the "stubs" of the variables (here, "Date" and "Balance"). Setting sep = "var.stubs"
just strips out the rest of the column name. the [, .time_1 := NULL]
is just to drop the time column that was created in the reshaping process.
library(splitstackshape)
merged.stack(mydf, var.stubs = c("Date", "Balance"),
sep = "var.stubs")[, .time_1 := NULL][]
# ID Date Balance
# 1: 1 01-01-2014 10000
# 2: 1 01-02-2014 5000
# 3: 2 01-01-2014 50000
# 4: 2 01-02-2014 30000
# 5: 3 01-01-2014 30000
# 6: 3 01-02-2014 15000
# 7: 4 01-01-2014 5000
# 8: 4 01-02-2014 3500
Soon (version 1.9.8 of "data.table") melt
would be able to handle conversion to a semi-long form like you're trying to get here. That would be faster than merged.stack
presently is, but merged.stack
should already be able to handle your present scenario.
If you care about order than probably the fastest method will come from data.table
answers. But if you don't then you could just bind the rows of the first three columns with the first and last two using rbind
. That will be very fast and simple but not have the order you desire. You can reorder with the order
function on ID.
Alternatively you could generate two matrices, transpose, and then bind it all together as vectors. This will be pretty fast because you're just making a few copies and selections and the reordering is done through just identifying the data in a different way rather than relying on a sorting algorithm.
dateMat <- as.matrix(df[, c(2, 4)])
balMat <- as.matrix(df[, c(3, 5)])
dates <- as.vector( t(dateMat) )
balances <- as.vector( t(balMat) )
dfl <- data.frame(ID = rep(df$ID, each = 2), Date = dates, Balance = balances)
You can test the two versions out for speed on your large data.frame
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With