I have a question about the extraction of multiple values from a data.frame in R and putting them into a new data.frame.
I have a data.frame that looks like this (df)
PRICE EVENT
1.50 0
1.70 0
1.65 0
1.20 1
0.90 0
1.70 0
1.55 0
. .
. .
1.10 0
1.20 0
1.14 1
0.90 0
My actual data.frame has these two columns and over 300.000 rows. The column called EVENT only has the values 0 OR 1 (the value 1 is a proxy that a certain event occurs).
First Step of my research: Analyze the price if the Event occurs. The first step is a easy one. I did it with
vector<-df[df$EVENT==1, "PRICE"]
now vector
contains all the Prices for the Eventdays. (here: 1.20 and 1.14)
but now the second step of my research is where it gets interesting:
now i want not only the prices for the eventday, but also the prices for x days before and after the eventday and put them into a matrix
For Example: I want the prices of two days before the event and one day after the event (including event day)
than the new data.frame i am trying to create would look like
Event 1 Event n
-2 1.70 ... 1.10
-1 1.65 ... 1.20
0 1.20 ... 1.14
+1 0.90 ... 0.90
Please keep in mind that the 4 days span [-2:1] is only an example. In my actual research i have to cover a 91 day span [-30:60].
Thanks for the help :)
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
To get multiple rows of matrix, specify the row numbers as a vector followed by a comma, in square brackets, after the matrix variable name. This expression returns the required rows as a matrix.
To split a column into multiple columns in the R Language, We use the str_split_fixed() function of the stringr package library.
We can create a matrix that contains the relevant row numbers, and then use that as a mask to arrive at your expected output:
event_rows <- which(df$EVENT==1)
mask <- sapply(event_rows, function(x) (x-2):(x+2))
apply(mask, 2, function(x) df$PRICE[x])
# [,1] [,2]
#[1,] 1.70 1.10
#[2,] 1.65 1.20
#[3,] 1.20 1.14
#[4,] 0.90 0.90
#[5,] 1.70 NA
Data
df <- structure(list(PRICE = c(1.5, 1.7, 1.65, 1.2, 0.9, 1.7, 1.55,
1.1, 1.2, 1.14, 0.9), EVENT = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 0L)), .Names = c("PRICE", "EVENT"), class = "data.frame", row.names = c(NA,
-11L))
For the sake of completion, here's a base R
solution:
# example data
set.seed(123)
df <- data.frame(price = rnorm(100), event = rbinom(100, 1, 0.05))
# create a vector of unique event positions with additional 2 positions before and 1 ahead
offset <- unique(as.vector(sapply(which(df$event == 1), function(x) c((x-2):(x+1)))))
# subset data
df[offset[offset >0 & offset <= 100],]
price event
1 -0.56047565 0
2 -0.23017749 1
3 1.55870831 0
20 -0.47279141 0
21 -1.06782371 0
22 -0.21797491 1
23 -1.02600445 0
46 -1.12310858 0
47 -0.40288484 0
48 -0.46665535 1
49 0.77996512 1
50 -0.08336907 0
62 -0.50232345 0
63 -0.33320738 0
64 -1.01857538 1
65 -1.07179123 0
75 -0.68800862 0
76 1.02557137 0
77 -0.28477301 1
78 -1.22071771 0
95 1.36065245 0
96 -0.60025959 0
97 2.18733299 1
98 1.53261063 0
Edit: I didn't see the expected output at first, see @mtoto's answer for that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With