R - Extract multiple rows from column 1 if certain value appears in column 2

Tags:

I have a question about the extraction of multiple values from a data.frame in R and putting them into a new data.frame.

I have a data.frame that looks like this (df)

PRICE     EVENT
1.50        0
1.70        0
1.65        0
1.20        1
0.90        0
1.70        0
1.55        0 
  .         .
  .         .
1.10        0
1.20        0
1.14        1
0.90        0

My actual data.frame has these two columns and over 300.000 rows. The column called EVENT only has the values 0 OR 1 (the value 1 is a proxy that a certain event occurs).

First Step of my research: Analyze the price if the Event occurs. The first step is a easy one. I did it with

vector<-df[df$EVENT==1, "PRICE"]

now vector contains all the Prices for the Eventdays. (here: 1.20 and 1.14)

but now the second step of my research is where it gets interesting:

now i want not only the prices for the eventday, but also the prices for x days before and after the eventday and put them into a matrix

For Example: I want the prices of two days before the event and one day after the event (including event day)

than the new data.frame i am trying to create would look like

    Event 1               Event n
-2   1.70        ...        1.10
-1   1.65        ...        1.20
 0   1.20        ...        1.14
+1   0.90        ...        0.90

Please keep in mind that the 4 days span [-2:1] is only an example. In my actual research i have to cover a 91 day span [-30:60].

Thanks for the help :)

674

asked Jan 25 '18 08:01

Bit

2 Answers

We can create a matrix that contains the relevant row numbers, and then use that as a mask to arrive at your expected output:

event_rows <- which(df$EVENT==1)
mask <- sapply(event_rows, function(x) (x-2):(x+2))
apply(mask, 2, function(x) df$PRICE[x])
#     [,1] [,2]
#[1,] 1.70 1.10
#[2,] 1.65 1.20
#[3,] 1.20 1.14
#[4,] 0.90 0.90
#[5,] 1.70   NA

Data

df <- structure(list(PRICE = c(1.5, 1.7, 1.65, 1.2, 0.9, 1.7, 1.55, 
1.1, 1.2, 1.14, 0.9), EVENT = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
0L, 1L, 0L)), .Names = c("PRICE", "EVENT"), class = "data.frame", row.names = c(NA, 
-11L))

141

answered Nov 07 '22 19:11

mtoto

For the sake of completion, here's a base R solution:

# example data
set.seed(123)
df <- data.frame(price = rnorm(100), event = rbinom(100, 1, 0.05))

# create a vector of unique event positions with additional 2 positions before and 1 ahead
offset <- unique(as.vector(sapply(which(df$event == 1), function(x) c((x-2):(x+1)))))

# subset data    
df[offset[offset >0 & offset <= 100],]


         price event
1  -0.56047565     0
2  -0.23017749     1
3   1.55870831     0
20 -0.47279141     0
21 -1.06782371     0
22 -0.21797491     1
23 -1.02600445     0
46 -1.12310858     0
47 -0.40288484     0
48 -0.46665535     1
49  0.77996512     1
50 -0.08336907     0
62 -0.50232345     0
63 -0.33320738     0
64 -1.01857538     1
65 -1.07179123     0
75 -0.68800862     0
76  1.02557137     0
77 -0.28477301     1
78 -1.22071771     0
95  1.36065245     0
96 -0.60025959     0
97  2.18733299     1
98  1.53261063     0

Edit: I didn't see the expected output at first, see @mtoto's answer for that.

answered Nov 07 '22 21:11

LAP

Related questions
                            
                                How do I make a world map without Antarctica?
                            
                                Convert string to symbol accepted by dplyr in function
                            
                                Programming a function for "lm" using tidyeval
                            
                                Include an extra section of text/hyperlink in YAML-Header section of Rmarkdown document
                            
                                R How to install package 'graph'?
                            
                                Recode a string column into integer using dplyr
                            
                                R: Extreme bunching of random values from runif with Mersenne-Twister seed
                            
                                suppress line/index numbers in R output
                            
                                ggplot2 vertical lines from data points in grouped scatter plot
                            
                                add_trace in Plotly in a loop [duplicate]
                            
                                How to repeat sequence when condition is met
                            
                                How do I manually fit a viewport with a fixed aspect ratio into its parent such that no space is wasted like ggplot can do?
                            
                                Plot emojis/emoticons in R with ggplot
                            
                                Use strsplit with multiple delimiters [duplicate]
                            
                                Finding subsets within a dataframe and writing the result
                            
                                Memory leaks in a simple Rcpp function
                            
                                R Shiny - how to display choice label in selectInput
                            
                                In delayed expression evaluation, R Shiny uses changed values of variables
                            
                                Create unary operator in R
                            
                                specify position of some nodes in a graph

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R - Extract multiple rows from column 1 if certain value appears in column 2

Tags:

dataframe

r

rows

Bit

People also ask

2 Answers

mtoto

LAP

Recent Activity

Donate For Us