I have a data.frame in which each gene name is repeated and contains values for 2 conditions: <pre class="prettyprint"><code>df <- data.frame(gene=c("A","A","B","B","C","C"), condition=c("control","treatment","control","treatment","control","treatment"), count=c(10, 2, 5, 8, 5, 1), sd=c(1, 0.2, 0.1, 2, 0.8, 0.1)) gene condition count sd 1 A control 10 1.0 2 A treatment 2 0.2 3 B control 5 0.1 4 B treatment 8 2.0 5 C control 5 0.8 6 C treatment 1 0.1 </code></pre> I want to calculate if there is an increase or decrease in "count" after treatment and mark them as such and/or subset them. That is (pseudo code): <pre class="prettyprint"><code>for each unique(gene) do if df[geneRow1,3]-df[geneRow2,3] > 0 then gene is "up" else gene is "down" </code></pre> This what it should look like in the end (the last columns is optional): <pre class="prettyprint"><code>up-regulated gene condition count sd regulation B control 5 0.1 up B treatment 8 2.0 up down-regulated gene condition count sd regulation A control 10 1.0 down A treatment 2 0.2 down C control 5 0.8 down C treatment 1 0.1 down </code></pre> I have been raking my brain with this, including playing with ddply, and I've failed to find a solution - please a hapless biologist. Cheers.

The <code>plyr</code> solution would look something like: <pre class="prettyprint"><code>library(plyr) reg.fun <- function(x) { reg.diff <- x$count[x$condition=='control'] - x$count[x$condition=='treatment'] x$regulation <- ifelse(reg.diff > 0, 'up', 'down') x } ddply(df, .(gene), reg.fun) gene condition count sd regulation 1 A control 10 1.0 up 2 A treatment 2 0.2 up 3 B control 5 0.1 down 4 B treatment 8 2.0 down 5 C control 5 0.8 up 6 C treatment 1 0.1 up > </code></pre> You could also think about doing this with a different package and/or with data in a different shape: <pre class="prettyprint"><code>df.w <- reshape(df, direction='wide', idvar='gene', timevar='condition') library(data.table) DT <- data.table(df.w, key='gene') DT[, regulation:=ifelse(count.control-count.treatment > 0, 'up', 'down'), by=gene] gene count.control sd.control count.treatment sd.treatment regulation 1: A 10 1.0 2 0.2 up 2: B 5 0.1 8 2.0 down 3: C 5 0.8 1 0.1 up > </code></pre>

Calculate the difference betwen pairs of consecutive rows in a data frame - R

Tags:

r

I have a data.frame in which each gene name is repeated and contains values for 2 conditions:

df <- data.frame(gene=c("A","A","B","B","C","C"),
condition=c("control","treatment","control","treatment","control","treatment"),
count=c(10, 2, 5, 8, 5, 1), 
sd=c(1, 0.2, 0.1, 2, 0.8, 0.1))

  gene condition count  sd
1    A   control    10 1.0
2    A treatment     2 0.2
3    B   control     5 0.1
4    B treatment     8 2.0
5    C   control     5 0.8
6    C treatment     1 0.1

I want to calculate if there is an increase or decrease in "count" after treatment and mark them as such and/or subset them. That is (pseudo code):

for each unique(gene) do 
   if df[geneRow1,3]-df[geneRow2,3] > 0 then gene is "up"
       else gene is "down"

This what it should look like in the end (the last columns is optional):

up-regulated
 gene condition count  sd  regulation
 B    control     5    0.1    up
 B    treatment   8    2.0    up

down-regulated
 gene condition count  sd  regulation
 A    control     10   1.0    down
 A    treatment   2    0.2    down
 C    control     5    0.8    down
 C    treatment   1    0.1    down

I have been raking my brain with this, including playing with ddply, and I've failed to find a solution - please a hapless biologist.

Cheers.

214

asked Sep 21 '12 23:09

fridaymeetssunday

1 Answers

The plyr solution would look something like:

library(plyr)
reg.fun <- function(x) {
  reg.diff <- x$count[x$condition=='control'] - x$count[x$condition=='treatment']
  x$regulation <- ifelse(reg.diff > 0, 'up', 'down')

  x
}

ddply(df, .(gene), reg.fun)


  gene condition count  sd regulation
1    A   control    10 1.0         up
2    A treatment     2 0.2         up
3    B   control     5 0.1       down
4    B treatment     8 2.0       down
5    C   control     5 0.8         up
6    C treatment     1 0.1         up
>

You could also think about doing this with a different package and/or with data in a different shape:

df.w <- reshape(df, direction='wide', idvar='gene', timevar='condition')

library(data.table)
DT <- data.table(df.w, key='gene')

DT[, regulation:=ifelse(count.control-count.treatment > 0, 'up', 'down'), by=gene]

   gene count.control sd.control count.treatment sd.treatment regulation
1:    A            10        1.0               2          0.2         up
2:    B             5        0.1               8          2.0       down
3:    C             5        0.8               1          0.1         up
>

128

answered Sep 27 '22 19:09

Justin

Related questions
                            
                                data.table's tables() function runs some of my .Rprofile functions
                            
                                Installing only 64 bit packages via the R command line
                            
                                how to find if all elements in a subset of a data.frame row are TRUE
                            
                                ggplot2: Create an independent copy from an ggplot-Object
                            
                                Keeping leading zeros using RODBC
                            
                                connecting all points (possible conbination) in scatter plot
                            
                                Sourcing scripts in [r] shows warnings since 2.15.1
                            
                                How do I draw a straight line on plot using R?
                            
                                remove quotation marks from string at beginning and end only if both are present
                            
                                Create a new data frame column by picking a value in others columns according to an index column
                            
                                geom_point: Put overlapping points with highest values on top of others
                            
                                How to delete blank lines with readLines in R?
                            
                                what is a Callback mechanism and how it applies in R
                            
                                R namespace access and match.fun
                            
                                Delete characters at positions within a string in R?
                            
                                Efficient use of as.numeric() and factor()
                            
                                Bold boxplot labels in R
                            
                                How can I remove rows containing '0' of certain columns while keeping the rows IDs of remaining rows in R
                            
                                arranging ggplot2 legend items in a grid
                            
                                R ggplot2 - Simple plot- cannot specify log axis limits

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With