I have the following <code>table1</code> which is a data frame composed of 6 columns and 8083 rows. Below I am displaying the head of this <code>table1</code>: <pre class="prettyprint"><code>|gene ID | prom_65| prom_66| amast_69| amast_70| p_value| |:--------------|---------:|---------:|---------:|---------:|---------:| |LdBPK_321470.1 | 24.7361| 25.2550| 31.2974| 45.4209| 0.2997430| |LdBPK_251900.1 | 107.3580| 112.9870| 77.4182| 86.3211| 0.0367792| |LdBPK_331430.1 | 72.0639| 86.1486| 68.5747| 77.8383| 0.2469355| |LdBPK_100640.1 | 43.8766| 53.4004| 34.0255| 38.4038| 0.1299948| |LdBPK_330360.1 | 2382.8700| 1871.9300| 2013.4200| 2482.0600| 0.8466225| |LdBPK_090870.1 | 49.6488| 53.7134| 59.1175| 66.0931| 0.0843242| </code></pre> I have another data frame, called <code>accessions40</code> which is a list of 510 gene IDs. It is a subset of the first column of <code>table1</code> i.e. all of its values (510) are contained in the first column of <code>table1</code> (8083). The head of <code>accessions40</code> is displayed below: <pre class="prettyprint"><code>|V1 | |:--------------| |LdBPK_330360.1 | |LdBPK_283000.1 | |LdBPK_360210.1 | |LdBPK_261550.1 | |LdBPK_367320.1 | |LdBPK_361420.1 | </code></pre> What I want to do is the following: I want to produce a new <code>table2</code> which contains under the first column (gene ID) only the values present in <code>accessions40</code> and the corresponding values from the other five columns from <code>table1</code>. In other words, I want to subset the first column of my <code>table1</code> based on the values of <code>accessions40</code>.

We can use <code>%in%</code> to get a logical vector and <code>subset</code> the rows of the 'table1' based on that. <pre class="prettyprint"><code>subset(table1, gene_ID %in% accessions40$V1) </code></pre> <hr> A better option would be <code>data.table</code> <pre class="prettyprint"><code>library(data.table) setDT(table1)[gene_ID %chin% accessions40$V1] </code></pre> Or use <code>filter</code> from <code>dplyr</code> <pre class="prettyprint"><code>library(dplyr) table1 %>% filter(gene_ID %in% accessions40$V1) </code></pre>

There are many ways to do this. Finding the <code>gene_ID</code> in <code>table1</code> which are present in <code>V1</code> column of <code>accession40</code> <pre class="prettyprint"><code>table1[table1$gene_ID %in% accessions40$V1, ] </code></pre> <hr> Or you can also use <code>match</code> <pre class="prettyprint"><code>table1[match(accessions40$V1, table1$gene_ID), ] </code></pre>

subset a column in data frame based on another data frame/list

Tags:

r

apply

subset

I have the following table1 which is a data frame composed of 6 columns and 8083 rows. Below I am displaying the head of this table1:

|gene ID        |   prom_65|   prom_66|  amast_69|  amast_70|   p_value|
|:--------------|---------:|---------:|---------:|---------:|---------:|
|LdBPK_321470.1 |   24.7361|   25.2550|   31.2974|   45.4209| 0.2997430|
|LdBPK_251900.1 |  107.3580|  112.9870|   77.4182|   86.3211| 0.0367792|
|LdBPK_331430.1 |   72.0639|   86.1486|   68.5747|   77.8383| 0.2469355|
|LdBPK_100640.1 |   43.8766|   53.4004|   34.0255|   38.4038| 0.1299948|
|LdBPK_330360.1 | 2382.8700| 1871.9300| 2013.4200| 2482.0600| 0.8466225|
|LdBPK_090870.1 |   49.6488|   53.7134|   59.1175|   66.0931| 0.0843242|

I have another data frame, called accessions40 which is a list of 510 gene IDs. It is a subset of the first column of table1 i.e. all of its values (510) are contained in the first column of table1 (8083). The head of accessions40 is displayed below:

|V1             |
|:--------------|
|LdBPK_330360.1 |
|LdBPK_283000.1 |
|LdBPK_360210.1 |
|LdBPK_261550.1 |
|LdBPK_367320.1 |
|LdBPK_361420.1 |

What I want to do is the following: I want to produce a new table2 which contains under the first column (gene ID) only the values present in accessions40 and the corresponding values from the other five columns from table1. In other words, I want to subset the first column of my table1 based on the values of accessions40.

994

asked Aug 09 '16 12:08

BCArg

2 Answers

We can use %in% to get a logical vector and subset the rows of the 'table1' based on that.

subset(table1, gene_ID %in% accessions40$V1)

A better option would be data.table

library(data.table)
setDT(table1)[gene_ID %chin% accessions40$V1]

Or use filter from dplyr

library(dplyr)
table1 %>%
      filter(gene_ID %in% accessions40$V1)

answered Oct 25 '22 21:10

akrun

There are many ways to do this. Finding the gene_ID in table1 which are present in V1 column of accession40

table1[table1$gene_ID %in% accessions40$V1, ]

Or you can also use match

table1[match(accessions40$V1, table1$gene_ID), ]

answered Oct 25 '22 21:10

Ronak Shah

Related questions
                            
                                Insert Layer underneath existing layers in ggplot2 object
                            
                                Using ggplot function in R error : could not find function ggplot
                            
                                Can't install rJava on ubuntu system
                            
                                Update a Value in One Column Based on Criteria in Other Columns
                            
                                R: applying function over matrix and keeping matrix dimensions
                            
                                How can I make R read my environmental variables?
                            
                                R reading a huge csv
                            
                                Get rid of \addlinespace in kable
                            
                                For loop in R with increments
                            
                                Are these strings or variables?
                            
                                Remove pattern from string with gsub
                            
                                R: Text progress bar in for loop
                            
                                Convert summary to data.frame
                            
                                Changing whisker definition in geom_boxplot
                            
                                How do I select variables in an R dataframe whose names contain a particular string?
                            
                                How do you extract a few random rows from a data.table on the fly
                            
                                Create URL hyperlink in R Shiny?
                            
                                purrr map equivalent of nested for loop
                            
                                Subsetting data.table set by date range in R
                            
                                Categorize numeric variable with mutate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With