Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a series matrix properly in R

Tags:

r

text-files

I downloaded GSE60341_series_matrix.txt.gz found here and when I read it into an R table as,

x <-read.table("GSE60341_series_matrix.txt", fill = TRUE)

I get all the information in rows. In other words I get a matrix of size (42977 rows, and 3 columns), whereas the number of samples should be 1951. So ideally, I should get a table of 1951 rows and (some k columns representing each sample).

Opening the text file gets me,

    sapiens"    "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"  "Homo sapiens"
!Sample_title   "20120811_NC18_NC18_01" "20120811_NC18_NC18_02" "20120811_NC18_NC18_03" "20120811_NC18_NC18_04" "20120811_NC18_NC18_05"
    !Sample_characteristics_ch1 "stimulation: Unstim"   "stimulation: Activated"    "stimulation: IFNb" "stimulation: Unstim"   "stimulation: Activated"    "stimulation: IFNb" "stimulation: Unstim"   "stimulation: Activated"    "stimulation: IFNb" "stimulation: Unstim"   "stimulation: Activated"    "stimulation: IFNb" "stimulation: Unstim"   "stimulation: Activated"    "stimulation: IFNb" "stimulation: Unstim"   "stimulation: Activated"    "stimulation: IFNb" "stimulation: Unstim"   "stimulation: Activated"

"lane: 9"   "lane: 11"  "lane: 12"  "lane: 1"   "lane: 2"   "lane: 3"   "lane: 4"   "lane: 5"   "lane: 6"   "lane: 7"   "lane: 8"   "lane: 9"   "lane: 10"  "lane: 11"  "lane: 12"  "lane: 1"   "lane: 2"   "lane: 3"

The information in the categories (lane, stimulation, Sample_title) are concatenated as rows but I want them to be in columns. Can I have a table where rows represent samples and columns represent , say [Sample_title, stimulation]?

like image 261
IssamLaradji Avatar asked Jan 27 '15 20:01

IssamLaradji


People also ask

What is Series matrix file?

Series_matrix files are summary text files that include a tab-delimited value- matrix table generated from the 'VALUE' column of each Sample record, headed by Sample and Series metadata. These files include SOFT attribute labels. Data generated from multiple Platforms are contained in separate files.


1 Answers

read.table is used for reading a generic ASCII table format, this file is in a special format used by the NCBI Gene Expression Omnibus (GEO).

Here is what you need to do:

  1. Install the GEOQuery package for reading GEO files by pasting this code into R:

    source("http://bioconductor.org/biocLite.R")
    biocLite("GEOquery")
    
  2. Load the package into memory with this line:

    library("GEOquery")
    
  3. Edit the following line, placing the complete path from your working directory to the file within the quotation marks, to read the data into memory as an object gse:

    gse=getGEO(filename="~/Downloads/GSE60341_series_matrix.txt.gz")
    
  4. Now, if you run View(gse) you will see a nicely formatted table with 1950 rows in gse.

    Check out the GEOquery Documentation for further info.

like image 185
Michael Liquori Avatar answered Sep 29 '22 19:09

Michael Liquori