I downloaded GSE60341_series_matrix.txt.gz found here and when I read it into an R table as,
x <-read.table("GSE60341_series_matrix.txt", fill = TRUE)
I get all the information in rows. In other words I get a matrix of size (42977 rows, and 3 columns), whereas the number of samples should be 1951. So ideally, I should get a table of 1951 rows and (some k columns representing each sample).
Opening the text file gets me,
sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens" "Homo sapiens"
!Sample_title "20120811_NC18_NC18_01" "20120811_NC18_NC18_02" "20120811_NC18_NC18_03" "20120811_NC18_NC18_04" "20120811_NC18_NC18_05"
!Sample_characteristics_ch1 "stimulation: Unstim" "stimulation: Activated" "stimulation: IFNb" "stimulation: Unstim" "stimulation: Activated" "stimulation: IFNb" "stimulation: Unstim" "stimulation: Activated" "stimulation: IFNb" "stimulation: Unstim" "stimulation: Activated" "stimulation: IFNb" "stimulation: Unstim" "stimulation: Activated" "stimulation: IFNb" "stimulation: Unstim" "stimulation: Activated" "stimulation: IFNb" "stimulation: Unstim" "stimulation: Activated"
"lane: 9" "lane: 11" "lane: 12" "lane: 1" "lane: 2" "lane: 3" "lane: 4" "lane: 5" "lane: 6" "lane: 7" "lane: 8" "lane: 9" "lane: 10" "lane: 11" "lane: 12" "lane: 1" "lane: 2" "lane: 3"
The information in the categories (lane
, stimulation
, Sample_title
) are concatenated as rows but I want them to be in columns. Can I have a table where rows represent samples and columns represent , say [Sample_title, stimulation]
?
Series_matrix files are summary text files that include a tab-delimited value- matrix table generated from the 'VALUE' column of each Sample record, headed by Sample and Series metadata. These files include SOFT attribute labels. Data generated from multiple Platforms are contained in separate files.
read.table
is used for reading a generic ASCII table format, this file is in a special format used by the NCBI Gene Expression Omnibus (GEO).
Here is what you need to do:
Install the GEOQuery package for reading GEO files by pasting this code into R:
source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")
Load the package into memory with this line:
library("GEOquery")
Edit the following line, placing the complete path from your working directory to the file within the quotation marks, to read the data into memory as an object gse
:
gse=getGEO(filename="~/Downloads/GSE60341_series_matrix.txt.gz")
Now, if you run View(gse)
you will see a nicely formatted table with 1950 rows in gse.
Check out the GEOquery Documentation for further info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With