i wanted to download gene expression data derived from generated by microarray experiments. i do not know too much about this subject, but as i understand, rows often correspond to genes and columns corresponds to samples. ideally, i expect a matrix of gene expression data.
i've been searching on the internet, and although it may seem like there are many places to download such data, when i actually do download the data, i do not get the matrix of gene expression. could someone please let me know if there is a place or how to download gene expression data in the format that i expect above?
any help is appreciated.
If you look at e.g. this entry in the Gene Expression Omnibus, one of the file formats is "TXT" and contains a matrix like you are asking for, after some metadata.
In principle microarray data can be expressed (please pardon the pun) as a matrix with samples as columns and rows as genes. In practice it is a good bit more complicated to derive such a representation for the raw data of an experiment. If you just get a pre-processed dataset you have little guarantee that the raw data was processed in a way that makes it comparable to other experiments or that the underlying raw data was of sufficiently high quality.
You are also going to need high quality metadata to derive any meaning from the data matrix. What were the biological conditions and sources from which the samples were derived? What genes do the probes on the particular array used correspond to? (Note that 9890_at is "probeset id", a unique identifier of a molecular probe of a particular sequence design which then needs to be mapped to a gene, different probes for the same gene won't give exactly the same response.)
The public microrarray databases therefore provide a lot of additional information in addition to a processed data matrix. In addition to GEO that has already been mentioned I would recommend ArrayExpress which in my opinion has the better search interface.
The tool of choice to work with microarray data for many is the bioconductor suite of software for the statistical programming language R.
Bioconductor provides APIs to download raw data with accompanying metadata from both repositories, see the GEO bioc package and ArrayExpress bioc package.
Both packages, in common with most bioconductor software come with excellent "vignettes" that introduce the software: GEO bioc vignette and Arrayexpress bioc vignette
Those vignettes should also give you examples of taking the raw data and deriving "Esets" (expression sets) from the raw data. At that point you can access the gene expression matrix in the bioconductor Eset object, and you have an object and APIs to interrogate the necessary metadata.
Note that there are different types of microarray. I would recommend starting with data from Affymetrix arrays as they have probably the most straightforward analysis APIs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With