Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip metadata when Importing dataset in R

Tags:

My question involves how to skip metadata in the beginning of a file when importing data into R. My data is in .txt format where the first lines are metadata describing the data and these need to be filtered out. Below is a minimal example of the data frame in tab delimited format:

Type=GenePix Export                         
DateTime=2010/03/04 16:04:16                        
PixelSize=10                        
Wavelengths=635                     
ImageFiles=Not Saved                        
NormalizationMethod=None                        
NormalizationFactors=1                      
JpegImage=                      
StdDev=Type 1                       
FeatureType=Circular                        
Barcode=                        
BackgroundSubtraction=LocalFeature                      
ImageOrigin=150, 10                     
JpegOrigin=150, 2760                        
Creator=GenePix Pro 7.2.29.002                      
var1    var2    var3    var4    var5    var6    var7
1   1   1   molecule1   1F3 400 4020
1   2   1   molecule2   1B5 221 4020
1   3   1   molecule3   1H5 122 2110
1   4   1   molecule4   1D1 402 2110
1   5   1   molecule5   1F1 600 4020

I could use the basic command shown below if I know the line that the actual data starts from:

mydata <- read.table("mydata.txt",header=T, skip=15)

Which would return;

mydata
  var1 var2 var3      var4 var5 var6 var7
1    1    1    1 molecule1  1F3  400 4020
2    1    2    1 molecule2  1B5  221 4020
3    1    3    1 molecule3  1H5  122 2110
4    1    4    1 molecule4  1D1  402 2110
5    1    5    1 molecule5  1F1  600 4020

The problem is that I need to write a script that can read various datasets where the row number where the actual data starts from varies from one data set to another. I could imagine using something like the sqldf package but I am not quite familiar with sql.

Any assistance would be greatly appreciated.