Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading strand (+, -) column with fread, data.table package

Tags:

r

data.table

I'm trying to use fread to read a genome alignment into a data.table in R. Here's a snapshot of the alignment file:

USI-EAS28:1:100:1786:674#0/1    +   1_maternal  68326824      CTCAATTATACTGAAAGAAACACAATATATCATA    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
USI-EAS28:1:100:1786:940#0/1    +   16_maternal 11407541    CTATTAGTGACCTGCTGTGGGACCTTGGGATGGT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
USI-EAS28:1:100:1786:705#0/1    +   1_maternal  63849584    CTGAGGGTTTGTGTCAGGAAGGGGTGTGGAATTG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   0:T>C
USI-EAS28:1:100:1786:1168#0/1   -   5_maternal  31381649    GCATCATTCATGAAACAATTTTCAAGAGAGGAAA  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1787:582#0/1   +   10_maternal 54587781    CTACAATAATAATAGGGGACTAAAACACCCCACT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1787:62#0/1    +   10_maternal 70390747     CTATTTGCTACTGAATTGTTAATTTTAAAACAGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:573#0/1   -   7_maternal  92583837     CACTGTCAACATTAGACAGACCAATGAGACAAAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:854#0/1   +   7_maternal  129611206    GTTTGTTTTTTTTTTTGAGATGGAGTCTCATTTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   32:C>T
 USI-EAS28:1:100:1788:185#0/1   -   13_maternal 23694307    CAAACAAACTCAAAATGGACTATCGACTGAAAAA  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   
 USI-EAS28:1:100:1788:1339#0/1  -   13_maternal 33699510    TTAACTCTAGTTTTTAGGGATTGCAAATTAGACG  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII  0   0:A>G

The second column reports the strand the read aligns to (+ is forward, - is reverse). Unfortunately fread is trying to read this column into an integer, assigning the value always to 0. This column should be read as a character, or even a boolean, for that matter. Trying to play with arguments sep and sep2 doesn't help.

like image 614
Alvaro Gonzalez Avatar asked Mar 13 '13 14:03

Alvaro Gonzalez


1 Answers

Thanks for reporting. Now fixed in v1.8.9 commit 849. + and - are now read as character, test added.

Btw, we are also intending to add colClasses so that you can override the column type that fread detects. The outstanding to do list relating to fread is at the top of the source file here :
https://r-forge.r-project.org/scm/viewvc.php/pkg/src/fread.c?view=markup&root=datatable

like image 146
Matt Dowle Avatar answered Nov 13 '22 12:11

Matt Dowle