I'm struggling to read my tables in Variant Call Format (VCF) with R.
Each file has some comment lines starting with ##
, and then the header starting with #
.
## contig=<ID=OTU1431,length=253>
## contig=<ID=OTU915,length=253>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT /home/sega/data/bwa/reads/0015.2142.fastq.q10sorted.bam
Eubacterium_ruminantium_AB008552 56 . C T 228 . DP=212;AD=0,212;VDB=0;SGB=-0.693147;MQ0F=0;AC=2;AN=2;DP4=0,0,0,212;MQ=59 GT:PL 1/1:255,255,0
How can I read such table without missing a header?
Using read.table()
with comment.char = "##"
returns an error: "invalid 'comment.char' argument"
A table can be read from left to right or from top to bottom. If you read a table across the row, you read the information from left to right. In the Cats and Dogs Table, the number of black animals is 2 + 2 = 4. You'll see that those are the numbers in the row directly to the right of the word 'Black.
Remember that the read. csv() as well as the read. csv2() function are almost identical to the read. table() function, with the sole difference that they have the header and fill arguments set as TRUE by default.
table() function in R Language is used to read data from a text file. It returns the data in the form of a table.
If you want to read VCF, you can also just try to use readVcf
from VariantAnnotation
in Bioconductor.
https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html
Otherwise, I can highly recommend fread
function in data.table
package.
It allows you to use the skip
argument to allow it to start importing when a substring has been found.
e.g.
fread("test.vcf", skip = "CHROM")
should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With