I am looking for some suggestions on how I can read text files by every n-th file in Raku/perl6.
In bioinformatics research, sometimes we need to parse text files in a somewhat less than straightforward manner. Such as Fastq files, which store data in groups of 4 lines at a time. Even more, these Fastq files come in pairs. So if we need to parse such files, we may need to do something like reading 4 lines from the first Fastq file, and reading 4 lines from the second Fastq file, then read the next 4 lines from the first Fastq file, and then read the next 4 lines from the second fastq file, ......
May I have some suggestions regarding what is the best way to approach this problem? Raku's "IO.lines" approach seems to be able to handle each line one at a time. but not sure how to handle every n-th line
An example fastq file pair: https://github.com/wtwt5237/perl6-for-bioinformatics/tree/master/Come%20on%2C%20sister/fastq What we tried before with "IO.lines": https://github.com/wtwt5237/perl6-for-bioinformatics/blob/master/Come%20on%2C%20sister/script/benchmark2.p6
Reading 4 lines at a time from 2 files and processing them into a single thing, can be easily done with zip
and batch
:
my @filenames = <file1 file2>;
for zip @filenames.map: *.IO.lines.batch(4) {
# expect ((a,b,c,d),(e,f,g,h))
}
This will keep producing until at least one of the files is fully handled. An alternate for batch
is rotor
: this will keep going while both files fill up 4 lines completely. Other ways of finishing the loop are with also specifying the :partial
flag with rotor
, and using roundrobin
instead of zip
. YMMV.
You can use the lines
method. Raku Sequence
s are lazy. This means that iterating over an expression like "somefile".IO.lines
will only ever read one line into memory, never the whole file. In order to do the latter you would need to assign the Sequence
to an Array
.
The pairs
method helps you getting the index of the lines. In combination with the divisible by operator %%
we can write
"somefile".IO.lines.pairs.grep({ .key && .key %% 4 }).map({ .value })
in order to get a sequence of every 4th line in a file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With