Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Julia, how does one convert a list of ASCII decimals to a string?

tldr: I want to convert [125, 119, 48, 126, 40] to output string, }w0~(


To give a real life example, I am working with sequence data in fastq format (Here is a link to the library imported).

cat example.fastq outputs the following:

@some/random/identifier
ACTAG
+
}w0~(

The julia code below demonstrates reading the fastq file:

import BioSequences.FASTQ


fastq_stream = FASTQ.Reader(open("example.fastq", "r"))
for record in fastq_stream
    # Still need to learn, why this offset of 33?
    println(
        Vector{Int8}(FASTQ.quality(record, :sanger)) .+ 33
    )
    println(
        String(FASTQ.sequence(record))
    )
    println(
        String(FASTQ.identifier(record))
    )
    break
end
close(fastq_stream)

This code prints the following:

[125, 119, 48, 126, 40]
ACTAG
some/random/identifier

I don't want to have to store this information in a list. I would prefer to convert it to string. So the output I am looking for here is:

}w0~(
ACTAG
some/random/identifier
like image 768
dnk8n Avatar asked Mar 05 '23 10:03

dnk8n


1 Answers

julia> String(UInt8.([125, 119, 48, 126, 40]))
"}w0~("

Explanation

in Julia Strings are constructed using a set of bytes. If you are using ASCII only the char-byte mapping is simple and you can directly work on raw data (which is also the fastest way to do that).

Note that since Julia Strings are immutable, when creating String from raw bytes, the initial bytes become unavailable - this also means that no data is copied in the String creation process. Have a look at the example below:

julia> mybytes = UInt8.([125, 119, 48, 126, 40]);

julia> mystring = String(mybytes)
"}w0~("

julia> mybytes
0-element Array{UInt8,1}

Performance note

Strings in Julia are not internalized. In analytics scenarios always consider using Symbols instead of Strings. In some scenarios using temperature=:hot instead of temperature="hot" can mean 3x shorter execution time.

EDIT - performance test

julia> using Random, BenchmarkTools;Random.seed!(0);

bb = rand(33:126,1000);

julia> @btime join(Char.($bb));
  31.573 μs (13 allocations: 6.56 KiB)

julia> @btime String(UInt8.($bb));
  711.111 ns (2 allocations: 2.13 KiB)

String(UInt8.($bb)) is over 40x faster and uses 1/3 of the memory

like image 200
Przemyslaw Szufel Avatar answered Apr 07 '23 10:04

Przemyslaw Szufel