I am using a new bioinformatics tool called Giggle and I have installed the python wrapper on my system. Even though the scenario is quite specific, I think the problem is quite general. This function:
index = Giggle.create("index", "HMEC_hg19_BroadHMM_ALL.bed")
should create an index based on several (or in this case one) .bed file. The bed files look like this:
chr1 10000 10600 15_Repetitive/CNV 0 . 10000 10600 245,245,245
chr1 10600 11137 13_Heterochrom/lo 0 . 10600 11137 245,245,245
chr1 11137 11737 8_Insulator 0 . 11137 11737 10,190,254
chr1 11737 11937 11_Weak_Txn 0 . 11737 11937 153,255,102
chr1 11937 12137 7_Weak_Enhancer 0 . 11937 12137 255,252,4
chr1 12137 14537 11_Weak_Txn 0 . 12137 14537 153,255,102
chr1 14537 20337 10_Txn_Elongation 0 . 14537 20337 0,176,80
It is basically a large tab delimited file containing genomic intervals and their corresponding chromosome. When running the above command I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "giggle/giggle.pyx", line 25, in giggle.giggle.Giggle.create
TypeError: expected bytes, str found
I have no clue why this is happening and I have tried converting the files to other types of encoding but nothing worked. The code snippet to which the error refers is as follows:
def create(self, char *path, char *glob):
giggle_bulk_insert(to_bytes(glob), to_bytes(path), 1)
return Giggle(path)
I am using Python 3.6 on a Linux subsystem for windows 10.
The problem is that in python 3 strings are represented as unicode strings, not byte strings as it was the case in python 2. When you install giggle and run your code using python 2 everything works fine. But you can do:
index = Giggle.create("index".encode('utf-8'), "HMEC_hg19_BroadHMM_ALL.bed".encode('utf-8'))
or alternatively
index = Giggle.create(b"index", b"HMEC_hg19_BroadHMM_ALL.bed")
to have explicit byte strings. It worked for me, up to the point that giggle complains about the .bed
file being incorrectly formatted (I probably messed up the format when copying)
Update: There is another issue that comes up when calling it like described above:
File type not supported 'HMEC_hg19_BroadHMM_ALL.bed'
Which is caused by the underlying lib giggle
only accepting .bed.gz
files, which can be seen in python-giggle/lib/giggle/src/file_read.c
:
if ( (strlen(i->file_name) > 7) &&
strcmp(".bed.gz", file_name + strlen(i->file_name) - 7) == 0) {
i->type = BED;
}
So I am assuming that the Readme at the python-giggle site is not correct in claiming that you can call it with .bed
files.
I tested it with one of the files provided in python-giggle\lib\giggle\test\data
and it ran without an error
The create()
method expects byte strings:
create(self, char *path, char *glob):
Cython can only accept bytes
objects in Python 3, str
in Python 2, to convert to a char
array automatically.
Either pass in bytes
objects when you call the method (encoding your str
objects first), or alter that method signature to accept str
unicode strings. See Accepting strings from Python code in the Cython tutorial.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With