Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 3 - reading a file within zipped archive places 'b' character at start of each line

Tags:

python

file

zip

In the code below, I always get a strange output that places b before every line. Just the letter b.

E.g. a sample output is like this:

[b'2017-06-01,15:19:57,']

The script itself is this:

from zipfile import ZipFile

with ZipFile('myarchive.zip','r') as myzip:
    with myzip.open('logs/logfile1.txt') as myfile:
        next(myfile)
        print(myfile.readlines())

The archive has a single folder in it called "logs" and inside logs there are several text files, each with lines below an empty first line (hence the next(myfile)

It places the b before the data, no matter which file I try to read. If there are multiple lines in a file it outputs something like this:

[b'2017-06-01,15:06:28,start session: \n', b'2017-06-01,15:06:36,stop session']

Why is it placing the pesky b there?

like image 616
omrakhur Avatar asked Dec 13 '22 21:12

omrakhur


2 Answers

In Python 3.x there is a distinction between strings and bytes data. When representing bytes as strings Python adds b prefix to denote that. If you want to treat your bytes as strings, you first need to decode them into a string:

your_string = your_bytes.decode("utf-8") 

Of course, the codec you'll use depends on how your string was encoded into bytes in the first place.

like image 69
zwer Avatar answered Dec 16 '22 09:12

zwer


Because zip is binary format and while reading from it it gives bytes instead of str.

you can convert using str.decode()

for example

>>>byte_string = b'2017-06-01,15:06:28,start session: \n'
>>>byte_string.decode()
2017-06-01,15:06:28,start session: \n

will give you the desired str.

like image 33
Rahul Avatar answered Dec 16 '22 11:12

Rahul