Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split byte string into lines

How can I split a byte string into a list of lines?

In python 2 I had:

rest = "some\nlines" for line in rest.split("\n"):     print line 

The code above is simplified for the sake of brevity, but now after some regex processing, I have a byte array in rest and I need to iterate the lines.

like image 539
Flavius Avatar asked Dec 13 '12 10:12

Flavius


People also ask

Can you slice bytes in Python?

We can slice bytearrays. And because bytearray is mutable, we can use slices to change its contents. Here we assign a slice to an integer list.

Is byte [] same as string?

Byte objects are sequence of Bytes, whereas Strings are sequence of characters. Byte objects are in machine readable form internally, Strings are only in human readable form. Since Byte objects are machine readable, they can be directly stored on the disk.

Can you concatenate bytes?

You can only concatenate a sequence with another sequence. bytes(a[0]) gives you that because a[0] is an integer, and as documented doing bytes(someInteger) gives you a sequence of that many zero bytes (e.g,, bytes(3) gives you 3 zero bytes). {a[0]} is a set.


2 Answers

There is no reason to convert to string. Just give split bytes parameters. Split strings with strings, bytes with bytes.

>>> a = b'asdf\nasdf' >>> a.split(b'\n') [b'asdf', b'asdf'] 
like image 185
Janus Troelsen Avatar answered Sep 23 '22 18:09

Janus Troelsen


Decode the bytes into unicode (str) and then use str.split:

Python 3.2.3 (default, Oct 19 2012, 19:53:16)  [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a = b'asdf\nasdf' >>> a.split('\n') Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: Type str doesn't support the buffer API >>> a = a.decode() >>> a.split('\n') ['asdf', 'asdf'] >>>  

You can also split by b'\n', but I guess you have to work with strings not bytes anyway. So convert all your input data to str as soon as possible and work only with unicode in your code and convert it to bytes when needed for output as late as possible.

like image 21
warvariuc Avatar answered Sep 21 '22 18:09

warvariuc