Say I have very large bytes object (after loading binary file) and I want to read parts by parts and advance the starting position until it meets the end. I use slicing to accomplish this. I'm worried that python will create completely new copy each time I ask for a slice instead of simply giving me the address of the memory pointing to the position I want. Simple example: <pre class="prettyprint lang-py prettyprint-override"><code>data = Path("binary-file.dat").read_bytes() total_length = len(data) start_pos = 0 while start_pos < total_length: bytes_processed = decode_bytes(data[start_pos:]) # <---- *** start_pos += bytes_processed </code></pre> In the above example does python creates completely new copy of bytes object starting from the <code>start_pos</code> due to the slicing. If so what is the best way to avoid data copy and use just a pointer to pass to the relevant position of the bytes array.

Yes, slicing a bytes object does create a copy, at least as of CPython 3.9.12. The closest the documentation comes to admitting this is in the description of the <code>bytes</code> constructor: <blockquote> In addition to the literal forms, bytes objects can be created in a number of other ways: <ul> <li>A zero-filled bytes object of a specified length: <code>bytes(10)</code> </li> <li>From an iterable of integers: <code>bytes(range(20))</code> </li> <li>Copying existing binary data via the buffer protocol: <code>bytes(obj)</code></li> </ul> </blockquote> which suggests any creation of a bytes object creates a separate copy of the data. But since I had a hard time finding an explicit confirmation that slicing does the same, I resorted to an empirical test. <pre class="prettyprint"><code>>>> b = b'\1' * 100_000_000 >>> qq = [b[1:] for _ in range(20)] </code></pre> After executing the first line, memory usage of the <code>python3</code> process in <code>top</code> was about 100 MB. The second executed after a considerable delay, making memory usage rise to the level of 2G. This seems pretty conclusive. PyPy 7.3.9 targetting Python 3.8 behaves largely the same; though of course, PyPy’s garbage collection is not as eager as CPython’s, so the memory is not freed as soon as the <code>bytes</code> objects become unreachable. To avoid copying the underlying buffer, wrap your <code>bytes</code> in a <code>memoryview</code> and slice that: <pre class="prettyprint"><code>>>> bm = memoryview(b) >>> qq = [bm[1:] for _ in range(50)] </code></pre>

Does slicing bytes object creates a whole new copy of data in python

Q: Why use bytearray Python?

The Python bytearray() function converts strings or collections of integers into a mutable sequence of bytes. It provides developers the usual methods Python affords to both mutable and byte data types. Python's bytearray() built-in allows for high-efficiency manipulation of data in several common situations.

Q: How do you break bytes in Python?

Solution: To split a byte string into a list of lines—each line being a byte string itself—use the Bytes. split(delimiter) method and use the Bytes newline character b'\n' as a delimiter.

Tags:

python

python-bytearray

Say I have very large bytes object (after loading binary file) and I want to read parts by parts and advance the starting position until it meets the end. I use slicing to accomplish this. I'm worried that python will create completely new copy each time I ask for a slice instead of simply giving me the address of the memory pointing to the position I want.

Simple example:

data = Path("binary-file.dat").read_bytes()
total_length = len(data)
start_pos = 0

while start_pos < total_length:
   bytes_processed = decode_bytes(data[start_pos:])  # <---- ***
   start_pos += bytes_processed

In the above example does python creates completely new copy of bytes object starting from the start_pos due to the slicing. If so what is the best way to avoid data copy and use just a pointer to pass to the relevant position of the bytes array.

781

asked Jun 23 '20 10:06

Tekz

1 Answers

Yes, slicing a bytes object does create a copy, at least as of CPython 3.9.12. The closest the documentation comes to admitting this is in the description of the bytes constructor:

In addition to the literal forms, bytes objects can be created in a number of other ways:

A zero-filled bytes object of a specified length: bytes(10)

From an iterable of integers: bytes(range(20))

Copying existing binary data via the buffer protocol: bytes(obj)

which suggests any creation of a bytes object creates a separate copy of the data. But since I had a hard time finding an explicit confirmation that slicing does the same, I resorted to an empirical test.

>>> b = b'\1' * 100_000_000
>>> qq = [b[1:] for _ in range(20)]

After executing the first line, memory usage of the python3 process in top was about 100 MB. The second executed after a considerable delay, making memory usage rise to the level of 2G. This seems pretty conclusive. PyPy 7.3.9 targetting Python 3.8 behaves largely the same; though of course, PyPy’s garbage collection is not as eager as CPython’s, so the memory is not freed as soon as the bytes objects become unreachable.

To avoid copying the underlying buffer, wrap your bytes in a memoryview and slice that:

>>> bm = memoryview(b)
>>> qq = [bm[1:] for _ in range(50)]

133

answered Oct 31 '22 20:10

user3840170

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does slicing bytes object creates a whole new copy of data in python

Tags:

python

python-bytearray

Tekz

People also ask

1 Answers

user3840170

Recent Activity

Donate For Us

Does slicing bytes object creates a whole new copy of data in python

Tags:

python

python-bytearray

Tekz

People also ask

1 Answers

user3840170

Related questions

Recent Activity

Donate For Us