Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Reading From and Writing to Binary Files

The following is my question re-worded

Reading the first 10 bytes of a binary file (operations later) -

infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')
x = infile.read(10)
for i in x:
    print(i, end=', ')
print(x)
outfile.write(bytes(x, "UTF-8"))

The first print statement gives -

255, 216, 255, 224, 0, 16, 74, 70, 73, 70, 

The second print statement gives -

b'\xff\xd8\xff\xe0\x00\x10JFIF'

a hexadecimal interpretation of the values in x.

outfile.write(bytes(x, "UTF-8"))

returns -

TypeError: encoding or errors without a string argument

Then x must not be a normal string but rather a byte string, which is still iterable?

If I want to write the contents of x to outfile.jpg unaltered then I go -

outfile.write(x)

Now I try to take each x [i] and perform some operation on each (shown below as a bone simple product of 1), assign the values to y and write y to outfile.jpg such that it is identical to infile.jpg. So I try -

infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')
x = infile.read(10)

yi = len(x)
y = [0 for i in range(yi)]

j = 0
for i in x:
    y [j] = i*1
    j += 1

for i in x:
    print(i, end=', ')

print(x)

for i in y:
    print(i, end=', ')

print(y)

print(repr(x))
print(repr(y))

outfile.write(y)

The first print statement (iterating through x) gives -

255, 216, 255, 224, 0, 16, 74, 70, 73, 70,

The second print statement gives -

b'\xff\xd8\xff\xe0\x00\x10JFIF'

The third print statement (iterating through y) gives -

255, 216, 255, 224, 0, 16, 74, 70, 73, 70,

The print statement gives -

[255, 216, 255, 224, 0, 16, 74, 70, 73, 70]

And finally, printing repr(x) and repr(y), as suggested by Tim, gives, respectively -

b'\xff\xd8\xff\xe0\x00\x10JFIF'
[255, 216, 255, 224, 0, 16, 74, 70, 73, 70]

And the file write statement gives the error -

TypeError: 'list' does not support the buffer interface

What I need is y to be the same type as x such that outfile.write(x) = outfile.write(y)

I stare into the eyes of the Python, but still I do not see its soul.

like image 498
brett Avatar asked Oct 02 '22 02:10

brett


2 Answers

They're not identical at all - they just display identically after str() is applied to them (which print() does implicitly). Print the repr() of them and you'll see the difference. Example:

>>> x = b'ab'
>>> y = "b'ab'"
>>> print(x)
b'ab'
>>> print(y) # displays identically
b'ab'
>>> print(repr(x)) # but x is really a 2-byte bytes object
b'ab'
>>> print(repr(y)) # and y is really a 5-character string
"b'ab'"

Mixing strings and bytes objects doesn't make sense (well, not in the absence of an explicit encoding - but you're not trying to encode/decode anything here, right?). If you're working with binary files, then you shouldn't be using strings at all - you should be using bytes or bytearray objects.

So the problem isn't really in how you're writing: the logic is fundamentally confused before then.

Can't guess what you want. Please edit the question to show a complete, executable example of what you're trying to accomplish. We don't need JPG files for this - make up some short, arbitrary binary data. Like:

dummy_jpg = b'\x01\x02\xff'
like image 193
Tim Peters Avatar answered Oct 13 '22 11:10

Tim Peters


... and this is how you you read and write to files in Python in binary mode.

#open binary files infile and outfile
infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')

#n = bytes to read
n=5

#read bytes of infile to x
x = infile.read(n)

#print x type, x
print()
print('x = ', repr(x), type(x))
print()

x = b'\xff\xd8\xff\xe0\x00' class 'bytes'

#define y of type list, lenth xi, type list
xi = len(x)
y = [0 for i in range(xi)]

#print y type, y
print('y =', repr(y), type(y))
print()

y = [0, 0, 0, 0, 0] class 'list'

#convert x to 8 bit octals and place in y, type list
j=0
for i in x:
    y [j] = '{:08b}' .format(ord(i))
    j += 1

#print y type, and y
print('y =', repr(y), type(y))
print()

y = ['11111111', '11011000', '11111111', '11100000', '00000000'] class 'list'

#perform bit level operations on y [i], not done in this example.

#convert y [i] back to integer
j=0
for i in y:
    y [j] = int(i, 2)
    j += 1

#print y type, and y
print('y =', repr(y), type(y))
print()

y = [255, 216, 255, 224, 0] class 'list'

#convert y to type byte and place in z
z = bytearray(y)

#print z type, and z
print('z =', repr(z), type(z))
print()

z = bytearray(b'\xff\xd8\xff\xe0\x00') class 'bytearray'

#output z to outfile
outfile.write(z)

infile.close()
outfile.close()
outfile = open('outfile.jpg', 'rb')

#read bytes of outfile to x
x = outfile.read(n)

#print x type, and x
print('x =', repr(x), type(x))
print()

x = b'\xff\xd8\xff\xe0\x00' class 'bytes'

#conclusion:  first n bytes of infile = n bytes of outfile (without bit level operations)

outfile.close()
like image 36
brett Avatar answered Oct 13 '22 11:10

brett