I'm currently working on a multi-threaded downloader with help of PycURL module. I am downloading parts of the files and merging them afterwards.
The parts are downloaded separately from multiple threads , they are written to temporary files in binary mode, but when I merge them into single file(they are merged in correct order) , the checksums do not match.
This only happens in linux env. The same script works flawlessly in Windows env.
This is the code(part of the script) that merges the files:
with open(filename,'wb') as outfile:
print('Merging temp files ...')
for tmpfile in self.tempfile_arr:
with open(tmpfile, 'rb') as infile:
shutil.copyfileobj(infile, outfile)
print('Done!')
I tried write()
method as well , but it results with same issue, and it will take a lot of memory for large files.
If I manually cat
the part files into a single file in linux, then file's checksum matches, the issue is with python's merging of files.
EDIT:
Here are the files and checksums(sha256) that I used to reproduce the issue:
file merged manually using cat
Command used:
for i in /tmp/pycurl_*_{0..7}; do cat $i >> manually_merged.tar.gz; done
Part files - numbered at the end, from 0 through 7
Python File I/O - Read and Write Files. In Python, the IO module provides methods of three types of IO operations; raw binary files, buffered binary files, and text files. The canonical way to create a file object is by using the open() function.
In Python, the struct module is used to read and save packed binary data. This module contains a number of methods that allow you to get a packed object on a specified format string.
A minimally reproducible case would be convenient, but I'd suspect universal newlines to be the issue: by default, if your files are windows-style text (newlines are \r\n
) they're going to get translated to Unix-style newlines (\n
) on reading. And then those unix-style newlines are going to get written back to the output file rather than the Windows-style ones you were expecting. That would explain the divergence between python and cat
(which'd do no translation whatsoever).
Try to run your script passing newline=''
(the empty string) to open
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With