I am trying to copy a large file (> 1 GB) from hard disk to usb drive using shutil.copy
. A simple script depicting what I am trying to do is:-
import shutil src_file = "source\to\large\file" dest = "destination\directory" shutil.copy(src_file, dest)
It takes only 2-3 min on linux. But the same file copy on same file takes more that 10-15 min under Windows. Can somebody explain why and give some solution, preferably using python code?
Update 1
Saved the file as test.pySource file size is 1 GB. Destinantion directory is in USB drive. Calculated file copy time with ptime. Result is here:-
ptime.exe test.py ptime 1.0 for Win32, Freeware - http://www. Copyright(C) 2002, Jem Berkes <jberkes@pc-t === test.py === Execution time: 542.479 s
542.479 s == 9 min. I don't think shutil.copy
should take 9 min for copying 1 GB file.
Update 2
Health of the USB is good as same script works well under Linux. Calculated time with same file under windows native xcopy.Here is the result.
ptime 1.0 for Win32, Freeware - http://www.pc-tools.net/ Copyright(C) 2002, Jem Berkes <[email protected]> === xcopy F:\test.iso L:\usb\test.iso 1 File(s) copied Execution time: 128.144 s
128.144 s == 2.13 min. I have 1.7 GB free space even after copying test file.
Increasing the buffer size had previously given a 2x improvement in large file copy performance under Java. My initial copy performance under Python using the standard libraries was much too slow. By changing the buffer size, I increased large file copy performance by 2x over the default buffer size.
I frequently need to copy very large files (20GB+). The most used Python function to do copies is shutil.copyfile which has a default buffer size of 16384 bytes.
Even if the raw data fits in memory, the Python representation can increase memory usage even more. And that means either slow processing, as your program swaps to disk, or crashing when you run out of memory. One common solution is streaming parsing, aka lazy parsing, iterative parsing, or chunked processing .
This is much slower than simply using built-in Python methods, which is due to time spent converting between Python and Numpy data structures. So, just keep in mind that while Numpy plays well with Python data structures, it is much faster when working solely with Numpy.
Just to add some interesting information: WIndows does not like the tiny buffer used on the internals of the shutil implementation.
I've quick tried the following:
import myshutil
def copyfileobj(fsrc, fdst, length=16*1024):
to
def copyfileobj(fsrc, fdst, length=16*1024*1024):
Using a 16 MB buffer instead of 16 KB caused a huge performance improvement.
Maybe Python needs some tuning targeting Windows internal filesystem characteristics?
Edit:
Came to a better solution here. At the start of your file, add the following:
import shutil def _copyfileobj_patched(fsrc, fdst, length=16*1024*1024): """Patches shutil method to hugely improve copy speed""" while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf) shutil.copyfileobj = _copyfileobj_patched
This is a simple patch to the current implementation and worked flawlessly here.
Python 3.8+: Python 3.8 includes a major overhaul of shutil, including increasing the windows buffer from 16KB to 1MB (still less than the 16MB suggested in this ticket). See ticket , diff
Your problem has nothing to do with Python. In fact, the Windows copy process is really poor compared to the Linux system.
You can improve this by using xcopy
or robocopy
according to this post: Is (Ubuntu) Linux file copying algorithm better than Windows 7?. But in this case, you have to make different calls for Linux and Windows...
import os import shutil import sys source = "source\to\large\file" target = "destination\directory" if sys.platform == 'win32': os.system('xcopy "%s" "%s"' % (source, target)) else: shutil.copy(source, target)
See also:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With