Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python copy larger file too slow

Tags:

I am trying to copy a large file (> 1 GB) from hard disk to usb drive using shutil.copy. A simple script depicting what I am trying to do is:-

import shutil src_file = "source\to\large\file" dest = "destination\directory" shutil.copy(src_file, dest) 

It takes only 2-3 min on linux. But the same file copy on same file takes more that 10-15 min under Windows. Can somebody explain why and give some solution, preferably using python code?

Update 1

Saved the file as test.pySource file size is 1 GB. Destinantion directory is in USB drive. Calculated file copy time with ptime. Result is here:-

ptime.exe test.py  ptime 1.0 for Win32, Freeware - http://www. Copyright(C) 2002, Jem Berkes <jberkes@pc-t  ===  test.py ===  Execution time: 542.479 s 

542.479 s == 9 min. I don't think shutil.copy should take 9 min for copying 1 GB file.

Update 2

Health of the USB is good as same script works well under Linux. Calculated time with same file under windows native xcopy.Here is the result.

ptime 1.0 for Win32, Freeware - http://www.pc-tools.net/ Copyright(C) 2002, Jem Berkes <[email protected]>  ===  xcopy F:\test.iso L:\usb\test.iso 1 File(s) copied  Execution time: 128.144 s 

128.144 s == 2.13 min. I have 1.7 GB free space even after copying test file.

like image 216
sundar_ima Avatar asked Feb 15 '14 15:02

sundar_ima


People also ask

Does increasing the buffer size improve copy performance in Python?

Increasing the buffer size had previously given a 2x improvement in large file copy performance under Java. My initial copy performance under Python using the standard libraries was much too slow. By changing the buffer size, I increased large file copy performance by 2x over the default buffer size.

How do I copy a large file in Python?

I frequently need to copy very large files (20GB+). The most used Python function to do copies is shutil.copyfile which has a default buffer size of 16384 bytes.

Why is my Python program so slow?

Even if the raw data fits in memory, the Python representation can increase memory usage even more. And that means either slow processing, as your program swaps to disk, or crashing when you run out of memory. One common solution is streaming parsing, aka lazy parsing, iterative parsing, or chunked processing .

Why is NumPy so slow?

This is much slower than simply using built-in Python methods, which is due to time spent converting between Python and Numpy data structures. So, just keep in mind that while Numpy plays well with Python data structures, it is much faster when working solely with Numpy.


Video Answer


2 Answers

Just to add some interesting information: WIndows does not like the tiny buffer used on the internals of the shutil implementation.

I've quick tried the following:

  • Copied the original shutil.py file to the example script folder and renamed it to myshutil.py
  • Changed the 1st line to import myshutil
  • Edited the myshutil.py file and changed the copyfileobj from

def copyfileobj(fsrc, fdst, length=16*1024):

to

def copyfileobj(fsrc, fdst, length=16*1024*1024):

Using a 16 MB buffer instead of 16 KB caused a huge performance improvement.

Maybe Python needs some tuning targeting Windows internal filesystem characteristics?

Edit:

Came to a better solution here. At the start of your file, add the following:

import shutil  def _copyfileobj_patched(fsrc, fdst, length=16*1024*1024):     """Patches shutil method to hugely improve copy speed"""     while 1:         buf = fsrc.read(length)         if not buf:             break         fdst.write(buf) shutil.copyfileobj = _copyfileobj_patched 

This is a simple patch to the current implementation and worked flawlessly here.

Python 3.8+: Python 3.8 includes a major overhaul of shutil, including increasing the windows buffer from 16KB to 1MB (still less than the 16MB suggested in this ticket). See ticket , diff

like image 159
mgruber4 Avatar answered Sep 18 '22 05:09

mgruber4


Your problem has nothing to do with Python. In fact, the Windows copy process is really poor compared to the Linux system.

You can improve this by using xcopy or robocopy according to this post: Is (Ubuntu) Linux file copying algorithm better than Windows 7?. But in this case, you have to make different calls for Linux and Windows...

import os import shutil import sys  source = "source\to\large\file" target = "destination\directory"  if sys.platform == 'win32':     os.system('xcopy "%s" "%s"' % (source, target)) else:     shutil.copy(source, target) 

See also:

  • Actual Performance, Perceived Performance, Jeff Atwood blog post about this subject
  • Windows xcopy is not working in python, which gives the syntax for using xcopy on Windows in fact
like image 37
Maxime Lorant Avatar answered Sep 19 '22 05:09

Maxime Lorant