Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python logging extremely slow on Linux server... but fast on Linux development VM?

Using the exact same code, it takes about 50ms every call to a logger method (such as logger.debug) on the server, while on the dev machine it's less than 1ms. The loggers are outputting to files, with a little bit of formatting.

Other than the slowdowns with logging, the server is twice as fast.

I'm developing on Ubuntu 11.04 (Gnome) running inside VMWare on Windows 7. The server is running Ubuntu Server 11.04 (no GUI, pure console). The logging module is the official "logging" module ("import logging ... logger = logging.getLogger('mylogger')").

Any idea what could be causing this? It is extremely frustrating!

Thanks for any help!

EDIT: Both machines return "Python 2.7.1+" for their version. Both machines are running 64-bit Ubuntu.

Hard drive configuration the server is software RAID-1, while in the dev computer just a single drive.

EDIT2: Accepted Fabian's answer as it was thorough, although it didnt't quite solve the problem.

Solution: Writing to the console, period, is extremely slow. I tested writing X to file, and writing X to the console, and it was about 100x slower to the console. I don't know why that would be, but I just ran what I was running with ssh from another computer and everything was solved.

like image 392
Albeit Avatar asked Jul 11 '12 19:07

Albeit


People also ask

Why is my Python code so slow?

In summary: code is slowed down by the compilation and interpretation that occurs during runtime. Compare this to a statically typed, compiled language which runs just the CPU instructions once compilated. It's actually possible to extend Python with compiled modules that are written in C.

How do I fix a slow running Linux server problem?

Limit the amount of memory the app is using (for example, on a web server, limit the number of processes available to serve requests) until the condition abates, or add more memory to the server. App is slow because the server is doing lots of I/O. Look for high values of IO/bi and IO/bo, and CPU/wa.

Why does Python take so long to run?

Python is an interpreted language, which means that the Python code we write must go through many, many stages of abstraction before it can become executable machine code. Just In Time (JIT) Compiler: Other interpreted languages like Java/.

What slows down a Python program?

If your Python code runs too fast, a call to time. sleep() is a simple way to slow down your code.


1 Answers

As noted in the comments, a possible reason is disk speed differences between the development VM and the production machine. Do you have the same kind of drives in both systems, eg. SSD, SATA vs. SCSI, spindle speed and cache, and so on? Your environments are quite different in terms of IO. Desktop Windows and VMWare will use aggressive disk caches while your production Linux machine is likely to err on the safe side and wait for data to be committed to disk more frequently. Maybe the Windows machine has drivers that are better suited for the kind of disk it has, while the server is running without optimisations? The file system differences are also significant, and the hardware is probably different enough to cause significant differences in IO speed. You could also have large differences in CPU and RAM speed. Desktop machines nowadays are often more focused on raw speed, while server hardware will focus more on reliability. You know your setup best, so you can probably compare the two systems in terms of hardware performance.

Beyond that, here's how you can find out what's really going on:

First, write a MWE to test logging. You should base it on your real code, and use logging in a similar way, but here's a small example:

import logging

logging.basicConfig(filename="test.log", level=logging.DEBUG)
logger = logging.getLogger("testlogger")

for i in range(0, 1000000):
    logger.info("iteration: %d", i)

Then run the script under cProfile both in your development and production machine. Be sure to log to the same filesystem as in your problem case, otherwise the results won't be applicable.

python -m cProfile testlogging.py

You will get output that looks like this:

57000501 function calls in 137.072 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.000    0.000 UserDict.py:4(__init__)
    1    0.000    0.000    0.000    0.000 __init__.py:1044(_fixupParents)
    1    0.000    0.000    0.000    0.000 __init__.py:1085(Logger)
    2    0.000    0.000    0.000    0.000 __init__.py:1100(__init__)
    1    0.000    0.000    0.000    0.000 __init__.py:1112(setLevel)
.....    .....    .....    .....    ..... ...............................

This should give you an idea of what is causing the slowness on the production machine. Things to look for in particular:

  • Look for lines reading {method 'write' of 'file' objects} and {method 'flush' of 'file' objects}. This will tell you how much time Python has spent writing to files and flushing the data to disk – in this case, the log file. Are there significant differences between the two machines? If so, it's definitely a difference in IO (disk) speed. You should then take a look at the disk setup of the server and see if there is anything you can do to get better disk performance.
  • Look for lines where the first percall column is particularly large. This column is the total time spent in the function divided by the number of calls to that function. Compare between the two machines, and you might find what's causing the difference.
  • Look for lines where the tottime column is particularly large. This column is the total time spent in the function. Again, compare between the two machines, and you may find some reasons for the speed difference.

If you find that disk IO seems to be the issue, you can do an additional test with just raw writing to files. You can probably find a benchmarking program that allows you to test disk throughput, but you could also just write a simple C (or Python) program that writes unformatted data to a file to make sure it really is pure disk performance that is the difference.

Just one last note: performance testing is, like programming, a mix of art, science, and engineering, and while there are patterns and advice you can follow, every case needs a bit of inventiveness to crack. So try things, make sure you don't fool yourself, and have fun! Good luck!

like image 141
Fabian Fagerholm Avatar answered Oct 17 '22 05:10

Fabian Fagerholm