Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this code use more and more memory over time?

Python: 3.11 Saxonche: 12.4.2

My website keeps consuming more and more memory until the server runs out of memory and crashes. I isolated the problematic code to the following script:

import gc
from time import sleep

from saxonche import PySaxonProcessor


xml_str = """
<root>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
    <stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
</root>
"""

while True:
    print('Running once...')
    with PySaxonProcessor(license=False) as proc:
        proc.parse_xml(xml_text=xml_str)

    gc.collect()
    sleep(1)

This script consumes memory at a rate of about 0.5 MB per second. The memory usage does not plateau after a while. I have logs showing that memory usage continues to grow for hours until the server runs out of memory and crashes.

Other things I tried that aren't shown above:

  • Using a PyDocumentBuilder to parse the XML instead of a PySaxonProcessor. It didn't appear to change anything.
  • Deleting the Saxon processor and the return value of parse_xml() using the del Python keyword. No change.

I have to use Saxon instead of lxml because I need XPath 3.0 support.

What am I doing wrong? How do I parse XML using Saxon in a way that doesn't leak?


A few folks have suggested that instantiating the PySaxonProcessor once before the loop will fix the leak. It doesn't. This still leaks:

with PySaxonProcessor(license=False) as proc:
    while True:
        print('Running once...')
        proc.parse_xml(xml_text=xml_str)

        gc.collect()
        sleep(1)
like image 487
Rainbolt Avatar asked Sep 19 '25 11:09

Rainbolt


2 Answers

There's clearly a failure to properly clean up once the context manager terminates - i.e., PySaxonProcessor.__exit__ isn't doing what it (probably) should do.

You need to contact the developer(s) as this isn't a Python issue per se. You are not doing anything wrong.

The problem can be replicated as follows:

from saxonche import PySaxonProcessor
import psutil

count = 0
process = psutil.Process()
prev = process.memory_info().rss

for _ in range(100):
    with PySaxonProcessor(license=False):
        pass
    if (count := count + 1) % 10 == 0:
        m = process.memory_info().rss
        print(f"{m - prev:,}")
        prev = m

Platform:

macOS 14.4.1
Python 3.12.2
M2

Output:

2,228,224
2,244,608
2,260,992
2,244,608
2,228,224
2,244,608
2,244,608
2,228,224
2,228,224
like image 188
Ramrab Avatar answered Sep 21 '25 03:09

Ramrab


It looks like a memory leak. I created a bug to track it: https://saxonica.plan.io/issues/6391

And the issue is now fixed in the released SaxonC 12.5.

like image 42
Norm Avatar answered Sep 21 '25 02:09

Norm