I'm working on a web app where users will be able to supply strings that the server will then substitute variables into. Preferably I'd like to use PEP 3101 <code>format()</code> syntax and I'm looking at the feasibility of overriding methods in <code>Formatter</code> to make it secure for untrusted input. Here are the risks I can see with <code>.format()</code> as it stands: <ul> <li>Padding lets you specify arbitrary lengths, so <code>'{:>9999999999}'.format(..)</code> could run the server out of memory and be a DOS. I'd need to disable this.</li> <li>Format lets you access the fields inside objects, which is useful, but it's creepy that you can access dunder variables and start drilling into bits of the standard library. There's no telling where there might be a <code>getattr()</code> that has side effects or returns something secret. I would whitelist attribute/index access by overriding <code>get_field()</code>.</li> <li>I'd need to catch some exceptions, naturally.</li> </ul> My assumptions are: <ul> <li>None of the traditional C format string exploits apply to Python, because specifying a parameter is a bounds-checked access into a collection, rather than directly popping off the thread's stack.</li> <li>The web framework I'm using escapes every variable that's substituted into a page template, and so long as it's the last stop before output, I'm safe from cross-site scripting attacks emerging from de-escaping.</li> </ul> What are your thoughts? Possible? Impossible? Merely unwise? <hr> Edit: Armin Ronacher outlines a nasty information leak if you don't filter out dunder variable access, but seems to regard securing <code>format()</code> as feasible: <pre class="prettyprint lang-python prettyprint-override"><code>{local_foo.__init__.__globals__[secret_global]} </code></pre> Be Careful with Python's New-Style String Format | Armin Ronacher's Thoughts and Writings Personally, I didn't actually go the untrusted <code>format()</code> route in my product, but am updating for the sake of completeness

Good instinct. Yes, an attacker being able to supply arbitrary format string is a vulnerability under python. <ul> <li>The denial of service is probably the most simple to address. In this case, limiting the size of the string or the number of operators within the string will mitigate this issue. There should be a setting where no reasonable user will need to generate a string with more variables than X, and this amount of computation isn't at risk of being exploited in a DoS attack.</li> <li>Being able to access attributes within an object could be dangerous. However, I don't think that the <code>Object</code> parent class has any useful information. The object supplied to the format would have to contain something sensitive. In any case, this type of notation can limited with a regular expression.</li> <li>If the format strings are user supplied then a user might need to know the error message for debugging. However, error mesages can contain senstive information such as local paths or class names. Make sure to limit the information that an attacker can obtain.</li> </ul> Look over the python format string specification and forbid functionality you don't want the user to have with a regex.

Can Python's string .format() be made safe for untrusted format strings?

Tags:

I'm working on a web app where users will be able to supply strings that the server will then substitute variables into.

Preferably I'd like to use PEP 3101 format() syntax and I'm looking at the feasibility of overriding methods in Formatter to make it secure for untrusted input.

Here are the risks I can see with .format() as it stands:

Padding lets you specify arbitrary lengths, so '{:>9999999999}'.format(..) could run the server out of memory and be a DOS. I'd need to disable this.
Format lets you access the fields inside objects, which is useful, but it's creepy that you can access dunder variables and start drilling into bits of the standard library. There's no telling where there might be a getattr() that has side effects or returns something secret. I would whitelist attribute/index access by overriding get_field().
I'd need to catch some exceptions, naturally.

My assumptions are:

None of the traditional C format string exploits apply to Python, because specifying a parameter is a bounds-checked access into a collection, rather than directly popping off the thread's stack.
The web framework I'm using escapes every variable that's substituted into a page template, and so long as it's the last stop before output, I'm safe from cross-site scripting attacks emerging from de-escaping.

What are your thoughts? Possible? Impossible? Merely unwise?

Edit: Armin Ronacher outlines a nasty information leak if you don't filter out dunder variable access, but seems to regard securing format() as feasible:

{local_foo.__init__.__globals__[secret_global]}

Be Careful with Python's New-Style String Format | Armin Ronacher's Thoughts and Writings

Personally, I didn't actually go the untrusted format() route in my product, but am updating for the sake of completeness

344

asked Mar 12 '13 08:03

Craig Timpany

1 Answers

Good instinct. Yes, an attacker being able to supply arbitrary format string is a vulnerability under python.

The denial of service is probably the most simple to address. In this case, limiting the size of the string or the number of operators within the string will mitigate this issue. There should be a setting where no reasonable user will need to generate a string with more variables than X, and this amount of computation isn't at risk of being exploited in a DoS attack.
Being able to access attributes within an object could be dangerous. However, I don't think that the Object parent class has any useful information. The object supplied to the format would have to contain something sensitive. In any case, this type of notation can limited with a regular expression.
If the format strings are user supplied then a user might need to know the error message for debugging. However, error mesages can contain senstive information such as local paths or class names. Make sure to limit the information that an attacker can obtain.

Look over the python format string specification and forbid functionality you don't want the user to have with a regex.

answered Oct 23 '22 20:10

rook

Related questions
                            
                                How to see the compiler output when running javac through an Ant task?
                            
                                Java program terminate with java result: 137
                            
                                How do I render three.js in node.js?
                            
                                Visual Studio 2013 Change Authentication on Existing Project
                            
                                Node.js http-proxy drops websocket requests
                            
                                What is going on when I set app.wsgi_app = ProxyFix(app.wsgi_app) when running a Flask app on gunicorn?
                            
                                Cross Platform Logging in Xamarin
                            
                                Is there anyway to link iPython notebooks and PyCharm, especially regarding debugging?
                            
                                Is it possible to use asp.net mvc 6 in Visual studio 2013?
                            
                                iOS setting.bundle issue with iOS 8 at simulator
                            
                                Single Click Preview File with Phpstorm Like Sublime Text
                            
                                How can I make GHCI release memory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With