XXE attacks and disabling remote entity loading when using Python's sax library
- 21 February, 2013 11:22
- Comments
If you need to work with XML in Python, there are a couple of libraries you can use. One of the more popular one is the xml.sax library which is included in the default installation of Python.
Building a basic parser in xml.sax is easy. For example, if we wanted to write a parser to load some XML as follows:
my_xml="""
<person>
<name>John Smith</name>
<company>Smith Pty Ltd</company>
</person>
"""
We would like the code to parse it into a simple Python object:
class MyObject:
def __init__(self):
self.name = None
self.company = None def __repr__(self):
return "%s (%s)" % (self.name, self.company)
Doing so in xml.sax is only a few lines of code. A simple parser for this type of XML would look like this:
class MyContentHandler(xml.sax.ContentHandler):
def __init__(self, object):
xml.sax.ContentHandler.__init__(self)
self.object = object def startElement(self, name, attrs):
self.chars = ""
def endElement(self, name):
if name=="name":
obj.name = self.chars
elif name=="company":
obj.company = self.chars
def characters(self, content):
self.chars += content
Given these bits of code, most programmers will then parse the XML like this:
obj = MyObject()parser = MyContentHandler(obj)
xml.sax.parseString(my_xml, parser)
print obj.__repr__()
Running this code will output:
John Smith (Smith Pty Ltd)
There is a problem with doing this type of parsing however. In recent times there has been a growing concern over untrusted XML and the attack vectors that are introduced by XML parsers. One of the main concerns is known as a XML External Entity (XXE) vulnerability. There are two methods which can be used to load content from outside of an XML file. They are an ENTITY declaration and a DTD declaration.Let's check out an example by modifying our XML to look as follows:
my_xml="""
<!DOCTYPE person [<!ENTITY remote SYSTEM
"http://www.computerworld.com.au/xml.test">]>
<person>
<name>John Smith &remote;</name>
<company>Smith Pty Ltd</company>
</person>
"""
If we run our previous code on this command, the returned value will be:
John Smith Hello (Smith Pty Ltd)
This is because the entity called remote is causing SAX to reach out to that address and fetching the results. This can cause issues if the files are huge (or never ending) such as file:///dev/random or may contain sensitive information such as file:///etc/passwdYou can verify what HTTP calls are being made by using a tool such as TCPdump, or the great HTTPretty library. For example, using the HTTPretty library let's us do the following:
from httpretty import HTTPrettyHTTPretty.enable()
obj = MyObject()
parser = MyContentHandler(obj)
xml.sax.parseString(my_xml, parser)
print obj.__repr__()
print HTTPretty.latest_requests
HTTPretty.disable()
This will print out the HTTP requests made between HTTPretty.enable() and the HTTPretty.disable(), in this case:
HTTPrettyRequest(headers=Host: www.computerworld.com.au
User-Agent: Python-urllib/1.17
, body="")
Unfortunately, disabling the loading of foreign entities is a little counterintuitive. The main issue we are faced with is that xml.sax.parseString (and its sibling, xml.sax.parse) do not give you the ability to set any options. This means we have to re-implement part of those functions to get access to the settings we want to change. In our case, the code would look as follows:
obj = MyObject()contentParser = MyContentHandler(obj)
parser = xml.sax.make_parser()
parser.setContentHandler(contentParser)
parser.setFeature(xml.sax.handler.feature_external_ges, 0)
parser.parse(StringIO.StringIO(my_xml))
print obj.__repr__()
Turning off the remote entity loading is done through the setFeature function. Once we have changed the code, running the code will no longer reach out to the network. Doing this will also disable the loading of remote DTDs which is another vector for such attacks, and which will probably result in faster code loads as you will not be reaching out over the network to get DTD files. This is something that people like the W3C will appreciate.
- Bookmark this page
- Share this article
- Got more on this story? Email TechWorld
- Follow TechWorld on twitter
- World Quality Report - The State of Quality 2012
- Managing the Rapid Rise in Database Growth: 2011 IOUG Survey on Database Manageability
- Governance For All - Empowering IT and Business Content Owners
- How the Cloud Changes the Game for Line of Business Managers in Midsize Companies
- Leading Through Connections – Insights from the Global Chief Executive Officer Study
-
Philip's 'smart' lightbulbs hit Australia
-
Philip's 'smart' lightbulbs hit Australia
-
Bitcoin finding its feet at first Silicon Valley conference
-
Australia lags Mongolia in Internet speeds
-
Salesforce.com to buy Clipboard, shutting down service













Recent comments
13 hours, 20 minutes ago
15 hours, 56 minutes ago
1 day, 5 hours ago
5 days, 15 hours ago
1 week ago
1 week ago
1 week, 3 days ago
1 week, 4 days ago
1 week, 4 days ago
1 week, 4 days ago