In this article I will describe how to prevent Saxon from parsing external entities to avoid XXE attacks.
Basically you should be very careful when parsing XML files from untrusted sources. Otherwise this can lead to serious security issues.
XML External Entity Attack
test.xml
1 2 3 4 5 6 |
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY xxe SYSTEM "file:///etc/passwd" >]> <foo>&xxe;</foo> |
If you parse arbitrary XML files from users and do not set up your XML parser correctly, the example above would lead to the content of the file „etc/passwd“ being automatically embedded into the parsed XML structure.
Saxon
If your code parses an XML (test.xml see above) via Saxon like the code shown below, then this would to the content of „/etc/passwd“ being printed out.
1 2 3 4 5 |
Processor processor = new Processor(false); DocumentBuilder documentBuilder = processor.newDocumentBuilder(); final XdmNode source = documentBuilder.build(new File("test.xml")); System.out.println(source.getStringValue()); |
In other XML parser implementations, there typically is a setter for features, where you can pass in settings like
1 2 |
DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance(); dfactory.setFeature("http://xml.org/sax/features/external-general-entities", false); |
In Saxon this setting was actually hidden quite well. In Saxon this setting has to be passed on the actual XML parser by concatenating two setting strings. This took me quite some time to figure out ( see below).
1 2 |
processor.setConfigurationProperty( FeatureKeys.XML_PARSER_FEATURE + "http://xml.org/sax/features/external-general-entities", false); |
After putting this setting in, the code above does not longer print the content of the included file.
Hope this little article saves time for some of you guys.
If you have any questions or comments, please leave a message below.