Dashboards & Visualizations

lxml etree.parse() gives "Document is empty" when using file input

StephenMarwick
Explorer

I'm writing a Django application within Splunk. Part of this is validating xml that my form generates.

When I try to validate the form using lxml, the etree.parse() function raises an exception that is displayed on the browser:

Error: XMLSyntaxError at /en-gb/<app>/<form_url>/ Document is empty, line 1, column 1

My unit tests work fine from outside of Splunk.

My code validator code:

from lxml import etree
class XmlValidator():
    def __init__(self, xsd_filename):
        xmlschemadoc = etree.parse(xsd_filename)       # this line fails
        self.xmlschema = etree.XMLSchema(xmlschemadoc)

My form code:

from django import forms
from XmlValidator import XmlValidator
class MyForm(forms.Form):
    def __init__(self, schema_filename):
        self.xsd_filename = schema_filename
    <snip>
    def clean(self):
        validator = XmlValidator(self.xsd_filename)
        <snip>

I've added code instrumentation to check the file is valid (absolute path, exists, permissions, etc).
I can open the file within the code, read the content and parse the content, just not the file itself. However, I need to pass in the filename so the schemas can include other schemas - this is only supported through the file interface.

I'm using:


  • Splunk 6.1, which comes with:

    • Django 1.5.5

    • Python 2.7.5

    • lxml 2.3.2


Any suggestions appreciated.

Tags (3)
0 Karma
1 Solution

StephenMarwick
Explorer

The problem is in the parser.

There must be some compatibility issue, because the supplied ETCompatXMLParser works, where the default XMLParser doesn't. (I am also using ElementTree within my application, though I don't know if this is related to my problem).

Old code:

xmlschemadoc = etree.parse(xsd_filename)       # this line fails

New code:

parser = etree.ETCompatXMLParser()
xmlschemadoc = etree.parse(xsd_filename, parser)       # this line now works

View solution in original post

StephenMarwick
Explorer

The problem is in the parser.

There must be some compatibility issue, because the supplied ETCompatXMLParser works, where the default XMLParser doesn't. (I am also using ElementTree within my application, though I don't know if this is related to my problem).

Old code:

xmlschemadoc = etree.parse(xsd_filename)       # this line fails

New code:

parser = etree.ETCompatXMLParser()
xmlschemadoc = etree.parse(xsd_filename, parser)       # this line now works

akheraj_splunk
Splunk Employee
Splunk Employee

Nice find! This had me stumped for a while before I came across your solution.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...