I do a lot of work for a UK based Job Search Engine called AllTheTopBananas.com. They index job data from lots of different providers and make it available to search in one place. Some of this job data comes into AllTheTopBananas via feeds, mainly in XML. Recently I was talking to one of their providers who uses .Net, who was telling me that they couldn’t provide a complete export of their data in an XML format, because doing so causes their server to run out of memory.
I’ve seen this problem before in a previous job, and in that case it stemmed from using an XMLDocument to build the feed. The XMLDocument keeps the entire XML data set in memory until you specifically write it out. For XML files that are just one chunk of xml for each row from a database, as in the case outlined here, there is a better way – the XMLTextWriter. The XMLTextWriter works in a similar way to a StreamWriter, so it writes the data out to your XML file immediately, which in turn keeps your memory usage down.
Using the XMLTextWriter is a little different in terms of technique to using an XMLDocument, so I’ll outline an example of how to use one.
Dim xtwOutput As System.Xml.XmlTextWriter
Dim sdrJobs As System.Data.SqlClient.SqlDataReader
xtwOutput = New System.Xml.XmlTextWriter("sample.xml", Text.Encoding.GetEncoding("iso-8859-1"))
'Configure the XMLTextWriter to format the XML nicely, and write the root element's open tag
With xtwOutput
.Formatting = System.Xml.Formatting.Indented
.Indentation = 1
.IndentChar = ControlChars.Tab
.WriteStartDocument()
.WriteStartElement("jobs")
End With
sdrJobs = GetJobsForExport()
While sdrJobs.Read
'Write the <job> element and its children
With xtwOutput
.WriteStartElement("job")
.WriteElementString("title", sdrJobs("title").ToString())
.WriteElementString("description", sdrJobs("description").ToString())
.WriteElementString("location", sdrJobs("location").ToString())
.WriteElementString("salary", sdrJobs("salary").ToString())
.WriteEndElement() '</job>
End With
End While
sdrJobs.Close()
'End the document - this will close any open elements
With xtwOutput
.WriteEndDocument()
End With
xtwOutput.Close()
In the code above, the first thing I do with the XMLTextWriter (line 4) is create a new one, and initialise the encoding to “iso-8859-1″. Setting the encoding is an important step that shouldn’t be overlooked. Consumers of your xml file expect there to be an xml declaration at the top of the file to tell them the encoding of the file. This must match the actual encoding of the file. It is also important to choose an encoding that is able to represent the characters you intend to put in the file – pound signs can sometimes cause problems, for example (That’s pound as in GBP, not #…). I usually use “iso-8859-1″ as it seems to give me the best results, though “utf-8″ is another encoding that seems to work in most cases.
I then set up the XMLTextWriter to indent my output (lines 7-14). This isn’t strictly necessary, but it makes the file much more human-readable. I also write the xml declaration (.WriteStartDocument()) and the root node of the xml file.
I then call a function that returns a SQLDataReader (line 16). I’ve not included the code that actually instantiates the SQLDataReader as that’s not part of the point I’m trying to illustrate.
Within the While sdrJobs.Read…End While (lines 18-30) I output a single job element, along with its children, to the xml file. See below for an example of the output file.
Finally, once the While loop exits I close the SQLDataReader, call WriteEndDocument on the XMLTextWriter to close any open elements and close the XMLTextWriter itself.
The xml output from the code above looks like this:
<?xml version="1.0" encoding="iso-8859-1"?>
<jobs>
<job>
<title> Test Job Title 1 </title>
<description> Test Description 1</description>
<salary> £10,000 pa </salary>
<location> Test Location 1 </location>
</job>
<job>
<title> Test Job Title 2 </title>
<description> Test Description 2</description>
<salary> £10,000 pa </salary>
<location> Test Location 2 </location>
</job>
<job>
<title> Test Job Title 3 </title>
<description> Test Description 3</description>
<salary> £10,000 pa </salary>
<location> Test Location 3 </location>
</job>
</jobs>
As you can see, the formatting I asked for has been carried out, and we have a nice readable xml file. So how much memory did I use to generate it?
Running the code above on a test database where the SQLDataReader returns 50,000 rows, the process uses less than 15MB of memory at its peak, and outputs a 75MB xml file. By contrast, loading the resulting XML file into an XMLDocument uses over 150MB – over 10 times as much memory.
Hopefully that explains how an XMLTextWriter can be used to output large amounts of xml much more efficiently than an XMLDocument is capable of. If you found this article useful, for my next post I’m intending to show what I think is a neat way of combining the XMLTextReader and XMLDocument to read our large xml file back in and parse it.