Back to the project page

Designing the Data Layer

By Sven Groot

One of the first things we realised is that when you are going to create a program that looks up tunes in a database you are going to need a database. By going with a traditional three-tiered approach we chose to have our data layer (third tier or the back end) interchangable. This is important, because while we don't foresee large amounts of data during our own development and testing, this could theoretically be needed at some point in the future. This approach allows us to write a relatively simple data layer, which doesn't need a database server, ODBC drivers, a DSN, and whatever else, while still allowing such a more scalable solution to be developed at a later date.

The solution we chose for our own "database" is to use XML files. XML files are easy to create, maintain and program against, and I have a large amount of experience in doing so. Our platform of choice would be Microsoft XML Core Services 4.0 (MSXML).

What I then envisioned is this: a system that would allow the user (programmer using the data layer) to access the songs in sequence using the iterator paradigm defined by the C++ Standard Template Library. This is sufficient, because we will always use a forward-only scan through the database, computing the distance between the Parsons Code the user provided and the Parsons Code in the database for each entry. For performance reasons, I wanted to scan through the XML files, reading and parsing elements only when they were requested by the iterator class. This is a highly scalable approach, as it requires no more memory than keeping any single song related data (parsons code, title, etc) in memory. I knew that such functionality was readily available in MSXML. Or so I thought!

As it turns out, I was confusing a few things. What would be needed for my approach is a pull-model reader that parses individual elements based on calls by the user, keeps no context (it doesn't build an internal tree) and doesn't parse ahead in the file. Such a parser exists, and it is even written by Microsoft. But it's implemented in the System.Xml.XmlTextReader class from the Microsoft .Net Framework, not in MSXML!

Which left two alternatives: using a different parser than MSXML, or a pre-load the entire XML file. Because I didn't feel like learning the ins and outs of a new parser, I decided on the latter. Because creating a DOM (Document Object Model) wastes a lot of memory, I decided instead to rely on on SAX (Simple API for XML), specifically the SAX2 implementation provided by MSXML4. SAX defines a push-model (event based) reader that keeps no context. It is in fact quite like what I originally wanted, but because it is a push-model parser it can't wait for external input to continue reading, but has to process the whole document at once.

By handling SAX events, I then fill a std::list with SONG objects. This has the disadvantage that you have to keep all songs in memory, but because far less data is stored than for a DOM, it is still more acceptable than using the XMLDOMDocument class.

Of course, a std::list defines a bidirectional iterator, while my original approach would only have allowed an input iterator. I have specifically chosen not to use this more flexible iterator type myself, because it's not really necessary for our search algorithm, and allows the possibility of adding a different data layer (for instance, one developed using Microsoft .Net using aforementioned XmlReader class) that would need such constraints.