Data Integration Wizard
(Last updated: 08-10-2004)
- Project Members
- Goals
- Abstract
- Introduction
- Design
- Implementation
- Experimentation
- Software Requirements
- Hardware Requirements
- Workplan
- Deliverables
- References
- Project Links
Project Members
Goals
The goal of this program is to develop a wizard on top of the existing Data Conversion Tool. The purpose of this wizard is to allow users to find data from different sources and tables and integrate these into new tables without the need to be an expert on relational databases, the database structure or even the application domain.
Abstract
The Data Integration Wizard is an application that guides the user through the steps necessary for data integration in a much more linear fashion than the Data Conversion Tool. The user is guided through finding the data it wants using an intelligent keyword search that attempts to match data based on its semantic value rather than literal equivalence. The user is subsequently guided through creating the result tables containing the relations (s)he desires without the need for the user to understand relational database design.
Introduction
The Data Integration Wizard will build upon the existing Data Conversion Tool, using it primarily as a back end to perform the actual integration step. Certain parts of the UI code may also be useful, but this is yet to be determined.
In order to determine the semantics of the keywords entered by the user, as well as the existing data in the database, a set of rules is necessary. These rules will be formulated in as general a syntax as possible, allowing for easy extension to fit future scenarios. These rules will be formulated in XML (eXtensible Markup Language), perhaps using parts of XML Schema (XSD).
The possibility of updating these rules based on an adaptive learning process by the Wizard as the user interacts with it will also be investigated.
Design
The design of the application will largely depend on the XML formats that will be used. Expect details about this format here later.
Implementation
Our implementation will be largely dictated by the structure of the existing Data Conversion Tool. As such our development platform will consist primarily of Borland C++Builder 5. Microsoft Visual C++ 6 is also used by the DCT, but only for the backend. Since we don't expect to modify the backend, our work should be mainly in Borland.
Other implementation ideas such as a web application have been rejected on the basis that it would take too much time to make the code we need from the DCT available to such an application.
We will need an XML parser, the most likely choice will be Microsoft XML Core Services 4.0. In addition it looks like we'll need a regular expressions library. Boost is a likely candidate.
Experimentation
None so far.
Software Requirements
- Borland C++Builder 5 with Update 1
- Microsoft Visual C++ 6 with Service Pack 6
- Microsoft XML Core Services 4.0
- Boost C++ Libraries (version 1.31 with the regex_patch)
Hardware Requirements
No special hardware requirements.
Workplan
- 21-04-2004: Preliminary presentation
- 28-05-2004: Completed XML format and wizard design
- 01-06-2004: Completed initial prototype
- 02-06-2004: Final presentation
- End of june: Completion of project
Deliverables
DataTypeXml source code (2004-10-08 snapshot): [zip]
This archive contains all source files related to parsing the DataTypeXml file and
matching strings based on the datatypes defined in that file.
It is a Borland C++ Builder 5 project, with a small console-based sample application
to demonstrate its usage. MSXML4 and Boost are required to build it.
DataTypeXml sample application: [zip]
This is a compiled version of the sample application mentioned above. The MSXML4
runtime is required to execute it. View sample output here.