Task at hand:
Provide a quick and safe way for mapping the data model across few (6-7) medium size SQL databases (each having 20 - 100 tables, all together few billion records).
This work is part of replacing a suite of legacy apps with a new fancy one. All of them are pushing a large and quite complex stream of data into an ETL process aimed to maintain a large data mart.
First step into this adventure is to be able to generate some (detailed) data dictionaries. They should be used as support in defining the data migration and data flows and hould be understood by both non-technical business stakeholders as well as not so data wise developers.
This work is part of replacing a suite of legacy apps with a new fancy one. All of them are pushing a large and quite complex stream of data into an ETL process aimed to maintain a large data mart.
First step into this adventure is to be able to generate some (detailed) data dictionaries. They should be used as support in defining the data migration and data flows and hould be understood by both non-technical business stakeholders as well as not so data wise developers.
Because there are few distinct teams working together, it is also desired to have a safe mechanism for generating these data dictionaries on an ongoing base. It would be great if changes to data dictionaries could be easily deployed and promoted across various testing and staging environments.
A significant constraint is to be able to provide a first draft of the data dictionaries with a first attempt of cross-domain mapping in less then 4 days. I have to use free/ open source tools.
Obviously, there is not too much time to read all these great books related to Data Dictionaries, metadata management and other cool metaphysics
.
(Cheap) Solutions:
Used long time ago the precursor of Microsoft Visio: Visio Modeler.
The tool is really interesting, especially in area of ORM (Object Role Modeling), but it suffers of few major pitfalls:
A significant constraint is to be able to provide a first draft of the data dictionaries with a first attempt of cross-domain mapping in less then 4 days. I have to use free/ open source tools.
Obviously, there is not too much time to read all these great books related to Data Dictionaries, metadata management and other cool metaphysics
(Cheap) Solutions:
Used long time ago the precursor of Microsoft Visio: Visio Modeler.
The tool is really interesting, especially in area of ORM (Object Role Modeling), but it suffers of few major pitfalls:
- Is (very) old and therefore does not work nicely in extracting physical catalogs from a SQL 2000 database (it has issues in retrieving primary and foreign keys).
- Most important, even the data dictionary tools are quite impressive, there is still the issue of synchronizing back with the real database.
Then I found a very nice little tool called Dictionary Data Creator
It is not by far as sophisticated as VisioModeler, but it has some great advantages:
- It works (slow export and import)
- Provides ability of defining your own additional properties for column descriptions
- It is applied direct on the database schema
- You can export and import .sql and .xml data dictionaries on other database deployments
- The interface is not very nice (I dream a graphic environment such as Visio or EM Diagram Editor
- The width of various columns/ properties cannot be changed
- The height of rows does not scale with the volume of text
- The tool allows Excel export but not the import of the same format. Importing Excel will be a Holly Grail for many non geeky business analysts.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=53847912-1203-41c0-a97d-452326d9386e)