Enterprise Information Integration


The relational model given by Codd enable a simple logical model, in mathematical foundation based on set-oriented queries. The relational database evolved in decades. The use cases for accessing information changed. Data marts, reporting, dashboards, security, governance and so on… RDBMS provided an efficient transaction semantics and support. But, world has changed, global economics became complex and data semantics became unmanageable. The corporations grown bigger, swallowing the tiny prawns around the globe. Many database vendors like Oracle, DB2, SQL Server. Many packaged apps SAP, Siebel, homegrown. Structured, unstructured, semi-structured and so many exotic data sources… Federated databases provided a centralized schema mapping logically the schemas from different databases. It faced challenges of schema integration, query translations etc. Virtual Databases, some called them.

Use of XML as standard for data exchange became a boon for all. It provided standards for schema, XSD, querying, XQuery etc. It is not a holy-grail though. The paradigm SOA, integrate systems, applications, processes and businesses. As the enterprise information ecosystem has grown like a behemoth, multiple producers/consumers forced to share data across its domain. Proliferation of data models results data chaos.

EII decouples data sources from data consumers without moving data. It provides universal data access facilitating dynamic dashboards or reports. By decouple application logic from integration; one can easily add/change sources with customization of the content delivery. It acts like a mediation engine for the source and client. Standardized data access APIs like ODBC, JDBC offer a specific set of commands to retrieve and modify data from a generic data source or one can use Web Services.

Then there are SDOs, SDO is an object-based programming specification for uniform access to various forms of data. It is designed to complement existing J2EE data access mechanisms, including EJBs, JDBC, and a number of popular frameworks for persistence. It is also intended to support service-oriented architectures (SOA), including Web services. SDO programmers work with data graphs (collections of tree-structured data) that consist of one or more data objects. The data graph structure is well suited to certain types of Web applications, such as those that display master-detail information and is easily understood by most object-oriented application programmers. It is based on Model Driven Architecture.

An EII system provides a single interface and extensible metadata that can encompass all data sources, speeding the development of applications such as enterprise portals. Development is faster as programmer’s code for access to a single virtual data source regardless of data storage. It is like a metadata-based view of sources. Virtual views optimize execution of distributed queries. So modeled with metadata, makes many data sources look like one single entity. Information modelers like XBRL could be used to publish data for reporting purposes. The EII server creates the query for each data source. It is intelligent to understand multiple data sources. The data sources are mapped to create the view i.e. data to xml binding to metadata. The metadata is stored in a repository. These reusable integrated logical entities or data objects can be exposed as a data service or as jdbc/odbc.

One of the key issues faced in data integration projects is locating and understanding the data to be integrated. Often, one would find that the data needed for a particular integration, application is not even captured in any source in the enterprise . ie the developer has to once again model the domain from different sources. Mapping from multiple systems and schema is really a cumbersome process and a human intervention becomes essential with more headaches. As an EII system do provide writing data, the transaction management across the system to be managed and information retrieval become complex when mapping data from structured and unstructured data. EII is not a perfect solution as it needs to have the performance and scalability similar to the ones expected from RDBMS. Even when it matures and succeeds, EII will not replace the data warehouse. Depending on the problem to be solved, integrated data will be persisted into a warehouse, or virtualized through EII.

Refrences

http://www.ibm.com/developerworks/data/library/techarticle/dm-0407saracco/

http://www.cs.washington.edu/homes/alon/files/eiisigmod05.pdf

http://www.jboss.com/products/platforms/dataservices/


No comments:

Post a Comment