Assignment title: Information
Integration layer on top of two databases
DESCRIPTION
A major step in information integration is to provide access and retrieve information from different
databases. You have already created an Inventory database in a previous homework. From now on we
will use that database but we will refer to it in this assignment with a different name – let us call it local
DB1 and it belongs to a fictitious company you own (MyComp).
MyComp has been doing really well for the past few years and it has merged with another company
(BuddyComp) which already has an inventory database – let us call it local DB2. The local DB2 database
is actually a set of tables and records that is provided to you as a script with SQL commands. You need to
run this script in your Oracle account to create and populate the tables that comprise the local DB2
database. In reality you do not have two separate databases in Oracle; instead, you have several tables
in your Oracle account:
Tables that make up your Inventory database (from HW2) which we now call local DB1
Tables that are generated by the script and make up local DB2
The purpose of this homework is to create an integration/metadata/ontology layer on top of these two
inventory databases, local DB1 and local DB2. Think of the integration layer as the metadata layer which
contains information describing the two databases. This homework is about creating this metadata
Creation of a metadata layer. You need to create a metadata layer to be used for integration of
information from the participating databases. This metadata layer is a table (or a set of couple of tables)
which contain information about the local database schemas (local DB1 and local DB2, which both
reside in your Oracle account. The information in the integration layer contains the following:
1. Canonical representation. This is the name you use to refer to an entity (table name, field
name) at the metadata layer. For example, you may use the name "Client" in local DB1, and
"Patron" in local DB2. However, these are "local names" representing customers at each local
DB. At the metadata layer you may use the "Customer" name to identify either Clients or
Patrons (this is just an example; there could be different field names in the actual databases).
The canonical representation of an entity is the "global" name of that entity at the metadata
layer. It can be used to describe the two corresponding names of the same entity, one in local
DB1, and the other in local DB2. Customer (canonical name); Client (local DB1); Patron (local
DB2)
2. Data Types and other semantic differences. This is a description of the local data types used for
each column name in each DB. For example, varchar2(20), number, etc. You may also store
function that converts data from the canonical representation to local databases.
3. Additional fields. If you want you may use additional information about the correspondences or
translations between canonical to local.
You may use any structure of table(s) that you think it is appropriate to identify information that is
stored as data in the two participating databases (local DB 1 and local DB 2). This is a very important
task since it will provide the basis for the integration. I would suggest that for each concept
(entity/field/etc.) that is present in each local db, you need to use three representations:
1. a canonical representation (global level) for the concept. This is a representation that you use to
identify the concept globally (see first column in the example table below).
2. a local representation for local DB1: it is the name of the column in local DB1 that represents
that same concept. Also the data type of that column in local DB1 (see columns 2 and 3 in
example table below)
3. a local representation for local DB2: it is the name of the column in local DB2 that represents
that same concept. Also the data type of that column in local DB1 (see columns 4 and 5 in
example table below)
See an example table below for a very simple representation of two concepts: a customer (represented
in both DBs) and a product ID (also in both DBs). Note that all data entered in that table are strings
Canonical
Representation
'Customer' 'Client' 'Varchar2(20)' 'Patron' 'Char(50)'
'Product_ID' 'CD_ID' 'Varchar2(10)' 'CD' 'number'
… … … … …
You may need to add more fields and more tables to be able to capture semantic and syntactic
differences that are present in the two local databases. In essence, you need to create another set of
tables (one or more) that will act as the metadata/integration layer on top of DB1, and DB2. You don't
create views on top of the existing tables of local DB1 and local DB2. You need to create actual tables
with metadata about DB1 tables and fields, and DB2 tables and fields. All the above tables are in your
own account (basically, they are just tables in your account).
Column name
in local DB1
Data Type in local
DB1
Column name
in local DB2
How can one create a metadata/integration layer?
Look at the schema of local DB1.
Look at the schema of local DB2.
Identify semantically similar fields and for each field create a column in the metadata table:
Column 1: Provide a name (canonical representation) of that field.
Column 2: Identify the corresponding name (local DB1 name) of the same field in DB1
Column 3: Identify the data type of that field in DB1
Column 4: Identify a function to map the canonical name to the DB2 name (if applicable)
Column 5: Identify the corresponding name (local DB2 name) of the same field in DB2
Column 6: Identify the data type of that field in DB2
Column 7: Identify a function to map the canonical name to the DB2 name (if applicable)
The results from bullet list above are going to be inserted as a single record in the table which
represents the metadata layer. If you need additional fields, (e.g., how to convert from local to
canonical - such as CONCAT (f1, f2), back and forth) feel free to add additional column(s) in your
integration table and populate them accordingly.
Fields that are not present in both databases have NULL values in some of the columns in the
metadata table.
For aggregate fields type the function that puts them together (e.g. CONCAT, or SUM, etc.)
Add table names also in the metadata table (just as you have added fields). The data type for
tables should be the string 'TABLE'
Let me reiterate that this assignment is about metadata. The final output of this assignment is to
make sure that you have enough information in your metadata (integration) layer.
DELIVERABLES
Create a file in MS Word or pdf. This file should contain the following:
1. Populating Local DB2. Run the script that creates and populates the local DB2 database. It
has been posted on Bb, under Homework Assignments. Download it, study the relational
schema and run it on your Oracle account.
Preservation of local schemas. You are NOT allowed to modify the schema of the two
existing databases (local DB1 and local DB2). However, you may insert more data records in
any existing table.
a. [20 points] Insert records in local DB1: 2 CDs of your choice and 2 books of your choice.
b. [30 points] Insert the same records in the schema of local DB2. Submit the INSERT
commands that you have used for a. and b., and also show a screen shot with the new
records.
2. Creation of the integration (metadata) layer
c. [20 points] E-R diagram of your integration layer (this is a very very simple one)
d. [30 points] Screen-shots with queries showing "select * from …" for each table you
created in the metadata (integration) layer (not the local DB2).