Assignment title: Information
DS 715 – Assignment#4
Performing OLAP with SQL
Objectives:
a. Demonstrate use of SQL to perform OLAP operations;
Estimated time to complete: 40 min reading +45-50 min. solve queries
Points: 25/100.
Tasks:
In this assignment we are going to use the SQL ROLLUP and CUBE
operators to generate and store a three dimensional cube with hierarchies
based on the tables in the database given by the description below:
Tb_Supplier(Supp_ID, Name, City, State)
Tb_Consumer(Con_ID, Name, City, State)
Tb_Product(Prod_ID, Name, Product_Category, Product_Line, Product_Packaging)
Tb_Offers(Supp_ID, Prod_ID, Quantity, Price)
Tb_Requests(Con_ID, Prod_ID, Quantity, Price)
Tb_Transactions(Tran_ID, Supp_ID, Con_ID, Prod_ID, Quantity, Price)
and represented by the diagram:
DS 715 – Assignment#4 Page 2
The stored cube will then be used to efficiently solve SQL OLAP queries.
The ROLLUP Operator
ROLLUP and CUBE are SQL extension operators one would place in the
GROUP BY clause of an SQL SELECT statement in order to compute and
add aggregates to the result set returned by the SELECT statement.
The ROLLUP operator is applied to a set of columns that represent a
concept hierarchy and performs a rollup on one or more levels of the
hierarchy.
For example run the following query against your database:
SELECT State, COUNT(Supp_ID) "#Suppliers"
FROM Tb_Supplier
GROUP BY ROLLUP(State)
and compare the result with what is returned by the following simple
GROUP BY query:
SELECT State, COUNT(Supp_ID) "#Suppliers"
FROM Tb_Supplier
GROUP BY State
Carefully comparing the results of the two queries reveals that the ROLLUP
query adds an extra row to the result, and that row represents the rollup of
the numbers of suppliers per state over all states in the database. This is an
example of rolling up by climbing a hierarchy.
Note that the extra row has a NULL value on the column representing the
state, i.e. the rolled-up column.
As another, very similar example, compare:
SELECT City, COUNT(Supp_ID) "#Suppliers"
FROM Tb_Supplier
GROUP BY ROLLUP(City)
with
SELECT City, COUNT(Supp_ID) "#Suppliers"
FROM Tb_Supplier
GROUP BY City
DS 715 – Assignment#4 Page 3
For an example that shows the effect of climbing two levels in a hierarchy
compare the result of this query:
SELECT State, City, COUNT(Supp_ID) "#Suppliers"
FROM Tb_Supplier
GROUP BY ROLLUP(State, City)
with
SELECT State, City, COUNT(Supp_ID) "#Suppliers"
FROM Tb_Supplier
GROUP BY State, City
One can notice that the ROLLUP adds two types of rows to the result of the
simple GROUP BY query:
- Rows with City=NULL and State a state name where the last column
represents the number of suppliers in each state – rolling-up cities by
state;
- One row with City=NULL and State=NULL where the last column
represents the total number of suppliers – rolling-up states.
Note that in the ROLLUP parameters list the columns are listed in the
descending order of the concept hierarchy (State > City).
Actually, running the query below will return a meaningless result:
SELECT State, City, COUNT(Supp_ID) "#Suppliers"
FROM Tb_Supplier
GROUP BY ROLLUP(City, State)
The CUBE Operator
The CUBE operator is applied on a number of attributes called the cube
dimensions and calculates aggregates on all possible subsets of its
dimensions. This is an example of rollups by dimension reductions.
For example a cube on the supplier, consumer and product dimensions is the
equivalent of the union of the results from 8 different GROUP BY queries.
Run the following query and carefully observe the result it returns:
DS 715 – Assignment#4 Page 4
SELECT DISTINCT S.Name "Supplier Name",
C.Name "Consumer Name",
P.Name "Product Name",
SUM(Quantity) "Total Transactions Quantity",
COUNT(Tran_ID) "Number of Transactions"
FROM Tb_Supplier S, Tb_Consumer C, Tb_Product P, Tb_Transactions T
WHERE S.Supp_ID=T.Supp_ID AND
C.Con_ID=T.Con_ID AND
P.Prod_ID=T.Prod_ID
GROUP BY CUBE(S.Name, C.Name, P.Name)
On your sample database this cube returns a total of 396 rows.
There are 8 kinds of rows in this result, and each kind represents a cuboid:
- One row where "Supplier Name"=NULL, "Consumer Name"=NULL,
"Product Name"=NULL, and the last two columns contains the
grand totals of their corresponding aggregates – this is a rollup by
reducing dimensions: supplier, consumer and product and represents
the apex node in the lattice of cuboids;
- Rows where "Supplier Name"=NULL, "Consumer Name"=NULL,
while "Product Name" has the name of a product and the last two
columns represent aggregates over each product – these rows
represent rollup by reducing dimensions: supplier and consumer;
- . . .
- Rows where "Supplier Name", "Consumer Name", and "Product
Name" have all non-null values – this is similar to a GROUP BY
query on the 3 dimension columns, and represent the base node in the
lattice of cuboids, i.e. zero dimensions are reduced.
If we want a group of columns to be handled as one single column by the
CUBE operator we would place those columns within another set of
parenthesis.
In the query example below the group of S.State, S.City, S.Name columns,
and group of C.State, C.City, C.Name columns are handled as one single
column each.
The query produces a result with 433 rows of the same 8 types as the first
CUBE query.
Run the following query and carefully observe the result it returns:
DS 715 – Assignment#4 Page 5
SELECT DISTINCT S.Name "Supplier Name",
S.City "Supplier City",
S.State "Supplier State",
C.Name "Consumer Name",
C.City "Consumer City",
C.State "Consumer State",
P.Name "Product Name",
SUM(Quantity) "Total Transactions Quantity",
COUNT(Tran_ID) "Number of Transactions"
FROM Tb_Supplier S, Tb_Consumer C, Tb_Product P, Tb_Transactions T
WHERE S.Supp_ID=T.Supp_ID AND
C.Con_ID=T.Con_ID AND
P.Prod_ID=T.Prod_ID
GROUP BY CUBE((S.State, S.City, S.Name),
(C.State, C.City, C.Name),
P.Name)
Notice that in all rows where "Supplier Name” or "Supplier City" or
"Supplier State" is NULL the other two will also be NULL.
Same is true for "Consumer Name", "Consumer City" and "Consumer State".
After all this is still a 3 dimensional cube.
Indeed this cube contains rollups on supplier state and consumer state, but
not on supplier or consumer name or city.
Cubes with Hierarchies
The columns "Supplier Name", "Supplier City" and "Supplier State",
and "Consumer Name" with "Consumer City" and "Consumer State",
represent two concept hierarchies in the supplier and consumer dimensions.
Adding the missing rollups to the cube above is straightforward and can be
done by adding ROLLUP operator for the two concept hierarchies to the
cube calculation as below (notice the parameter attributes are listed in the
descending order of the concept hierarchy – higher level concepts first):
DS 715 – Assignment#4 Page 6
SELECT DISTINCT S.Name "Supplier Name",
S.City "Supplier City",
S.State "Supplier State",
C.Name "Consumer Name",
C.City "Consumer City",
C.State "Consumer State",
P.Name "Product Name",
SUM(Quantity) "Total Transactions Quantity",
COUNT(Tran_ID) "Number of Transactions"
FROM Tb_Supplier S, Tb_Consumer C, Tb_Product P, Tb_Transactions T
WHERE S.Supp_ID=T.Supp_ID AND
C.Con_ID=T.Con_ID AND
P.Prod_ID=T.Prod_ID
GROUP BY CUBE((S.State, S.City, S.Name),
(C.State, C.City, C.Name),
P.Name),
ROLLUP(S.State, S.City, S.Name),
ROLLUP(C.State, C.City, C.Name)
The result is a 3 dimensional cube with two 3-level hierarchies producing on
your database a total of 1414 rows which can be placed in 32 cuboids
(4*4*2=32).
Notice we will have rows where:
"Supplier Name" "Supplier City" "Supplier State"
NULL - - --rollup supplier name
NULL NULL - --rollup supplier name
and city
and
"Consumer Name" "Consumer City" "Consumer State"
NULL - - --rollup consumer name
NULL NULL - --rollup consumer name
and city
and all their combinations, but never any of the patterns below:
DS 715 – Assignment#4 Page 7
"Supp/Cons Name" "Supp/Cons City" "Supp/Cons State"
NULL - NULL
- NULL NULL
- - NULL
Notice that calculating a cube on all the attributes, as in:
GROUP BY cube(S.State, S.City, S.Name, C.State, C.City, C.Name, P.Name)
would generate a result with 9108 rows on our sample database.
This 7 dimensional cube would contain many meaningless rows and would
waste a lot of space while slowing down queries on the cube.
Cube Queries
Most OLAP queries are retrieving the rows of one single cuboid in the
multi-dimensional cube or a sub-set of rows from one cuboid if slice and
dice operations are involved.
Given a pre-calculated cube the rows of each cuboid are easily identified by
a specific combination of NULL values within their columns.
Therefore, one can easily select the rows of a cuboid or any subset of it by a
simple SELECT-FROM –WHERE statement, which is much more efficient
than a multi-join GROUP BY query needed to extract the same rows directly
from the data warehouse tables.
To calculate and store the sample cube with hierarchies from the previous
section simply use the statement:
SELECT DISTINCT S.Name "Supplier Name",
S.City "Supplier City",
S.State "Supplier State",
C.Name "Consumer Name",
C.City "Consumer City",
C.State "Consumer State",
P.Name "Product Name",
SUM(Quantity) "Total Transactions Quantity",
COUNT(Tran_ID) "Number of Transactions"
INTO Tb_Transactions_Cube
FROM Tb_Supplier S, Tb_Consumer C, Tb_Product P, Tb_Transactions T
WHERE S.Supp_ID=T.Supp_ID AND
DS 715 – Assignment#4 Page 8
C.Con_ID=T.Con_ID AND
P.Prod_ID=T.Prod_ID
GROUP BY CUBE((S.State, S.City, S.Name),
(C.State, C.City, C.Name),
P.Name),
ROLLUP(S.State, S.City, S.Name),
ROLLUP(C.State, C.City, C.Name)
which creates the Tb_Transactions_Cube table and populates with the rows
of the cube.
It is now easy to write queries against this cube.
The grand total aggregates, i.e. the apex cuboid, is in the one row where all
columns except the aggregate ones are NULL:
--apex cuboid
SELECT *
FROM Tb_Transactions_Cube
WHERE "Supplier Name" IS NULL
AND "Supplier City" IS NULL
AND "Supplier State" IS NULL
AND "Consumer Name" IS NULL
AND "Consumer City" IS NULL
AND "Consumer State" IS NULL
AND "Product Name" IS NULL
while the base cuboid, which details aggregates by all combinations of
consumers, suppliers and products, is returned by:
--base
SELECT *
FROM Tb_Transactions_Cube
WHERE "Supplier Name" IS NOT NULL
AND "Supplier City" IS NOT NULL
AND "Supplier State" IS NOT NULL
AND "Consumer Name" IS NOT NULL
AND "Consumer City" IS NOT NULL
AND "Consumer State" IS NOT NULL
AND "Product Name" IS NOT NULL
DS 715 – Assignment#4 Page 9
The aggregates by products are given by the rows where product name is not
NULL:
--aggregates by products - rollup supplier and consumer dimensions,
--hierarchies included
SELECT *
FROM Tb_Transactions_Cube
WHERE "Supplier Name" IS NULL
AND "Supplier City" IS NULL
AND "Supplier State" IS NULL
AND "Consumer Name" IS NULL
AND "Consumer City" IS NULL
AND "Consumer State" IS NULL
AND "Product Name" IS NOT NULL
The following query will return nothing, because the configuration below
violates the hierarchies defined in the Tb_Transactions_Cube (e.g. state
cannot be NULL where name is not NULL):
--impossible or non-sense
SELECT *
FROM Tb_Transactions_Cube
WHERE "Supplier Name" IS NOT NULL
AND "Supplier City" IS NOT NULL
AND "Supplier State" IS NULL
AND "Consumer Name" IS NOT NULL
AND "Consumer City" IS NOT NULL
AND "Consumer State" IS NULL
AND "Product Name" IS NOT NULL
Other sample queries:
--aggregates by products sold to consumers in Wisconsin?
SELECT "Product Name", [Total Transactions Quantity],
[Number of Transactions]
FROM Tb_Transactions_Cube
WHERE "Supplier Name" IS NULL
AND "Supplier City" IS NULL
AND "Supplier State" IS NULL
DS 715 – Assignment#4 Page 10
AND "Consumer Name" IS NULL
AND "Consumer City" IS NULL
AND "Consumer State" = 'Wisconsin'
AND "Product Name" IS NOT NULL
--Quantity of milk sold by each supplier from Wisconsin?
--careful [Supplier City] has to be NOT NULL
SELECT [Supplier Name], [Total Transactions Quantity]
FROM Tb_Transactions_Cube
WHERE "Supplier Name" IS NOT NULL
AND "Supplier City" IS NOT NULL
AND "Supplier State"='Wisconsin'
AND "Consumer Name" IS NULL
AND "Consumer City" IS NULL
AND "Consumer State" IS NULL
AND "Product Name" = 'Milk'
--Quantity of milk sold by each supplier from Wisconsin in the state of
--Ilinois?
--careful [Supplier City] has to be NOT NULL
SELECT [Supplier Name], [Total Transactions Quantity]
FROM Tb_Transactions_Cube
WHERE "Supplier Name" IS NOT NULL
AND "Supplier City" IS NOT NULL
AND "Supplier State" = 'Wisconsin'