Assignment title: Management
SIT718 Real World Analytics
Assignment 2
Using aggregation functions for data analysis
Total Marks 40%. Weighting 15%
Due date: 2nd February 2017, 11.30 pm
Building - Energy Efficiency Dataset
In order to design energy efficient buildings, the computation of the heating load
(HL) and the cooling load (CL) is required to determine the specifications of
the heating and cooling equipment needed to maintain comfortable indoor air
conditions. Building energy simulation tools are widely used to analyse or forecast
building energy consumption. The Dataset provides energy analysis of heating
load (denoted as y1) and the cooling load (denoted as y2) using 768 building
shapes that are simulated using a building simulator. The dataset comprises
8 features (variables), which are denoted as X1; X2; X3; :::; X8. The details
about these variables are given below:
X1: Relative compactness in percentage (expressed in decimals) - A measure
of building compactness. High value means highly compact.
X2: Surface area in square meters
X3: Wall area in square meters
X4: Roof area in square meters
X5: Overall height in meters
X6: Orientation Categorical: see below for description
X7: Glazing area Categorical: see below for description
1X8: Glazing area distribution Categorical: see below for description
y1: Heating load in kW h:m−2 per annam
y2: Cooling load in kW h:m−2 per annam
Three types of glazing areas (variable X7), which are expressed as percentages of the floor area: 10%, 25%, and 40% are used. Five different distribution
scenarios for each glazing area (variable X8) were simulated: (1) uniform: with
25% glazing on each side, (2) north: 55% on the north side and 15% on each
of the other sides, (3) east: 55% on the east side and 15% on each of the other
sides, (4) south: 55% on the south side and 15% on each of the other sides, and
(5) west: 55% on the west side and 15% on each of the other sides. In addition,
samples with no glazing areas (0%) are also obtained. Orientation (variable X6)
values are north (2), east(3), south (4) and west (5).
Assignment tasks
1. Understand the data [5 marks]
(i) Download the txt file (ENB2012dataTextFile.txt) from CloudDeakin
and save it to your R working directory.
(ii) Assign the data to a matrix, e.g. using
the.data <- as.matrix(read.table("ENB2012dataTextFile.txt"))
(iii) Decide whether you would like to focus on Heating load (y1) or Cooling
load (y2). This is your variable of interest. Generate a subset of 300 data,
e.g. using:
(to investigate heating load:)
my.data <- the.data[sample(1:768,300),c(1:8,9)]
(to investigate Cooling load)
my.data <- the.data[sample(1:768,300),c(1:8,10)]
(iv) Using scatter plots and histograms, report on the general relationship
between each of the variables and your variable of interest (the heating
load or cooling load). (Include a plot and 1 or 2 sentences for each of the
28 variables).
2. Transform the data [10 marks]
(i) Choose any four from the first five variables (X1; X2; X3; X4; X5)
and make appropriate transformations so that the values can be aggregated
in order to predict the variable of interest (either heating or cooling load
that you have selected). Assign your transformed data along with your
transformed variable of interest to an array (it should be 300 rows and 5
columns). Save it to a txt file titled \name-transformed.txt".
write.table(your.data,"name-transformed.txt",)
(ii) Briefly explain the general relationship between each of your transformed variables and your selected heating or cooling load. (2-3 sentences
each)
3. Build models and investigate the importance of each variable. [20 marks]
(i) Download the AggWaFit.R file (from CloudDeakin) to your working
directory and load into the R workspace using,
source("AggWaFit718.R")
(ii) Using the fitting functions to learn the parameters for
• A weighted arithmetic mean,
• Weighted power means with p = 0:5, and p = 2,
• An ordered weighted averaging function, and
• A Choquet integral.
(iii) Include two tables in your report - one on the error measures, and one
summarising the weights/parameters that were learned for your data.
(iv) Compare and interpret the data in your tables. Be sure to comment
on (a) How good the model is, (b) The importance of each of the variables
(the four variables that you have selected), and (c) Any interaction between
any of those variables (are they complementary or redundant?) and better
models favour higher or lower inputs. (1-3 paragraphs)
34. Use your model for prediction. [5 marks]
(i) Using your best fitting model, predict the heating or cooling load for
the following input: X1 = 0:69; X2 = 730; X3 = 269:5; X4 =
220:5; X5 = 7. Give your result and comment on whether you think
it is reasonable. (1-2 sentences)
(ii) Comment generally on the ideal conditions (in terms of your chosen
four variables) under which a low heating or cooling load will result. (1-2
sentences)
Your final submission, which should be submitted to the SIT718 Clouddeakin
Dropbox, should include the following three files.
1. A report (created in any word processor), covering all of the items in above
(items coloured blue usually have explicit instructions about what should
be included). With plots and tables it should only be 3 - 5 pages.
2. A data file named \name-transformed.txt" (where 'name' is replaced with
your name - you can use your surname or first name - just to help me
distinguish them!).
3. The R code file (that you have written to produce your results) named
\name-code.R" (where 'name' is replaced with your name - you can use
your surname or first name).
4