Assignment title: Management


SIT718 Real World Analytics Assignment 2 Using aggregation functions for data analysis Total Marks 40%. Weighting 15% Due date: 2nd February 2017, 11.30 pm Building - Energy Efficiency Dataset In order to design energy efficient buildings, the computation of the heating load (HL) and the cooling load (CL) is required to determine the specifications of the heating and cooling equipment needed to maintain comfortable indoor air conditions. Building energy simulation tools are widely used to analyse or forecast building energy consumption. The Dataset provides energy analysis of heating load (denoted as y1) and the cooling load (denoted as y2) using 768 building shapes that are simulated using a building simulator. The dataset comprises 8 features (variables), which are denoted as X1; X2; X3; :::; X8. The details about these variables are given below: X1: Relative compactness in percentage (expressed in decimals) - A measure of building compactness. High value means highly compact. X2: Surface area in square meters X3: Wall area in square meters X4: Roof area in square meters X5: Overall height in meters X6: Orientation Categorical: see below for description X7: Glazing area Categorical: see below for description 1X8: Glazing area distribution Categorical: see below for description y1: Heating load in kW h:m−2 per annam y2: Cooling load in kW h:m−2 per annam Three types of glazing areas (variable X7), which are expressed as percentages of the floor area: 10%, 25%, and 40% are used. Five different distribution scenarios for each glazing area (variable X8) were simulated: (1) uniform: with 25% glazing on each side, (2) north: 55% on the north side and 15% on each of the other sides, (3) east: 55% on the east side and 15% on each of the other sides, (4) south: 55% on the south side and 15% on each of the other sides, and (5) west: 55% on the west side and 15% on each of the other sides. In addition, samples with no glazing areas (0%) are also obtained. Orientation (variable X6) values are north (2), east(3), south (4) and west (5). Assignment tasks 1. Understand the data [5 marks] (i) Download the txt file (ENB2012dataTextFile.txt) from CloudDeakin and save it to your R working directory. (ii) Assign the data to a matrix, e.g. using the.data <- as.matrix(read.table("ENB2012dataTextFile.txt")) (iii) Decide whether you would like to focus on Heating load (y1) or Cooling load (y2). This is your variable of interest. Generate a subset of 300 data, e.g. using: (to investigate heating load:) my.data <- the.data[sample(1:768,300),c(1:8,9)] (to investigate Cooling load) my.data <- the.data[sample(1:768,300),c(1:8,10)] (iv) Using scatter plots and histograms, report on the general relationship between each of the variables and your variable of interest (the heating load or cooling load). (Include a plot and 1 or 2 sentences for each of the 28 variables). 2. Transform the data [10 marks] (i) Choose any four from the first five variables (X1; X2; X3; X4; X5) and make appropriate transformations so that the values can be aggregated in order to predict the variable of interest (either heating or cooling load that you have selected). Assign your transformed data along with your transformed variable of interest to an array (it should be 300 rows and 5 columns). Save it to a txt file titled \name-transformed.txt". write.table(your.data,"name-transformed.txt",) (ii) Briefly explain the general relationship between each of your transformed variables and your selected heating or cooling load. (2-3 sentences each) 3. Build models and investigate the importance of each variable. [20 marks] (i) Download the AggWaFit.R file (from CloudDeakin) to your working directory and load into the R workspace using, source("AggWaFit718.R") (ii) Using the fitting functions to learn the parameters for • A weighted arithmetic mean, • Weighted power means with p = 0:5, and p = 2, • An ordered weighted averaging function, and • A Choquet integral. (iii) Include two tables in your report - one on the error measures, and one summarising the weights/parameters that were learned for your data. (iv) Compare and interpret the data in your tables. Be sure to comment on (a) How good the model is, (b) The importance of each of the variables (the four variables that you have selected), and (c) Any interaction between any of those variables (are they complementary or redundant?) and better models favour higher or lower inputs. (1-3 paragraphs) 34. Use your model for prediction. [5 marks] (i) Using your best fitting model, predict the heating or cooling load for the following input: X1 = 0:69; X2 = 730; X3 = 269:5; X4 = 220:5; X5 = 7. Give your result and comment on whether you think it is reasonable. (1-2 sentences) (ii) Comment generally on the ideal conditions (in terms of your chosen four variables) under which a low heating or cooling load will result. (1-2 sentences) Your final submission, which should be submitted to the SIT718 Clouddeakin Dropbox, should include the following three files. 1. A report (created in any word processor), covering all of the items in above (items coloured blue usually have explicit instructions about what should be included). With plots and tables it should only be 3 - 5 pages. 2. A data file named \name-transformed.txt" (where 'name' is replaced with your name - you can use your surname or first name - just to help me distinguish them!). 3. The R code file (that you have written to produce your results) named \name-code.R" (where 'name' is replaced with your name - you can use your surname or first name). 4