© Deakin University 1 FutureLearnASSESSMENT DETAILSSIT718 Real World AnalyticsAssessment Task 3: Problem solving task 2Using aggregation functions for data analysisThis document supplies detailed information on assessment tasks for this unit.Key information• Due: 9th January 2019 11.30pm AEDT• Weighting: 30%• Reference style: HarvardLearning OutcomesThis assessment assesses the following Unit Learning Outcomes (ULO) and related Graduate LearningOutcomes (GLO):
Unit Learning Outcome (ULO)
Graduate Learning Outcome (GLO)
ULO1 – assessed through student abilityto apply knowledge of multivariatefunctions, data transformations and datadistributions to summarise data sets.ULO2 – assessed through the studentability to analyse datasets by interpretingsummary statistics, model and functionparameters.ULO4 – assessed through student abilityto develop software codes to solvecomputational problems for real worldanalytics.
GLO1 – Discipline knowledge and capabilitiesGLO4 – Critical thinkingGLO5 – Problem solving
PurposeThis assignment will test your knowledge and understanding of the aggregation functions and theirapplications for data summarization and prediction. This assignment will also test your ability in Rprogramming, in using specific R commands as well as R packages.InstructionsThe work is individual. Solutions and answers to the assignment must be explained carefully in a concisemanner and presented carefully. Use of books, articles and/or online resources on share price relatedto SIT718 Real World Analytics is allowed. Students are expected to refer to the suitable literaturewhere appropriate.The assessment consists of FOUR tasks. Students must attempt all tasks and provide an individualwritten report in appropriate word processor.The detailed problem description and data set will be released to students on Wednesday 5thDecember 2018.Submission detailsNo more than 7 A4 sides, including Figures, Tables, Appendices and References. The report should betyped. Use minimal font 11pt and 2.5cm side margins. If the page limit is exceeded only the first 7 pageswill be marked.Assignment (a report in pdf format, software code and/or data) must be submitted via the assignmentdropbox in the unit site (accessed in DeakinSync)No e-mail or hardcopy submissions are accepted.ASSESSMENT DETAILSSIT718 Real World AnalyticsAssessment Task 3: Problem solving task 2© Deakin University 2 FutureLearnExtension requestsRequests for extensions should be made to Unit/Campus Chairs well in advance of the assessment duedate. If you wish to seek an extension for an assignment, you will need to apply by email directly toProf. Maia Angelova(firstname.lastname@example.org), as soon as you become aware that you will have difficultyin meeting the scheduled deadline, but at least 3 days before the due date. When you make yourrequest, you must include appropriate documentation (medical certificate, death notice) and a copy ofyour draft assignment.Conditions under which an extension will normally be approved include:Medical To cover medical conditions of a serious nature, e.g. hospitalisation, serious injury orchronic illness. Note: Temporary minor ailments such as headaches, colds and minor gastric upsetsare not serious medical conditions and are unlikely to be accepted. However, serious cases of thesemay be considered.Compassionate e.g. death of close family member, significant family and relationship problems.Hardship/Trauma e.g. sudden loss or gain of employment, severe disruption to domesticarrangements, victim of crime. Note: Misreading the timetable, exam anxiety or returning home willnot be accepted as grounds for consideration.Special considerationYou may be eligible for special consideration if circumstances beyond your control prevent you fromundertaking or completing an assessment task at the scheduled time.See the following link for advice on the application process:http://www.deakin.edu.au/students/studying/assessment-and-results/special-considerationAssessment feedbackStudents will receive written feedback and model solutions to aid reflection and analysis of problemstrategies and solutions for consideration in the upcoming problem-solving task.ReferencingYou must correctly use the Harvard method in this assessment. See the Deakin referencing guide.Academic integrity, plagiarism and collusionPlagiarism and collusion constitute extremely serious breaches of academic integrity. They are forms ofcheating, and severe penalties are associated with them, including cancellation of marks for a specificassignment, for a specific unit or even exclusion from the course. If you are ever in doubt about how toproperly use and cite a source of information refer to the referencing site above.Plagiarism occurs when a student passes off as the student’s own work, or copies withoutacknowledgement as to its authorship, the work of any other person or resubmits their own work froma previous assessment task.Collusion occurs when a student obtains the agreement of another person for a fraudulent purpose,with the intent of obtaining an advantage in submitting an assignment or other work.Work submitted may be reproduced and/or communicated by the university for the purpose ofassuring academic integrity of submissions: https://www.deakin.edu.au/students/studysupport/referencing/academic-integrity© Deakin University 3 FutureLearnASSESSMENT DETAILSSIT718 Real World AnalyticsAssessment Task 3: Problem solving task 2Using aggregation functions for data analysisForest Fires Data SetIn order to predict the burned area of forest fires (“UCI Machine Learning Repository:Forest Fires Data Set”, 2017), in the northeast region of Portugal (“Montesinho.Com –Nature Tourism In Montesinho Natural Park”, 2017), analysis of the meteorological andother data is required (see details at “Forest Fires Dataset”, 2017), also consider theinformation given in http://cwfis.cfs.nrcan.gc.ca/background/summary/fwi . For thisassignment you are provided with a modified dataset “Forest718.txt”.Attribute Information:X1: x-axis spatial coordinate within the Montesinho park map: 1 to 9(“Montesinho.Com – Nature Tourism In Montesinho Natural Park”, 2017)X2: y-axis spatial coordinate within the Montesinho park map: 2 to 9(“Montesinho.Com – Nature Tourism In Montesinho Natural Park”, 2017)X3: month – month of the year: ‘jan=1’ to ‘dec=12’X4: day – day of the week: ‘mon=1’ to ‘sun=7’X5: FFMC – FFMC index from the FWI system: 18.7 to 96.20 (Happe, 2017)X6: DMC – DMC index from the FWI system: 1.1 to 291.3 (Happe, 2017)X7: DC – DC index from the FWI system: 7.9 to 860.6 (Happe, 2017)X8: ISI – ISI index from the FWI system: 0.0 to 56.10 (Happe, 2017)X9: temp – temperature in Celsius degrees: 2.2 to 33.30X10: RH – relative humidity in %: 15.0 to 100X11: wind – wind speed in km/h: 0.40 to 9.40X12: rain – outside rain in mm/m2 : 0.0 to 6.4X13=Y: area – the burned area of the forest (in ha): 0.00 to 1090.84Assignment tasks
1. Understand the data
Download the txt file (Forest718.txt) from Future Learn and save it to your R
working directoryAssign the data to a matrix, e.g. usingthe.data <- as.matrix(read.table(“Forest718.txt”))
Your variable of interest is X13=Y: area – the burned area of the forest (in ha): 0.00 to1090.84 (the thirteenth column in the dataset). Generate a subset of 200 data e.g.using:my.data <- the.data[sample(1:517,200),c(1:13)] [3 marks](iii) Choose any FOUR variables from X5 to X11. Using scatter plots and histograms,report on the general relationship between each of the variables and yourvariable of interest Y. Include 4 scatter plots, 5 histograms and 1 or 2sentences for each of the variables2. Transform the data[15 marks](i) For the chosen four variables and the variable of interest Y make appropriate
Briefly explain the general relationship between each of your transformedvariables and your variable of interest (the area). (2-3 sentences each)
3. Build models and investigate the importance of each variable
Download the AggWaFit.R file (from CloudDeakin) to your working directory
[5 marks]and load into the R workspace using,(ii) Using the fitting functions to learn the parameters for:© Deakin University 4 FutureLearntransformations so that the values can be aggregated in order to predict thevariable of interest (the area). Assign your transformed data along with yourtransformed variable of interest X13=Y to an array (it should be 200 rows and5 columns). Save it to a txt file titled “name-transformed.txt”.[20 marks]source (“AggWaFit718.R”)
• A weighted arithmetic mean,• Weighted power means with p = 0:5, and p = 2,• An ordered weighted averaging function, and
• A Choquet integral.
Include two tables in your report – one on the error measures, and onesummarising the weights/parameters that were learned for your data.
Compare and interpret the data in your tables. Be sure to comment on:
a. How good the model is.b. The importance of each of the variables (the four variables that you haveselected),c. Any interaction between any of those variables (are they complementaryor redundant?) andd. Better models favour higher or lower inputs. (1-3 paragraphs)[10 marks]
4. Use your model for prediction
Using your best fitting model, predict the area for the following input:
X5=91.6; X6=181.3; X7=613; X8=7.6; X9=24.6; X10=44; X11=4; X12=0.
Give your result and comment on whether you think it is reasonable. (1-2)
Comment generally on the ideal conditions (in terms of your chosen four
variables) under which an area will result. (1-2 sentences)
Your final submission, which should be submitted to the SIT718 CloudDeakin Dropbox,should include the following three files. Please follow the instructions below and do notcompress your files.1. A “name-report.pdf” report (created in any word processor), covering all of the itemsin above (items coloured blue usually have explicit instructions about what shouldbe included). With plots and tables it should only be 3 – 5 pages.2. A data file named “name-transformed.txt” (where `name’ is replaced with your name– you can use your surname or first name – just to help me distinguish them!).3. The R code file (that you have written to produce your results) named “namecode.R” (where `name’ is replaced with your name – you can use your surname or firstname).© Deakin University 5 FutureLearn________________________________________________________________________________References“UCI Machine Learning Repository: Forest Fires Data Set”. Archive.ics.uci.edu. N.p., 2017. Web.(http://archive.ics.uci.edu/ml/datasets/Forest+Fires), 29 Apr. 2017.“Forest Fires Dataset”. Dsi.uminho.pt. N.p., 2017. Web. (www.dsi.uminho.pt/~pcortez/forestfires ), 29 Apr. 2017.Cortez, P. and Morais, A.D.J.R., 2007. A data mining approach to predict forest fires usingmeteorological data (http://www3.dsi.uminho.pt/pcortez/fires.pdf ).Happe, Harry. “Meteomalaga”. https://Malagaweather.com. N.p., 2017. Web. 29 Apr. 2017.“Montesinho.Com – Nature Tourism In Montesinho Natural Park”. montesinho.com. N.p., 2017(https://www.montesinho.com/en ), 29 Apr. 2017.Appendix – FWI System(Happe, 2017)The FWI is based on weather readings taken at noon standard time and rates fire dangerat the mid-afternoon peak from 2:00 – 4:00 pm. Weather readings required are:• Air temperature (in the shade)• Relative Humidity (in the shade)• Wind speed (at 10 metres above ground level for an average over 10 minutes)• Rainfall (For the previous 24 hours)The Fire Weather Index has six components:Three Fuel Moisture Codes are:1. Fine Fuel Moisture Code2. Duff Moisture Code3. Drought CodeThree Fire Behaviour Indices1. Initial Spread index2. Build Up Index3. Fire Weather Index© Deakin University 6 FutureLearnThree Fuel Moisture CodesThe FWI System evaluates fuel moisture content and relative fire behaviour using pastand present weather effects on ground level fuels. The moisture codes reflect the neteffects of daily moisture gains and losses.Fine Fuel Moisture Code – FFMCThis is a numerical rating of the moisture content of surface litter and other cured finefuels. It shows the relative ease of ignition and flammability of fine fuels. The moisturecontent of fine fuels is very sensitive to the weather. Even a day of rain, or of fine andwindy weather, will significantly affect the FFMC rating. The system uses a time lag oftwo-thirds of a day to accurately measure the moisture content in fine fuels. The FFMCrating is on a scale of 0 to 99. Any figure above 70 is high, and above 90 is extreme.Duff Moisture Code – DMCDMC is a numerical rating of the average moisture content of loosely compacted organiclayers of moderate depth. The code indicates the depth that fire will burn in moderateduff layers and medium size woody material. Duff layers take longer than surface fuelsto dry out but weather conditions over the past couple of weeks will significantly affectthe DMC. The system applies a time lag of 12 days to calculate the DMC. A DMC ratingof more than 30 is dry, and above 40 indicates that intensive burning will occur in theduff and medium fuels. Burning off operations should not be carried out when the DMCrating is above 40.Drought Code – DCThe DC is a numerical rating of the moisture content of deep, compact, organic layers. Itis a useful indicator of seasonal drought and shows the likelihood of fire involving thedeep duff layers and large logs. A long period of dry weather (the system uses 52 days)is needed to dry out these fuels and affect the Drought Code. A DC rating of 200 ishigh, and 300 or more is extreme indicating that fire will involve deep sub-surface andheavy fuels. Burning off should not be permitted when the DC rating is above 300.Fire Behaviour IndicesThe three behaviour indices are relative to the fuel moisture content. They indicate whata fire is likely to do. The lower the moisture content, the higher the Fuel Moisture Codes,and the higher the Fire Behaviour Indices – and the more active the fire will be.Initial Spread Index – ISIThis indicates the rate fire will spread in its early stages. It is calculated from the FFMCrating and the wind factor. The open-ended ISI scale starts at zero and a rating of 10indicates high rate of spread shortly after ignition. A rating of 16 or more indicatesextremely rapid rate of spread.Build -Up Index – BUIThis index shows the amount of fuel available for combustion, indicating how the fire willdevelop after initial spread. It is calculated from the Duff Moisture Code and theDrought Code. The BUI scale starts at zero and is open-ended. A rating above 40 is high,above 60 is extreme.Fire Weather index – FWIInformation from the ISI and BUI is combined to provide a numerical rating of fireintensity – the Fire Weather Index. The FWI indicates the likely intensity of a fire.The FWI is divided into four fire danger classes:Low 0 – 7 Medium 8 – 16 High l7 – 31 Extreme 32+© Deakin University 7 FutureLearn
The post SIT718 Real World Analytics appeared first on My Assignment Online.