Assessment item 3 – Data Analysis
Due Date: 24-May-2020
Return Date: 16-Jun-2020
Submission method options: Alternative submission method
In this assignment, you will perform some basic data analysis on a dataset obtained from the Gapminder (http://www.gapminder.org/) website which collects and presents authentic statistics of all countries worldwide.
Download this zip package (https://doms.csu.edu.au/csu/file/
8ecc7393-0664-44fc-8288-8a5a29de687b/1/ITC558_202030_A3_dataset.zip) which contains three dataset files: ‘life.csv’, ‘bmi_men.csv’ and ‘bmi_women.csv’. First file contains data about average life expectancy (in years) for most countries worldwide. Other two files contain data about men and women average Body Mass Index (BMI) for the same set of countries. These are plain text files with all data separated by commas. You can also open the files in a spreadsheet application to better understand their contents. All three files have a similar structure — first row contains the year headers and first column contains the country names. There is data about 186 countries for a period of 1980 to 2008.
Your program should perform the following steps.
(1) Read all the data from files and save into a 2D list and two dictionaries.
The life expectancy data should be stored in the form of two dimensional list where the outer list has 186 elements. Each inner list contains data for specific countries.
The BMI data from both files should be stored in two dictionaries which map country names to a list of data values. Both dictionaries will contain 186 keys, with each key associated with a list of 29 values (BMI data from 1980 to 2008).
Following diagram illustrates the required data structures. Note that all numbers have been converted from string to float data types.
You should use these collections for the next five steps — do not read the files again.
(2) Some users may be interested in gender neutral BMI data. For this purpose, create another Python dictionary bmi_all of the same structure and size as bmi_men (or bmi_women) and populate it with worldwide gender-average BMI values. For example bmi_all for Zimbabwe in 2008 would be 23.3.
(3) Use the bmi_all dictionary from step 2 to calculate worldwide statistics (min, max and median (https://en.wikipedia.org/wiki/Median)) for a user-selected year. See example in the sample-run below. Median value should be displayed with a precision of 3 decimal places.
(4) Compare the latest 5-year BMI data for men against women for the three most populous countries in the world (China, India, United States). First work out the 2004 to 2008 men’s BMI average for these countries. Repeat the same for women’s BMI. Then display the men and women BMI values and the percentage difference (https://www.mathsisfun.com/percentage- difference.html) between the two. Display all values with 2 decimal places precision.
(5) Plot life expectancy trend of a user selected country. Your program will prompt the user for a country name (case insensitive) and then create a line chart showing life expectancy variation over the years. Sample run below shows an example.
(6) To explore the correlation between BMI and life expectancy, plot worldwide average values of the two on the same chart. For this purpose, your program will create two lists of 29 elements each to store worldwide average BMI and life expectancy data for each year. Refer to sample run for an example.
[Disclaimer: Correlation does not imply causation (https://www.tylervigen.com/spurious- correlations).]
For plotting charts in step 5 and 6, use the matplotlib library. Consult the textbook section 7-8 to learn how to draw simple charts. The chart for step 6 is rather complex because it contains two y-axis. For this part, please review and adapt the sample code below.
Important Note: Other than matplotlib, you can NOT use any library module or third party module in this assessment.
Your program should be able handle following invalid inputs or error situations.
• Any of the three dataset files do not exist or can’t be read.
• Non-numeric or out of range year value provided by user.
• Incorrect country name provided by user.
A sample run of the program is given below to clearly demonstrate all the requirements.
A simple data analysis program
— Step 1 —
All dataset has been read into memory.
— Step 2 —
Gender-average BMI data stored in a new dictionary.
— Step 3 —
Select a year to find statistics (1980 to 2008): garbage
<error> That is an invalid year.
Select a year to find statistics (1980 to 2008): 1990
In 1990, countries with minimum and maximum BMI values were ‘Vietnam’ and ‘Tonga’ respectively.
Median BMI value in 1990 was 24.450
— Step 4 —
Men vs women BMI in highest population countries:
*** China *** Men: 22.82
Percent difference: 0.18%
*** India *** Men: 20.92
Percent difference: 1.42%
*** United States *** Men: 28.30
Percent difference: 0.42%
— Step 5 —
Enter the country to visualize life expectancy data: jupiter
<error> ‘jupiter’ is not a valid country.
Enter the country to visualize life expectancy data: sRi laNka
Plot for ‘Sri Lanka’ opens in a new window.
— Step 6 —
Your assignment should consist of following tasks.
Draw a flowchart that represent the algorithms of step 2 and step 6. Include flowcharts of any functions that are called during these steps. You can draw the flowcharts with a pen/pencil on a piece of paper and scan it for submission, as long as the handwriting is clear and legible.
However, it is strongly recommended to draw flowcharts using a drawing software.
Select six sets of test data that will demonstrate the ‘normal’ operation of your program; that is, test data that will demonstrate what happens when a VALID input is entered. Select four sets of test data that will demonstrate the ‘abnormal’ operation of your program.
Set out the test cases in a tabular form as follows. It is important that the output listings (i.e., screenshots) are not edited in any way.
Test Data Table
Test data type
The reason it was selected
The output expected due to the use of the test data
The screenshot of actual output when the test data is used
Implement your algorithm in Python. Comment on your code as necessary to explain it clearly. Run your program using the test data you have selected and complete the final column of test data table above.
Your submission will consist of:
1. Your algorithm through flowchart/s
2. The table recording your chosen test data and results
3. Source code for your Python implementation
Thus your directory for Assignment will at least contain two or three files (depending on whether you put the flowchart and the test table in the same file).
It is critically important that your test runs are unmodified outputs from your program, and that these results should be reproducible by the marker running your saved .py python program.
This assessment task will work towards assessing the following learning outcome/s:
• be able to analyse the steps involved in a disciplined approach to problem-solving, algorithm development and coding.
• be able to demonstrate and explain elements of good programming style.
• be able to identify, isolate and correct errors; and evaluate the corrections in all phases of the programming process.
• be able to interpret and implement algorithms and program code.
• be able to apply sound program analysis, design, coding, debugging, testing and documentation techniques to simple programming problems.
• be able to write code in an appropriate coding language.
MARKING CRITERIA AND STANDARDS
Algorithm design is efficient in terms of time and memory.
Flowcharts precisely describe the algorithm design.
Flowcharts do not have any unnecessary component.
Flowcharts have at most one notation error.
Algorithm matches the program code.
Flowcharts follow the convention, contain at most three notation errors and produce algorithm at a high level.
Does not meet Pass criteria.
Test data is explores every branch of the program.
To demonstrate comprehensive testing, number of test cases exceeds the required minimum.
Sound justification is provided for the selection of test data.
Diversity is evident among the chosen test data.
Minimum required number of normal and abnormal test cases are collected.
Brief reasoning is provided for the test data
Selected test data is clearly presented in required table format.
At least two normal and two abnormal test cases are provided.
Does not meet Pass criteria.
Test results are reproducible.
Python code contains only necessary statements and variables.
All exceptions and errors are handled properly, included those which are not part of specifications.
Program meets all specifications.
Output format is correct as required.
Python code produces correct results.
Output has minor formatting errors.
Functionality is mostly implemented but code may contain minor syntax or logical errors.
Python code is produced that does not execute properly. It may contain lot of syntax errors and/or produce completely incorrect results.
Code includes function header comments and module level docstrings.
Code design is modular, containing several reusable functions.
Named constants are used instead of magic numbers.
White space is appropriately used for code readability.
Avoids unnecessary global variables.
All variables have meaningful names.
Sufficient inline comments are present.
Indentation is consistent throughout.
Functions are used but they are not generic (reusable).
Uses many global variables.
Most variables have unambiguous names.
Small number of inline comments.
Incomplete or largely dysfunctional code.
Additional Note: The standards outlined for each criteria are cumulative. So, for example, to achieve the standard for high distinction your work also needs to meet the standards outlined for Pass, Credit and Distinction levels.
You have to prepare and present all source code, test data table, and flowchart/s separately and include them all in a single MS Word file identified by your name. See the ‘Requirements’ section below. The Python source code you write should be saved with a name such as ITC558assignment3YourName.py and then include a copy of it as text in the MS Word file named ITC558assignment3YourName.docx.
The other parts of the assignment (such as your flowchart/s and your table of test data) should be included in the same MS Word file and save as ITC558assignment3YourName.docx.
It is critically important that your test runs are unmodified outputs from your program, and that these results should be reproducible by the marker running your saved ITC558assignment3YourName.py python program.
You have to save all the parts of the assignment (as described under ‘Presentation’ above) into a single MS Word document identified by your name as outlined in the section on presentation.
Failure to adhere to these requirements may disqualify the submission for marking.
Submit your complete assignment in MS Word format to Turnitin and insert your program source code as an object to your MS Word document (The subject lecturer will explain to you how to insert the object to your MS Word document).