BSB123 Data Analysis
Assessment Item 2 :
Research Report (2020 S1) – Understanding Social Media
Date Due: Monday 25th May (11:59pm)
Total Marks: This assessment (marked out of 50) item counts 30% towards your final grade.
Assistance: There will be a series of Online assistance sessions beginning immediately after the mid
semester break i.e. Thursday 30th April at 1pm. The session will be split into two (2) components:
a. Component 1: this is were I will discuss the mechanics of each of the questions being asked.
This part of the lecture will be recorded
b. Component 2: this is where I will answer questions from students individually re any
concerns, issues etc. This part of the assistance session will NOT be recorded.
A recent study of social media use in early 2019 was undertaken to examine if there were any patterns
in the gender, living arrangements, parents’ education and age of spent online per weekend in three
social media types, namely Facebook, Instagram and Others (Youtube, Twitter etc). The purpose of
the study is to understand the habits of the younger population in relation to different forms of media
use. An example of the first 10 observations is seen below.
A key to understanding the data is as follows:
1. Instagram: time spent in minutes per weekend on Instagram
2. Facebook: time spent in minutes per weekend on Facebook
3. Other: time spent in minutes per weekend on other social media platforms
4. Gender: a dummy variable where Female=1 and Male=0
5. Age: is the age of the person who agreed to participate in the study
6. Lives at Home: whether the person lives at home where Yes=1 and 0=No
7. Year 10: highest level of education of parents is Year 10
8. Year 12: highest level of education of parents is Year 12
9. University: parents had completed a bachelor’s degree
10. Postgraduate: This is the reference variable for Year 10, Year 12 and Bachelor so all comparisons
are against this variable
Please note that I have already sorted the numerical allocations for the dummy variables. We will
discuss how to interpret these in the lectures and in the Research report help sessions which begin
on Thursday 23rd April
Instagram Facebook Other Gender Age Lives at Home Year 10 Year 12 Bachelor
35 38 17 0 19 1 1 0 0
65 26 21 1 19 0 1 0 0
55 44 23 0 18 0 0 0 1
40 29 24 1 18 0 1 0 0
65 40 24 1 19 1 0 1 0
50 64 25 1 17 1 0 1 0
65 41 25 1 18 0 0 1 0
60 24 26 0 19 0 0 0 0
60 54 27 1 18 0 0 0 1
65 73 27 1 19 1 0 0 0
Task 1 (Boxplots and t-tests: Investigating the Data)
1. Construct separate boxplots for gender and time spent online for Facebook, Instagram and other
social media platforms. What can you say about the distributional features shown in the graphical
representations (central location, spread and skewness) of each of the boxplots?
2. (a) Considering Facebook, Instagram and Social Media, test whether there is a significant
difference in the usage time between males and females at a 5% level of significance.
(b) Considering Facebook, Instagram and Social Media, test whether there is a significant
difference in the usage time for those who live at home versus thus who do not live at home at a
5% level of significance.
3. Write a short summary of the results of Questions 1 & 2 outlining the results that you have found
and how these results better help to explain the purpose of the study. What happens if you were
willing to decrease the significance levels?
Note: In all two sample tests, you should discuss briefly whether it is a one or two tail test, the test
statistics, any assumption made and draw a conclusion based on Excel output.
Task 2 (Regression Analysis and investing relationships between the variables)
You plan to develop a regression model to investigate how various factors influence time spent on
social media types.
4. Before you conduct any regression analysis, you use Excel to construct a correlation matrix of all
the quantitative variables in the dataset. Based on the correlation matrix, comment briefly on the
associations between each of the dependant variables (Facebook, Instagram and Social Media)
and the quantitative variables. Write a summary of your findings.
5. You conduct a stepwise regression according to the following procedure for each of the three (3)
Gender and Age
Gender, Age and Living at Home
Gender, Age, Living at Home, Year 10, Year 12, University
Present the regression output for each of the four regressions in tabular form for each dependent
6. Based on the regression output obtained in Step 4, answer the following:
(a) Which summary measure in the regression output is used to assess the overall adequacy of
the model? Comment on the overall adequacy of the model obtained in Step 4 for each of the
three dependent variables.
(b) For each of the independent variables, fully interpret the regression coefficients and comment
on their statistical significance. (In discussing statistical significance of a regression coefficient,
you have to justify your choice of one or two tail test.)
7. Considering the correlations found in question 4 and all the coefficients in the regression analysis
from each of the three regressions for Step 4 (one for each dependent variable), are there any
issues re the signs or the statistical significances of the coefficients? Discuss fully.
Task 3 (Summary Report)
Present your findings of all components of the statistical analysis in the form of a professional report
that is to be presented to a board of educators. The educators are are interested in whether the factors
in your data can be used to understand the habits of the younger population in relation to different
forms of media use. It is expected that you will outline all your findings in a clear and coherent fashion.
• Use 1 & ½ spacing and font size of 11.
• You can and are encouraged to include relevant charts and Excel objects in your summary report
• No referencing is required in your summary report. However, if you wish to include, and refer to,
additional information, you can use any referencing system as long as it is used consistently.
• There is no word limit for Tasks 1 and 2.
• The word limit of 500 (with a tolerance of 10%) applies only to the summary report, and is
exclusive of words in tables, appendices and reference list (if any).
You should submit your response to all three tasks as a single pdf document saved in the format:
• After uploading your research report on Blackboard, it is your responsibility to go back to the
Assignment Upload page to check that your report was properly uploaded.
• Due: 11:59 pm May 25 (Monday) 2020 via Blackboard