Comm215/A, B- Fall 2022- Data Analysis Project DAT
DAP report uploading, Dec 21st, 2022 on or before 24:00
Objectives In the DAP:
•The data analysis assignment is designed to apply the basic statistical concepts learned in the course. You will be using techniques learned in class in an applied setting.
• Software : handling of data file with softwares Excel with add-ins (MegaStat or PHSTAT) or MINITAB. Must be typed and uploaded as a PDF file.
To be completed in groups of 5 people (or individually)
Due date: Dec 21st, 2022 on or before 24:00 Via Moodle as an Assignment in a PDF file.
We need only one copy per team including;
• 1- Cover Page showing team number and each participant’s
Last Name, First Name and Student’s ID
2 – Table of Content
3- Project Report, consists of 3 paragraphs
a. Brief Introduction: The introduction explains what you are doing and why within the context of your analysis. Give an explanation of the nature and the scope of the case, the importance of the problem and the need for its resolution.
b. Discussion: The discussion presents the results to the questions asked below (in the order in which they asked). You may refer to any tables, charts, illustrations you have included in your appendix (see below).
c. Brief Conclusion: The Conclusion draws inference about your results, and it is also where it will become clear if you have learned your material. Show what you know by giving insightful and relevant explanations of your results as well as any recommendations you may have.
d. Use the results from the proposed questions to build a narrative that will become your analysis ( for each database separately).
o Appendix – include all tables, charts, results and illustrations required to answer the case questions and arrive to conclusion(s).
These must be organized in the order in which each question appears and Labelle with a brief and clear explanation below each table, charts, and results with a brief analysis.
Finally the DAP printout should not be more than 12 pages.
• The professor reser ves the right to deduct some or all points from a group project which does not adhere to the above criteria
• Please refer to your course outline and review the section regarding Academic Integrity and Plagiarism! Some examples of plagiarism are
o Copying directly or improperly citing the works of others
o Par ticipated in unauthorized collaboration
o Paying to have the project completed
o Accepting payment in order to include non-par ticipating members into the group.
Comm215/A, B_ Fall 2022_ Data Analysis Project DAT
The “corporations data” presented in this case study were collected from 58 randomly selected US companies including; Merck, GE, 3M, Kraft Foods, Boeing, United Technologies,….etc, within 5 different industries.
The main objective of this study is to see how well company’s Net Income can be predicted from a list of suggested predictors such as;
Long term debt, Total assets, Net sales, Operating expense, Income tax, Number of employees and the Type of industry.
For the complete analysis of this assignment, you need to apply statistical methods and procedures which you learnt throughout this course.
Under the Heading “Data Analysis Project (DAP – 8%)”
You will find:
– The data file called Fall2022_Corporations_Database.xlsx
– DAP’s description.
– Assignment Folder to upload your final result as a PDF file.
Variables list are on the Sheet-2 EXCEL file:
• Net Income in thousands of dollars
• Long term Debt in thousands of dollars
• Total Assets in thousands of dollars
• Net Sales in thousands of dollars
• Operating Expense in thousands of dollars
• Income Tax
• Type of Industry:
1(Food & Beverage), 2 (Technology), 3(Retail), 4(Entertainment) and 5(Transport)
• Number of Employees
– Using the following two variables, apply an appropriate graphical techniques and descriptive methods to present, interpret shape & the hidden trend, Indicate the most typical values for each distribution finally show measure of dispersions. Finally for each variable, briefly explain whether the empirical rule applies, show your recommended average with a measure of dispersion and any outlier(s).
• Type of Industry
• Number of Employees ( Note; In grouping the Number of Employees use the initial starting value at 820 with IW=28,000).
– Construct a scatter plot between Net Income and Number of Employees, explain the form of relationship from the scatter diagram,
– Construct a cross-tabulation/contingency table between Type of Industry and Number of Employees. In grouping the Number of Employees as a row variable use the initial starting point at 820 with IW=28,000, comment on the Number of Employees.
– From the contingency table, test whether more than 20% of the employees in the lowest category (820-28819) were working in the Retail Industry (3). Use α =0.10.
– Construct a 98% confidence interval for the proportion of companies in the Food & Beverage industry. Briefly interpretation CI and show the margin of error.
– Can it be concluded that the expected Long Term Debt is less than 30% of the average Total Asset (µo)? Use α =0.05 significant level.
– Provide a correlation matrix between Net Income and 6 of independent variables;
Long term Debt, Total Assets, Operating Expense, Income Tax, and Number of Employees.
Briefly interpret the result.
Develop a multiple regression model using Net Income as dependent variable and
Long term Debt, Total Assets, Operating Expense, Income Tax and Number of Employees as independent variables.
Your final model should only include significant independent variables (use significant level α=0.01).
Finally use your recommended/clean model to predict the Net Income of a company with an Operating Expense of $ 2,500,000, Long Term Debt =0.0 and the Income Tax of $190,000