make Data Collection and Preliminary Data Analysis by using R studio ,this course statistic using R?
In this critical thinking assignment, you will select a dataset from two scenario options: The Titanic or Telecom Customers Churn.
Upon selecting the option, you will review the corresponding data set and use it to answer questions and complete an exploratory data analysis. You will use R and Excel to complete the assignment.
Scenario Options – choose one:
Option 1:
This set includes data on the passengers of the Titanic. The tragic story had a fortunate outcome for some. If you choose this option, you will explore questions related to the reasons behind the survival of some passengers and the demise of others.
The data can be accessed at:
A description of the data is available at: https://www.kaggle.com/c/titanic/data
Option 2:
The second data set includes data about the customers of a telecom company, the services they subscribed to, and whether they left the company’s services. Multiple aspects of the customer’s behaviors could be analyzed using this dataset.
The data and the variable descriptions can be found at:
Requirements:
Part I: Contextualization
In the Introductory section for Part 1 of your 4-5-page report, provide a contextualization of the dataset by doing the following:
- Download the selected data set and review its description. Specify the selected dataset.
- Summarize the story behind the data. Provide in-text and reference list citations of any outside sources that you consulted to contextualize the data collection efforts.
- Explore the dataset. Explain the attributes of the entities.
- Describe the data captured in the data set.
- Summarize three observations from a cursory visual inspection of the data.
Part II: Research Questions
When data is collected, or even before collection starts, we have the purpose of the analysis in mind and plan accordingly. For an analysis to provide us with meaningful insight, we must plan it well and have a clear research question or questions in mind. The term “research” can sound scary, but it refers to a path to satisfying curiosity or resolving a problem. For Part II, which should be 2-3 pages, provide an analysis based on the background of the data sets:
- Write three questions that this dataset can help answer.
- Explain the importance of these questions in a business or other relevant context.
- Identify the variables from the data set that are relevant to the questions.
- Write a sample statement of outcome that your analysis may reveal.
- Underline the words in these statements that refer to the variables identified in task prompt H.
Part III. Data Preparation and Distribution Analysis. Using an R script, please perform the following:
- Identify the columns that hold the values of the variables you previously identified in the analysis of the research questions. Write R script to generate a subset of the given data. If the variable is categorical and the values are not numerical, transform the values into numerical values. For example, if the values of a column are ‘Y’ and ‘N’, use SQL to convert them to “0” and “1” respectively when generating the subset. That includes only the values for the selected variables and translate categorical values into numeric codes.
- Write an R script for a query that provides insights into your research question.
Part IV. Reporting
- Write a 250-word executive summary for members of the general public outlining the findings from your analysis.
Your well-written report should be 10-12 pages in length, not including the title or reference pages. Use Saudi Electronic University academic writing standards and APA style guidelines, citing at least two references in support of your work, in addition to your text and assigned readings.
You will upload a zipped file that includes your 10-12-page report and all supporting files, including Word, R code and Excel files, and screenshots that support the presentation of the findings in the report.