Sunday, February 23, 2020

What Is Data Science?

This video is the first in a new "data dictionary" series, breaking down common data science terms in language that is accessible to people who are interested in data science but may not know much about it or come from a technical background.

Data science describes the activities related to collecting, storing and creating value from data. Creating value from data means using it to do useful, like making better decisions. By analyzing data we can detect patterns in it and understand the process that generated it. This is the heart of data science: understanding why we observe what we do and then using that knowledge to do make better choices. There are 5 common steps in the process of going from raw data to valuable insights: 1. Problem Formulation. Figuring out what you want to do and what questions to ask to produce the answers that would allow you to do it. 2. Data Cleaning. Getting the data into a format you can actually work with. This can be a time consuming process if your data is spread out across multiple sources or if it contains errors that need to be resolved. 3. Exploratory Data Analysis or EDA. This involves getting a sense of the structure of your data by inspecting values, generating summary statistics and creating plots to figure out how you can best use it to address your problem. 4. Predictive Analytics. This describes making models from data—things we’ve already observed—to predict how things might unfold in the future. We typically use computers to detect patterns in the data and create models for us. This process is known as machine learning. 5. Prescriptive Analytics. This involves making recommendations of actions to take based on based on predictive models and real-world considerations like costs and strategic objectives. The final output of data science is to prescribe a course of action that addresses the initial problem as optimally as possible given constraints like time, cost, availability of data and uncertainty. In the end, data science is all about using data to make better decisions.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.