DAT102x: Data Science Professional Project is the final capstone project for Microsoft's 9-course Professional Program Certificate in Data Science series on edX. It is recommended that you complete the required lead-up courses in the program before tackling the capstone project, although you can complete it without doing any of the prerequisite courses as long as you know Python or R and have an Azure Machine learning account set up. The course consists entirely of a single predictive modeling competition, so your grade is based solely upon how well your model performs.
As a capstone project course, DAT102x doesn't have much in the way of lecture content. The course page on edX just has a few orientation videos that summarize the project and provide details about how to structure and submit your model. The project itself is just like a Kaggle competition: you are handed training data with a prediction target and an evaluation metric and you have to use that data to create a predictive model that is then applied to hidden test data for final evaluation. The only difference is that you have to build and submit your model using the Microsoft Azure Machine Learning platform.
The December 2016 project is a binary classification problem where you have to predict whether borrowers will default on loans based on features like credit score and annual income. The data for this particular session is fairly clean, only requiring minor adjustments like removing columns and dealing with some missing values. As such the data is easy to throw into a predictive model using R or Python: the main difficulty with modeling and the project as a whole is getting everything to work as a Microsoft Azure predictive modeling web service. For instance, I started by downloading the data set and creating an end-to-end data cleaning, modeling and submission in a single R file on my local machine, but there was no easy to way to convert my code into something that would work as an Azure web service. In the end I abandoned my custom model and used Azure's built in predictive modeling modules to get a web service up and running. My first submission exceeded the target threshold (70% classification accuracy) for passing the project, although it could take some fiddling models and hyperparameters to exceed the goal.
I give DAT102x: Data Science Professional Project 3 out of 5 stars: Fair.
This comment has been removed by a blog administrator.
ReplyDelete