Wednesday, July 22, 2015
edX - Scalable Machine Learning Review
Scalable Machine Learning is a 5-week distributed machine learning course offered by UC Berkeley through the edX platform. It is a follow up to another UC Berkely course: Introduction to Big Data with Apache Spark. Although the first course is not a strict perquisite, Salable Machine Learning uses the same virtual machine and even has some overlap with the homework labs, so it is beneficial to take Introduction to Big Data first. Scalable Machine Learning teaches distributed machine learning basics using Pyspark, Apache Spark’s Python API. Basic proficiency with Python is necessary to pass the course and some exposure to algorithms and machine learning concepts is helpful. Course evaluation is based primarily on 5 labs distributed as iPython notebooks.
The first two weeks of the course cover machine learning basics and introduce Apache Spark. For students already familiar with machine learning basics who took Introduction to Big Data, there’s not much new to learn during first two weeks. Week 2 is essentially an exact clone of week 2 of the intro to big data course, including the lab assignment. The final 3 weeks have meatier lecture content and longer labs, each covering a different machine learning technique--linear regression, logistic regression and principal component analysis.
The lecture content is clean and the lecturer speaks clearly. His delivery isn’t perfect, but the only real purpose of the lectures is to serve as background information for the meat of the course: the labs. Each lab is a lengthy iPython notebook with several sections leading you through the process of creating a pipeline for running a machine learning algorithm with Pyspark. Much of the code you need is provided for you, but writing the key functions and data transformations necessary to complete the labs can still be time consuming. Little things like an ambiguous instruction or uncaught error you made earlier in the assignment can result in bugs that take a while to squash. Despite occasional frustrations, the labs do a good job interspersing instruction with practical, hands-on learning.
Scalable Machine Learning is a quality introduction to machine learning with Pyspark that focuses on labs over lectures. The lectures could be better and some of the instructions and error checks in the labs could be more comprehensive, but this is a great course for those looking to learn by doing.
I give Scalable Machine Learning 4 out of 5 stars: Very Good.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.