Monday, December 8, 2014

CS1156x: Learning from Data Review


CS1156x: Learning from Data is a 10-week introductory machine learning course offered by Caltech on the edX platform focused on giving students a solid foundation in machine learning theory. Major course topics include the feasibility of learning, linear models, generalization, VC dimension, overfitting, regularization and validation. The course also covers several common machine learning algorithms including the perceptron, linear regression, logistic regression, neural networks, support vector machines and radial basis functions. As a theory-heavy course, much time is devoted to mathematical reasoning and the math behind various machine learning concepts and algorithms. You need a strong mathematical background, including knowledge of linear algebra and calculus, to understand everything in this course. You also need the ability to program in some language that allows you to perform matrix and vector operations. The course provides a temporary MATLAB license and forum support for MATLAB; many students also used R and Python.


Learning from Data is different from most MOOCs in that it isn't optimized for the web. Course content consists of 18 full-length lecture videos recorded on the Caltech campus, each spanning about 75 minutes including 10-15 minute Q&A sessions. Two lectures are posted each week for 9 weeks along with PDFs of lecture slides and 8 homeworks that each consist of 10 multiple choice questions. There are no in-video quizzes or interactive exercises, as the course is basically an online port of the on-campus course. It requires a high level of motivation and attentiveness get through two very dense 75-minute lectures each week and despite being multiple choice, the homework problems can be very time consuming since many require programming. You get 2 attempts at each question, but each attempt is worth half of your grade, so guessing based on your intuition can be costly. The final is an untimed test that is just like the homework except that it has 20 questions. You need a total score of 50% to earn a certificate.


Although Learning from Data isn't in the typical MOOC format, the professor is a skilled lecturer and manages to keep the lengthy lecture videos engaging. The lecture slides are packed with useful information and the forums were very helpful; students were active in helping one another and the professor was very active on the forums even though this wasn't the first run of the course. The homework questions reinforce the material more than you would expect from a 10 question multiple choice quiz if you take the time to understand the question and answer carefully.


Overall, Learning from Data is a great course that emphasizes theory, but often has practical implications. The level of mathematical maturity it requires will be barrier for some students, although you can still get something of this course if you don't understand all of the math. If I were taking this course as a student on campus I would probably rate it 5/5, but I think they missed some opportunities to make it truly excellent MOOC by failing to adapt it for the online the audience. This course will give you a deeper understanding of machine learning than other intro MOOCs on the same subject, but if you're more interested in learning practical tools and applying machine learning consider taking MIT's Analytics Edge on edX or Coursera's Machine Learning course.


I give this course 4.5 out of 5 stars: Great.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.