Sunday, June 1, 2014

John Hopkins Coursera Data Science Specialization Track--Part 2



The second wave of courses in John Hopkins' data science specialization track on Cousera is coming to a close, so it's time for another MOOC update. I'll spend a moment reviewing each of the 3 courses in the second wave and then give my thoughts on this month and the specialization track as a whole thus far.


Exploratory Data Analysis

Exploratory Data Analysis is the 4th course in John Hopkins’s data science specialization track. The first 2 weeks of the course provide a thorough overview of plotting in R using the base graphical package, the lattice package and the ggplot2 package. Week 3 takes a sudden detour into data clustering and the fairly advanced topics of principal components analysis and single value decomposition only jump back to plotting with a section on color. The clustering section seems a little about of place since there is not any introduction explaining the purpose of clustering. What's worse the SVD and PCA sections require a fairly high level of linear algebra knowledge to understand, which are not prerequisites for this course. I suspect that section will leave may students scratching their heads. Week 4 consists of 2 case studies where the professor shows you how to perform an exploratory analysis on a couple different data sets.

If this course only consisted of the plotting lectures I’d give it a 4 out of 5. The plotting lectures that make up the bulk of the course are well done and this course provides more instructor face time and live examples in R than any of the 3 courses in the first wave of the data science track. Unfortunately, there are no interactive exercises or in-lecture quizzes and the principal components analysis and single value decomposition sections are too advanced for this course. It would have been better if they left the SVD and PCA functions as black boxes in R and simply explained in general terms what they do and how to interpret their output. Still, the quality overview of R plotting makes this course worth a look.

I give this course 3.5 out of 5 stars: Good.


Reproducible Research

Reproducible Research is the 5th course in the John Hopkins data science track. As the title states, the course is all about making research and data analysis reproducible in R.

The first 2.5 weeks of lecture material is great. It provides a well-organized overview of how to create reproducible research in R using R markdown and the knitr package, taking plenty of time to talk about best practices. Thankfully, Roger Peng has added in a little box with his face in at as he talks over his slides for many of his videos, which makes the content a lot more engaging than it is in some of the other John Hopkins courses that only have voiceovers.

The final 1.5 weeks of lecture video material is not as useful or engaging and seems a bit lazy in that week 4 takes the form of recordings of lectures given sometime in the past. The videos in second half of week 3 only have voiceovers and they have an echo to them that makes them hard to listen to.

All in all, the first 2.5 weeks of this course are definitely worth checking out if you have any interest in learning about reproducible research but you might want to skip through some of the content at the end of the course.

I give this course 4 out of 5 stars: Very Good.


Statistical Inference

Statistical Inference is the 6th course in the John Hopkins data science specialization. This course is basically an introduction to statistics in R. The course covers many different topics in the span of 4 weeks from basic probability and distributions to T tests, p values and statistical power. The lectures take the form of slideshows with a lot of dense mathematical notation, small text and mediocre voiceovers. The course tries to cover too much ground too fast and the material isn't presented in a way that is easy to understand or engaging. I don’t think the lecturer’s face was shown once in the entire course. That’s not to say there isn't good information in the lecture slides, but the presentation and execution are poor.

I give this course 1.5 out of 5 stars: Very Bad.


Thoughts Thus Far

This month of the data science track had a lot more meat to it than the first month, but the the lack of interactive exercises, in-lecture quizzes and instructor face time detracts from the value of the courses. Some of the material seems like it was tossed together at the last minute, especially in the later weeks of the courses. I'm starting to feel quite glad I didn't pay for verified certificates because I'm finding myself skipping through a lot of dull sections. The courses provide good information but they seem more useful as reference materiel than learning material because without interactive exercises, in-lecture quizzes and quality homework, it is hard to retain things for more than a few days.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.