Friday, April 27, 2018

Kaggle Learn: Data Visualization Level 1 Review



Data Visualization Level 1 is a primer on plotting data in Python available on Kaggle Learn, an educational portal now available on the popular data science competition website Kaggle.com. The course consists of a series of 11 programming notebooks covering a variety of data visualization topics, such as plot types, styling and faceting. You should now the basics of Python and preferably pandas before taking this course; Kaggle Learn also has a pandas introduction to get you up to speed. Unlike traditional MOOCs, this course has no graded element, but each notebook offers a few programming exercises that let you get your feet wet testing out some of the code and concepts presented in the tutorial.


Data Visualization Level 1 focuses on teaching you how to make plots using the data visualization functionality built into pandas. The Python data visualization ecosystem can be confusing since there are many different packages available that all do similar things, so the focus on using the relatively simple data visualization tools built into pandas, keeps the focus on the visualizations themselves rather than learning the potentially complicated syntax of a new library. The course does introduce Seaborn, but the syntax is generally more approachable than other courses I've seen that try to jump straight into teaching plotting in pure matplotlib. The notebooks themselves are well organized and visually appealing: it's easy to follow along and the plots are right there for you to look at and play around with in a forked notebook. The exercises at the end of the lessons challenge you to write the code necessary to produce plots that are visually depicted at the end of the notebooks. This would be a great exercise except that when you fork the notebook to work on them, the hidden code used to display the plots in the first place is revealed, so you are basically given the solutions before you even get started. I'm sure this could be remedied with various workarounds, so it's a shame that it hasn't been fixed yet because it seems like one of the better ways to give users hands experience making visualizations.


Data Visualization Level 1 is a nice introduction to data visualization in Python that doesn't get bogged down introducing packages with complicated syntax. It does, however, have 3 optional sections at the end that do introduce some additional plotting packages such as plotnine, a data viz tool seeks to recreate the grammar of graphics plotting syntax used by R's popular ggplot2 library. If you're looking for a comprehensive tour of a data visualization package or plot design principles, you won't find it here, but if you're interested in starting to make plots in Python with a minimal amount of pain, this is a good place to start.


I give Data Visualization Level 1 4 out of 5 stars: Very Good.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.