Wednesday, December 5, 2018

Learning Data Science: The Role of Online Courses

In this post, I will feature my entry to the 2018 Kaggle ML & DS Survey Challenge, which tasks users with telling a data story based on its annual user survey. My entry, Learning Data Science: The Role of Online Courses, focuses on the role of massive open online courses (MOOCs) in data science education of Kaggle users. If you're interested in viewing the R code I used to generate plots, seeing the interactive plots or leaving me a comment, follow this link my project's Kaggle page.



           Introduction 

Back in 2011, if you wanted to learn online, your options were limited. You could scour the internet for articles, blog posts, forums and the occasional video to piece together understanding of a topic of interest, but structured online courses could only really be found on MIT's OpenCourseWare platform. Although MIT OCW offered a massive repository of course materials recorded from live lectures, its content was not optimized for the web. Content was uploaded en masse and “as is”, leaving users to sift through mountains of material, without a consistent structure, that often lacked key course components.
This was the learning landscape I entered when I took my first online course. I had several friends in college who were into computer science, but I didn't know anything about it myself, so I decided to try to learn the basics of programming online. Eventually, I stumbled upon an MIT OCW course called MIT 6.0: Introduction to Computer Science and Programming. I had never used OCW before and certainly had never heard of Python, but it seemed like as good a place to start as any. After all, it was just an intro course and with MIT's expert educators on my side how hard could it be?
It was hard. The lectures were long and there was no interactivity. The concepts made sense, but there wasn't an easy way to quickly put them into practice. The first programming assignment that required a relatively trivial calculation had me pulling out my hair, on the verge of quitting. Later I discovered that the solution required a for loop, a construction that was covered in a teacher's assistant session that was seperate from the main lectures. It was a struggle, but I learned a lot. I learned the basics of Python, but also that an archive of on campus content was not an ideal format for online education. It was the Web 1.0 of online learning. But Web 2.0 was coming fast.
In the first half of 2012, three new platforms for massive open online courses, also known as “MOOCs”, appeared: UdacityCoursera and edX. With them came an explosion of new learning content that was more accessible and interactive than ever before, often created in partnership with top universities, all available for free. I started taking courses on all three platforms and also began writing reviews of the courses I took to help other learners decide which courses to take. In 2014, I took the first offering of MIT's The Analytics Edge on edX, which taught the basics of R and also introduced Kaggle, which was used as a platform to administer the final project. Since then, I've taken over 100 MOOCs, written over 100 course reviews and participated in dozens more Kaggle competitions.
The 2018 Kaggle Machine Learning survey challenge presents an opportunity to for you to join me in exploring the intersection of two of my greatest interests: online courses and Kaggle. In particular, we will explore the role machine learning MOOCs have played in the data science education of Kagglers. Do most Kagglers learn from MOOCs? Which MOOC platforms do Kagglers use most? How do Kagglers perceive the quality of online courses versus traditional education? In this inquiry we will take a deep dive into MOOC usage by the Kaggle community, answering all of these questions and more. Along the way, I will provide my insights as a prolific MOOC-taker and explore how reality aligns with--or defies--my expectations.

Where Do Kagglers Learn? 

We will begin our exploration of MOOC usage in the Kaggle community by looking at the overall contribution of online courses to the data science education of Kagglers. Question 35 of the Kaggle ML & DS Survey asks: "What percentage of your current machine learning/data science training falls under each category?" The response categories for this question are: Self Taught, Online Courses (MOOCs), Work, University, Kaggle and Other, with an accompanying optional write-in text response. The responses to each category are numeric values indicating the percentage that each source contributed to a user's overall data science education.

Note: Throughout our exploration I will use the terms "online courses" and "MOOCs" interchangeably, the term "Kagglers" to refer to Kaggle users, the term "MOOCers" to refer to those who take online courses and the term "non-MOOCers" to refer to those who do not take online courses. I will treat the Kaggle survey data as representative of the Kaggle community as a whole and thus "Kagglers" will refer both to all Kaggle users and all survey respondents.

Learning Source Participation 


We will begin our exploration by investigating which learning sources Kagglers participate in most often. We'll consider any learning source with a non-zero value for survey question 35 to be a source that a user participates in.
Let's start by creating a plot to check the number of Kagglers that use each learning source. Each time we make a plot, I will note my main expectation of what we will find ahead of time based on my knowledge of Kaggle and online learning and then contrast my expectation against the reality of our observations.

Expectation:

  • Kaggle will be the most frequently used learning source, since this is a survey of Kaggle users.

Reality:

  • Only 45.4% of Kagglers consider Kaggle a notable learning source.
  • Self-teaching is the most important learning source, with almost 80% participation, while over 70% of Kagglers use MOOCs.
  • Roughly half of users learn data science from work and half learn from university.
The plot shows that MOOCs are the second most common learning source among Kagglers after self-study. Since self-teaching is not well-defined, there could be significant overlap between self-teaching and learning from online courses. Personally, much of my data science knowledge comes from online courses and yet I consider myself self-taught.
Let's explore further by creating the same plot, looking only at Kagglers who take online courses.

Expectation:

  • Most MOOCers will also report being self-taught.

Reality:

  • 83.4% of MOOCers also consider themselves self-taught.
  • 48.5% of MOOCers also learn from university.
  • Over half of MOOCers also learn from Kaggle.
Let's continue this line of exploration by recreating the plot above for the subset of Kagglers who do not participate in MOOCs.

Expectation

  • University participation among non-MOOCers will be higher than it is among those who take online courses.

Reality:

  • 53.1% of non-MOOCers learn from university--4.6 percentage points higher than university participation among MOOCers.
  • 70.7% of non MOOCers consider themselves self-taught--12.7 percentage points lower than the self-teaching rate among MOOCers.
  • MOOCers are over 50% more likely to report Kaggle as an important learning source than non-MOOCers.
In addition to these observations, close inspection of the plot suggests that non-MOOCers use fewer learning sources than online course-takers. Let's investigate further by making a boxplot showing the spread and average number of learning sources of MOOCers vs non-MOOCers.

Expectation:

  • MOOCers will report using more learning sources on average than non-MOOCers.

Reality

  • MOOCers use about 1.3 more learning sources than non-MOOCers on average.
The plot confirms our suspicion that MOOCers tend to participate in a wider variety of learning sources than non-MOOCers. This is perhaps not a completely fair comparison, since we compared a group that we know uses a specific learning source to another that we know does not use that source--the MOOC-taking subgroup can have as many as 7 total learning sources, while non-MOOCers are limited to 6. Still, it shows that MOOCers do use more learning sources on average.

Learning Source Importance 


Thus far we have only looked at whether Kagglers use a learning source or not, ignoring the degree to which each source contributes to total learning. Next, we'll explore which learning sources Kagglers consider most important, by investigating which sources contributed the most to total data science education.
First we will make side by side boxplots showing the percentage of contribution of each learning source. We'll leave out the "Other" and "Text" fields, since relatively few users listed those as being important.

Expectation:

  • MOOCs will have a lower average learning contribution than other learning sources, since we know MOOC users spread their time across many learning sources.

Reality:

  • MOOCs had the second highest learning contribution of all sources.
  • University only contributed 18% to total data science education on average.
  • Self-teaching contributed the most to data science education.
  • Kaggle contributed relatively little to overall total science education.
The plot seems to conflict with our earlier observation that MOOC students tend to use a broader range of learning sources than non-MOOCers. If MOOCers use a broad range of educational sources, we might expect MOOCs to contribute a relatively amount to total education.
This conflict reveals a potential problem with the plot: it includes Kagglers who don't use each learning source along with those that do, which may give a warped representation of the importance of each learning source. We aren't really interested in how important a learning source is to people who don't use it--we're interested in how important each source is to those who actually do use them. Let's remake the last plot, filtering out zero valued "non use" responses.

Expectation:

  • MOOCs will have a lower average learning contribution than university and work.

Reality:

  • MOOCs contribute about 35% of overall data science education on average for those who use them.
  • University contributes a similar amount to data science education as self-teaching and MOOCs among those who use each.
The new plot shows that MOOCs did not contribute more than University among Kagglers who actually participated in each. Still, the learning contribution of MOOCs is higher than I expected relative to university and work.
Let's see how the picture changes when we separate MOOCers from non-MOOCers. Given that we know non-MOOCers use fewer learning sources on average, we would expect them to concentrate heavily in the few learning sources they do use.

Expectation:

  • The average learning contribution of university will be higher for non-MOOCers than it is for MOOCers.

Reality:

  • The average learning contribution of university for non-MOOCers is nearly double that reported by MOOCers.
  • All learning sources (aside from MOOCs) have higher learning contributions among non-MOOCers.
The plot confirms that those who do not take online courses tend to rely more heavily on a few learning sources than MOOCers, who tend to spread learning out across more sources. This should come as no surprise given non-MOOCers have fewer total learning sources between which they distribute their time.
Now that we have an idea of the overall participation rates and importance of learning sources, let's investigate how learning source usage differs among different subsets of the Kaggle community.

Learning Source Demographics 

Learning Sources and Gender

First let's break learning source participation down by gender. In particular, we'll look at the learning source participation of males vs females. Before we make a plot, let's check how many respondents fall into each group.
Female401016.8%
Male1943081.4%
Men accounted for over 80% of respondents but there were still more than 4000 women in the survey. Now let's compare learning source participation with a plot.

Expectation:

  • Learning source participation does not vary by gender.

Reality:

  • Male and female Kagglers both learn from MOOCs and work at similar rates.
  • A greater proportion of males learn from self-study and Kaggle.
  • A greater proportion of females learn from university.
Let's break this comparison down further by looking at the importance of each learning source, again filtering out non-participants.

Expectation:

  • Learning source importance does not vary by gender.

Reality

  • Learning source importance is generally similar across genders, but the median contribution of university is 40% for females and 30% for males.
Overall, gender appears to have little relation to Kagglers' use of online courses. The biggest takeaway is that female Kagglers participate in university at higher rates than males and tend to attribute more of their total learning to university. Since the survey data is limited to Kaggle users, however, we can't reasonably use these results to make inferences about the greater data science community outside of Kaggle.

Learning Sources and Age 

Next, we will investigate how learning source participation varies with age. First let's check the breakdown of how many respondents fall into each age group.
AgeCountProportion_of_Users
18-21303712.7%
22-24514121.5%
25-29615925.8%
30-34377615.8%
35-3922539.4%
40-4413605.7%
45-498583.6%
50-545822.4%
55-593281.4%
60-692731.1%
70-79530.2%
80+390.2%
Most users fall in the 18 to 34 year age brackets, with each successive age bracket beyond 34 having fewer users. Some of the older age groups have relatively few respondents so plots of those groups may exhibit more variability than the plots that include many responses.
Let's go ahead and create a plot of learning source participation by age.

Expectation:

  • MOOC participation will be more common among younger Kagglers.

Reality:

  • MOOC participation is fairly consistent across age groups, with between 68% and 74% participation for every age group under 60.
  • Participation in learning from work is relatively low for users aged 18-24.
It makes sense that the youngest Kagglers are less likely to learn from work than the average user, since younger Kagglers may not yet be in the workforce.

Learning Sources and Country 

Next, we will investigate learning source participation by country. The survey data includes responses from 147 countries, but we will filter down to the 20 countries with the most respondents in the interest of readibility. We'll also exclude the responses "Other" and "I do not wish to disclose my location".

Expectation:

  • I have no idea what to expect!

Reality:

  • MOOC participation is close to 70% in many countries.
  • Nigeria has the highest MOOC participation rate at 89% followed by India at 82%.
  • Japan has a low MOOC participation rate of 39%--25 percentage points lower than the next lowest country.
Interestingly, Japan--the country that uses MOOCs the least--also has the lowest participation in learning data science from university. This low MOOC participation rate suggests Japan could be a growth market for online course platforms.

Learning Sources by Programming Language 

To conclude our exploration of MOOC participation, let's compare learning source usage by programming language preference. As a prolific MOOC-taker I'm aware that most data science courses use Python or R, so I'm curious whether there is any appreciable difference in MOOC participation rates according to Kagglers' most-used programming language.
First, let's look at the number of Kagglers that use each language.
Programming LanguageNumber of Users
Python8180
R2046
SQL1211
Java903
C/C++739
C#/.NET432
Javascript/Typescript408
MATLAB355
SAS/STATA228
PHP191
Visual Basic/VBA135
Other117
Scala106
Bash59
Ruby55
Go46
Julia11
As expected, Python and R dominate as the most commonly used languages among Kagglers. Now let's make a plot of participation rates.

Expectation:

  • Python and R users will have higher MOOC participation rates than users of other languages, since Python and R are the most common languages used to teach data science MOOCs.

Reality:

  • MOOC participation rates are close to 70% across most languages, with 74% of Python users, 71% of R users and 73% of SQL users participating in MOOCs.
  • Scala users have highest participation in MOOCs at 78% and bash users have the lowest at 50%.

Where Do Kagglers Learn: Key Findings

This concludes our exploration of where Kagglers learn. Let's summarize our key findings thus far:
  • Self-teaching is the most common learning method among Kagglers.
  • About 70% of Kagglers use MOOCs and MOOC use is relatively consistent across gender and age.
  • MOOCers tend to use many different learning sources, while non-MOOCers tend to use a smaller number of learning sources more heavily.
  • The biggest demographic differences in MOOC usage we discovered was variation across countries. This suggests country-related factors such as income, language and access to traditional education may play an important role in MOOC participation.

Which MOOC Platforms Do Kagglers Use? 

In the first section, we learned that the majority of Kagglers use online courses for at least part of their data science education and that MOOC use was consistent across many subgroups within the community. In this section, we will explore which MOOC platforms Kagglers use. Specifically, we will look at survey questions 36 and 37, which read: "Which (MOOC) platforms have you used?" and "On which platforms have you spent the most time?" respectively.
From my personal experience, I know that Coursera is both one of the oldest MOOC platforms and has the largest catalog of free university-style courses of any platform. It was also the first platform I'm aware of that offered data science courses taught by industry experts such as Andrew Ng's machine learning course and Geoffrey Hinton's (now retired) course "Neural Networks for Machine Learning" which were both available in 2012. Consequently, I expect Coursera to lead in terms of overall MOOC platform popularity, with the other long-running platforms edX and Udacity rounding out the top three. I have used many other platforms in the past, including DataCampCodeacademy and Kaggle's learning materials, so it will be interesting to how these platforms stack up against the old guard.

MOOC Platform Participation 


Let's begin our investigation of MOOC platforms by plotting the overall participation rate of each platform.

Expectation:

  • Coursera will have the highest participation rate, followed by edX and Udacity.

Reality:

  • 38% of Kagglers use Coursera, more the double the usage rate of any other MOOC platform.
  • Five MOOC platforms form a "second tier" below Coursera with usage rates between 15% and 18%: Udemy, DataCamp, Kaggle Learn, Udacity and edX.
Interestingly, Udemy and DataCamp--two platforms that primarily offer paid content--are used slightly more than Udacity and edX, which are perhaps the most similar platforms to Coursera. Since use of one MOOC platform does not exclude you from using others, it could be that while many MOOCers use Coursera, they also use one or more platforms from the second tier.
Let's check how many platforms MOOCers on Kaggle use, excluding the responses of "Other" and "None".

Observations

  • Over 70% of MOOCers on Kaggle use multiple online course platforms.
  • Roughly 30% of MOOCers stick with just one platform.
Before we look further at users of multiple MOOC platforms, let's make a plot to investigate which sites single-platform learners use the most.

Expectation:

  • Coursera will dominate in participation among single-platform users.

Reality

  • 44% of single platform MOOCers use Coursera, which is more than 3 times the participation rate of the next most used platform.
  • Relatively few Kagglers use only Udacity or only edX compared to their overall participation rates.
Now let's turn our attention to users of multiple MOOC platforms. Since Coursera is a dominant presence in the MOOC market, let's make a plot to investigate which secondary platforms Coursera users also tend to use.

Expectation:

  • edX will be the most popular secondary platform among Coursera users, since it came out around the same time and has a similar course structure.

Reality:

  • Udemy, DataCamp, edX and Udacity all have similar participation rates among Coursera users, with about a third of Coursera users participating in each.
Coursera users don't appear to favor any secondary platform in particular, but does the same hold true users of the "second tier" platforms?
Let's make a series of plots like the one above for each of the second tier platforms find out..

Expectation:

  • edX users will also use Coursera at a high rate, since the platforms are similar.

Reality:

  • 78% of edX and Udacity users on Kaggle also use Coursera.
  • Kaggle Learn users participated in the fewest alternate MOOC platforms.
This time the plot aligned with my expectations: edX and Udacity users use Coursera at a higher rate than users of other platforms, which makes sense because all three platforms came out around the same time and began as purely free sites. They have each since added various paid course options, but also maintain a healthy amount of free content. Udemy and DataCamp require paying money to access most content, so it makes sense that users of those platforms would be somewhat less likely to branch out, as they have incentive to make the most of their investment by sticking to the platform they pay for.

MOOC Platform Importance 


Now that we have a sense of which MOOC platforms Kagglers use, let's consider survey question 37: "On which platforms have you spent the most time?" Since we know almost 30% of MOOCers on Kaggle use a single platform and that Coursera is the most popular platform, Coursera probably leads in terms of usage time as well. Let's investigate with a plot.

Expectation:

  • Coursera will be the most heavily-used platform.

Reality:

  • Almost 40% of Kagglers spend the most time on Coursera.
  • The paid sites Udemy and DataCamp are cited as most-used platforms more often than the other second tier MOOC platforms.
It could be that Coursera dominates because it is well known, but when users find alternatives, they end up spending more time on other sites. Let's compare the top 5 MOOC platforms external to Kaggle head to head, only looking at users who actually use each pair of platforms.

Expectation:

  • Users of the paid sites Udemy and DataCamp will tend spend more time using those platforms than they do on non-paid sites.

Reality:

  • Coursera dominates in head-to-head match-ups of total time usage, even against paid MOOC platforms.
Since Coursera is a long-established platform that was first to the market with quality data science content, it makes sense that Kagglers have used it the most. It would be interesting to know how much Kagglers have used the various MOOC platforms on a shorter time scale, such as the in past 6 months or past year. This would give us a better idea of which platforms are currently popular, rather than potentially skewing results in favor of platforms like Coursera that have offered data science content the longest.

MOOC Platform Demographics 


Now that we have a sense of overall MOOC platform participation, let's investigate further by breaking it down by the subgroups we explored in the first section: gender, age, country and programming language preference. Let's start with the gender breakdown.

Expectation:

  • Platform preferences do not vary by gender.

Reality:

  • DataCamp is the only platform of the top 6 that female Kagglers participate in at a higher rate males.
  • Males are more than twice as likely to use Fast.AI.
DataCamp is the only subscription-based platform of the bunch, which suggests a monthly fee structure may be more appealing to female Kagglers than males.
Next let's investigate MOOC platform participation by Age. We'll drop learning platforms outside of Coursera and the "second tier" for readability.

Expectation:

  • Platform participation will not change much across age groups, since earlier we saw that overall MOOC participation is steady among Kagglers aged 18 to 60.

Reality:

  • MOOC platform participation rates generally rise with the age from age 18 to 34.
  • MOOC platform participation remains fairly consistent through the middle ages.
  • Platform participation drops off among the oldest Kagglers.
The fact that MOOC platform usage rises on the whole from 18 to 34 makes sense intuitively, because the more experience you have, the more time you've had to try different things. Younger Kagglers may also be busy with university, giving them less time to branch out and try many different MOOC platforms.

Let's continue our exploration by breaking MOOC platform use down by country. In the interest of readability, we will look at the top 16 countries in terms of total responses recorded.

Expectation:

  • Coursera will dominate in total participation across countries.

Reality:

  • MOOC platforms have similar participation rates across many countires in North America and Europe, with the USA, Canada, UK and Germany having Coursera participation rates around 40% and second tier platform participation rates in the 15 to 20% range.
  • Russian Kagglers participate in Coursera at the highest rate of any of the top 16 countries, but also have the lowest usage of Udemy and the second lowest usage of DataCamp.
  • Kagglers from China and Japan both report relatively low usage of MOOC platforms outside of Kaggle Learn.
These observations suggest that paid content and language could be significant barriers to MOOC platform participation in certain countries.  
Finally, let's make a plot breaking down MOOC platform participation by programming language preferences.

Expectation:

  • Python users will have relatively high participation rates in Coursera and Udacity, since those platforms offer many courses in Python.

Reality:

  • Coursera is the most popular platform across all languages and Udacity participation is the highest among Python users.
  • MOOC platform participation tends to be higher across the board among Python, R and SQL users.
  • 42% of R users also use DataCamp, while only 14% of R users use Udacity.
Since MOOCs tend to use the most popular programming languages to teach data science content, it isn't surprising that Python and R users report relatively high usage of most MOOC platforms. Kagglers who use less common data science languages likely have to rely more heavily upon learning sources like work, university and self-teaching to learn what isn't available on MOOCs.

Which MOOC Platforms Do Kagglers Use: Key Findings


To conclude this section, let's summarize our most important observations on MOOC platform use:
  • Coursera dominates as the most popular MOOC platform across all demographics.
  • A second tier of MOOC platforms--Udemy, DataCamp, edX, Udacity and Kaggle Learn--have similar participation rates.
  • Most MOOCers learn from more than one online course platform.
  • Of the Demographics we explored, MOOC platform participation varies most noticeably by country.

How Do Kagglers Perceive MOOCs? 

Our exploration thus far has shown us where Kagglers learn and which MOOC platforms are most popular. In this final section, we'll explore how Kagglers view the quality of MOOCs as a learning source versus traditional education. To this end, we will investigate survey question 39: "How do you perceive the quality of online learning platforms as compared to the quality of the education provided by traditional brick and mortar institutions? "

MOOC Perception vs Traditional Education 


Let's begin our exploration of MOOC quality by looking at the totals for each response to question 39, filtering out users who did not respond. Up till now, my experience with Kaggle and MOOCs have given me some idea of what to expect from the data, but I have no idea how people view MOOCs compared to traditional education. Thus for this subsection, I will not note my expectations before making plots.

Observations

  • Over half of Kagglers view MOOCs as being slightly better or much better than brick and mortar institutions.
  • Fewer than 15% of Kagglers view MOOCs as as worse than brick and mortar institutions.
The plot shows Kagglers generally hold a favorable opinion of online courses vs traditional learning. It isn't clear, however, whether it is the high opinion of MOOC quality that led to this result or a low opinion of brick and mortar institutions. This is not a question we can answer with the data we have at hand, but we can investigate whether perceptions vary based on Kagglers' most used learning sources and MOOC platforms.
Let's plot perceptions of MOOCs again, breaking them down by Kagglers' most-used learning sources.

Observations

  • 70% of Kagglers who use MOOCs as their primary learning source consider MOOCs to be higher quality than traditional brick and mortar institutions.
  • Kagglers who use university as a primary learning source have mixed opinions of MOOCs, with 35% viewing MOOCs as better than traditional institutions and 25% viewing MOOCs as worse than traditional institutions.

Next, let's see how perceptions of MOOCs vary according to Kagglers' most-used MOOC platforms.

Observations

  • Coursera and DataCamp users had similar perceptions of MOOC quality with around 55% of users regarding MOOCs as better than brick and mortar institutions.
  • 65% of Udemy and Udacity users perceived MOOCs as higher quality than traditional education.
Interestingly, Kaggle Learn users were the most likely to report MOOCs as being "slightly better" than brick and mortar institutions. This perhaps suggests Kaggle Learn has some good content but could benefit from expanding the range and depth of its course offerings to convert more users from the lukewarm "slightly better" response to "much better".
Based on our earlier explorations, we know that many users of "second tier" MOOC platforms use multiple learning portals for taking online courses. Let's investigate how perceptions of MOOCs vs traditional education vary by the number of MOOC platforms Kagglers use.

Observations

  • Fewer than half of respondents who used a single MOOC platform view MOOCs as better than brick and mortar institutions and 11% had no opinion.
  • The more MOOC platforms Kagglers use, the more likely they are to view MOOCs as better than brick and mortar institutions.
The plot shows that prolific MOOCers were unlikely to favor traditional education vs MOOCs. This suggests that as Kagglers try more MOOC platforms, they increasingly favor MOOCs vs traditional education, although it could be the case that users with low opinions of traditional education are predisposed to trying many MOOC platforms.

MOOC Perception Demographics 


We will finish our exploration of MOOCs by investigating how perceptions of online courses vary by the demographic subgroups we explored in the first two sections: gender, age, country and programming language preference. I expect that groups with high MOOC platform participation rates will also tend to have high opinions of MOOCs.
Let's begin by plotting MOOC perceptions by gender.

Expectation:

  • Males will have a higher opinion of MOOCs than females on average, since our earlier explorations showed female Kagglers have higher university participation rates and placed more importance on it.

Reality:

  • Male Kagglers held a slightly more positive view of MOOCs vs traditional learning institutions compared to females.

Next, let's break perceptions down by age.

Expectation:

  • Middle aged Kagglers will have a more positive view of MOOCs vs traditional learning than the youngest Kagglers, since our earlier explorations revealed that young Kagglers use relatively few MOOC platforms.

Reality:

  • 64% of users in the 18-21 age group and 58% of users in the 22-24 age group perceived MOOCs as better than traditional education.
  • Opinions of MOOCs were consistent among working-aged Kagglers with about half of users in the 25 to 60 age ranges perceiving MOOCs as better than traditional education.
The high opinion of MOOCs vs traditional education among younger Kagglers seems surprising given our earlier observations that younger users participate in fewer MOOC platforms and that more platform participation is associated with higher opinions of MOOCs. It goes to show that general trends in the population don't necessarily extend to specific subgroups within the data.
Now let's break perceptions of MOOCs down by country.

Expectation:

  • Kagglers from countries with high MOOC participation rates like India and Brazil will have higher than average perceptions of MOOCs vs traditional education.

Reality:

  • Kagglers from India have the highest opinion of MOOCs with 80% perceiving MOOC quality as better than traditional institutions.
  • Of the 16 most represented countries in the survey, Kagglers from the USA held the highest opinion of traditional brick and mortar institutions vs MOOCs with 35% of respondents favoring MOOCs and 27% favoring traditional learning.
  • Users from India, China and Japan reported very low opinions of traditional brick and mortar institutions vs MOOCs, with only 6 percent of respondents from each country favoring traditional education.
Intuitively, we would expect users from countries with ready access to high quality higher education to report higher opinions of brick and mortar institutions. This suggests countries with particularly low views of traditional education, either do not have many high quality brick and mortar institutions or perhaps that there is not adequate access to higher education.

Finally, let's see how perceptions of MOOCs vary with programming language preferences.

Expectation:

  • Users of less popular data science languages will have lower overall opinions of MOOCs vs traditional education than users of Python, R and SQL.

Reality:

  • Kagglers view MOOCs as higher quality than traditional education regardless of programming language preference.

Perception of MOOCs: Key Findings


Let's summarize our key findings about the perception of MOOCs: 
  • Over half of Kagglers view MOOCs as better quality than traditional education.
  • Perception of MOOC quality is highest among Kagglers who use many MOOC platforms.
  • Young Kagglers are the most likely to favor MOOCs over traditional education.
  • Kagglers from India, China and Japan held especially favorable opinions of MOOCs vs traditional education.

Conclusion 

MOOCs have come a long way since I took my first programming course on OpenCourseWare. As the accessibility of machine learning MOOCs has grown, they've established themselves as a staple learning source among the data science community, used by over 70% of Kagglers. Coursera is a dominant force in the MOOC market, with more than twice as many users on Kaggle than any other platform, but many MOOCers use multiple platforms. Kagglers also generally have positive perceptions of the quality of MOOCs versus traditional brick and mortar institutions, which is good news for online course providers as they seek to find sustainable business models that strike the right balance between free and paid content.
Online courses let you learn wherever you are, at your own pace, to fulfill your personal learning goals. They democratize education in an unprecedented way, allowing anyone with interest and internet access to take courses taught by top universities and industry experts. My hope is that our exploration of MOOC usage on Kaggle has given you new insights into the role of online courses in data science education and piqued your curiosity to try a new course, explore a new platform or perhaps to take a MOOC for the first time. There's little to lose and a lot to learn!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.