Monday, July 13, 2015

Introduction to R Part 1: R Setup


This is the first of what I envision as an ongoing series of posts aimed at providing an introduction to the R programming language for data analysis and predictive modeling. This guide does not assume prior programming experience and focuses mainly on the use of R as a tool for data analysis rather than programming. This guide is meant to be practical and actionable above all else so it will focus on giving you the tools you need to work with data rather than delving into low-level details of the language itself.

R is an open source programming language that is popular for statistics, data analysis and data science. R was built for any by statisticians, so many common statistical operations are built right into the base language, making it an great tool for data analysis. Only Python rivals R’s popularity for data exploration, analysis and predictive modeling. This introduction is not going to spend much time discussing Python or its pros and cons vs. R but suffice it to say both languages are extremely useful and it is worth your time to learn both. In my experience, it is easier to learn how to program in Python, but it is easier to get started with data analysis in R because all the tools you need are either baked in or one simple download away.

Before getting into the meat of the guide, you’ll need to get R set up on your computer. Thankfully, R installation and package management is simple. You can download R and many of the most popular R packages from CRAN, the Comprehensive R Archive Network. Just click the link, download the appropriate version of R for your operating system listed under “Download and Install R” and run the installer.

Getting an editor for R is equally simple. RStudio is the defacto standard for editing and interacting with R: everyone uses it. Just click the link, download the appropriate installer for your operating system under the list “Installers for Supported Platforms” and run installer. Once you’ve installed RStudio, launch it to start an interactive R programming session.


A brief intro to RStudio

When you open RStudio, you’ll see a window with 4 panes:



The bottom left pane is the interactive R console that you can use to type commands into R and view output. Try typing in a simple math expression like 5*3 and then press enter:




*Note: You can press the up arrow key when using the console to cycle commands you entered previously.

The top left pane is a text editor you can use to write code, run code and save it for later. Under the "File" menu, point to "New File" and then select "R Script" to create a new code file:





Running simple commands in the interactive R console is quick and easy, but typing your code into the editor makes it easier to catch errors and rerun code. To run code from the editor, highlight code you have written and click the “Run” button near the top of the window or hold down the control key and press enter. The code and output will appear in the console window.

The bottom right pane is a utility window with tabs labeled: files, plots, packages and help. The main use of this window for our purposes will be to view plots and look at R documentation. If you're ever having trouble with an R function, you can view its documentation by typing the name of the function preceded by a question mark into the console. For instance, typing ?plot into the console pulls up documentation on R's plot function. Alternatively you can also use the help() function to call up documentation, such as by typing help(plot):




The top right pane is an object explorer that lists all of the various variables, data structures and functions you have loaded into your R environment. There's not much to see here right now, but the environment pane provides useful summary information about the data you’re working with. You can also click the History tab to view a searchable list of commands you've run.

Now that you have R and R Studio set up you’re ready to start learning how to use R for data analysis!

Next time: Introduction to R Part 2: R Arithmetic


*Special Note: I've added an Index page with links to all 30 parts of the Introduction to R series.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.