Back to Homepage

Introduction to R

Tutorial objectives

The objectives of this tutorial are to:

  • Download and install R
  • Download and install RStudio
  • Learn the basics of R including:
    • basic mathmatical functions
    • variable assignment
    • installing packages
    • loading packages
  • Be introduced to data visualization

Downloading and installing R

The first step is to download and install R, which is freely available and can be accessed here. You can choose any of the “mirrors” you want, but it is best to use the one that is closest to you geographically (e.g., the one at Oregon State University).

Downloading and installing RStudio

The second step is to download and install Rstudio, which is a very nice integrated development environment (IDE) for R. In a nutshell, it makes working with R much easier. You will want to install the Rstudio Desktop version that is appropriate for your operating system. After you have installed R and Rstudio, you can proceed to the next step.

Getting started with R

R is a programming language that was developed to help researchers analyze quantitative data (e.g., do statistics). As such, R can be used for conducting both simple mathematical functions and complex statistical analyses. See below for some of the simple things that you can do with R. Note that any code directly preceded with a “#” is ignored by R.

addition:

1+2
## [1] 3

multiplication:

5*4
## [1] 20

division:

18/32
## [1] 0.5625

power calculations:

5^2
## [1] 25

square root calculations:

sqrt(25)
## [1] 5

In short, you can use R as a calculator if you wish.

Assigning variables

We can also save values (or other objects) by assigning them to an arbitrary variable. We do this using “<-”.

VarName1 <- 5^2

Once we have saved a value (or other object) we can use it in later in various ways.

print(VarName1) #display the value using the "print" function
## [1] 25
VarName1 - 5
## [1] 20

Loading and installing packages

R comes loaded with a large number of helpful packages and datasets. To access these datasets, we use the function library() which takes a package name as an argument.

library("psych")

If you haven’t installed the package “psych”, then you will need to install it, which you can do using the function install.packages(), which takes the name of the package you want to install as an argument.

install.packages("psych")

After installing the package, we can then load it. Note that we do NOT have to install packages each time we use R (we only have to do that once). We do, however, have to load the package each time we use R.

library("psych")

If we want to get details regarding the use of a particular package, we can use the help() function, which will open the documentation for the chosen package.

help(psych)

Playing with some data in R

R comes with a number of datasets pre-installed. Later on, we will load our own datasets, but for now, lets play with one that comes with R, called “mtcars”. First, lets see what kind of data mtcars comprises. We can do this using the help() function.

help(mtcars)

After running this code, you should see a description of the dataset in a separate window. As noted in that window, mtcars comprises data from Motor Trend magazine’s tests of 32 cars in 1973-1974. We can get a statistical summary of this data by using the function summary()

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

As we can see above, the dataset includes 11 characteristics for the included 32 cars. We can easily plot the relationship between some of these characteristics using the function plot(). Note that we can access particular variables in our data by using the dataframe name (e.g., mtcars) followed by a dollar sign ($) and the variable name.

plot(mtcars$mpg,mtcars$hp)

This plot seems to show a negative relationship between a car’s horsepower and its fuel efficiency (MPG), which is likely what we would expect.

Installing and testing ggplot2

One particularly useful package that we will be using a lot this term is ggplot. The appropriate command for installing ggplot2 is included below.

install.packages("ggplot2")
library("ggplot2")
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

Once ggplot is installed and loaded, we can quickly make nicer (and more sophisticated) plots. Below is an example of a rather simple one - a scatter plot with a line of best fit. We will learn more about ggplot in upcoming classes!

ggplot(data = mtcars, aes(mpg,hp)) +
  geom_point() +
  geom_smooth(method = lm)
## `geom_smooth()` using formula 'y ~ x'

The end (for now)

This is the end of our introduction to R for now. More will follow in upcoming classes.