Introduction to R for Medical Students

A Practical Primer

Lathan Liou, MPhil

11/02/23

Objectives

Get R/RStudio on your computer
Be able to take input data and generate output results using R
- Be able to load libraries and import data
- Know what a vector and dataframe are
- Understand what a function is and how common functions work
- Learn data wrangling basics

Getting R and RStudio

Download R [Mac][Windows]
Download RStudio here
- RStudio is an Integrated Development Environment (IDE), which provides a visual and interactive interface to make coding in R easier
- R is the language maintained by volunteers whereas RStudio is a product maintained by a company called Posit

RStudio

R Packages

Base R comes installed, but power of R is being open source
install.packages("package_name") installs a package for the first time (only need to do once)
- The quotations “” are required
- For this session, run install.packages("tidyverse") in your console
library(package_name) loads a package into your current working session

High Yield Basic Concepts

Assignment: assign a value to an object name (e.g. x <- 10 )
- Note: your name shouldn’t have spaces. Instead use snake_case or camelCase or dot.case
Functions: your “action verbs”, which take in input argument and return output (e.g. mean())
Help: ?function_name

A Little More About Functions

x <- c(2,4,6)
mean(x)

[1] 4

This is how to write your own function:
- name of your function
- syntax: function() {}
- arguments: what you pass in as inputs

x <- c(2,4,6)
my_mean <- function(x) {
  # x is an argument
  out <- sum(x)/length(x)
  return(out)
}
my_mean(x)

[1] 4

Vectors

Basic data structure in R
Function c() combines its arguments into a vector
```
x <- c(2,4,6)
```
Indexing [] retrieves elements of a vector by position (or by name for a named vector)
```
x[2]
```
```
[1] 4
```
Vectors can consist of numbers, characters, dates, but you cannot mix data types (e.g. numbers and characters)
- structures_profs <- c("Ki Mak", "Jeffrey Laitman", "Dani Curcio")

Useful Vector Functions

length(): number of elements in vector

# number of elements in vector
length(x)

[1] 3

mean(): mean of elements in vector

# number of elements in vector
mean(x)

[1] 4

Be careful if you have NA values (which is fairly common in most datasets)

x2 <- c(2, 4, 6, NA)

# will return NA
mean(x2)

[1] NA

# will return what you're looking for
mean(x2, na.rm = TRUE)

[1] 4

Data Frames

Tidy data principles
- Each row is an observation
- Each column is a variable
- Each cell contains one value
How do data frames relate to vectors?
- Imagine a data frame as a bunch of vertical vectors next to each other

data.frame creates a data frame (also look at tibble)

df <- data.frame(x = c(2,4,6),
                 y = c(1,2,3))
df

Importing Data as Data Frames

Many formats of data
Common formats include .csv (comma), .tsv (tab), and .txt (space/tab)
- Read with readr package: readr::read_csv(), readr::read_tsv() or readr::read_delim()
- :: means namespace, which tells R in which package to look for the function
Software-specific formats include:
- Excel (.xls, .xlsx)
  - Read with readxl package: readxl::read_excel()
- Stata (.dta)
  - Read with haven package: haven::read_dta()

Viewing your Data

Either click the object in the Environment panel
Or use the View() function (it’s cleaner to type this into your console)
Use str() to understand data types (numeric, character, date, etc.) in your data
Use names() to view row names and colnames() to view column names of a dataframe

Accessing your Data

How do you select specific columns?

Either use $ operator or [[ ]] operator

# head() truncates output
head(iris)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

head(iris$Sepal.Length)

[1] 5.1 4.9 4.7 4.6 5.0 5.4

head(iris[["Sepal.Length"]])

[1] 5.1 4.9 4.7 4.6 5.0 5.4

Data Wrangling Verbs You Should Know

Disclaimer: A lot of functions will be introduced in the next couple of slides, so please bear with me. We will practice these afterwards and you can always refer to the cheatsheet referenced in the last slide.
select(): select variables you want to keep
filter(): select rows you want to keep based on condition(s)
mutate(): create or modify variables

More Data Wrangling Verbs You Should Know

summarize(): compute summary statistics into a single row
count(): tabulate counts for each level of variable
group_by(): useful in conjuction with `summarize()` and `count()`.

Even More Data Wrangling Verbs You Should Know

pivot_longer(): useful in conjuction with `summarize()` and `count()`
separate(): useful in conjuction with `summarize()` and `count()`
%>%: pipe function