2  Introduction to R

2.1 What is R?

R is a free, open-source programming language and environment specifically designed for statistical computing and data analysis. What makes R truly powerful is its modular design - R consists of a core system that provides basic functionality, which is then extended through thousands of specialized packages contributed by statisticians, data scientists, and researchers worldwide.

2.2 The package ecosystem: R’s superpower

R’s strength lies in its package system. Think of R as a smartphone: the base R installation is like the phone’s operating system, providing essential functions. Packages are like apps - each one adds specific capabilities for different tasks.

2.2.1 Base R vs. packages

Base R includes fundamental functions for: - Basic arithmetic and statistics - Data structures (vectors, data frames, lists) - Simple graphics - File input/output

Packages extend R with specialized tools for: - Advanced statistical methods - Machine learning algorithms - Data visualization - Web scraping - Bioinformatics - Finance and economics - PX-files - And much more!

2.2.2 CRAN: The package repository

The Comprehensive R Archive Network (CRAN) hosts over 19,000 packages. This massive ecosystem means that whatever data analysis task you’re facing, someone has likely created a package to help.

# Install packages from CRAN
install.packages("tidyverse")    # Collection of data science packages

2.3 Key package collections

2.3.1 The tidyverse

The tidyverse is a collection of packages designed for data science with a consistent philosophy and grammar:

library(tidyverse)

# This loads 8 core packages:
# - ggplot2: data visualization
# - dplyr: data manipulation
# - tidyr: data tidying
# - readr: data import
# - purrr: functional programming
# - tibble: modern data frames
# - stringr: string manipulation
# - forcats: working with factors

R has package ecosystems for virtually every field. So if you need a package in a specific field, browse the web to find it.

2.4 Basic R concepts (built on packages)

2.4.1 Data structures

Even basic data operations benefit from packages:

# Base R data frame
students_base <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(20, 22, 19),
  grade = c(85, 92, 78)
)

# With tibble (tidyverse package) - enhanced data frames
library(tibble)
students_tibble <- tibble(
  name = c("Alice", "Bob", "Charlie"),
  age = c(20, 22, 19),
  grade = c(85, 92, 78)
)

students_tibble  # Better printing and behavior
# A tibble: 3 × 3
  name      age grade
  <chr>   <dbl> <dbl>
1 Alice      20    85
2 Bob        22    92
3 Charlie    19    78

2.4.2 Data manipulation

Base R can manipulate data, but packages make it easier:

# Base R
high_grades <- students_base[students_base$grade > 80, ]
mean_age <- mean(students_base$age)

# With dplyr (tidyverse package) 
library(dplyr)
high_grades <- students_tibble %>% 
  filter(grade > 80)

mean_age <- students_tibble %>% 
  summarise(avg_age = mean(age))

2.5 Package management

2.5.1 Installing and loading packages

# Install once per computer
install.packages("dplyr")

# Load each R session
library(dplyr)

# Alternative loading method
require(dplyr)

# Install multiple packages at once
install.packages(c("dplyr", "ggplot2", "readr"))

2.6 Why the package approach works

  1. Specialization: Experts in each field contribute domain-specific tools
  2. Quality control: CRAN has submission standards ensuring package quality
  3. Community-driven: Thousands of contributors improve and maintain packages
  4. Modularity: Only load what you need, keeping R fast and efficient
  5. Innovation: New methods are quickly available as packages

2.7 Getting started workflow

  1. Install base R and RStudio
  2. Identify your needs (data visualization, statistical modeling, etc.)
  3. Find relevant packages using CRAN Task Views or online resources
  4. Install and explore packages with documentation and examples
  5. Combine packages to build powerful analysis workflows

R’s package-centric design means you’re never starting from scratch - you’re building on the work of thousands of experts who’ve solved similar problems. This collaborative approach makes R incredibly powerful for data analysis across virtually any domain!