# Install packages from CRAN
install.packages("tidyverse") # Collection of data science packages
2 Introduction to R
2.1 What is R?
R is a free, open-source programming language and environment specifically designed for statistical computing and data analysis. What makes R truly powerful is its modular design - R consists of a core system that provides basic functionality, which is then extended through thousands of specialized packages contributed by statisticians, data scientists, and researchers worldwide.
2.2 The package ecosystem: R’s superpower
R’s strength lies in its package system. Think of R as a smartphone: the base R installation is like the phone’s operating system, providing essential functions. Packages are like apps - each one adds specific capabilities for different tasks.
2.2.1 Base R vs. packages
Base R includes fundamental functions for: - Basic arithmetic and statistics - Data structures (vectors, data frames, lists) - Simple graphics - File input/output
Packages extend R with specialized tools for: - Advanced statistical methods - Machine learning algorithms - Data visualization - Web scraping - Bioinformatics - Finance and economics - PX-files - And much more!
2.2.2 CRAN: The package repository
The Comprehensive R Archive Network (CRAN) hosts over 19,000 packages. This massive ecosystem means that whatever data analysis task you’re facing, someone has likely created a package to help.
2.3 Key package collections
2.3.1 The tidyverse
The tidyverse is a collection of packages designed for data science with a consistent philosophy and grammar:
library(tidyverse)
# This loads 8 core packages:
# - ggplot2: data visualization
# - dplyr: data manipulation
# - tidyr: data tidying
# - readr: data import
# - purrr: functional programming
# - tibble: modern data frames
# - stringr: string manipulation
# - forcats: working with factors
R has package ecosystems for virtually every field. So if you need a package in a specific field, browse the web to find it.
2.4 Basic R concepts (built on packages)
2.4.1 Data structures
Even basic data operations benefit from packages:
# Base R data frame
<- data.frame(
students_base name = c("Alice", "Bob", "Charlie"),
age = c(20, 22, 19),
grade = c(85, 92, 78)
)
# With tibble (tidyverse package) - enhanced data frames
library(tibble)
<- tibble(
students_tibble name = c("Alice", "Bob", "Charlie"),
age = c(20, 22, 19),
grade = c(85, 92, 78)
)
# Better printing and behavior students_tibble
# A tibble: 3 × 3
name age grade
<chr> <dbl> <dbl>
1 Alice 20 85
2 Bob 22 92
3 Charlie 19 78
2.4.2 Data manipulation
Base R can manipulate data, but packages make it easier:
# Base R
<- students_base[students_base$grade > 80, ]
high_grades <- mean(students_base$age)
mean_age
# With dplyr (tidyverse package)
library(dplyr)
<- students_tibble %>%
high_grades filter(grade > 80)
<- students_tibble %>%
mean_age summarise(avg_age = mean(age))
2.5 Package management
2.5.1 Installing and loading packages
# Install once per computer
install.packages("dplyr")
# Load each R session
library(dplyr)
# Alternative loading method
require(dplyr)
# Install multiple packages at once
install.packages(c("dplyr", "ggplot2", "readr"))
2.6 Why the package approach works
- Specialization: Experts in each field contribute domain-specific tools
- Quality control: CRAN has submission standards ensuring package quality
- Community-driven: Thousands of contributors improve and maintain packages
- Modularity: Only load what you need, keeping R fast and efficient
- Innovation: New methods are quickly available as packages
2.7 Getting started workflow
- Install base R and RStudio
- Identify your needs (data visualization, statistical modeling, etc.)
- Find relevant packages using CRAN Task Views or online resources
- Install and explore packages with documentation and examples
- Combine packages to build powerful analysis workflows
R’s package-centric design means you’re never starting from scratch - you’re building on the work of thousands of experts who’ve solved similar problems. This collaborative approach makes R incredibly powerful for data analysis across virtually any domain!