Intro to R Programming - Lession 1

发布于:2025-08-17 ⋅ 阅读:(14) ⋅ 点赞:(0)

Overview

This tutorial will introduce you to the fundamentals of R programming, focusing on core data structures and basic operations.


1. Getting Started

What is R?

  • Open-source programming language for statistical computing and graphics
  • Used widely in data analysis, machine learning, and research
  • Extensible through packages (CRAN has over 18,000 packages)

Setting Up

# Check R version
version$version.string

# Install a package (example)
install.packages("ggplot2")

# Load a package
library(ggplot2)

2. Basic Data Structures

Vectors

The most fundamental data structure in R - a sequence of elements of the same type.

# Create numeric vectors
x <- c(1, 3, 5, 7, 9)  # c() = combine function
print(x)
y <- 2:6               # Colon operator for sequences
print(y)

# Create character vectors
names <- c("Alice", "Bob", "Charlie")
print(names)

# Create logical vectors
logical_vec <- c(TRUE, FALSE, TRUE, TRUE)
print(logical_vec)

# Vector operations
x + 2       # Add 2 to each element
x * y       # Element-wise multiplication
sum(x)      # Sum of elements
mean(x)     # Mean of elements
length(x)   # Number of elements

# Indexing (R uses 1-based indexing)
x[3]        # Third element
x[c(1, 4)]  # First and fourth elements
x[x > 5]    # Elements greater than 5

Matrices

Two-dimensional arrays with elements of the same type.

# Create a matrix
mat <- matrix(1:12, nrow = 3, ncol = 4)
print(mat)

mat <- matrix(1:12, nrow = 3, byrow = TRUE)  # Fill by rows
print(mat)

# Matrix operations
dim(mat)       # Dimensions (rows, columns)
nrow(mat)      # Number of rows
ncol(mat)      # Number of columns

# Indexing matrices
mat[2, 3]      # Element at row 2, column 3
mat[, 2]       # Entire second column
mat[1:2, ]     # First two rows

# Matrix arithmetic
mat * 2        # Multiply all elements by 2
t(mat)         # Transpose matrix

Factors

Specialized vectors for categorical data (as discussed in “R in Action” Chapter 2).

# Create a factor
gender <- c("Male", "Female", "Male", "Male", "Female")
gender_factor <- factor(gender)
print(gender_factor)

# Examine the factor
gender_factor
levels(gender_factor)  # Categories
nlevels(gender_factor) # Number of categories

# Ordered factors
satisfaction <- c("Low", "Medium", "High", "Medium", "Low")

# These categories have a ranked relationship:"Low" < "Medium" < "High"
# Specified by the levels parameter.
satisfaction_factor <- factor(satisfaction, 
                             levels = c("Low", "Medium", "High"),
                             ordered = TRUE)

# Check ordering
satisfaction_factor[1] < satisfaction_factor[3]  # Should be TRUE

Data Frames

Tabular data structure (most commonly used in data analysis).

# Create a data frame
id <- 1:5
name <- c("Alice", "Bob", "Charlie", "Diana", "Eve")
age <- c(25, 30, 35, 40, 45)
salary <- c(50000, 60000, 70000, 80000, 90000)

# ?data.frame
# The function data.frame() creates data frames, tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R's modeling software.
employees <- data.frame(id, name, age, salary, stringsAsFactors = FALSE)

# Examine the data frame
head(employees)  # First few rows
str(employees)   # Structure of the data frame
summary(employees)  # Summary statistics

# Accessing elements
employees$age    # Age column
employees[, 3]   # Third column
employees[2, ]   # Second row
employees[employees$age > 30, ]  # Rows where age > 30

3. Basic Operations & Functions

Built-in Functions

# Math functions
sqrt(25)
log(10)
exp(1)
max(x)
min(x)
sd(x)  # Standard deviation

# Statistical functions
set.seed(123)  # For reproducibility
random_numbers <- rnorm(100)  # 100 random numbers from normal distribution
mean(random_numbers)
median(random_numbers)
quantile(random_numbers)

Control Structures

# If-else statements
x <- 10
if (x > 5) {
  print("x is greater than 5")
} else {
  print("x is less than or equal to 5")
}

# For loops
for (i in 1:5) {
  print(i^2)
}

# Apply functions (vectorized operations - preferred over loops)
sapply(1:5, function(x) x^2)

4. Practice Exercise

Let’s create and analyze a dataset about students:

# 1. Create the following vectors:
student_id <- 1:10
names <- c("Anna", "Ben", "Claire", "Dan", "Eva", 
           "Frank", "Grace", "Henry", "Ivy", "Jack")
gender <- factor(c("F", "M", "F", "M", "F", "M", "F", "M", "F", "M"))
test_scores <- c(85, 76, 92, 68, 90, 72, 88, 79, 95, 81)

# 2. Combine them into a data frame called 'students'
students <- data.frame(student_id, names, gender, test_scores)

# 3. Find the average test score
mean_score <- mean(students$test_scores)
print(paste("Average score:", mean_score))

# 4. Find how many students scored above 80
high_performers <- students[students$test_scores > 80, ]
n_high_performers <- nrow(high_performers)
print(paste("Number of high performers:", n_high_performers))

# 5. Calculate average score by gender
female_scores <- students$test_scores[students$gender == "F"]
male_scores <- students$test_scores[students$gender == "M"]
print(paste("Female average:", mean(female_scores)))
print(paste("Male average:", mean(male_scores)))

5. Resources & Next Steps

  • “R in Action” by Robert Kabacoff (Chapters 1-3 for further study)
  • R Documentation: ?function_name or help(function_name)
  • CRAN: https://cran.r-project.org/
  • Online tutorials: RStudio Cheatsheets, DataCamp, Coursera