Working with Confusion Matrices
Understanding how to work with confusion matrices is essential for implementing classification metrics in yardstick.
Creating Confusion Matrices
Use yardstick_table() to create weighted confusion matrices:
xtab <- yardstick::yardstick_table(truth, estimate, case_weights = case_weights)What yardstick_table Returns
yardstick_table() returns a base R table object (which is technically an array):
xtab <- yardstick_table(truth, estimate)
class(xtab)
# [1] "table"
# It's a 2D array with dimnames
dimnames(xtab)
# $truth
# [1] "yes" "no"
#
# $estimate
# [1] "yes" "no"Structure: - Rows represent actual truth values - Columns represent predicted estimate values - Cell values are counts (or weighted counts if case_weights provided)
How Case Weights are Incorporated
When you provide case_weights, they’re summed within each cell:
truth <- factor(c("A", "A", "B", "B"))
estimate <- factor(c("A", "B", "A", "B"))
weights <- c(1, 2, 3, 4)
xtab <- yardstick_table(truth, estimate, case_weights = weights)
# estimate
# truth A B
# A 1 2 # Weights for correct/incorrect "A" predictions
# B 3 4 # Weights for incorrect/correct "B" predictionsWithout weights, these would just be counts (1, 1, 1, 1).
Accessing Elements Correctly
Critical: Use character names, not integer indices:
# Good: use level names
tp <- xtab["yes", "yes"]
fp <- xtab["no", "yes"]
# Bad: numeric indices can be confusing
tp <- xtab[1, 1] # Which level is row 1?Pattern for binary metrics
# Determine event and control levels
event <- if (identical(event_level, "first")) {
levels(truth)[1]
} else {
levels(truth)[2]
}
control <- setdiff(levels(truth), event)
# Access using names
tp <- xtab[event, event] # True positives: actual event, predicted event
fp <- xtab[control, event] # False positives: actual control, predicted event
fn <- xtab[event, control] # False negatives: actual event, predicted control
tn <- xtab[control, control] # True negatives: actual control, predicted controlRemember: - First index (row) = actual truth - Second index (column) = prediction - Format: xtab[truth_value, predicted_value]
Confusion Matrix for Multiclass
For multiclass, the table is larger:
truth <- factor(c("A", "A", "B", "B", "C", "C"))
estimate <- factor(c("A", "B", "A", "C", "C", "A"))
xtab <- yardstick_table(truth, estimate)
# estimate
# truth A B C
# A 1 1 0
# B 1 0 1
# C 1 0 1Extracting per-class metrics
# True positives: diagonal elements
tp <- diag(xtab)
# [1] 1 0 1 (for A, B, C respectively)
# Total actual per class: row sums
actual_per_class <- rowSums(xtab)
# [1] 2 2 2
# Total predicted per class: column sums
predicted_per_class <- colSums(xtab)
# [1] 3 1 2
# False positives for each class: column sum minus diagonal
fp <- colSums(xtab) - diag(xtab)
# [1] 2 1 1
# False negatives for each class: row sum minus diagonal
fn <- rowSums(xtab) - diag(xtab)
# [1] 1 2 1Efficient multiclass patterns
Use vectorized operations:
# Good: matrix operations
tp <- diag(xtab)
fp <- colSums(xtab) - tp
fn <- rowSums(xtab) - tp
# Calculate per-class precision
precision <- tp / (tp + fp)
# Calculate per-class recall
recall <- tp / (tp + fn)Common Mistakes with Table Indexing
Mistake 1: Row/column confusion
# Correct understanding:
tp <- xtab[event, event] # actual event, predicted event
# Wrong interpretation:
# Thinking columns are truth - THEY ARE NOT
# Rows = actual truth, Columns = predictionsRemember: xtab[truth_value, predicted_value]
Mistake 2: Assuming numeric indices
# Fragile: depends on factor level order
tp <- xtab[1, 1]
fp <- xtab[2, 1]
# Robust: uses level names
tp <- xtab[event, event]
fp <- xtab[control, event]Mistake 3: Not handling zero counts
# When a cell is zero, it's still numeric
fp <- xtab[control, event] # Could be 0
# Don't need special handling for zeros
sensitivity <- tp / (tp + fn) # Works even if fn = 0 (gives Inf or NaN)Zero counts are valid and operations handle them correctly (may result in Inf or NaN which is expected).
Factor Level Ordering and the Table
The table rows and columns follow factor level order:
truth <- factor(c("B", "A"), levels = c("A", "B"))
estimate <- factor(c("B", "A"), levels = c("A", "B"))
xtab <- yardstick_table(truth, estimate)
# estimate
# truth A B
# A 1 0
# B 0 1
# Levels order matches table order:
rownames(xtab) # "A", "B"
colnames(xtab) # "A", "B"This is why it’s important to specify factor levels explicitly in tests.
Multiclass Averaging
Three types of averaging for multiclass metrics:
Macro averaging
Unweighted average across classes (equal weight for each class):
# Calculate per-class metric
per_class_precision <- tp / (tp + fp)
# Macro average: equal weight
wt <- rep(1, n_classes)
macro_precision <- weighted.mean(per_class_precision, wt)
# Or simply: mean(per_class_precision)Macro-weighted averaging
Weighted by class frequency in truth:
# Calculate per-class metric
per_class_precision <- tp / (tp + fp)
# Macro-weighted: weight by class frequency
class_counts <- colSums(xtab) # or rowSums, they should be equal for balanced data
macro_weighted_precision <- weighted.mean(per_class_precision, class_counts)Micro averaging
Pool all classes, calculate once:
# Aggregate first
total_tp <- sum(tp)
total_fp <- sum(fp)
# Then calculate
micro_precision <- total_tp / (total_tp + total_fp)Performance Tips
Cache confusion matrix calculations
Don’t recalculate the confusion matrix multiple times:
# Good: calculate once, reuse
metric_impl <- function(truth, estimate, estimator, event_level, case_weights) {
# Calculate confusion matrix once
xtab <- yardstick::yardstick_table(truth, estimate, case_weights = case_weights)
# Use it multiple times
metric_estimator_impl(xtab, estimator, event_level)
}
# Bad: calculate multiple times
metric_impl <- function(truth, estimate, estimator, event_level, case_weights) {
# Calculating in each helper call
binary_result <- metric_binary(truth, estimate, event_level, case_weights)
# Confusion matrix calculated again inside metric_binary
}Use matrix operations
# Good: vectorized operations
tp <- diag(xtab)
fp <- colSums(xtab) - tp
fn <- rowSums(xtab) - tp
# Avoid: looping
tp <- numeric(n_classes)
for (i in seq_len(n_classes)) {
tp[i] <- xtab[i, i]
}Testing with Confusion Matrices
When testing, create simple, explicit test data:
test_that("confusion matrix indexing works correctly", {
truth <- factor(c("yes", "yes", "no", "no"), levels = c("yes", "no"))
estimate <- factor(c("yes", "no", "yes", "no"), levels = c("yes", "no"))
xtab <- yardstick_table(truth, estimate)
# Verify structure
expect_equal(xtab["yes", "yes"], 1) # TP
expect_equal(xtab["no", "yes"], 1) # FP
expect_equal(xtab["yes", "no"], 1) # FN
expect_equal(xtab["no", "no"], 1) # TN
})Next Steps
- Implement class metrics: class-metrics.md
- Handle case weights: case-weights.md
- Understand the metric system: metric-system.md