Combining Metrics with metric_set()
metric_set() allows you to combine multiple yardstick metrics into a single function that calculates all of them at once. This is more efficient than calling metrics individually and integrates seamlessly with tidymodels workflows.
Note for Source Development: If you’re contributing directly to the yardstick package, see how
metric_set()validates and combines metrics internally. See the Source Development Guide for details.
Overview
Use when:
You want to calculate multiple metrics on the same data
You’re using tidymodels workflows (tune, recipes, workflows)
You want to avoid repeating metric calculations
You need consistent metric evaluation across resamples
Key benefits:
Efficiency: Shared calculations performed once (e.g., confusion matrix)
Convenience: One function call instead of many
Integration: Works with tune package for model tuning
Consistency: All metrics use same data preprocessing
Implementation:
Metric set creation:
R/metric_set.R(implementsmetric_set()and validation)Compatibility checking: Validates metric types can be combined
Function generation: Creates composite function that calls each metric
Usage in tidymodels:
Tune integration: Used by
tune_grid(),tune_bayes()for model evaluationResampling: Applied consistently across all resamples
Workflow integration: Works with
fit_resamples(),last_fit()
Test patterns:
Metric set creation:
tests/testthat/test-metric_set.RCompatibility validation: Tests for valid/invalid metric combinations
Integration tests: Tests with tune and workflows packages
Basic Usage
library(yardstick)
# Create a metric set
my_metrics <- metric_set(rmse, rsq, mae)
# Use it like any other metric function
my_metrics(data, truth = actual, estimate = predicted)
# Returns tibble with all metrics:
# .metric .estimator .estimate
# rmse standard 0.123
# rsq standard 0.891
# mae standard 0.098Compatibility Rules
Metrics in a set must be compatible. You can mix:
✓ Valid Combinations
1. All numeric metrics:
numeric_metrics <- metric_set(rmse, mae, rsq, huber_loss)2. Mix of class and class probability metrics:
class_metrics <- metric_set(accuracy, precision, recall, roc_auc, pr_auc)3. Mix of survival metrics (any combination):
surv_metrics <- metric_set(
concordance_survival, # Static
brier_survival, # Dynamic
brier_survival_integrated # Integrated
)✗ Invalid Combinations
Cannot mix metric types:
# ERROR: Cannot mix numeric and classification
metric_set(rmse, accuracy)
# ERROR: Cannot mix classification and survival
metric_set(accuracy, concordance_survival)Function Signatures by Type
The returned function has different arguments depending on metric types:
Numeric Metrics
regression_metrics <- metric_set(rmse, mae, rsq)
# Signature:
regression_metrics(
data,
truth,
estimate,
na_rm = TRUE,
case_weights = NULL,
...
)
# Usage:
regression_metrics(test_data, truth = y, estimate = y_pred)Class/Probability Metrics
class_metrics <- metric_set(accuracy, roc_auc, pr_auc)
# Signature:
class_metrics(
data,
truth,
..., # For probability columns
estimate, # Must be named!
estimator = NULL,
na_rm = TRUE,
event_level = yardstick_event_level(),
case_weights = NULL
)
# Usage - note estimate is named:
class_metrics(
test_data,
truth = obs,
VF:L, # Probability columns
estimate = pred # Named argument!
)Survival Metrics
surv_metrics <- metric_set(concordance_survival, brier_survival)
# Signature:
surv_metrics(
data,
truth,
..., # For survival predictions
estimate, # Named for time predictions
na_rm = TRUE,
case_weights = NULL
)Important: Named estimate Argument
⚠️ For class/probability and survival metric sets, you MUST name the estimate argument.
class_metrics <- metric_set(accuracy, roc_auc)
# ✓ Correct
class_metrics(data, truth = obs, estimate = pred)
# ✗ Wrong - estimate captured by ...
class_metrics(data, truth = obs, pred)
# Error: Can't find estimate columnWhy? The estimate argument comes after ... in the signature, so unnamed arguments get captured by ....
Working with Groups
Metric sets respect dplyr::group_by():
metrics <- metric_set(accuracy, kap, roc_auc)
# Compute metrics for each resample
hpc_cv |>
group_by(Resample) |>
metrics(truth = obs, VF:L, estimate = pred)
# Returns one row per metric per group:
# .metric .estimator .estimate Resample
# accuracy multiclass 0.709 Fold01
# kap multiclass 0.583 Fold01
# roc_auc hand_till 0.901 Fold01
# accuracy multiclass 0.713 Fold02
# ...Using metric_tweak() with metric_set()
Use metric_tweak() to set custom defaults for metrics before adding them to a set:
# Create tweaked version with custom parameter
f2_meas <- metric_tweak("f2_meas", f_meas, beta = 2)
mase12 <- metric_tweak("mase12", mase, m = 12)
# Add to metric set
my_metrics <- metric_set(
precision,
recall,
f_meas, # Default beta = 1
f2_meas # Custom beta = 2
)
my_metrics(data, truth = obs, estimate = pred)
# Both f_meas and f2_meas calculated with different beta valuesWhy this matters: Once metrics are in a set, you can’t change their parameters. Tweak them first.
Complete Examples
Regression Workflow
library(yardstick)
library(dplyr)
# Define metric set
regression_metrics <- metric_set(
rmse,
mae,
rsq,
huber_loss
)
# Use on test data
results <- regression_metrics(
solubility_test,
truth = solubility,
estimate = prediction
)
results
# .metric .estimator .estimate
# rmse standard 0.789
# mae standard 0.582
# rsq standard 0.892
# huber_loss standard 0.341Classification with Probabilities
# Mix class and probability metrics
class_metrics <- metric_set(
accuracy,
precision,
recall,
f_meas,
roc_auc,
pr_auc
)
# Use with class probabilities
results <- class_metrics(
two_class_example,
truth = truth,
Class1, # Probability column
estimate = predicted
)
results
# .metric .estimator .estimate
# accuracy binary 0.838
# precision binary 0.819
# recall binary 0.875
# f_meas binary 0.846
# roc_auc binary 0.939
# pr_auc binary 0.946Multiclass Classification
multi_metrics <- metric_set(
accuracy,
bal_accuracy,
kap,
roc_auc,
precision,
recall
)
# Specify macro averaging for precision/recall
hpc_cv |>
multi_metrics(
truth = obs,
VF:L, # Probability columns
estimate = pred,
estimator = "macro"
)Cross-Validation
library(rsample)
# Define metrics once
cv_metrics <- metric_set(rmse, rsq, mae)
# Use across all folds
cv_results <- vfold_cv(training_data, v = 10) |>
mutate(
metrics = map(splits, function(split) {
# Fit model and predict
model <- fit_model(analysis(split))
preds <- predict(model, assessment(split))
# Calculate all metrics at once
cv_metrics(
assessment(split),
truth = outcome,
estimate = preds
)
})
)
# Aggregate across folds
cv_results |>
unnest(metrics) |>
group_by(.metric) |>
summarize(mean = mean(.estimate), se = sd(.estimate))With Groupwise Metrics
# Create groupwise metric
accuracy_diff <- new_groupwise_metric(
fn = accuracy,
name = "accuracy_diff",
aggregate = function(x) diff(range(x$.estimate))
)
# Combine with regular metrics
fairness_metrics <- metric_set(
accuracy,
precision,
recall,
accuracy_diff(protected_attr) # Add groupwise metric
)
fairness_metrics(data, truth = obs, estimate = pred)Creating Custom Metrics for metric_set()
To use your custom metric in a set, wrap it with the appropriate new_*_metric():
# Define your metric function
my_custom_metric <- function(data, truth, estimate, na_rm = TRUE, ...) {
# Implementation
# ...
tibble(
.metric = "my_custom",
.estimator = "standard",
.estimate = result
)
}
# Wrap with new_*_metric() - required for metric_set()
my_custom_metric <- new_numeric_metric(
my_custom_metric,
direction = "maximize"
)
# Now it works in metric sets
my_metrics <- metric_set(rmse, mae, my_custom_metric)Key requirements: 1. Must be wrapped with new_*_metric() 2. Must follow standard yardstick signature patterns 3. Must return standard yardstick output format
Using with tune Package
Metric sets integrate with tune for model tuning:
library(tune)
library(workflows)
# Define metrics for tuning
tune_metrics <- metric_set(
rmse,
rsq,
mae
)
# Use in tune_grid()
tune_results <- tune_grid(
workflow,
resamples = cv_folds,
grid = param_grid,
metrics = tune_metrics # Pass metric set
)
# Best models selected based on all metrics
show_best(tune_results, metric = "rmse")Performance Benefits
Metric sets are more efficient than individual calls:
# Inefficient - confusion matrix calculated 3 times
accuracy(data, truth, estimate)
precision(data, truth, estimate)
recall(data, truth, estimate)
# Efficient - confusion matrix calculated once, shared
metrics <- metric_set(accuracy, precision, recall)
metrics(data, truth, estimate)Shared calculations:
Confusion matrices (for class metrics)
ROC curves (for ROC-based metrics)
Group-by operations
Missing value handling
Advanced Patterns
Conditional Metrics
# Select metrics based on data
metrics <- if (is_binary) {
metric_set(accuracy, sensitivity, specificity, roc_auc)
} else {
metric_set(accuracy, bal_accuracy, kap)
}
metrics(data, truth = obs, estimate = pred)Parameterized Sets
create_metric_set <- function(include_auc = TRUE) {
base_metrics <- c(accuracy, precision, recall)
if (include_auc) {
base_metrics <- c(base_metrics, list(roc_auc))
}
do.call(metric_set, base_metrics)
}
# Use
metrics <- create_metric_set(include_auc = TRUE)Multiple Tweaked Versions
# Different F-measures
f0.5 <- metric_tweak("f0.5_meas", f_meas, beta = 0.5)
f1 <- f_meas
f2 <- metric_tweak("f2_meas", f_meas, beta = 2)
# All in one set
f_metrics <- metric_set(f0.5, f1, f2)Troubleshooting
Error: Cannot mix metric types
# Error
metric_set(rmse, accuracy)Solution: Keep metrics of compatible types together.
Error: estimate not found
# Wrong
class_metrics(data, truth, pred)Solution: Name the estimate argument:
class_metrics(data, truth, estimate = pred)Error: Metric doesn’t work in set
my_metric <- function(data, truth, estimate) { ... }
metric_set(rmse, my_metric) # ErrorSolution: Wrap custom metrics:
my_metric <- new_numeric_metric(my_metric, direction = "minimize")
metric_set(rmse, my_metric) # WorksBest Practices
- Define once, use everywhere: Create metric sets at the top of your analysis
- Name your sets: Use descriptive names like
classification_metrics, notmetrics - Use with groups: Leverage group-aware behavior for cross-validation
- Tweak before combining: Set custom parameters with
metric_tweak()first - Keep compatible types: Don’t mix numeric, class, and survival metrics
- Named estimate: Always name the
estimateargument for class/survival metrics - Integration: Use with tune package for consistent tuning metrics
Common Metric Sets
# Standard regression
regression_std <- metric_set(rmse, mae, rsq)
# Regression with alternatives
regression_robust <- metric_set(mae, huber_loss, mape)
# Binary classification
binary_clf <- metric_set(
accuracy, sensitivity, specificity,
roc_auc, pr_auc
)
# Multiclass classification
multiclass_clf <- metric_set(
accuracy, bal_accuracy, kap,
roc_auc # Uses hand_till method for multiclass
)
# Survival analysis
survival_std <- metric_set(
concordance_survival,
brier_survival,
brier_survival_integrated
)
# Fairness analysis
fairness_set <- metric_set(
accuracy,
demographic_parity(group),
equal_opportunity(group)
)See Also
Metric System - Understanding basic metric architecture
Groupwise Metrics - Creating disparity metrics
metric_tweak() - Customizing metric parameters
?metric_set- Full documentation