Combining Metrics with metric_set()

metric_set() allows you to combine multiple yardstick metrics into a single function that calculates all of them at once. This is more efficient than calling metrics individually and integrates seamlessly with tidymodels workflows.

Note for Source Development: If you’re contributing directly to the yardstick package, see how metric_set() validates and combines metrics internally. See the Source Development Guide for details.

Overview

Use when:

You want to calculate multiple metrics on the same data
You’re using tidymodels workflows (tune, recipes, workflows)
You want to avoid repeating metric calculations
You need consistent metric evaluation across resamples

Key benefits:

Efficiency: Shared calculations performed once (e.g., confusion matrix)
Convenience: One function call instead of many
Integration: Works with tune package for model tuning
Consistency: All metrics use same data preprocessing

Implementation:

Metric set creation: R/metric_set.R (implements metric_set() and validation)
Compatibility checking: Validates metric types can be combined
Function generation: Creates composite function that calls each metric

Usage in tidymodels:

Tune integration: Used by tune_grid(), tune_bayes() for model evaluation
Resampling: Applied consistently across all resamples
Workflow integration: Works with fit_resamples(), last_fit()

Test patterns:

Metric set creation: tests/testthat/test-metric_set.R
Compatibility validation: Tests for valid/invalid metric combinations
Integration tests: Tests with tune and workflows packages

Basic Usage

library(yardstick)

# Create a metric set
my_metrics <- metric_set(rmse, rsq, mae)

# Use it like any other metric function
my_metrics(data, truth = actual, estimate = predicted)

# Returns tibble with all metrics:
# .metric .estimator .estimate
# rmse    standard   0.123
# rsq     standard   0.891
# mae     standard   0.098

Compatibility Rules

Metrics in a set must be compatible. You can mix:

✓ Valid Combinations

1. All numeric metrics:

numeric_metrics <- metric_set(rmse, mae, rsq, huber_loss)

2. Mix of class and class probability metrics:

class_metrics <- metric_set(accuracy, precision, recall, roc_auc, pr_auc)

3. Mix of survival metrics (any combination):

surv_metrics <- metric_set(
  concordance_survival,       # Static
  brier_survival,             # Dynamic
  brier_survival_integrated   # Integrated
)

✗ Invalid Combinations

Cannot mix metric types:

# ERROR: Cannot mix numeric and classification
metric_set(rmse, accuracy)

# ERROR: Cannot mix classification and survival
metric_set(accuracy, concordance_survival)

Function Signatures by Type

The returned function has different arguments depending on metric types:

Numeric Metrics

regression_metrics <- metric_set(rmse, mae, rsq)

# Signature:
regression_metrics(
  data,
  truth,
  estimate,
  na_rm = TRUE,
  case_weights = NULL,
  ...
)

# Usage:
regression_metrics(test_data, truth = y, estimate = y_pred)

Class/Probability Metrics

class_metrics <- metric_set(accuracy, roc_auc, pr_auc)

# Signature:
class_metrics(
  data,
  truth,
  ...,              # For probability columns
  estimate,         # Must be named!
  estimator = NULL,
  na_rm = TRUE,
  event_level = yardstick_event_level(),
  case_weights = NULL
)

# Usage - note estimate is named:
class_metrics(
  test_data,
  truth = obs,
  VF:L,               # Probability columns
  estimate = pred     # Named argument!
)

Survival Metrics

surv_metrics <- metric_set(concordance_survival, brier_survival)

# Signature:
surv_metrics(
  data,
  truth,
  ...,              # For survival predictions
  estimate,         # Named for time predictions
  na_rm = TRUE,
  case_weights = NULL
)

Important: Named `estimate` Argument

⚠️ For class/probability and survival metric sets, you MUST name the estimate argument.

class_metrics <- metric_set(accuracy, roc_auc)

# ✓ Correct
class_metrics(data, truth = obs, estimate = pred)

# ✗ Wrong - estimate captured by ...
class_metrics(data, truth = obs, pred)
# Error: Can't find estimate column

Why? The estimate argument comes after ... in the signature, so unnamed arguments get captured by ....

Working with Groups

Metric sets respect dplyr::group_by():

metrics <- metric_set(accuracy, kap, roc_auc)

# Compute metrics for each resample
hpc_cv |>
  group_by(Resample) |>
  metrics(truth = obs, VF:L, estimate = pred)

# Returns one row per metric per group:
# .metric  .estimator .estimate Resample
# accuracy multiclass 0.709     Fold01
# kap      multiclass 0.583     Fold01
# roc_auc  hand_till  0.901     Fold01
# accuracy multiclass 0.713     Fold02
# ...

Using metric_tweak() with metric_set()

Use metric_tweak() to set custom defaults for metrics before adding them to a set:

# Create tweaked version with custom parameter
f2_meas <- metric_tweak("f2_meas", f_meas, beta = 2)
mase12 <- metric_tweak("mase12", mase, m = 12)

# Add to metric set
my_metrics <- metric_set(
  precision,
  recall,
  f_meas,    # Default beta = 1
  f2_meas    # Custom beta = 2
)

my_metrics(data, truth = obs, estimate = pred)

# Both f_meas and f2_meas calculated with different beta values

Why this matters: Once metrics are in a set, you can’t change their parameters. Tweak them first.

Complete Examples

Regression Workflow

library(yardstick)
library(dplyr)

# Define metric set
regression_metrics <- metric_set(
  rmse,
  mae,
  rsq,
  huber_loss
)

# Use on test data
results <- regression_metrics(
  solubility_test,
  truth = solubility,
  estimate = prediction
)

results
# .metric     .estimator .estimate
# rmse        standard   0.789
# mae         standard   0.582
# rsq         standard   0.892
# huber_loss  standard   0.341

Classification with Probabilities

# Mix class and probability metrics
class_metrics <- metric_set(
  accuracy,
  precision,
  recall,
  f_meas,
  roc_auc,
  pr_auc
)

# Use with class probabilities
results <- class_metrics(
  two_class_example,
  truth = truth,
  Class1,           # Probability column
  estimate = predicted
)

results
# .metric   .estimator .estimate
# accuracy  binary     0.838
# precision binary     0.819
# recall    binary     0.875
# f_meas    binary     0.846
# roc_auc   binary     0.939
# pr_auc    binary     0.946

Multiclass Classification

multi_metrics <- metric_set(
  accuracy,
  bal_accuracy,
  kap,
  roc_auc,
  precision,
  recall
)

# Specify macro averaging for precision/recall
hpc_cv |>
  multi_metrics(
    truth = obs,
    VF:L,               # Probability columns
    estimate = pred,
    estimator = "macro"
  )

Cross-Validation

library(rsample)

# Define metrics once
cv_metrics <- metric_set(rmse, rsq, mae)

# Use across all folds
cv_results <- vfold_cv(training_data, v = 10) |>
  mutate(
    metrics = map(splits, function(split) {
      # Fit model and predict
      model <- fit_model(analysis(split))
      preds <- predict(model, assessment(split))

      # Calculate all metrics at once
      cv_metrics(
        assessment(split),
        truth = outcome,
        estimate = preds
      )
    })
  )

# Aggregate across folds
cv_results |>
  unnest(metrics) |>
  group_by(.metric) |>
  summarize(mean = mean(.estimate), se = sd(.estimate))

With Groupwise Metrics

# Create groupwise metric
accuracy_diff <- new_groupwise_metric(
  fn = accuracy,
  name = "accuracy_diff",
  aggregate = function(x) diff(range(x$.estimate))
)

# Combine with regular metrics
fairness_metrics <- metric_set(
  accuracy,
  precision,
  recall,
  accuracy_diff(protected_attr)  # Add groupwise metric
)

fairness_metrics(data, truth = obs, estimate = pred)

Creating Custom Metrics for metric_set()

To use your custom metric in a set, wrap it with the appropriate new_*_metric():

# Define your metric function
my_custom_metric <- function(data, truth, estimate, na_rm = TRUE, ...) {
  # Implementation
  # ...

  tibble(
    .metric = "my_custom",
    .estimator = "standard",
    .estimate = result
  )
}

# Wrap with new_*_metric() - required for metric_set()
my_custom_metric <- new_numeric_metric(
  my_custom_metric,
  direction = "maximize"
)

# Now it works in metric sets
my_metrics <- metric_set(rmse, mae, my_custom_metric)

Key requirements: 1. Must be wrapped with new_*_metric() 2. Must follow standard yardstick signature patterns 3. Must return standard yardstick output format

Using with tune Package

Metric sets integrate with tune for model tuning:

library(tune)
library(workflows)

# Define metrics for tuning
tune_metrics <- metric_set(
  rmse,
  rsq,
  mae
)

# Use in tune_grid()
tune_results <- tune_grid(
  workflow,
  resamples = cv_folds,
  grid = param_grid,
  metrics = tune_metrics  # Pass metric set
)

# Best models selected based on all metrics
show_best(tune_results, metric = "rmse")

Performance Benefits

Metric sets are more efficient than individual calls:

# Inefficient - confusion matrix calculated 3 times
accuracy(data, truth, estimate)
precision(data, truth, estimate)
recall(data, truth, estimate)

# Efficient - confusion matrix calculated once, shared
metrics <- metric_set(accuracy, precision, recall)
metrics(data, truth, estimate)

Shared calculations:

Confusion matrices (for class metrics)
ROC curves (for ROC-based metrics)
Group-by operations
Missing value handling

Advanced Patterns

Conditional Metrics

# Select metrics based on data
metrics <- if (is_binary) {
  metric_set(accuracy, sensitivity, specificity, roc_auc)
} else {
  metric_set(accuracy, bal_accuracy, kap)
}

metrics(data, truth = obs, estimate = pred)

Parameterized Sets

create_metric_set <- function(include_auc = TRUE) {
  base_metrics <- c(accuracy, precision, recall)

  if (include_auc) {
    base_metrics <- c(base_metrics, list(roc_auc))
  }

  do.call(metric_set, base_metrics)
}

# Use
metrics <- create_metric_set(include_auc = TRUE)

Multiple Tweaked Versions

# Different F-measures
f0.5 <- metric_tweak("f0.5_meas", f_meas, beta = 0.5)
f1 <- f_meas
f2 <- metric_tweak("f2_meas", f_meas, beta = 2)

# All in one set
f_metrics <- metric_set(f0.5, f1, f2)

Troubleshooting

Error: Cannot mix metric types

# Error
metric_set(rmse, accuracy)

Solution: Keep metrics of compatible types together.

Error: `estimate` not found

# Wrong
class_metrics(data, truth, pred)

Solution: Name the estimate argument:

class_metrics(data, truth, estimate = pred)

Error: Metric doesn’t work in set

my_metric <- function(data, truth, estimate) { ... }
metric_set(rmse, my_metric)  # Error

Solution: Wrap custom metrics:

my_metric <- new_numeric_metric(my_metric, direction = "minimize")
metric_set(rmse, my_metric)  # Works

Best Practices

Define once, use everywhere: Create metric sets at the top of your analysis
Name your sets: Use descriptive names like classification_metrics, not metrics
Use with groups: Leverage group-aware behavior for cross-validation
Tweak before combining: Set custom parameters with metric_tweak() first
Keep compatible types: Don’t mix numeric, class, and survival metrics
Named estimate: Always name the estimate argument for class/survival metrics
Integration: Use with tune package for consistent tuning metrics

Common Metric Sets

# Standard regression
regression_std <- metric_set(rmse, mae, rsq)

# Regression with alternatives
regression_robust <- metric_set(mae, huber_loss, mape)

# Binary classification
binary_clf <- metric_set(
  accuracy, sensitivity, specificity,
  roc_auc, pr_auc
)

# Multiclass classification
multiclass_clf <- metric_set(
  accuracy, bal_accuracy, kap,
  roc_auc  # Uses hand_till method for multiclass
)

# Survival analysis
survival_std <- metric_set(
  concordance_survival,
  brier_survival,
  brier_survival_integrated
)

# Fairness analysis
fairness_set <- metric_set(
  accuracy,
  demographic_parity(group),
  equal_opportunity(group)
)