Source Development Guide: Contributing to Yardstick

Complete guide for contributing new metrics to the yardstick package itself.


When to Use This Guide

Use this guide if you are:

  • Contributing a PR directly to the yardstick package

  • Working inside the yardstick repository

  • Adding metrics that should be part of yardstick core

  • Modifying existing yardstick metrics

Don’t use this guide if you are:


Prerequisites

Clone the Yardstick Repository

# Clone from GitHub
git clone https://github.com/tidymodels/yardstick.git
cd yardstick

# Create a feature branch
git checkout -b feature/add-metric-name

See Repository Access for more details.

Install Development Dependencies

# Install yardstick with all dependencies
devtools::install_dev_deps()

# Load the package for development
devtools::load_all()

Understanding Yardstick’s Architecture

Package Organization

yardstick/
├── R/
│   ├── num-*.R          # Numeric metrics
│   ├── class-*.R        # Classification metrics
│   ├── prob-*.R         # Probability metrics
│   ├── surv-*.R         # Survival metrics
│   ├── aaa-*.R          # Core infrastructure
│   └── utils-*.R        # Internal utilities
├── tests/testthat/
│   ├── test-num-*.R
│   ├── test-class-*.R
│   └── _snaps/          # Snapshot test outputs
└── man-roxygen/         # Documentation templates

File Naming Conventions

Source files must follow strict naming:

  • Numeric: R/num-[name].RR/num-mae.R

  • Class: R/class-[name].RR/class-accuracy.R

  • Probability: R/prob-[name].RR/prob-roc_auc.R

  • Survival: R/surv-[name].RR/surv-concordance_survival.R

Test files must match:

  • R/num-mae.Rtests/testthat/test-num-mae.R

Working with Internal Functions

✅ You MUST Use Internal Helpers

When developing yardstick itself, always use existing internal functions - don’t reimplement what already exists:

# ✅ CORRECT - Use internal helper
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)
  yardstick_mean(errors, case_weights = case_weights)
}

# ❌ WRONG - Reimplementing existing functionality
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)
  if (is.null(case_weights)) {
    mean(errors)
  } else {
    weighted.mean(errors, w = as.double(case_weights))
  }
}

Why this matters: Internal helpers ensure consistency across all yardstick metrics and handle edge cases correctly. Reviewers will request changes if you reimplement existing functionality.

Common Internal Helpers

yardstick_mean() - Weighted Mean

Handles case weights consistently:

yardstick_mean <- function(x, case_weights = NULL) {
  if (is.null(case_weights)) {
    mean(x)
  } else {
    if (inherits(case_weights, c("hardhat_importance_weights",
                                 "hardhat_frequency_weights"))) {
      case_weights <- as.double(case_weights)
    }
    weighted.mean(x, w = case_weights)
  }
}

finalize_estimator_internal() - Estimator Selection

For multiclass metrics:

accuracy.data.frame <- function(data, truth, estimate,
                                estimator = NULL, ...,
                                call = rlang::caller_env()) {
  estimator <- finalize_estimator_internal(
    estimator,
    metric_class = "accuracy",
    call = call
  )

  # Rest of implementation
}

Validation Functions

# These provide consistent error messages
check_numeric_metric(truth, estimate, case_weights)
check_class_metric(truth, estimate, case_weights)
check_prob_metric(truth, estimate, case_weights)

Finding Internal Functions

# List all internal functions
ls("package:yardstick", all.names = TRUE)

# Search in source
# grep -r "yardstick_" R/

# View source
yardstick:::yardstick_mean

See Best Practices (Source) for complete guide to internal functions.

❌ NO Package Prefix in Source Development

CRITICAL: When developing yardstick itself, never use yardstick:: prefix:

# ✅ CORRECT - No prefix (same package)
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)
  yardstick_mean(errors, case_weights = case_weights)
}

mae.data.frame <- function(data, truth, estimate, ...) {
  numeric_metric_summarizer(
    name = "mae",
    fn = mae_vec,
    data = data,
    ...
  )
}

# ❌ WRONG - Don't prefix your own package
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)
  yardstick::yardstick_mean(errors, case_weights = case_weights)
}

mae.data.frame <- function(data, truth, estimate, ...) {
  yardstick::numeric_metric_summarizer(
    name = "mae",
    fn = mae_vec,
    data = data,
    ...
  )
}

Why: You’re developing the package itself - these functions are in the same namespace and don’t need prefixing.


Step-by-Step Implementation

Step 1: Choose Your Metric Type

Determine which category your metric falls into:

  • Numeric (regression)

  • Class (classification with classes)

  • Probability (classification with probabilities)

  • Survival (time-to-event)

See the main SKILL.md for the complete decision tree.

Step 2: Create Source File

Create R/num-[name].R (or class-, prob-, etc.):

# R/num-mae.R

#' Mean Absolute Error
#'
#' @family numeric metrics
#' @family accuracy metrics
#' @templateVar fn mae
#' @template return
#' @template event_first
#'
#' @inheritParams rmse
#'
#' @export
mae <- function(data, ...) {
  UseMethod("mae")
}

mae <- new_numeric_metric(mae, direction = "minimize")

#' @export
#' @rdname mae
mae.data.frame <- function(data, truth, estimate, na_rm = TRUE,
                           case_weights = NULL, ...) {
  numeric_metric_summarizer(
    name = "mae",
    fn = mae_vec,
    data = data,
    truth = !!enquo(truth),
    estimate = !!enquo(estimate),
    na_rm = na_rm,
    case_weights = !!enquo(case_weights)
  )
}

#' @export
mae_vec <- function(truth, estimate, na_rm = TRUE, case_weights = NULL, ...) {
  check_numeric_metric(truth, estimate, case_weights)

  if (na_rm) {
    result <- yardstick_remove_missing(truth, estimate, case_weights)
    truth <- result$truth
    estimate <- result$estimate
    case_weights <- result$case_weights
  } else if (yardstick_any_missing(truth, estimate, case_weights)) {
    return(NA_real_)
  }

  mae_impl(truth, estimate, case_weights)
}

# Internal implementation
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)

  # Use internal helper
  yardstick_mean(errors, case_weights = case_weights)
}

Step 3: Document with Templates and Examples

REQUIRED: All exported functions MUST include @examples:

#' Mean Absolute Error
#'
#' @family numeric metrics
#' @family accuracy metrics
#' @templateVar fn mae
#' @template return
#' @template event_first
#'
#' @inheritParams rmse
#'
#' @examples
#' # Basic usage
#' mae(solubility_test, solubility, prediction)
#'
#' # With case weights
#' library(dplyr)
#' solubility_test %>%
#'   mutate(weight = 1:nrow(.)) %>%
#'   mae(solubility, prediction, case_weights = weight)
#'
#' @export
mae <- function(data, ...) {
  UseMethod("mae")
}

Yardstick templates:

#' @templateVar fn mae
#' @template return
#' @template event_first

Templates are defined in man-roxygen/ directory and handle common documentation patterns.

Why @examples matters: Examples are critical for usability - they show users how to actually use the metric. Reviewers will request examples if missing.

Step 4: Create Test File

Create tests/testthat/test-num-mae.R:

test_that("mae works correctly", {
  # Use internal test data
  df <- data_altman()

  result <- mae(df, pathology, scan)

  # Use snapshot testing
  expect_snapshot(result)
})

test_that("mae works with numeric vectors", {
  truth <- c(1, 2, 3, 4, 5)
  estimate <- c(1.5, 2.5, 2.5, 3.5, 4.5)

  expect_equal(mae_vec(truth, estimate), 0.5)
})

test_that("mae handles NA correctly", {
  df <- data_altman()
  df$pathology[1:10] <- NA

  result_remove <- mae(df, pathology, scan, na_rm = TRUE)
  expect_false(is.na(result_remove$.estimate))

  result_keep <- mae(df, pathology, scan, na_rm = FALSE)
  expect_true(is.na(result_keep$.estimate))
})

test_that("mae validates input types", {
  df <- data.frame(
    truth = 1:5,
    estimate = letters[1:5]
  )

  expect_snapshot(error = TRUE, {
    mae(df, truth, estimate)
  })
})

test_that("mae works with case weights", {
  df <- data_altman()
  df$weights <- seq_len(nrow(df))

  result_unweighted <- mae(df, pathology, scan)
  result_weighted <- mae(df, pathology, scan, case_weights = weights)

  expect_false(
    result_unweighted$.estimate == result_weighted$.estimate
  )
})

See Testing Patterns (Source) for comprehensive testing guide.

Step 5: Run Tests and Check

# Document
devtools::document()

# Load
devtools::load_all()

# Test
devtools::test()

# Full check
devtools::check()

File Creation Guidelines for PRs

═══════════════════════════════════════════════════════ ⚠️⚠️⚠️ CRITICAL: PR FILE DISCIPLINE ⚠️⚠️⚠️ ═══════════════════════════════════════════════════════

🛑 STOP! STOP! STOP! 🛑

Before you create even ONE file, read this ENTIRE section.

You will create EXACTLY 2 files. That’s it. Two. Not 3. Not 4. Not 5. TWO FILES ONLY.

═══════════════════════════════════════════════════════

MANDATORY Pre-Flight Checklist

READ EACH LINE. CHECK EACH BOX. DO NOT SKIP THIS.

Before creating ANY files:

═══════════════════════════════════════════════════════

The ONLY Files You Will Create

  1. R/[type]-[metric_name].R - Metric function with complete roxygen documentation

    • Example: R/num-mae.R, R/class-accuracy.R, R/prob-roc_auc.R

    • Contains all three functions: _impl(), _vec(), .data.frame() method

    • Complete roxygen documentation with @examples

  2. tests/testthat/test-[type]-[metric_name].R - Comprehensive test suite

    • Example: tests/testthat/test-num-mae.R

    • Or add tests to existing file like tests/testthat/test-class-metrics.R if appropriate

That’s it. Two files. Nothing else.

═══════════════════════════════════════════════════════

Files You Will ABSOLUTELY NOT Create

🛑 INSTRUCTIONS FOR CLAUDE: STOP IMMEDIATELY IF YOU ARE ABOUT TO CREATE ANY FILE NOT LISTED IN “THE ONLY FILES YOU WILL CREATE” SECTION ABOVE. 🛑

❌ NEVER CREATE:

  • ❌ README.md (yardstick already has one)

  • ❌ README.txt (yardstick already has one)

  • ❌ NEWS_entry.md (maintainer adds to NEWS.md)

  • ❌ IMPLEMENTATION_SUMMARY.md

  • ❌ IMPLEMENTATION_NOTES.md

  • ❌ IMPLEMENTATION_NOTES.txt

  • ❌ QUICKSTART.md

  • ❌ QUICK_REFERENCE.md

  • ❌ example_usage.R

  • ❌ USAGE_EXAMPLE.R

  • ❌ metric_examples.R

  • ❌ test_examples.R

  • ❌ METRIC_DESIGN.md

  • ❌ VALIDATION_APPROACH.md

  • ❌ PR_CHECKLIST.md

  • ❌ PR_DESCRIPTION.md

  • ❌ PR_SUMMARY.md

  • ❌ INDEX.md

  • ❌ FILE_GUIDE.md

  • ❌ MANIFEST.md

  • ❌ INTEGRATION_GUIDE.md

  • ❌ SUMMARY.md

  • ❌ SUMMARY.txt

  • ❌ OVERVIEW.md

  • ❌ verification_script.R

  • ❌ check_metric.R

  • ❌ pkgdown_update.txt

  • ❌ pkgdown_addition.yml

  • ❌ WORKFLOW_COMMANDS.sh

  • ❌ setup.sh

  • ❌ ANY other .md, .txt, .yml, .sh, or helper files

═══════════════════════════════════════════════════════

Where Content Actually Goes

CRITICAL: Everything has a place. No separate files.

Content ❌ WRONG ✅ CORRECT
Examples example_usage.R roxygen @examples in R file
Metric design rationale METRIC_DESIGN.md roxygen @details in R file
Implementation notes IMPLEMENTATION_NOTES.txt roxygen @details in R file
Usage instructions QUICKSTART.md roxygen @examples in R file
PR description PR_DESCRIPTION.md Conversation with user
NEWS entry NEWS_entry.md Mention in conversation
Test examples test_examples.R Tests in test file
Validation approach VALIDATION_APPROACH.md Comments in test file

═══════════════════════════════════════════════════════

FINAL CHECK Before Creating Files

🛑 STOP RIGHT HERE. ANSWER THESE QUESTIONS: 🛑

  1. Am I about to create exactly 2 files? (YES/NO)
  2. Are both files either R/.R or tests/testthat/test-.R? (YES/NO)
  3. Am I about to create ANY .md, .txt, .yml, or .sh files? (NO)
  4. Have I put all examples in roxygen @examples? (YES)
  5. Have I put all notes in roxygen @details? (YES)

If you answered incorrectly to ANY question above, STOP. Re-read this section.

═══════════════════════════════════════════════════════

Why This Matters

PRs to yardstick should contain ONLY code and tests. Period.

Extra documentation files:

  • Clutter the repository

  • Duplicate roxygen documentation

  • Create maintenance burden

  • Slow down PR review

  • Get deleted by maintainers anyway

The yardstick maintainers have explicitly requested: CODE AND TESTS ONLY.

When you submit a PR:

  • Code goes in R/ with roxygen docs

  • Tests go in tests/testthat/

  • Everything else (PR description, NEWS entry, examples) you discuss with the user in conversation

No exceptions. No “helpful” documentation files. Just code and tests.

═══════════════════════════════════════════════════════


Documentation Patterns

Using @template

#' @templateVar fn mae
#' @template return

Available templates (in man-roxygen/):

  • @template return - Standard return value

  • @template event_first - Event level for class metrics

  • @template multiclass - Multiclass documentation

Using @templateVar

Define variables before templates:

#' @templateVar fn mae
#' @templateVar metric_fn mae

Inheriting Parameters

Use @inheritParams extensively:

#' @inheritParams rmse

This inherits all parameters from rmse documentation.


Multiclass Metrics

Supporting Multiple Estimators

For class metrics:

accuracy.data.frame <- function(data, truth, estimate,
                                estimator = NULL, na_rm = TRUE,
                                case_weights = NULL, ...,
                                call = rlang::caller_env()) {
  # Finalize estimator
  estimator <- finalize_estimator_internal(
    estimator,
    metric_class = "accuracy",
    call = call
  )

  class_metric_summarizer(
    name = "accuracy",
    fn = accuracy_vec,
    data = data,
    truth = !!enquo(truth),
    estimate = !!enquo(estimate),
    estimator = estimator,
    na_rm = na_rm,
    case_weights = !!enquo(case_weights),
    call = call
  )
}

Implement Binary and Estimator Variants

accuracy_vec <- function(truth, estimate, estimator = NULL,
                        na_rm = TRUE, case_weights = NULL, ...,
                        call = rlang::caller_env()) {
  # ... validation ...

  if (is_binary(estimator)) {
    accuracy_binary(truth, estimate, case_weights)
  } else {
    accuracy_estimator_impl(truth, estimate, estimator, case_weights)
  }
}

Using Internal Test Data

Available Test Helpers

# Binary classification
data <- data_altman()

# Three-class data
data <- data_three_class()

# Cross-validation data
data <- data_hpc_cv1()

See Testing Patterns (Source) for complete list.


Snapshot Testing

Yardstick uses snapshots extensively:

test_that("mae returns correct structure", {
  df <- data_altman()
  result <- mae(df, pathology, scan)

  # Snapshot entire result
  expect_snapshot(result)
})

test_that("mae errors appropriately", {
  expect_snapshot(error = TRUE, {
    mae_vec(1:5, letters[1:5])
  })
})

Reviewing Snapshots

# Review snapshot changes
testthat::snapshot_review()

# Accept changes
testthat::snapshot_accept()

Consistency with Existing Metrics

Study Similar Metrics

Before implementing:

  • For numeric: R/num-mae.R, R/num-rmse.R

  • For class: R/class-accuracy.R, R/class-precision.R

  • For probability: R/prob-roc_auc.R

Match Function Structure

# 1. Generic (exported)
#' @export
mae <- function(data, ...) {
  UseMethod("mae")
}

# 2. Wrap with new_*_metric
mae <- new_numeric_metric(mae, direction = "minimize")

# 3. Data frame method (exported)
#' @export
#' @rdname mae
mae.data.frame <- function(...) {
  numeric_metric_summarizer(...)
}

# 4. Vector method (exported)
#' @export
mae_vec <- function(...) {
  # ... validation and NA handling ...
  mae_impl(...)
}

# 5. Implementation (NOT exported - internal)
mae_impl <- function(truth, estimate, case_weights = NULL) {
  # Core calculation
}

Creating New Internal Helpers

When to Create

Create internal helpers when:

  • Logic is shared by 2+ metrics

  • Complex calculation used repeatedly

  • Abstraction improves clarity

Naming and Documentation

#' Calculate weighted mean with case weight handling
#'
#' @param x Numeric vector
#' @param case_weights Optional case weights
#'
#' @return Numeric scalar
#' @keywords internal
#' @noRd
yardstick_mean <- function(x, case_weights = NULL) {
  # Implementation
}

Use:

  • @keywords internal to mark as internal

  • @noRd to skip documentation generation

  • Don’t use @export


Error Messages

Use cli for consistent errors:

if (invalid) {
  cli::cli_abort(
    "{.arg estimator} must be {.val binary}, {.val macro}, or {.val micro}, not {.val {estimator}}.",
    call = call
  )
}

Always pass call parameter for better error context.


PR Submission

Before Submitting

  1. Run full check:

    devtools::check()
  2. Update NEWS.md:

    ## yardstick (development version)
    
    * Added `mae()` metric for mean absolute error (#123).
  3. Commit changes:

    git add .
    git commit -m "Add mae() metric"
    git push origin feature/add-metric-name

Creating the PR

  1. Go to https://github.com/tidymodels/yardstick

  2. Click “New pull request”

  3. Select your branch

  4. Fill in description:

    • What metric does this add?

    • Why is it useful?

    • Reference any related issues

Review Process

The tidymodels team will review your PR. Common feedback:

  • Add more tests

  • Match existing documentation style

  • Use internal helpers

  • Add examples

  • Fix code style issues

See Troubleshooting (Source) for common review feedback.


Reference Documentation

Source Development

Metric Types

Core Concepts

Shared References


Next Steps

  1. Clone yardstick repository
  2. Create feature branch
  3. Implement your metric following this guide
  4. Test thoroughly using internal test data
  5. Run devtools::check()
  6. Submit PR to tidymodels/yardstick

Getting Help