Source Development Guide: Contributing to Yardstick

Complete guide for contributing new metrics to the yardstick package itself.

When to Use This Guide

✅ Use this guide if you are:

Contributing a PR directly to the yardstick package
Working inside the yardstick repository
Adding metrics that should be part of yardstick core
Modifying existing yardstick metrics

❌ Don’t use this guide if you are:

Creating a new package that extends yardstick → Use Extension Development Guide
Building standalone metrics → Use Extension Development Guide

Prerequisites

Clone the Yardstick Repository

# Clone from GitHub
git clone https://github.com/tidymodels/yardstick.git
cd yardstick

# Create a feature branch
git checkout -b feature/add-metric-name

See Repository Access for more details.

Install Development Dependencies

# Install yardstick with all dependencies
devtools::install_dev_deps()

# Load the package for development
devtools::load_all()

Understanding Yardstick’s Architecture

Package Organization

yardstick/
├── R/
│   ├── num-*.R          # Numeric metrics
│   ├── class-*.R        # Classification metrics
│   ├── prob-*.R         # Probability metrics
│   ├── surv-*.R         # Survival metrics
│   ├── aaa-*.R          # Core infrastructure
│   └── utils-*.R        # Internal utilities
├── tests/testthat/
│   ├── test-num-*.R
│   ├── test-class-*.R
│   └── _snaps/          # Snapshot test outputs
└── man-roxygen/         # Documentation templates

File Naming Conventions

Source files must follow strict naming:

Numeric: R/num-[name].R → R/num-mae.R
Class: R/class-[name].R → R/class-accuracy.R
Probability: R/prob-[name].R → R/prob-roc_auc.R
Survival: R/surv-[name].R → R/surv-concordance_survival.R

Test files must match:

R/num-mae.R → tests/testthat/test-num-mae.R

Working with Internal Functions

✅ You MUST Use Internal Helpers

When developing yardstick itself, always use existing internal functions - don’t reimplement what already exists:

# ✅ CORRECT - Use internal helper
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)
  yardstick_mean(errors, case_weights = case_weights)
}

# ❌ WRONG - Reimplementing existing functionality
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)
  if (is.null(case_weights)) {
    mean(errors)
  } else {
    weighted.mean(errors, w = as.double(case_weights))
  }
}

Why this matters: Internal helpers ensure consistency across all yardstick metrics and handle edge cases correctly. Reviewers will request changes if you reimplement existing functionality.

Common Internal Helpers

`yardstick_mean()` - Weighted Mean

Handles case weights consistently:

yardstick_mean <- function(x, case_weights = NULL) {
  if (is.null(case_weights)) {
    mean(x)
  } else {
    if (inherits(case_weights, c("hardhat_importance_weights",
                                 "hardhat_frequency_weights"))) {
      case_weights <- as.double(case_weights)
    }
    weighted.mean(x, w = case_weights)
  }
}

`finalize_estimator_internal()` - Estimator Selection

For multiclass metrics:

accuracy.data.frame <- function(data, truth, estimate,
                                estimator = NULL, ...,
                                call = rlang::caller_env()) {
  estimator <- finalize_estimator_internal(
    estimator,
    metric_class = "accuracy",
    call = call
  )

  # Rest of implementation
}

Validation Functions

# These provide consistent error messages
check_numeric_metric(truth, estimate, case_weights)
check_class_metric(truth, estimate, case_weights)
check_prob_metric(truth, estimate, case_weights)

Finding Internal Functions

# List all internal functions
ls("package:yardstick", all.names = TRUE)

# Search in source
# grep -r "yardstick_" R/

# View source
yardstick:::yardstick_mean

See Best Practices (Source) for complete guide to internal functions.

❌ NO Package Prefix in Source Development

CRITICAL: When developing yardstick itself, never use yardstick:: prefix:

# ✅ CORRECT - No prefix (same package)
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)
  yardstick_mean(errors, case_weights = case_weights)
}

mae.data.frame <- function(data, truth, estimate, ...) {
  numeric_metric_summarizer(
    name = "mae",
    fn = mae_vec,
    data = data,
    ...
  )
}

# ❌ WRONG - Don't prefix your own package
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)
  yardstick::yardstick_mean(errors, case_weights = case_weights)
}

mae.data.frame <- function(data, truth, estimate, ...) {
  yardstick::numeric_metric_summarizer(
    name = "mae",
    fn = mae_vec,
    data = data,
    ...
  )
}

Why: You’re developing the package itself - these functions are in the same namespace and don’t need prefixing.

Step-by-Step Implementation

Step 1: Choose Your Metric Type

Determine which category your metric falls into:

Numeric (regression)
Class (classification with classes)
Probability (classification with probabilities)
Survival (time-to-event)

See the main SKILL.md for the complete decision tree.

Step 2: Create Source File

Create R/num-[name].R (or class-, prob-, etc.):

# R/num-mae.R

#' Mean Absolute Error
#'
#' @family numeric metrics
#' @family accuracy metrics
#' @templateVar fn mae
#' @template return
#' @template event_first
#'
#' @inheritParams rmse
#'
#' @export
mae <- function(data, ...) {
  UseMethod("mae")
}

mae <- new_numeric_metric(mae, direction = "minimize")

#' @export
#' @rdname mae
mae.data.frame <- function(data, truth, estimate, na_rm = TRUE,
                           case_weights = NULL, ...) {
  numeric_metric_summarizer(
    name = "mae",
    fn = mae_vec,
    data = data,
    truth = !!enquo(truth),
    estimate = !!enquo(estimate),
    na_rm = na_rm,
    case_weights = !!enquo(case_weights)
  )
}

#' @export
mae_vec <- function(truth, estimate, na_rm = TRUE, case_weights = NULL, ...) {
  check_numeric_metric(truth, estimate, case_weights)

  if (na_rm) {
    result <- yardstick_remove_missing(truth, estimate, case_weights)
    truth <- result$truth
    estimate <- result$estimate
    case_weights <- result$case_weights
  } else if (yardstick_any_missing(truth, estimate, case_weights)) {
    return(NA_real_)
  }

  mae_impl(truth, estimate, case_weights)
}

# Internal implementation
mae_impl <- function(truth, estimate, case_weights = NULL) {
  errors <- abs(truth - estimate)

  # Use internal helper
  yardstick_mean(errors, case_weights = case_weights)
}

Step 3: Document with Templates and Examples

REQUIRED: All exported functions MUST include @examples:

#' Mean Absolute Error
#'
#' @family numeric metrics
#' @family accuracy metrics
#' @templateVar fn mae
#' @template return
#' @template event_first
#'
#' @inheritParams rmse
#'
#' @examples
#' # Basic usage
#' mae(solubility_test, solubility, prediction)
#'
#' # With case weights
#' library(dplyr)
#' solubility_test %>%
#'   mutate(weight = 1:nrow(.)) %>%
#'   mae(solubility, prediction, case_weights = weight)
#'
#' @export
mae <- function(data, ...) {
  UseMethod("mae")
}

Yardstick templates:

#' @templateVar fn mae
#' @template return
#' @template event_first

Templates are defined in man-roxygen/ directory and handle common documentation patterns.

Why @examples matters: Examples are critical for usability - they show users how to actually use the metric. Reviewers will request examples if missing.

Step 4: Create Test File

Create tests/testthat/test-num-mae.R:

test_that("mae works correctly", {
  # Use internal test data
  df <- data_altman()

  result <- mae(df, pathology, scan)

  # Use snapshot testing
  expect_snapshot(result)
})

test_that("mae works with numeric vectors", {
  truth <- c(1, 2, 3, 4, 5)
  estimate <- c(1.5, 2.5, 2.5, 3.5, 4.5)

  expect_equal(mae_vec(truth, estimate), 0.5)
})

test_that("mae handles NA correctly", {
  df <- data_altman()
  df$pathology[1:10] <- NA

  result_remove <- mae(df, pathology, scan, na_rm = TRUE)
  expect_false(is.na(result_remove$.estimate))

  result_keep <- mae(df, pathology, scan, na_rm = FALSE)
  expect_true(is.na(result_keep$.estimate))
})

test_that("mae validates input types", {
  df <- data.frame(
    truth = 1:5,
    estimate = letters[1:5]
  )

  expect_snapshot(error = TRUE, {
    mae(df, truth, estimate)
  })
})

test_that("mae works with case weights", {
  df <- data_altman()
  df$weights <- seq_len(nrow(df))

  result_unweighted <- mae(df, pathology, scan)
  result_weighted <- mae(df, pathology, scan, case_weights = weights)

  expect_false(
    result_unweighted$.estimate == result_weighted$.estimate
  )
})

See Testing Patterns (Source) for comprehensive testing guide.

Step 5: Run Tests and Check

# Document
devtools::document()

# Load
devtools::load_all()

# Test
devtools::test()

# Full check
devtools::check()

File Creation Guidelines for PRs

═══════════════════════════════════════════════════════ ⚠️⚠️⚠️ CRITICAL: PR FILE DISCIPLINE ⚠️⚠️⚠️ ═══════════════════════════════════════════════════════

🛑 STOP! STOP! STOP! 🛑

Before you create even ONE file, read this ENTIRE section.

You will create EXACTLY 2 files. That’s it. Two. Not 3. Not 4. Not 5. TWO FILES ONLY.

═══════════════════════════════════════════════════════

The ONLY Files You Will Create

R/[type]-[metric_name].R - Metric function with complete roxygen documentation
- Example: R/num-mae.R, R/class-accuracy.R, R/prob-roc_auc.R
- Contains all three functions: _impl(), _vec(), .data.frame() method
- Complete roxygen documentation with @examples
tests/testthat/test-[type]-[metric_name].R - Comprehensive test suite
- Example: tests/testthat/test-num-mae.R
- Or add tests to existing file like tests/testthat/test-class-metrics.R if appropriate

That’s it. Two files. Nothing else.

═══════════════════════════════════════════════════════

Files You Will ABSOLUTELY NOT Create

🛑 INSTRUCTIONS FOR CLAUDE: STOP IMMEDIATELY IF YOU ARE ABOUT TO CREATE ANY FILE NOT LISTED IN “THE ONLY FILES YOU WILL CREATE” SECTION ABOVE. 🛑

❌ NEVER CREATE:

❌ README.md (yardstick already has one)
❌ README.txt (yardstick already has one)
❌ NEWS_entry.md (maintainer adds to NEWS.md)
❌ IMPLEMENTATION_SUMMARY.md
❌ IMPLEMENTATION_NOTES.md
❌ IMPLEMENTATION_NOTES.txt
❌ QUICKSTART.md
❌ QUICK_REFERENCE.md
❌ example_usage.R
❌ USAGE_EXAMPLE.R
❌ metric_examples.R
❌ test_examples.R
❌ METRIC_DESIGN.md
❌ VALIDATION_APPROACH.md
❌ PR_CHECKLIST.md
❌ PR_DESCRIPTION.md
❌ PR_SUMMARY.md
❌ INDEX.md
❌ FILE_GUIDE.md
❌ MANIFEST.md
❌ INTEGRATION_GUIDE.md
❌ SUMMARY.md
❌ SUMMARY.txt
❌ OVERVIEW.md
❌ verification_script.R
❌ check_metric.R
❌ pkgdown_update.txt
❌ pkgdown_addition.yml
❌ WORKFLOW_COMMANDS.sh
❌ setup.sh
❌ ANY other .md, .txt, .yml, .sh, or helper files

═══════════════════════════════════════════════════════

Where Content Actually Goes

CRITICAL: Everything has a place. No separate files.

Content	❌ WRONG	✅ CORRECT
Examples	example_usage.R	roxygen @examples in R file
Metric design rationale	METRIC_DESIGN.md	roxygen @details in R file
Implementation notes	IMPLEMENTATION_NOTES.txt	roxygen @details in R file
Usage instructions	QUICKSTART.md	roxygen @examples in R file
PR description	PR_DESCRIPTION.md	Conversation with user
NEWS entry	NEWS_entry.md	Mention in conversation
Test examples	test_examples.R	Tests in test file
Validation approach	VALIDATION_APPROACH.md	Comments in test file

═══════════════════════════════════════════════════════

FINAL CHECK Before Creating Files

🛑 STOP RIGHT HERE. ANSWER THESE QUESTIONS: 🛑

Am I about to create exactly 2 files? (YES/NO)
Are both files either R/.R or tests/testthat/test-.R? (YES/NO)
Am I about to create ANY .md, .txt, .yml, or .sh files? (NO)
Have I put all examples in roxygen @examples? (YES)
Have I put all notes in roxygen @details? (YES)

If you answered incorrectly to ANY question above, STOP. Re-read this section.

═══════════════════════════════════════════════════════

Why This Matters

PRs to yardstick should contain ONLY code and tests. Period.

Extra documentation files:

Clutter the repository
Duplicate roxygen documentation
Create maintenance burden
Slow down PR review
Get deleted by maintainers anyway

The yardstick maintainers have explicitly requested: CODE AND TESTS ONLY.

When you submit a PR:

Code goes in R/ with roxygen docs
Tests go in tests/testthat/
Everything else (PR description, NEWS entry, examples) you discuss with the user in conversation

No exceptions. No “helpful” documentation files. Just code and tests.

═══════════════════════════════════════════════════════

Documentation Patterns

Using @template

#' @templateVar fn mae
#' @template return

Available templates (in man-roxygen/):

@template return - Standard return value
@template event_first - Event level for class metrics
@template multiclass - Multiclass documentation

Using @templateVar

Define variables before templates:

#' @templateVar fn mae
#' @templateVar metric_fn mae

Inheriting Parameters

Use @inheritParams extensively:

#' @inheritParams rmse

This inherits all parameters from rmse documentation.

Multiclass Metrics

Supporting Multiple Estimators

For class metrics:

accuracy.data.frame <- function(data, truth, estimate,
                                estimator = NULL, na_rm = TRUE,
                                case_weights = NULL, ...,
                                call = rlang::caller_env()) {
  # Finalize estimator
  estimator <- finalize_estimator_internal(
    estimator,
    metric_class = "accuracy",
    call = call
  )

  class_metric_summarizer(
    name = "accuracy",
    fn = accuracy_vec,
    data = data,
    truth = !!enquo(truth),
    estimate = !!enquo(estimate),
    estimator = estimator,
    na_rm = na_rm,
    case_weights = !!enquo(case_weights),
    call = call
  )
}

Implement Binary and Estimator Variants

accuracy_vec <- function(truth, estimate, estimator = NULL,
                        na_rm = TRUE, case_weights = NULL, ...,
                        call = rlang::caller_env()) {
  # ... validation ...

  if (is_binary(estimator)) {
    accuracy_binary(truth, estimate, case_weights)
  } else {
    accuracy_estimator_impl(truth, estimate, estimator, case_weights)
  }
}

Using Internal Test Data

Available Test Helpers

# Binary classification
data <- data_altman()

# Three-class data
data <- data_three_class()

# Cross-validation data
data <- data_hpc_cv1()

See Testing Patterns (Source) for complete list.

Snapshot Testing

Yardstick uses snapshots extensively:

test_that("mae returns correct structure", {
  df <- data_altman()
  result <- mae(df, pathology, scan)

  # Snapshot entire result
  expect_snapshot(result)
})

test_that("mae errors appropriately", {
  expect_snapshot(error = TRUE, {
    mae_vec(1:5, letters[1:5])
  })
})

Reviewing Snapshots

# Review snapshot changes
testthat::snapshot_review()

# Accept changes
testthat::snapshot_accept()

Consistency with Existing Metrics

Study Similar Metrics

Before implementing:

For numeric: R/num-mae.R, R/num-rmse.R
For class: R/class-accuracy.R, R/class-precision.R
For probability: R/prob-roc_auc.R

Match Function Structure

# 1. Generic (exported)
#' @export
mae <- function(data, ...) {
  UseMethod("mae")
}

# 2. Wrap with new_*_metric
mae <- new_numeric_metric(mae, direction = "minimize")

# 3. Data frame method (exported)
#' @export
#' @rdname mae
mae.data.frame <- function(...) {
  numeric_metric_summarizer(...)
}

# 4. Vector method (exported)
#' @export
mae_vec <- function(...) {
  # ... validation and NA handling ...
  mae_impl(...)
}

# 5. Implementation (NOT exported - internal)
mae_impl <- function(truth, estimate, case_weights = NULL) {
  # Core calculation
}

Creating New Internal Helpers

When to Create

Create internal helpers when:

Logic is shared by 2+ metrics
Complex calculation used repeatedly
Abstraction improves clarity

Naming and Documentation

#' Calculate weighted mean with case weight handling
#'
#' @param x Numeric vector
#' @param case_weights Optional case weights
#'
#' @return Numeric scalar
#' @keywords internal
#' @noRd
yardstick_mean <- function(x, case_weights = NULL) {
  # Implementation
}

Use:

@keywords internal to mark as internal
@noRd to skip documentation generation
Don’t use @export

Error Messages

Use cli for consistent errors:

if (invalid) {
  cli::cli_abort(
    "{.arg estimator} must be {.val binary}, {.val macro}, or {.val micro}, not {.val {estimator}}.",
    call = call
  )
}

Always pass call parameter for better error context.

PR Submission

Before Submitting

Run full check:
```
devtools::check()
```

Update NEWS.md:

## yardstick (development version)

* Added `mae()` metric for mean absolute error (#123).

Commit changes:

git add .
git commit -m "Add mae() metric"
git push origin feature/add-metric-name

Creating the PR

Go to https://github.com/tidymodels/yardstick
Click “New pull request”
Select your branch
Fill in description:
- What metric does this add?
- Why is it useful?
- Reference any related issues

Review Process

The tidymodels team will review your PR. Common feedback:

Add more tests
Match existing documentation style
Use internal helpers
Add examples
Fix code style issues

See Troubleshooting (Source) for common review feedback.

Reference Documentation