Source Development Guide: Contributing to Yardstick
Complete guide for contributing new metrics to the yardstick package itself.
When to Use This Guide
✅ Use this guide if you are:
Contributing a PR directly to the yardstick package
Working inside the yardstick repository
Adding metrics that should be part of yardstick core
Modifying existing yardstick metrics
❌ Don’t use this guide if you are:
Creating a new package that extends yardstick → Use Extension Development Guide
Building standalone metrics → Use Extension Development Guide
Prerequisites
Clone the Yardstick Repository
# Clone from GitHub
git clone https://github.com/tidymodels/yardstick.git
cd yardstick
# Create a feature branch
git checkout -b feature/add-metric-nameSee Repository Access for more details.
Install Development Dependencies
# Install yardstick with all dependencies
devtools::install_dev_deps()
# Load the package for development
devtools::load_all()Understanding Yardstick’s Architecture
Package Organization
yardstick/
├── R/
│ ├── num-*.R # Numeric metrics
│ ├── class-*.R # Classification metrics
│ ├── prob-*.R # Probability metrics
│ ├── surv-*.R # Survival metrics
│ ├── aaa-*.R # Core infrastructure
│ └── utils-*.R # Internal utilities
├── tests/testthat/
│ ├── test-num-*.R
│ ├── test-class-*.R
│ └── _snaps/ # Snapshot test outputs
└── man-roxygen/ # Documentation templates
File Naming Conventions
Source files must follow strict naming:
Numeric:
R/num-[name].R→R/num-mae.RClass:
R/class-[name].R→R/class-accuracy.RProbability:
R/prob-[name].R→R/prob-roc_auc.RSurvival:
R/surv-[name].R→R/surv-concordance_survival.R
Test files must match:
R/num-mae.R→tests/testthat/test-num-mae.R
Working with Internal Functions
✅ You MUST Use Internal Helpers
When developing yardstick itself, always use existing internal functions - don’t reimplement what already exists:
# ✅ CORRECT - Use internal helper
mae_impl <- function(truth, estimate, case_weights = NULL) {
errors <- abs(truth - estimate)
yardstick_mean(errors, case_weights = case_weights)
}
# ❌ WRONG - Reimplementing existing functionality
mae_impl <- function(truth, estimate, case_weights = NULL) {
errors <- abs(truth - estimate)
if (is.null(case_weights)) {
mean(errors)
} else {
weighted.mean(errors, w = as.double(case_weights))
}
}Why this matters: Internal helpers ensure consistency across all yardstick metrics and handle edge cases correctly. Reviewers will request changes if you reimplement existing functionality.
Common Internal Helpers
yardstick_mean() - Weighted Mean
Handles case weights consistently:
yardstick_mean <- function(x, case_weights = NULL) {
if (is.null(case_weights)) {
mean(x)
} else {
if (inherits(case_weights, c("hardhat_importance_weights",
"hardhat_frequency_weights"))) {
case_weights <- as.double(case_weights)
}
weighted.mean(x, w = case_weights)
}
}finalize_estimator_internal() - Estimator Selection
For multiclass metrics:
accuracy.data.frame <- function(data, truth, estimate,
estimator = NULL, ...,
call = rlang::caller_env()) {
estimator <- finalize_estimator_internal(
estimator,
metric_class = "accuracy",
call = call
)
# Rest of implementation
}Validation Functions
# These provide consistent error messages
check_numeric_metric(truth, estimate, case_weights)
check_class_metric(truth, estimate, case_weights)
check_prob_metric(truth, estimate, case_weights)Finding Internal Functions
# List all internal functions
ls("package:yardstick", all.names = TRUE)
# Search in source
# grep -r "yardstick_" R/
# View source
yardstick:::yardstick_meanSee Best Practices (Source) for complete guide to internal functions.
❌ NO Package Prefix in Source Development
CRITICAL: When developing yardstick itself, never use yardstick:: prefix:
# ✅ CORRECT - No prefix (same package)
mae_impl <- function(truth, estimate, case_weights = NULL) {
errors <- abs(truth - estimate)
yardstick_mean(errors, case_weights = case_weights)
}
mae.data.frame <- function(data, truth, estimate, ...) {
numeric_metric_summarizer(
name = "mae",
fn = mae_vec,
data = data,
...
)
}
# ❌ WRONG - Don't prefix your own package
mae_impl <- function(truth, estimate, case_weights = NULL) {
errors <- abs(truth - estimate)
yardstick::yardstick_mean(errors, case_weights = case_weights)
}
mae.data.frame <- function(data, truth, estimate, ...) {
yardstick::numeric_metric_summarizer(
name = "mae",
fn = mae_vec,
data = data,
...
)
}Why: You’re developing the package itself - these functions are in the same namespace and don’t need prefixing.
Step-by-Step Implementation
Step 1: Choose Your Metric Type
Determine which category your metric falls into:
Numeric (regression)
Class (classification with classes)
Probability (classification with probabilities)
Survival (time-to-event)
See the main SKILL.md for the complete decision tree.
Step 2: Create Source File
Create R/num-[name].R (or class-, prob-, etc.):
# R/num-mae.R
#' Mean Absolute Error
#'
#' @family numeric metrics
#' @family accuracy metrics
#' @templateVar fn mae
#' @template return
#' @template event_first
#'
#' @inheritParams rmse
#'
#' @export
mae <- function(data, ...) {
UseMethod("mae")
}
mae <- new_numeric_metric(mae, direction = "minimize")
#' @export
#' @rdname mae
mae.data.frame <- function(data, truth, estimate, na_rm = TRUE,
case_weights = NULL, ...) {
numeric_metric_summarizer(
name = "mae",
fn = mae_vec,
data = data,
truth = !!enquo(truth),
estimate = !!enquo(estimate),
na_rm = na_rm,
case_weights = !!enquo(case_weights)
)
}
#' @export
mae_vec <- function(truth, estimate, na_rm = TRUE, case_weights = NULL, ...) {
check_numeric_metric(truth, estimate, case_weights)
if (na_rm) {
result <- yardstick_remove_missing(truth, estimate, case_weights)
truth <- result$truth
estimate <- result$estimate
case_weights <- result$case_weights
} else if (yardstick_any_missing(truth, estimate, case_weights)) {
return(NA_real_)
}
mae_impl(truth, estimate, case_weights)
}
# Internal implementation
mae_impl <- function(truth, estimate, case_weights = NULL) {
errors <- abs(truth - estimate)
# Use internal helper
yardstick_mean(errors, case_weights = case_weights)
}Step 3: Document with Templates and Examples
REQUIRED: All exported functions MUST include @examples:
#' Mean Absolute Error
#'
#' @family numeric metrics
#' @family accuracy metrics
#' @templateVar fn mae
#' @template return
#' @template event_first
#'
#' @inheritParams rmse
#'
#' @examples
#' # Basic usage
#' mae(solubility_test, solubility, prediction)
#'
#' # With case weights
#' library(dplyr)
#' solubility_test %>%
#' mutate(weight = 1:nrow(.)) %>%
#' mae(solubility, prediction, case_weights = weight)
#'
#' @export
mae <- function(data, ...) {
UseMethod("mae")
}Yardstick templates:
#' @templateVar fn mae
#' @template return
#' @template event_firstTemplates are defined in man-roxygen/ directory and handle common documentation patterns.
Why @examples matters: Examples are critical for usability - they show users how to actually use the metric. Reviewers will request examples if missing.
Step 4: Create Test File
Create tests/testthat/test-num-mae.R:
test_that("mae works correctly", {
# Use internal test data
df <- data_altman()
result <- mae(df, pathology, scan)
# Use snapshot testing
expect_snapshot(result)
})
test_that("mae works with numeric vectors", {
truth <- c(1, 2, 3, 4, 5)
estimate <- c(1.5, 2.5, 2.5, 3.5, 4.5)
expect_equal(mae_vec(truth, estimate), 0.5)
})
test_that("mae handles NA correctly", {
df <- data_altman()
df$pathology[1:10] <- NA
result_remove <- mae(df, pathology, scan, na_rm = TRUE)
expect_false(is.na(result_remove$.estimate))
result_keep <- mae(df, pathology, scan, na_rm = FALSE)
expect_true(is.na(result_keep$.estimate))
})
test_that("mae validates input types", {
df <- data.frame(
truth = 1:5,
estimate = letters[1:5]
)
expect_snapshot(error = TRUE, {
mae(df, truth, estimate)
})
})
test_that("mae works with case weights", {
df <- data_altman()
df$weights <- seq_len(nrow(df))
result_unweighted <- mae(df, pathology, scan)
result_weighted <- mae(df, pathology, scan, case_weights = weights)
expect_false(
result_unweighted$.estimate == result_weighted$.estimate
)
})See Testing Patterns (Source) for comprehensive testing guide.
Step 5: Run Tests and Check
# Document
devtools::document()
# Load
devtools::load_all()
# Test
devtools::test()
# Full check
devtools::check()File Creation Guidelines for PRs
═══════════════════════════════════════════════════════ ⚠️⚠️⚠️ CRITICAL: PR FILE DISCIPLINE ⚠️⚠️⚠️ ═══════════════════════════════════════════════════════
🛑 STOP! STOP! STOP! 🛑
Before you create even ONE file, read this ENTIRE section.
You will create EXACTLY 2 files. That’s it. Two. Not 3. Not 4. Not 5. TWO FILES ONLY.
═══════════════════════════════════════════════════════
MANDATORY Pre-Flight Checklist
READ EACH LINE. CHECK EACH BOX. DO NOT SKIP THIS.
Before creating ANY files:
═══════════════════════════════════════════════════════
The ONLY Files You Will Create
R/[type]-[metric_name].R - Metric function with complete roxygen documentation
Example:
R/num-mae.R,R/class-accuracy.R,R/prob-roc_auc.RContains all three functions:
_impl(),_vec(),.data.frame()methodComplete roxygen documentation with @examples
tests/testthat/test-[type]-[metric_name].R - Comprehensive test suite
Example:
tests/testthat/test-num-mae.ROr add tests to existing file like
tests/testthat/test-class-metrics.Rif appropriate
That’s it. Two files. Nothing else.
═══════════════════════════════════════════════════════
Files You Will ABSOLUTELY NOT Create
🛑 INSTRUCTIONS FOR CLAUDE: STOP IMMEDIATELY IF YOU ARE ABOUT TO CREATE ANY FILE NOT LISTED IN “THE ONLY FILES YOU WILL CREATE” SECTION ABOVE. 🛑
❌ NEVER CREATE:
❌ README.md (yardstick already has one)
❌ README.txt (yardstick already has one)
❌ NEWS_entry.md (maintainer adds to NEWS.md)
❌ IMPLEMENTATION_SUMMARY.md
❌ IMPLEMENTATION_NOTES.md
❌ IMPLEMENTATION_NOTES.txt
❌ QUICKSTART.md
❌ QUICK_REFERENCE.md
❌ example_usage.R
❌ USAGE_EXAMPLE.R
❌ metric_examples.R
❌ test_examples.R
❌ METRIC_DESIGN.md
❌ VALIDATION_APPROACH.md
❌ PR_CHECKLIST.md
❌ PR_DESCRIPTION.md
❌ PR_SUMMARY.md
❌ INDEX.md
❌ FILE_GUIDE.md
❌ MANIFEST.md
❌ INTEGRATION_GUIDE.md
❌ SUMMARY.md
❌ SUMMARY.txt
❌ OVERVIEW.md
❌ verification_script.R
❌ check_metric.R
❌ pkgdown_update.txt
❌ pkgdown_addition.yml
❌ WORKFLOW_COMMANDS.sh
❌ setup.sh
❌ ANY other .md, .txt, .yml, .sh, or helper files
═══════════════════════════════════════════════════════
Where Content Actually Goes
CRITICAL: Everything has a place. No separate files.
| Content | ❌ WRONG | ✅ CORRECT |
|---|---|---|
| Examples | example_usage.R | roxygen @examples in R file |
| Metric design rationale | METRIC_DESIGN.md | roxygen @details in R file |
| Implementation notes | IMPLEMENTATION_NOTES.txt | roxygen @details in R file |
| Usage instructions | QUICKSTART.md | roxygen @examples in R file |
| PR description | PR_DESCRIPTION.md | Conversation with user |
| NEWS entry | NEWS_entry.md | Mention in conversation |
| Test examples | test_examples.R | Tests in test file |
| Validation approach | VALIDATION_APPROACH.md | Comments in test file |
═══════════════════════════════════════════════════════
FINAL CHECK Before Creating Files
🛑 STOP RIGHT HERE. ANSWER THESE QUESTIONS: 🛑
- Am I about to create exactly 2 files? (YES/NO)
- Are both files either R/.R or tests/testthat/test-.R? (YES/NO)
- Am I about to create ANY .md, .txt, .yml, or .sh files? (NO)
- Have I put all examples in roxygen @examples? (YES)
- Have I put all notes in roxygen @details? (YES)
If you answered incorrectly to ANY question above, STOP. Re-read this section.
═══════════════════════════════════════════════════════
Why This Matters
PRs to yardstick should contain ONLY code and tests. Period.
Extra documentation files:
Clutter the repository
Duplicate roxygen documentation
Create maintenance burden
Slow down PR review
Get deleted by maintainers anyway
The yardstick maintainers have explicitly requested: CODE AND TESTS ONLY.
When you submit a PR:
Code goes in R/ with roxygen docs
Tests go in tests/testthat/
Everything else (PR description, NEWS entry, examples) you discuss with the user in conversation
No exceptions. No “helpful” documentation files. Just code and tests.
═══════════════════════════════════════════════════════
Documentation Patterns
Using @template
#' @templateVar fn mae
#' @template returnAvailable templates (in man-roxygen/):
@template return- Standard return value@template event_first- Event level for class metrics@template multiclass- Multiclass documentation
Using @templateVar
Define variables before templates:
#' @templateVar fn mae
#' @templateVar metric_fn maeInheriting Parameters
Use @inheritParams extensively:
#' @inheritParams rmseThis inherits all parameters from rmse documentation.
Multiclass Metrics
Supporting Multiple Estimators
For class metrics:
accuracy.data.frame <- function(data, truth, estimate,
estimator = NULL, na_rm = TRUE,
case_weights = NULL, ...,
call = rlang::caller_env()) {
# Finalize estimator
estimator <- finalize_estimator_internal(
estimator,
metric_class = "accuracy",
call = call
)
class_metric_summarizer(
name = "accuracy",
fn = accuracy_vec,
data = data,
truth = !!enquo(truth),
estimate = !!enquo(estimate),
estimator = estimator,
na_rm = na_rm,
case_weights = !!enquo(case_weights),
call = call
)
}Implement Binary and Estimator Variants
accuracy_vec <- function(truth, estimate, estimator = NULL,
na_rm = TRUE, case_weights = NULL, ...,
call = rlang::caller_env()) {
# ... validation ...
if (is_binary(estimator)) {
accuracy_binary(truth, estimate, case_weights)
} else {
accuracy_estimator_impl(truth, estimate, estimator, case_weights)
}
}Using Internal Test Data
Available Test Helpers
# Binary classification
data <- data_altman()
# Three-class data
data <- data_three_class()
# Cross-validation data
data <- data_hpc_cv1()See Testing Patterns (Source) for complete list.
Snapshot Testing
Yardstick uses snapshots extensively:
test_that("mae returns correct structure", {
df <- data_altman()
result <- mae(df, pathology, scan)
# Snapshot entire result
expect_snapshot(result)
})
test_that("mae errors appropriately", {
expect_snapshot(error = TRUE, {
mae_vec(1:5, letters[1:5])
})
})Reviewing Snapshots
# Review snapshot changes
testthat::snapshot_review()
# Accept changes
testthat::snapshot_accept()Consistency with Existing Metrics
Study Similar Metrics
Before implementing:
For numeric:
R/num-mae.R,R/num-rmse.RFor class:
R/class-accuracy.R,R/class-precision.RFor probability:
R/prob-roc_auc.R
Match Function Structure
# 1. Generic (exported)
#' @export
mae <- function(data, ...) {
UseMethod("mae")
}
# 2. Wrap with new_*_metric
mae <- new_numeric_metric(mae, direction = "minimize")
# 3. Data frame method (exported)
#' @export
#' @rdname mae
mae.data.frame <- function(...) {
numeric_metric_summarizer(...)
}
# 4. Vector method (exported)
#' @export
mae_vec <- function(...) {
# ... validation and NA handling ...
mae_impl(...)
}
# 5. Implementation (NOT exported - internal)
mae_impl <- function(truth, estimate, case_weights = NULL) {
# Core calculation
}Creating New Internal Helpers
When to Create
Create internal helpers when:
Logic is shared by 2+ metrics
Complex calculation used repeatedly
Abstraction improves clarity
Naming and Documentation
#' Calculate weighted mean with case weight handling
#'
#' @param x Numeric vector
#' @param case_weights Optional case weights
#'
#' @return Numeric scalar
#' @keywords internal
#' @noRd
yardstick_mean <- function(x, case_weights = NULL) {
# Implementation
}Use:
@keywords internalto mark as internal@noRdto skip documentation generationDon’t use
@export
Error Messages
Use cli for consistent errors:
if (invalid) {
cli::cli_abort(
"{.arg estimator} must be {.val binary}, {.val macro}, or {.val micro}, not {.val {estimator}}.",
call = call
)
}Always pass call parameter for better error context.
PR Submission
Before Submitting
Run full check:
devtools::check()Update NEWS.md:
## yardstick (development version) * Added `mae()` metric for mean absolute error (#123).Commit changes:
git add . git commit -m "Add mae() metric" git push origin feature/add-metric-name
Creating the PR
Go to https://github.com/tidymodels/yardstick
Click “New pull request”
Select your branch
Fill in description:
What metric does this add?
Why is it useful?
Reference any related issues
Review Process
The tidymodels team will review your PR. Common feedback:
Add more tests
Match existing documentation style
Use internal helpers
Add examples
Fix code style issues
See Troubleshooting (Source) for common review feedback.
Reference Documentation
Source Development
Testing Patterns (Source) - Testing with internal helpers
Best Practices (Source) - Code style and internal functions
Troubleshooting (Source) - Common issues
Metric Types
Core Concepts
Next Steps
- Clone yardstick repository
- Create feature branch
- Implement your metric following this guide
- Test thoroughly using internal test data
- Run
devtools::check() - Submit PR to tidymodels/yardstick
Getting Help
Check Troubleshooting (Source)
Study existing metrics in the repository
Review Best Practices (Source)
Open an issue on GitHub for questions
Tag maintainers in your PR