Troubleshooting Yardstick Source Development
Context: This guide is for source development - contributing to the yardstick package directly.
Key focus: Working with package internals, integration testing, and yardstick-specific issues.
For extension development (creating new packages), see Troubleshooting (Extension).
Working with Yardstick Internals
Finding Internal Functions
Problem: How do I know what internal functions are available?
Solutions:
# List all objects including internals
ls("package:yardstick", all.names = TRUE)
# Filter for yardstick_ helpers
ls("package:yardstick", all.names = TRUE, pattern = "yardstick_")
# View source of internal function
yardstick:::yardstick_mean
# Search in source files
# In terminal from yardstick root:
# grep -r "yardstick_" R/When Internal Functions Change
Problem: Internal function behavior changed and broke my metric.
Solution: 1. Check git history: git log --all --full-history -- R/your-file.R 2. Review recent PRs that modified the internal function 3. Update your metric to match new behavior 4. Add tests to prevent future breakage
Prevention: - Document WHY you’re using an internal (in comments) - Add tests that would catch internal changes - Prefer stable internals (e.g., yardstick_mean)
Internal Function Not Found
Problem:
Error: object 'yardstick_helper' not foundCauses & Solutions:
- Function doesn’t exist
- Check spelling
- Search codebase:
grep -r "yardstick_helper" R/
- Function was removed
- Check git history
- Implement functionality yourself
- Ask package maintainers
- Function is in different file
- Make sure file is sourced before yours
- Check NAMESPACE
metric_set() Integration Issues
Metric Not Working in metric_set()
Problem:
metrics <- metric_set(mae, rmse, my_custom_metric)
metrics(data, truth, estimate)
# Error or my_custom_metric not in resultsCauses & Solutions:
- Metric not wrapped with
new_*_metric()
# Bad
mae <- function(data, ...) {
UseMethod("mae")
}
# Good
mae <- function(data, ...) {
UseMethod("mae")
}
mae <- new_numeric_metric(mae, direction = "minimize")- Wrong metric class
# Make sure class hierarchy is correct
class(mae)
# [1] "mae" "numeric_metric" "metric" "function"
# For class metrics
class(accuracy)
# [1] "accuracy" "class_metric" "metric" "function"- Missing direction or range
mae <- new_numeric_metric(
mae,
direction = "minimize", # Required
range = c(0, Inf) # Optional but recommended
)metric_set() Returns Wrong Estimator
Problem: All metrics in set show same estimator, but they shouldn’t.
Example:
metrics <- metric_set(accuracy, precision, recall)
result <- metrics(df, truth, estimate)
# All show "binary" but some should show "macro"Cause: Metrics in a set must have compatible estimators.
Solution: - For binary data: All metrics will use “binary” - For multiclass data: Specify estimator explicitly
# Multiclass with specific estimator
result <- metrics(df, truth, estimate, estimator = "macro")Mixed Metric Types in metric_set()
Problem:
metrics <- metric_set(mae, accuracy) # numeric + class
# Error: Can't mix metric typesCause: You can’t mix numeric, class, and probability metrics in one set.
Solution: Create separate metric sets:
numeric_metrics <- metric_set(mae, rmse, mse)
class_metrics <- metric_set(accuracy, precision, recall)Case Weight Issues
Case Weights Not Working
Problem: Weighted and unweighted results are identical.
Diagnosis:
# Check if weights are being used
df <- data.frame(
truth = 1:5,
estimate = c(1.5, 2.5, 2.5, 3.5, 4.5),
weights = c(1, 1, 1, 1, 100) # Heavy weight on last
)
result_unweighted <- mae(df, truth, estimate)
result_weighted <- mae(df, truth, estimate, case_weights = weights)
# These should be different!
result_unweighted$.estimate == result_weighted$.estimateSolutions:
- Using
yardstick_mean()correctly
# Good - passes weights
mae_impl <- function(truth, estimate, case_weights = NULL) {
errors <- abs(truth - estimate)
yardstick_mean(errors, case_weights = case_weights)
}
# Bad - ignores weights
mae_impl <- function(truth, estimate, case_weights = NULL) {
errors <- abs(truth - estimate)
mean(errors) # Doesn't use case_weights!
}- Converting hardhat weights
# yardstick_mean() handles this, but if implementing manually:
if (!is.null(case_weights)) {
if (inherits(case_weights, c("hardhat_importance_weights",
"hardhat_frequency_weights"))) {
case_weights <- as.double(case_weights)
}
weighted.mean(x, w = case_weights)
}- Weights column must be hardhat type
# Create proper hardhat weights
df$wt <- hardhat::importance_weights(df$weights)
# Then use in metric
mae(df, truth, estimate, case_weights = wt)Case Weights Causing Errors
Problem:
Error: All case weights must be non-negativeCause: Negative weights in data.
Solution: Validate input data has non-negative weights.
Package Check Issues
Check Fails: Examples Too Long
Problem:
checking examples ... ERROR
Examples with CPU time > 2.5 times elapsed timeSolution: Use \donttest{} for slow examples:
#' @examples
#' # Quick example
#' mae(data, truth, estimate)
#'
#' \donttest{
#' # Slower example with cross-validation
#' result <- tune::fit_resamples(...)
#' }
Check Fails: Tests Too Long
Problem: Test suite takes too long.
Solution: Make tests faster: - Use smaller test datasets - Skip slow tests on CRAN: skip_on_cran() - Parallelize independent tests
Check Fails: Snapshot Changes
Problem:
checking tests ... ERROR
Snapshot test failedSolution:
# Review changes
testthat::snapshot_review()
# If changes are intentional, accept them
testthat::snapshot_accept()
# If unintentional, fix code and re-testCheck Fails: Undefined Global Variables
Problem:
checking R code for possible problems ... NOTE
accuracy: no visible binding for global variable '.estimate'Cause: Using bare column names in dplyr/rlang code.
Solution: Use .data pronoun:
# Good
data |> dplyr::mutate(new_col = .data$.estimate * 2)
# Avoid
data |> dplyr::mutate(new_col = .estimate * 2)Integration Testing Issues
Metric Doesn’t Work with tune
Problem: Metric works alone but fails in tune::tune_grid().
Diagnosis:
# Test with tune
library(tune)
library(rsample)
# Your metric should work here
result <- tune_grid(
model,
recipe,
resamples = vfold_cv(data),
metrics = metric_set(mae, rmse, your_metric)
)Common Issues:
- Metric returns wrong structure
- Must return tibble with
.metric,.estimator,.estimate
- Must return tibble with
- Metric doesn’t handle grouped data
- Use metric summarizers, they handle groups
- Metric fails on resampled data
- Test with
rsample::vfold_cv()data
- Test with
Metric Doesn’t Work with workflows
Problem: Metric works with tune but not workflows.
Solution: Make sure metric works with model predictions:
library(workflows)
library(parsnip)
wf <- workflow() |>
add_model(linear_reg()) |>
add_formula(mpg ~ .)
fit <- fit(wf, mtcars)
preds <- predict(fit, mtcars)
# Your metric should work with these predictions
mae(bind_cols(mtcars, preds), truth = mpg, estimate = .pred)Git and PR Issues
Merge Conflicts
Problem: Your branch has conflicts with main.
Solution:
# Update your local main
git checkout main
git pull origin main
# Rebase your branch
git checkout your-branch
git rebase main
# Resolve conflicts in each file
# After resolving each:
git add resolved-file.R
git rebase --continue
# Force push (since rebase rewrites history)
git push --force-with-lease origin your-branchPR Build Fails
Problem: PR checks fail on GitHub Actions.
Common failures:
- R CMD check fails
- Run locally:
devtools::check() - Fix all errors, warnings, notes
- Run locally:
- Tests fail
- Run locally:
devtools::test() - Fix failing tests
- Run locally:
- Code coverage drops
- Add tests for new code
- Aim for >90% coverage on new code
- Lint failures
- Run:
lintr::lint_package() - Fix style issues
- Run:
PR Review Feedback
Common requests:
- “Add tests for edge cases”
- Add tests for NA handling
- Add tests for empty data
- Add tests for perfect predictions
- “Update documentation”
- Add
@examples - Clarify parameter descriptions
- Add
@detailssection
- Add
- “Use existing internal helper”
- Reviewer points out existing function
- Refactor to use it
- “Match style of existing metrics”
- Review similar metrics
- Match their structure and naming
Yardstick-Specific Issues
Estimator Column Missing
Problem: Result is missing .estimator column.
Solution: Always use metric summarizers:
# They automatically add .estimator
numeric_metric_summarizer(...)
class_metric_summarizer(...)
prob_metric_summarizer(...)Direction Attribute Missing
Problem:
attr(my_metric, "direction")
# NULLSolution: Set when creating metric:
mae <- new_numeric_metric(
mae,
direction = "minimize" # or "maximize" or "zero"
)Metric Class Hierarchy Wrong
Problem: metric_set() doesn’t recognize your metric.
Check class hierarchy:
class(your_metric)
# Should be: c("your_metric", "numeric_metric", "metric", "function")
# or: c("your_metric", "class_metric", "metric", "function")
# or: c("your_metric", "prob_metric", "metric", "function")Solution: Use the right new_*_metric() function:
# For numeric metrics
new_numeric_metric(fn, direction = "minimize")
# For class metrics
new_class_metric(fn, direction = "maximize")
# For probability metrics
new_prob_metric(fn, direction = "maximize")Performance Issues
Metric is Slow
Problem: Metric takes too long on large datasets.
Diagnosis:
# Profile your code
profvis::profvis({
mae(large_data, truth, estimate)
})Common bottlenecks:
Non-vectorized operations
# Slow for (i in seq_along(truth)) { errors[i] <- abs(truth[i] - estimate[i]) } # Fast errors <- abs(truth - estimate)Unnecessary copies
# Slow - creates copy data <- data |> dplyr::filter(!is.na(truth)) # Fast - works with indices valid <- !is.na(truth) truth <- truth[valid]Inefficient aggregation
# Use built-in functions mean(errors) # Fast # Avoid manual calculation sum(errors) / length(errors) # Slower
Getting Help
Check Existing Issues
Search yardstick GitHub issues for similar problems: https://github.com/tidymodels/yardstick/issues
Ask Tidymodels Team
For yardstick-specific questions: - Open an issue on GitHub - Ask in tidymodels Slack (if you have access) - Tag maintainers in your PR
Reference Existing Metrics
Study similar metrics in the codebase: - Look at R/num-mae.R for simple numeric metrics - Look at R/class-accuracy.R for class metrics - Look at R/prob-roc_auc.R for probability metrics
Next Steps
- Review Testing Patterns (Source) for testing guidance
- Check Best Practices (Source) for coding standards
- See Extension Troubleshooting for common R package issues