Source Development Guide: Contributing to Recipes
Complete guide for contributing new preprocessing steps to the recipes package itself.
When to Use This Guide
✅ Use this guide if you are:
Contributing a PR directly to the recipes package
Working inside the recipes repository
Adding steps that should be part of recipes core
Modifying existing recipes steps
❌ Don’t use this guide if you are:
Creating a new package that extends recipes → Use Extension Development Guide
Building standalone steps → Use Extension Development Guide
Prerequisites
Clone the Recipes Repository
# Clone from GitHub
git clone https://github.com/tidymodels/recipes.git
cd recipes
# Create a feature branch
git checkout -b feature/add-step-nameSee Repository Access for more details.
Install Development Dependencies
# Install recipes with all dependencies
devtools::install_dev_deps()
# Load the package for development
devtools::load_all()Understanding Recipes Architecture
Package Organization
recipes/
├── R/
│ ├── center.R # Normalization steps
│ ├── dummy.R # Encoding steps
│ ├── pca.R # Dimension reduction
│ ├── filter.R # Row operations
│ ├── aaa-*.R # Core infrastructure
│ └── utils-*.R # Internal utilities
├── tests/testthat/
│ ├── test-center.R
│ ├── test-dummy.R
│ └── helper-*.R # Test helpers
└── man/ # Documentation
File Naming Conventions
Source files: R/[step_name].R
- Examples:
center.R,normalize.R,pca.R,dummy.R
Test files: tests/testthat/test-[step_name].R
- Examples:
test-center.R,test-normalize.R
Working with Internal Functions
✅ You CAN Use Internal Functions
When developing recipes itself, internal functions are available:
# ✅ GOOD - You're developing the package
prep.step_center <- function(x, training, info = NULL, ...) {
# Use internal function (no prefix needed)
col_names <- recipes_eval_select(x$terms, training, info)
# Core calculation
# ...
}Common Internal Helpers
recipes_eval_select() - Variable Selection
The most important function for resolving variable selections:
# Converts all_numeric(), manual selections, etc. to column names
col_names <- recipes_eval_select(x$terms, training, info)get_case_weights() - Extract Case Weights
wts <- get_case_weights(info, training)
were_weights_used <- are_weights_used(wts, unsupervised = TRUE)
if (isFALSE(were_weights_used)) {
wts <- NULL
}check_type() - Validate Column Types
# Validates that columns are the correct type
check_type(training[, col_names], types = c("double", "integer"))check_new_data() - Validate Columns Exist
# In bake(), check required columns exist
check_new_data(col_names, object, new_data)remove_original_cols() - Handle keep_original_cols
# For create-new-columns steps
new_data <- remove_original_cols(new_data, object, original_cols)print_step() - Standard Printing
print_step(x$columns, x$terms, x$trained, title, width,
case_weights = x$case_weights)sel2char() - Convert Selections to Strings
# For tidy() on untrained steps
term_names <- sel2char(x$terms)See Best Practices (Source) for complete guide to internal functions.
Step-by-Step Implementation
Step 1: Choose Your Step Type
Determine which category your step falls into:
Modify-in-place: Transforms existing columns
Create-new-columns: Generates new columns
Row-operation: Filters or removes rows
See Step Architecture for decision tree.
Step 2: Create Source File
Create R/[step_name].R:
# R/center.R
#' Center numeric variables
#'
#' `step_center()` creates a *specification* of a recipe step that will
#' normalize numeric data to have a mean of zero.
#'
#' @inheritParams step_normalize
#' @param ... One or more selector functions to choose variables for this step.
#' See [selections()] for more details.
#' @param role Not used by this step since no new variables are created.
#' @param na_rm A logical value indicating whether NA values should be removed
#' when computing means.
#' @param means A named numeric vector of means. This is `NULL` until computed
#' by [prep()].
#'
#' @template step-return
#'
#' @family normalization steps
#' @export
#'
#' @details
#' Centering data means that the average of the variable is subtracted from the
#' data. `step_center` estimates the variable means from the data used in the
#' `training` argument of [prep()]. [bake()] then applies the centering to new
#' data sets using these means.
#'
#' @template case-weights-unsupervised
#'
#' @examplesIf rlang::is_installed("modeldata")
#' data(biomass, package = "modeldata")
#'
#' biomass_tr <- biomass[biomass$dataset == "Training", ]
#' biomass_te <- biomass[biomass$dataset == "Testing", ]
#'
#' rec <- recipe(
#' HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
#' data = biomass_tr
#' )
#'
#' center_trans <- rec |>
#' step_center(carbon, hydrogen)
#'
#' center_obj <- prep(center_trans, training = biomass_tr)
#'
#' transformed_te <- bake(center_obj, biomass_te)
#'
step_center <- function(
recipe,
...,
role = NA,
trained = FALSE,
means = NULL,
na_rm = TRUE,
skip = FALSE,
id = rand_id("center")
) {
add_step(
recipe,
step_center_new(
terms = enquos(...),
trained = trained,
role = role,
means = means,
na_rm = na_rm,
skip = skip,
id = id,
case_weights = NULL
)
)
}
step_center_new <- function(terms, role, trained, means, na_rm, skip, id,
case_weights) {
step(
subclass = "center",
terms = terms,
role = role,
trained = trained,
means = means,
na_rm = na_rm,
skip = skip,
id = id,
case_weights = case_weights
)
}
#' @export
prep.step_center <- function(x, training, info = NULL, ...) {
# 1. Resolve variable selections (internal function)
col_names <- recipes_eval_select(x$terms, training, info)
# 2. Validate column types (internal function)
check_type(training[, col_names], types = c("double", "integer"))
# 3. Get case weights (internal function)
wts <- get_case_weights(info, training)
were_weights_used <- are_weights_used(wts, unsupervised = TRUE)
if (isFALSE(were_weights_used)) {
wts <- NULL
}
# 4. Calculate means
means <- vapply(
training[, col_names],
weighted_mean, # Could use internal helper if it exists
numeric(1),
wts = wts,
na_rm = x$na_rm
)
# 5. Check for issues
inf_cols <- col_names[is.infinite(means)]
if (length(inf_cols) > 0) {
cli::cli_warn(
"Column{?s} {.var {inf_cols}} returned Inf or NaN."
)
}
# 6. Return updated step
step_center_new(
terms = x$terms,
role = x$role,
trained = TRUE,
means = means,
na_rm = x$na_rm,
skip = x$skip,
id = x$id,
case_weights = were_weights_used
)
}
#' @export
bake.step_center <- function(object, new_data, ...) {
col_names <- names(object$means)
# Validate columns exist (internal function)
check_new_data(col_names, object, new_data)
# Apply transformation
for (col_name in col_names) {
new_data[[col_name]] <- new_data[[col_name]] - object$means[[col_name]]
}
new_data
}
#' @export
print.step_center <- function(x, width = max(20, options()$width - 30), ...) {
title <- "Centering for "
# Use internal print helper
print_step(x$columns, x$terms, x$trained, title, width,
case_weights = x$case_weights)
invisible(x)
}
#' @rdname tidy.recipe
#' @export
tidy.step_center <- function(x, ...) {
if (is_trained(x)) {
res <- tibble(
terms = names(x$means),
value = unname(x$means)
)
} else {
# Use internal helper
term_names <- sel2char(x$terms)
res <- tibble(
terms = term_names,
value = na_dbl
)
}
res$id <- x$id
res
}Step 3: Create Test File
Create tests/testthat/test-center.R:
test_that("centering works correctly", {
rec <- recipe(mpg ~ ., data = mtcars) |>
step_center(disp, hp)
prepped <- prep(rec, training = mtcars)
results <- bake(prepped, mtcars)
expect_equal(mean(results$disp), 0, tolerance = 1e-7)
expect_equal(mean(results$hp), 0, tolerance = 1e-7)
# Check tidy output
trained_tidy <- tidy(prepped, 1)
expect_equal(trained_tidy$value, c(mean(mtcars$disp), mean(mtcars$hp)))
})
test_that("centering handles selectors", {
rec <- recipe(mpg ~ ., data = mtcars) |>
step_center(all_numeric_predictors())
prepped <- prep(rec, training = mtcars)
# Check correct columns selected
selected <- prepped$steps[[1]]$columns
expect_true("disp" %in% selected)
expect_false("mpg" %in% selected) # outcome, not predictor
})
test_that("centering works with case weights", {
mtcars_weighted <- mtcars
mtcars_weighted$wt_col <- hardhat::importance_weights(seq_len(nrow(mtcars)))
rec <- recipe(mpg ~ ., data = mtcars_weighted) |>
update_role(wt_col, new_role = "case_weights") |>
step_center(disp)
prepped <- prep(rec, training = mtcars_weighted)
# Weighted mean should differ from unweighted
expect_false(
prepped$steps[[1]]$means[["disp"]] == mean(mtcars$disp)
)
})
test_that("centering validates input", {
df <- data.frame(x = letters[1:5], y = 1:5)
rec <- recipe(~ ., data = df) |>
step_center(x)
expect_error(prep(rec, training = df))
})
test_that("centering handles NA", {
mtcars_na <- mtcars
mtcars_na$disp[1:5] <- NA
rec <- recipe(mpg ~ ., data = mtcars_na) |>
step_center(disp, na_rm = TRUE)
prepped <- prep(rec, training = mtcars_na)
baked <- bake(prepped, mtcars_na)
# NA values remain NA
expect_true(all(is.na(baked$disp[1:5])))
expect_false(any(is.na(baked$disp[6:nrow(mtcars_na)])))
})Testing
INSTRUCTIONS FOR CLAUDE: Create tests based on features present.
Essential Tests (ALL steps) - 8-10 tests minimum
Core functionality (3-4 tests):
Basic correctness (transformation works)
Multiple columns (if applicable)
Single column (if applicable)
Variable selection (1-2 tests):
Works with recipes selectors (all_numeric(), all_predictors())
Manual column selection
NA handling (1 test):
- Verify NA behavior (preserve, remove, or error)
Infrastructure (2-3 tests):
print() method works
tidy() method works (before and after prep)
Integration in recipe pipeline
Feature-Specific Tests (Add ONLY if applicable)
If step computes statistics (+2 tests):
Case weights: frequency weights
Case weights: importance weights
If skip parameter present (+1 test):
- skip = TRUE and FALSE behavior
If keep_original_cols parameter (+1 test):
- keep_original_cols = TRUE and FALSE
If multiple custom parameters (+2 tests):
Parameter combinations
Parameter validation
If complex statistical operations (+2-3 tests):
Edge cases (zero variance, all same values)
Boundary conditions
Target Test Counts
Per-row operations: 8-12 tests
Statistical operations: 12-18 tests
Complex calculations: 18-25 tests
See Testing Patterns (Source) for comprehensive guide and internal test helpers.
Step 4: Run Tests and Check
# Document
devtools::document()
# Load
devtools::load_all()
# Test
devtools::test()
# Full check
devtools::check()Documentation Patterns
Using @inheritParams
Recipes uses extensive parameter inheritance:
#' @inheritParams step_normalizeThis inherits all standard parameters from step_normalize.
Using @template
#' @template step-return
#' @template case-weights-unsupervisedAvailable templates are in templates or inline documentation.
Cross-Referencing Steps
#' @seealso [step_normalize()], [step_scale()]
#' @family normalization stepsDocumentation Files to Create
INSTRUCTIONS FOR CLAUDE:
Create ONLY these files by default: 1. R/[step_name].R - Complete implementation 2. tests/testthat/test-[step_name].R - Test suite 3. README.md - Overview with basic usage example (200-300 lines)
Do NOT create unless user explicitly requests:
❌ IMPLEMENTATION_SUMMARY.md
❌ QUICKSTART.md
❌ example_usage.R
❌ Additional documentation files
If user wants more documentation, they will ask (e.g., “add comprehensive documentation”).
The Three-Function Pattern
Every step needs these three functions:
1. Step Constructor (User-Facing)
#' @export
step_center <- function(recipe, ..., role = NA, ...) {
add_step(recipe, step_center_new(...))
}2. Step Initialization (Internal)
step_center_new <- function(terms, role, trained, ...) {
step(subclass = "center", ...)
}3. S3 Methods (All Exported)
#' @export
prep.step_center <- function(x, training, info = NULL, ...) { }
#' @export
bake.step_center <- function(object, new_data, ...) { }
#' @export
print.step_center <- function(x, ...) { }
#' @export
tidy.step_center <- function(x, ...) { }Step Type Best Practices
Modify-in-Place Steps
# Use role = NA
step_center <- function(recipe, ..., role = NA, ...) { }
# No keep_original_cols parameter
# Columns modified in placeCreate-New-Columns Steps
# Use role = "predictor"
step_dummy <- function(recipe, ..., role = "predictor",
keep_original_cols = FALSE, ...) { }
# In bake(), handle keep_original_cols
new_data <- remove_original_cols(new_data, object, original_cols)Row-Operation Steps
# Default skip = TRUE
step_filter <- function(recipe, ..., skip = TRUE, ...) { }
# In bake(), respect skip
if (object$skip) {
return(new_data)
}Creating New Internal Helpers
When to Create
Create internal helpers when:
Logic is shared by 2+ steps
Complex operation used repeatedly
Abstraction improves clarity
Example Internal Helper
#' Calculate weighted mean with case weight handling
#'
#' @param x Numeric vector
#' @param wts Numeric weights (or NULL)
#' @param na_rm Remove NA values?
#'
#' @return Numeric scalar
#' @keywords internal
#' @noRd
weighted_mean_na <- function(x, wts = NULL, na_rm = TRUE) {
if (is.null(wts)) {
mean(x, na.rm = na_rm)
} else {
if (inherits(wts, c("hardhat_importance_weights",
"hardhat_frequency_weights"))) {
wts <- as.double(wts)
}
weighted.mean(x, w = wts, na.rm = na_rm)
}
}Use:
@keywords internal@noRdDon’t export
Error Messages
Use cli for consistent errors:
if (bad_input) {
cli::cli_abort(
"{.arg na_rm} must be a single logical value, not {.obj_type_friendly {na_rm}}.",
call = call
)
}Pass call parameter for better error context:
prep.step_center <- function(x, training, info = NULL, ...,
call = rlang::caller_env()) {
if (problem) {
cli::cli_abort("Error message", call = call)
}
}PR Submission
Before Submitting
Run full check:
devtools::check()Update NEWS.md:
## recipes (development version) * Added `step_center()` for centering numeric predictors (#123).Commit changes:
git add . git commit -m "Add step_center()" git push origin feature/add-step-center
Creating the PR
Go to https://github.com/tidymodels/recipes
Click “New pull request”
Select your branch
Fill in description:
What does this step do?
Why is it useful?
Reference any related issues
Review Process
Common feedback:
Add tests for all selectors
Match existing documentation style
Use internal helpers
Add more examples
Fix code style issues
See Troubleshooting (Source) for common review feedback.
Reference Documentation
Source Development
Testing Patterns (Source) - Testing with internal helpers
Best Practices (Source) - Code style and internal functions
Troubleshooting (Source) - Common issues
Step Types
Core Concepts
Next Steps
- Clone recipes repository
- Create feature branch
- Implement your step following this guide
- Test thoroughly with recipes test patterns
- Run
devtools::check() - Submit PR to tidymodels/recipes
Getting Help
Check Troubleshooting (Source)
Study existing steps in the repository
Review Best Practices (Source)
Open an issue on GitHub for questions
Tag maintainers in your PR