Troubleshooting Recipes Source Development
Context: This guide is for source development - contributing to the recipes package directly.
Key focus: Working with package internals, prep/bake workflow issues, and recipes-specific problems.
For extension development (creating new packages), see Troubleshooting (Extension).
Working with Recipes Internals
Finding Internal Functions
Problem: How do I know what internal functions are available?
Solutions:
# List all objects including internals
ls("package:recipes", all.names = TRUE)
# Filter for specific patterns
ls("package:recipes", all.names = TRUE, pattern = "^check_")
ls("package:recipes", all.names = TRUE, pattern = "^get_")
# View source of internal function
recipes:::check_type
recipes:::recipes_eval_select
# Search in source files
# In terminal from recipes root:
# grep -r "check_type" R/When Internal Functions Change
Problem: Internal function changed and broke my step.
Solution: 1. Check git history: git log --all --full-history -- R/your-file.R 2. Review recent PRs that modified the function 3. Update your step to match new behavior 4. Add regression tests
Prevention:
Document WHY you’re using an internal function
Add tests that would catch interface changes
Use stable internals when possible
Internal Function Not Found
Problem:
Error: object 'internal_helper' not foundCauses & Solutions:
Function doesn’t exist
Check spelling
Search codebase
Function was removed/renamed
Check git history
Implement functionality yourself
Need to use different helper
- Ask maintainers for recommendation
Variable Selection Issues
Selector Doesn’t Work
Problem: all_numeric() or other selector isn’t working.
Symptom:
rec <- recipe(mpg ~ ., data = mtcars) |>
step_center(all_numeric())
prep(rec)
# Error: Can't select columns that don't existCause: Using selector incorrectly.
Solution: Use recipes_eval_select() in prep():
prep.step_center <- function(x, training, info = NULL, ...) {
# This resolves the selector
col_names <- recipes_eval_select(x$terms, training, info)
# Now col_names contains actual column names
# ...
}Wrong Columns Selected
Problem: Selector picks wrong columns (e.g., includes outcome).
Diagnosis:
rec <- recipe(mpg ~ ., data = mtcars) |>
step_center(all_numeric())
prepped <- prep(rec)
# Check what was selected
prepped$steps[[1]]$columnsSolution: Use the right selector:
# Includes outcome
all_numeric()
# Excludes outcome
all_numeric_predictors()
# Only specific role
has_role("predictor")Selector Fails on New Data
Problem: Step works on training data but fails on new data.
Symptom:
bake(prepped_rec, new_data)
# Error: Column 'x' doesn't existCause: Column was in training but not in new data.
Solution: Use check_new_data() in bake():
bake.step_center <- function(object, new_data, ...) {
col_names <- names(object$means)
# This will give clear error if columns missing
check_new_data(col_names, object, new_data)
# ...
}Manual Selection Not Working
Problem: Selecting columns by name doesn’t work.
Example:
step_center(disp, hp) # Doesn't workCause: Need to use tidyselect syntax.
Solution: User code is actually correct. In prep():
# This handles both:
# - step_center(disp, hp)
# - step_center(all_numeric())
col_names <- recipes_eval_select(x$terms, training, info)prep/bake Workflow Issues
prep() Fails
Problem: Error during prep().
Common causes:
- Wrong column types
prep.step_center <- function(x, training, info = NULL, ...) {
col_names <- recipes_eval_select(x$terms, training, info)
# Add type validation
check_type(training[, col_names], types = c("double", "integer"))
# This will error if types wrong
}- Missing values cause issues
# If your calculation can't handle NA
if (any(is.na(training[, col_names])) && !x$na_rm) {
cli::cli_warn("NA values found but {.arg na_rm = FALSE}.")
}- Insufficient data
if (nrow(training) == 0) {
cli::cli_abort("Training data has 0 rows.")
}bake() Fails
Problem: Error during bake().
Common causes:
- Required columns missing
bake.step_center <- function(object, new_data, ...) {
col_names <- names(object$means)
# This catches missing columns
check_new_data(col_names, object, new_data)
# ...
}- Wrong data types in new data
# New data has different types than training
# Solution: Validate types in bake() or document assumptions- Step wasn’t trained
if (!object$trained) {
cli::cli_abort("Step must be trained before baking.")
}Parameters Not Stored
Problem: Parameters calculated in prep() aren’t available in bake().
Example:
prep.step_center <- function(x, training, info = NULL, ...) {
col_names <- recipes_eval_select(x$terms, training, info)
# Calculate means
means <- colMeans(training[, col_names])
# PROBLEM: Where do means go?
}Solution: Return them in the updated step:
prep.step_center <- function(x, training, info = NULL, ...) {
col_names <- recipes_eval_select(x$terms, training, info)
means <- colMeans(training[, col_names])
# Return updated step with parameters
step_center_new(
terms = x$terms,
role = x$role,
trained = TRUE,
means = means, # Store parameters here
na_rm = x$na_rm,
skip = x$skip,
id = x$id,
case_weights = NULL
)
}Step Applied Twice
Problem: Transformation applied multiple times.
Example:
rec <- recipe(mpg ~ ., data = mtcars) |>
step_center(disp)
prepped <- prep(rec)
baked1 <- bake(prepped, mtcars)
baked2 <- bake(prepped, baked1) # Applied again!Cause: bake() is designed to be idempotent but data might not be.
Solution: Document that bake() should only be used on original data, or make transformation truly idempotent.
Case Weight Issues
Weights Not Working
Problem: Case weights seem to be ignored.
Diagnosis:
# Test if weights matter
mtcars_weighted <- mtcars
mtcars_weighted$wt <- hardhat::importance_weights(seq_len(nrow(mtcars)))
rec <- recipe(mpg ~ ., data = mtcars_weighted) |>
update_role(wt, new_role = "case_weights") |>
step_center(disp)
prepped_weighted <- prep(rec, training = mtcars_weighted)
prepped_unweighted <- prep(
recipe(mpg ~ ., data = mtcars) |> step_center(disp),
training = mtcars
)
# These should differ
prepped_weighted$steps[[1]]$means
prepped_unweighted$steps[[1]]$meansSolutions:
- Extract weights in prep()
wts <- get_case_weights(info, training)
were_weights_used <- are_weights_used(wts, unsupervised = TRUE)
if (isFALSE(were_weights_used)) {
wts <- NULL
}- Use weights in calculations
if (is.null(wts)) {
means <- colMeans(training[, col_names], na.rm = x$na_rm)
} else {
# Convert hardhat weights and use
wts <- as.double(wts)
means <- vapply(
training[, col_names],
function(col) weighted.mean(col, w = wts, na.rm = x$na_rm),
numeric(1)
)
}Weights Not Recognized
Problem:
Error: No case weights foundCause: Weight column doesn’t have case_weights role.
Solution:
# User must set role
df$wt <- hardhat::importance_weights(df$weight_values)
rec <- recipe(y ~ ., data = df) |>
update_role(wt, new_role = "case_weights") |>
step_your_step(...)Role and Column Management
role = NA vs role = “predictor”
Problem: Confusion about when to use role = NA.
Solution:
role = NA: Modify-in-place steps (preserve existing role)
step_center <- function(recipe, ..., role = NA, ...)role = “predictor”: Create-new-columns steps (assign role to new cols)
step_dummy <- function(recipe, ..., role = "predictor", ...)
Column Role Changed Unexpectedly
Problem: Column role changed after step.
Check:
rec <- recipe(mpg ~ ., data = mtcars) |>
step_your_step(disp)
prepped <- prep(rec)
# Check role
prepped$var_info |>
dplyr::filter(variable == "disp") |>
dplyr::pull(role)Solution:
Use
role = NAfor modify-in-place stepsDocument role behavior clearly
Original Columns Not Removed
Problem: Original columns remain after create-new-columns step.
Cause: Not using remove_original_cols().
Solution:
bake.step_dummy <- function(object, new_data, ...) {
# Create new dummy columns
# ...
# Remove originals (unless keep_original_cols = TRUE)
new_data <- remove_original_cols(
new_data,
object,
names(object$levels) # Original column names
)
new_data
}Skip Parameter Issues
Step Applied to Test Data When It Shouldn’t Be
Problem: Row-operation step applied to test data.
Example:
rec <- recipe(~ ., data = mtcars) |>
step_filter(mpg > 20) # Should only filter training
prepped <- prep(rec, training = mtcars)
baked <- bake(prepped, new_data = test_data)
# Test data was filtered! (Probably wrong)Solution: Use skip = TRUE for row-operation steps:
step_filter <- function(recipe, ..., skip = TRUE, ...) {
# ...
}
# In bake()
bake.step_filter <- function(object, new_data, ...) {
if (object$skip) {
return(new_data)
}
# Apply filter only if skip = FALSE
# ...
}Skip Parameter Ignored
Problem: skip = TRUE but step still applied in bake().
Cause: Not checking skip in bake().
Solution:
bake.step_your_step <- function(object, new_data, ...) {
# Always check skip first
if (object$skip) {
return(new_data)
}
# Apply transformation
# ...
}Integration Issues
Step Doesn’t Work with tune
Problem: Step works alone but fails in tune_grid().
Diagnosis:
library(tune)
library(workflows)
# Test in workflow
wf <- workflow() |>
add_recipe(
recipe(mpg ~ ., data = mtcars) |>
step_your_step(...)
) |>
add_model(linear_reg())
# Test with resamples
result <- fit_resamples(wf, resamples = vfold_cv(mtcars))Common Issues:
Step fails on resampled data
Some folds might have different characteristics
Add validation for edge cases
Step modifies outcome accidentally
Check that selectors don’t include outcome
Use
all_predictors()notall_numeric()
Step is too slow
Optimize calculations
Avoid unnecessary copies
Recipe Doesn’t Work After Adding Step
Problem: Recipe fails after adding your step.
Diagnosis:
# Test incrementally
rec1 <- recipe(mpg ~ ., data = mtcars) |>
step_normalize(all_numeric_predictors())
prep(rec1) # Works
rec2 <- rec1 |>
step_your_step(disp)
prep(rec2) # FailsCommon causes:
Your step expects data in certain format
Your step modifies data unexpectedly
Your step doesn’t preserve required columns
Grouped Data Issues
Problem: Step fails with grouped data frames.
Solution: Make sure step preserves grouping:
bake.step_center <- function(object, new_data, ...) {
# Save groups
groups <- dplyr::group_vars(new_data)
# Do transformation
# ...
# Restore groups
if (length(groups) > 0) {
new_data <- dplyr::group_by(new_data, !!!rlang::syms(groups))
}
new_data
}Package Check Issues
Check Fails: Example Errors
Problem:
checking examples ... ERROR
Error in step_your_step(...): object not foundCauses:
Example uses unexported function
- Make sure all functions in examples are exported
Example uses undeclared dependency
Add package to Suggests:
usethis::use_package("pkg", "Suggests")Use
@examplesIfif package optional
Example too complex
Simplify
Use
\donttest{}for slow examples
Check Fails: Tests Too Slow
Problem: Test suite exceeds time limit.
Solutions:
# Skip slow tests on CRAN
test_that("slow operation works", {
skip_on_cran()
# Slow test here
})
# Use smaller datasets
test_data <- mtcars[1:10, ] # Instead of full mtcars
# Reduce iterations
for (i in 1:10) { # Instead of 1:1000
# ...
}Check Fails: Undefined Global Variables
Problem:
checking R code for possible problems ... NOTE
step_center: no visible binding for global variable 'disp'Cause: Using bare column names without proper NSE handling.
Solution: Use .data pronoun:
# In dplyr operations
data |> dplyr::mutate(new = .data$disp * 2)
# Or declare global variables (less preferred)
utils::globalVariables(c("disp", "hp"))Performance Issues
Step is Slow
Problem: Step takes too long on large datasets.
Diagnosis:
# Profile
profvis::profvis({
prep(rec, training = large_data)
})Common bottlenecks:
Non-vectorized operations
# Slow for (i in seq_len(nrow(data))) { data[i, col] <- transform(data[i, col]) } # Fast data[[col]] <- transform(data[[col]])Repeated column selections
# Slow - selects each time for (col in cols) { new_data <- new_data |> dplyr::mutate(...) } # Fast - vectorized for (col in cols) { new_data[[col]] <- transform(new_data[[col]]) }Inefficient aggregations
# Use built-in functions colMeans(data) # Avoid manual vapply(data, mean, numeric(1))
Git and PR Issues
Merge Conflicts in NAMESPACE
Problem: NAMESPACE has conflicts.
Solution:
# Don't edit NAMESPACE manually
# Instead, resolve R code conflicts and regenerate
git checkout your-branch
# Fix R code conflicts
devtools::document() # Regenerates NAMESPACE
git add NAMESPACE
git commitPR Build Fails
Common failures:
R CMD check errors
Run locally:
devtools::check()Fix all errors, warnings, notes
Test failures
Run:
devtools::test()Check that tests pass locally first
Code style issues
Run:
styler::style_pkg()Run:
lintr::lint_package()
Common Review Feedback
“Add tests for selectors”
Requested tests:
test_that("step works with all_numeric()", { ... })
test_that("step works with all_numeric_predictors()", { ... })
test_that("step works with manual selection", { ... })
test_that("step works with has_role()", { ... })“Add case weight tests”
test_that("step respects case weights", {
# Test with and without weights
# Results should differ
})“Use internal helper”
Reviewer suggests existing internal function:
Check if it exists
Review its implementation
Refactor to use it
“Match style of existing steps”
Look at similar steps
Match their structure
Use same helper functions
Getting Help
Check Existing Issues
Search recipes GitHub: https://github.com/tidymodels/recipes/issues
Study Existing Steps
Look at similar steps:
Normalization:
R/center.R,R/scale.REncoding:
R/dummy.R,R/novel.RFiltering:
R/filter.R
Ask Tidymodels Team
Open GitHub issue
Ask in tidymodels forums
Tag maintainers in PR
Next Steps
Review Testing Patterns (Source) for testing guidance
Check Best Practices (Source) for coding standards
See Extension Troubleshooting for general R package issues