Recipe Helper Functions Reference

The recipes package provides helper functions to standardize common operations in recipe steps. Use these instead of implementing your own versions.

Overview

Function	Purpose	Typical Usage
`recipes_eval_select()`	Convert quosures to column names	`prep()` method
`check_type()`	Validate column types	`prep()` method
`check_new_data()`	Verify columns exist in new data	`bake()` method
`check_name()`	Prevent column name conflicts	When creating new columns
`get_case_weights()`	Extract case weights from info	`prep()` method
`are_weights_used()`	Check if weights should be used	`prep()` method
`rand_id()`	Generate unique step IDs	Step constructor
`print_step()`	Standard step printing	`print()` method
`remove_original_cols()`	Handle keep_original_cols	`bake()` method
`sel2char()`	Convert selections to strings	`tidy()` method
`is_trained()`	Check training status	`tidy()` method
`add_step()`	Add step to recipe	Step constructor
`yardstick_any_missing()`	Check for NA values	Both `prep()` and `bake()`
`yardstick_remove_missing()`	Remove rows with NAs	Both `prep()` and `bake()`

Variable Selection and Resolution

recipes_eval_select()

Purpose: Resolves tidyselect expressions to actual column names.

When to use: In prep() to convert user’s variable selections (like all_numeric()) to actual column names.

Signature:

recipes_eval_select(quos, data, info)

Arguments: - quos: Quosures from rlang::enquos(...) - data: Training data frame - info: Recipe info object (from prep() parameter)

Example:

prep.step_yourname <- function(x, training, info = NULL, ...) {
  # Convert user selection to actual column names
  col_names <- recipes::recipes_eval_select(x$terms, training, info)
  # col_names is now a character vector like c("mpg", "disp", "hp")

  # ... rest of prep
}

Returns: Character vector of column names

sel2char()

Purpose: Converts tidyselect expressions to human-readable strings.

When to use: In tidy() method for untrained steps to show what will be selected.

Example:

tidy.step_yourname <- function(x, ...) {
  if (recipes::is_trained(x)) {
    # Use actual column names from trained step
    res <- tibble::tibble(terms = names(x$params))
  } else {
    # Convert selection to readable strings
    term_names <- recipes::sel2char(x$terms)
    res <- tibble::tibble(terms = term_names)
  }
  res$id <- x$id
  res
}

Returns: Character vector of selection names (e.g., "all_numeric()", "disp")

Validation Functions

check_type()

Purpose: Validates that columns are of expected types.

When to use: In prep() after resolving column names, before computing parameters.

Signature:

check_type(dat, types = NULL)

Arguments: - dat: Data frame subset with columns to check - types: Character vector of allowed types

Example:

prep.step_yourname <- function(x, training, info = NULL, ...) {
  col_names <- recipes::recipes_eval_select(x$terms, training, info)

  # Validate columns are numeric or integer
  recipes::check_type(
    training[, col_names],
    types = c("double", "integer")
  )

  # If wrong type, check_type() throws an error
  # ... continue with prep
}

Common type values: - "double": Numeric values - "integer": Integer values - "factor": Factor/categorical - "logical": Boolean - "character": Text

Behavior: Throws error if any column doesn’t match allowed types.

check_new_data()

Purpose: Validates that required columns exist in new data.

When to use: At the start of bake() to ensure columns needed by the step are present.

Signature:

check_new_data(col_names, object, new_data)

Arguments: - col_names: Character vector of required column names - object: The trained step object - new_data: New data frame to validate

Example:

bake.step_yourname <- function(object, new_data, ...) {
  col_names <- object$columns  # or names(object$params)

  # Check columns exist in new_data
  recipes::check_new_data(col_names, object, new_data)

  # If missing columns, check_new_data() throws informative error
  # ... continue with bake
}

Behavior: Throws error with helpful message if columns are missing.

check_name()

Purpose: Checks if proposed column name already exists, prevents conflicts.

When to use: In bake() for create-new-columns steps before adding new columns.

Signature:

check_name(new_names, data, object, newname)

Arguments: - new_names: Character vector of proposed new column names - data: Data frame where columns will be added - object: Step object - newname: Alternative name to suggest if conflict exists

Example:

bake.step_yourname <- function(object, new_data, ...) {
  # Generate new column names
  new_col_names <- paste0(object$columns, "_transformed")

  # Check for conflicts
  new_col_names <- recipes::check_name(
    new_col_names,
    new_data,
    object,
    newname = "transformed"
  )

  # ... create new columns with validated names
}

Returns: Modified names if conflicts exist, original names otherwise.

Case Weights

get_case_weights()

Purpose: Extracts case weight column from recipe info.

When to use: In prep() to get case weights for weighted computations.

Signature:

get_case_weights(info, data)

Arguments: - info: Recipe info object - data: Training data

Example:

prep.step_yourname <- function(x, training, info = NULL, ...) {
  col_names <- recipes::recipes_eval_select(x$terms, training, info)

  # Get case weights if present
  wts <- recipes::get_case_weights(info, training)
  were_weights_used <- recipes::are_weights_used(wts, unsupervised = TRUE)

  if (isFALSE(were_weights_used)) {
    wts <- NULL
  }

  # Use wts in weighted computations if not NULL
  # ... rest of prep
}

Returns: Weight vector or NULL if no weights specified.

are_weights_used()

Purpose: Determines if case weights should be used for this operation.

When to use: After get_case_weights() to decide whether to use them.

Signature:

are_weights_used(wts, unsupervised = FALSE)

Arguments: - wts: Weights from get_case_weights() - unsupervised: Whether this is an unsupervised operation (TRUE for most recipe steps)

Example:

wts <- recipes::get_case_weights(info, training)
were_weights_used <- recipes::are_weights_used(wts, unsupervised = TRUE)

if (isFALSE(were_weights_used)) {
  wts <- NULL
}

# Store the fact that weights were used
step_yourname_new(
  # ... other params,
  case_weights = were_weights_used
)

Returns: Logical indicating whether weights should be used.

Step Construction

add_step()

Purpose: Adds a step to a recipe.

When to use: In your step constructor function.

Signature:

add_step(recipe, step_object)

Example:

step_yourname <- function(recipe, ...) {
  recipes::add_step(
    recipe,
    step_yourname_new(
      terms = rlang::enquos(...),
      # ... other parameters
    )
  )
}

Returns: Updated recipe with step added.

rand_id()

Purpose: Generates unique identifier for a step.

When to use: As default value for id parameter in step constructor.

Signature:

rand_id(prefix)

Example:

step_yourname <- function(
  recipe,
  ...,
  id = recipes::rand_id("yourname")  # Default unique ID
) {
  # ... function body
}

Returns: Character string like "yourname_a7b2c".

Column Operations

remove_original_cols()

Purpose: Removes original columns based on keep_original_cols parameter.

When to use: In bake() for create-new-columns steps, after adding new columns.

Signature:

remove_original_cols(data, object, col_names)

Arguments: - data: Data frame with both original and new columns - object: Trained step object (must have keep_original_cols field) - col_names: Character vector of original column names

Example:

bake.step_yourname <- function(object, new_data, ...) {
  col_names <- object$columns

  # Create new columns
  new_cols <- create_new_columns(new_data[, col_names], object$params)
  new_data <- vctrs::vec_cbind(new_data, new_cols)

  # Remove originals if keep_original_cols = FALSE
  new_data <- recipes::remove_original_cols(new_data, object, col_names)

  new_data
}

Returns: Data frame with original columns removed (if keep_original_cols = FALSE).

Important: Only use for steps with keep_original_cols parameter. The helper correctly handles role preservation and column ordering.

Status and Printing

is_trained()

Purpose: Checks if a step has been trained.

When to use: In tidy() method to decide what to return.

Example:

tidy.step_yourname <- function(x, ...) {
  if (recipes::is_trained(x)) {
    # Return actual learned values
    res <- tibble::tibble(
      terms = names(x$params),
      value = unname(x$params)
    )
  } else {
    # Return placeholders
    term_names <- recipes::sel2char(x$terms)
    res <- tibble::tibble(
      terms = term_names,
      value = rlang::na_dbl
    )
  }
  res$id <- x$id
  res
}

Returns: Logical, TRUE if step has been prepped.

print_step()

Purpose: Provides standardized printing for recipe steps.

When to use: In print() method.

Signature:

print_step(col_names, terms, trained, title, width, case_weights = NULL)

Arguments: - col_names: Resolved column names (if trained) or NULL - terms: Original quosures from step - trained: Whether step is trained - title: Description of operation - width: Maximum width for printing - case_weights: Whether case weights were used

Example:

print.step_yourname <- function(x, width = max(20, options()$width - 30), ...) {
  title <- "Centering for "
  recipes::print_step(
    x$columns,      # NULL if untrained
    x$terms,        # Original selection
    x$trained,      # TRUE/FALSE
    title,
    width,
    case_weights = x$case_weights
  )
  invisible(x)
}

Output:

Centering for disp, hp, ... (3 columns) [trained]

NA Handling

yardstick_any_missing()

Purpose: Checks if any values are NA across multiple vectors.

When to use: When na_rm = FALSE, to decide whether to return NA.

Example:

if (!na_rm) {
  if (yardstick::yardstick_any_missing(truth, estimate, case_weights)) {
    return(NA_real_)
  }
}

Returns: Logical, TRUE if any NA values exist.

yardstick_remove_missing()

Purpose: Removes rows with NA values from multiple vectors.

When to use: When na_rm = TRUE, to filter out missing values.

Example:

if (na_rm) {
  result <- yardstick::yardstick_remove_missing(truth, estimate, case_weights)
  truth <- result$truth
  estimate <- result$estimate
  case_weights <- result$case_weights
}

Returns: List with filtered vectors (maintains alignment).

Best Practices

Always use helpers: Don’t reimplement functionality that helpers provide
Check early: Validate in prep(), trust in bake()
Consistent patterns: Use helpers the same way across all steps
Error messages: Helpers provide consistent, user-friendly errors

Common Patterns

Complete prep() pattern

prep.step_yourname <- function(x, training, info = NULL, ...) {
  # 1. Resolve selections
  col_names <- recipes::recipes_eval_select(x$terms, training, info)

  # 2. Validate types
  recipes::check_type(training[, col_names], types = c("double", "integer"))

  # 3. Get case weights
  wts <- recipes::get_case_weights(info, training)
  were_weights_used <- recipes::are_weights_used(wts, unsupervised = TRUE)
  if (isFALSE(were_weights_used)) {
    wts <- NULL
  }

  # 4. Compute parameters
  params <- compute_params(training[, col_names], wts, x$your_param)

  # 5. Return trained step
  step_yourname_new(
    terms = x$terms,
    trained = TRUE,
    columns = col_names,
    params = params,
    case_weights = were_weights_used,
    # ... other fields
  )
}

Complete bake() pattern

bake.step_yourname <- function(object, new_data, ...) {
  # 1. Get column names
  col_names <- object$columns

  # 2. Validate columns exist
  recipes::check_new_data(col_names, object, new_data)

  # 3. Apply transformation
  for (col in col_names) {
    new_data[[col]] <- apply_transform(new_data[[col]], object$params[[col]])
  }

  # 4. Return modified data
  new_data
}

Internal Helpers (Source Development Only)

When contributing to recipes itself, all the helpers listed above can be used without the recipes:: prefix. They’re internal functions available directly in the package environment.

Additional Internal Helpers

When developing recipes source code, you may also encounter:

Variable selection internals: Functions that support recipes_eval_select()
Type checking internals: Extended validation beyond check_type()
Column name utilities: Functions for managing column names and conflicts
Role management: Functions for assigning and updating column roles

Usage in Source Development

# Extension development (requires recipes:: prefix)
col_names <- recipes::recipes_eval_select(x$terms, training, info)
recipes::check_type(training[, col_names], types = c("double", "integer"))

# Source development (no prefix needed)
col_names <- recipes_eval_select(x$terms, training, info)
check_type(training[, col_names], types = c("double", "integer"))

When to Create New Internal Helpers

If contributing to recipes and you find yourself duplicating logic across multiple steps:

Check existing internals first: Browse R/aaa-*.R and R/utils-*.R files
Consider generalization: Will this helper be useful for other steps?
Document thoroughly: Use @keywords internal and @noRd
Don’t export: Internal helpers should not be in NAMESPACE

See the Source Development Guide for complete patterns and examples.

Next Steps

Understand step architecture: step-architecture.md
Implement modify-in-place steps: modify-in-place-steps.md
Implement create-new-columns steps: create-new-columns-steps.md
Implement row-operation steps: row-operation-steps.md
Add optional methods: optional-methods.md
Review best practices: package-extension-requirements.md#best-practices