Hyperparameter Tuning

Overview

Hyperparameters are model settings that cannot be learned from data. Tuning searches for values that optimize out-of-sample performance.

Method	Best For	Pros	Cons
Grid search	Few parameters, small grids	Exhaustive, reproducible	Exponential scaling
Racing	Large grids	Efficient, discards poor configs early	May miss borderline good configs
Bayesian optimization	Expensive models, many parameters	Smart exploration	Overhead for simple problems

For tabular data, grid search and racing are optimized and should be the default.

Marking Parameters for Tuning

In tidymodels, use tune() as a placeholder for parameters to optimize.

tidymodels

library(tidymodels)

# Model with tunable parameters
model_spec <- rand_forest(
 mtry = tune(),
 min_n = tune(),
 trees = 500
) |>
 set_engine("ranger") |>
 set_mode("classification")

# Recipe with tunable parameters
recipe_spec <- recipe(outcome ~ ., data = train_data) |>
 step_pca(all_numeric_predictors(), num_comp = tune())

Grid Search

Evaluates all combinations of parameter values in a predefined grid.

When to use: Almost all cases.

Considerations: - Space-filling designs cover parameter space more efficiently - Typical grid sizes: 10-50 configurations

tidymodels

Space-filling grid (recommended):

We can make a specific grid from a parameter set object:

grid <- grid_space_filling(params, size = 25)

However, it is best to simply provide an integer to the grid argument of the tuning functions (see below).

Running the grid search:

resamples <- vfold_cv(train_data, v = 10, strata = outcome)

tune_results <- tune_grid(
 wf,
 resamples = resamples,
 grid = 20,
 metrics = metric_set(roc_auc, accuracy)
)

Racing Methods

Starts with all configurations, then eliminates poor performers after initial resamples. Efficient for large and/or wide grids.

When to use: Wide grids; want efficiency without giving up grid search’s coverage.

Considerations: - Requires enough resamples for statistical comparison (≥5 resamples) - Can’t use with validation sets (only one assessment—no racing possible) - ANOVA racing is most common; simulated annealing racing also available

tidymodels

library(finetune)

# ANOVA racing - eliminates configs significantly worse than best

# Set seed first

tune_results <- tune_race_anova(
 wf,
 resamples = resamples,
 grid = 20,
 metrics = metric_set(roc_auc)
)

Initial Model Suggestions

Unless the user is interested in a specific model, we suggest starting with two disparate models:

A regularized model that is linear in its parameters, such as linear regression, logistic regression, or multinomial regression.
A boosted tree that uses early stopping.

Generally, do not suggest models that are similar to ones already considered. For example, do not suggest random forest after a different tree ensemble has been evaluated.

Linear Models

Propose a simple model early in the process. Use a recipe that includes any existing feature engineering and also standardize the predictors after a zero-variance filter.

tidymodels

For example, for a regression data set where we have observed that predictor x1 has a nonlinear relationship with the outcome:

library(tidymodels)

# Model with tunable parameters
glmnet_spec <- linear_reg(penalty = tune(), mixture = tune()) |>
 set_engine("glmnet")

# Recipe 
spline_rec <- recipe(outcome ~ ., data = train_data) |>
 step_spline_natural(x1, deg_free = tune()) |> 
 step_dummy(all_factor_predictors()) |> 
 step_zv(all_predictors()) |> 
 step_normalize(all_predictors())
 
glmnet_wflow <- workflow(spline_rec, model_spec)

Boosting

Also propose a boosting model early in the process. Make sure that the number of trees is set to a specific value and the argument stop_iter = 5 is used.

For example:

library(tidymodels)

# Model with tunable parameters
bst_spec <- boosted_tree(
  trees = 1000,
  learn_rate = tune(),
  mtry = tune(),
  min_n = tune(),
  stop_iter = tune()
) |> 
 set_mode("regression")

# Recipe 
indicator_rec <- recipe(outcome ~ ., data = train_data) |>
 step_dummy(all_factor_predictors(), one_hot = TRUE)
 
bst_wflow <- workflow(indicator_rec, bst_spec)

tidymodels

Working with Tuning Results

Viewing results

# Summary of all configurations
collect_metrics(tune_results)

# Best configuration by metric
show_best(tune_results, metric = "roc_auc", n = 5)

# Single best
select_best(tune_results, metric = "roc_auc")

Visualizing tuning

# Performance vs parameters
autoplot(tune_results)

# Custom visualization
tune_results |>
 collect_metrics() |>
 filter(.metric == "roc_auc") |>
 ggplot(aes(x = mtry, y = mean, color = factor(min_n))) +
 geom_point() +
 geom_line()

Finalizing the model

# Select best parameters
best_params <- select_best(tune_results, metric = "roc_auc")

# Update workflow with best parameters
final_wf <- finalize_workflow(wf, best_params)

# Fit to full training set
final_fit <- fit(final_wf, data = train_data)

If the resample function was used to make the initial split, there is a simpler API:

# Select best parameters
best_params <- select_best(tune_results, metric = "roc_auc")

# Update workflow with best parameters
final_wf <- finalize_workflow(wf, best_params)

# Fit to full training set
final_fit <- last_fit(init_split, final_wf)

Parameter Ranges

tidymodels provides sensible defaults via the dials package. Customize when needed.

# View default range
mtry()

# Customize range (mtry depends on number of predictors)
params <- extract_parameter_set_dials(wf) |>
 update(mtry = mtry(range = c(2, 20)))

# Finalize data-dependent parameters
params <- params |>
 finalize(train_data)

Parallel Processing

Tuning is embarrassingly parallel—each configuration can run independently.

Before proposing potentially long-running computations, like resampling or model fitting, first use parallel::detectCores() to determine the maximum number of cores available, then ask the user if they would like to use parallel processing and, if so, how many cores you are allowed to use. Keep using the extra cores throughout the work unless the user asks you to stop.

When computing statistics over a large number of columns, use the future package to parallelize these computations. Do not use the parallel, mirai, or foreach packages for parallel execution.

When using tidymodels functions, such as tune::fit_resamples() or tune::tune_grid(), ask about parallel processing and use the future package to create local workers.

library(future)
plan("multisession")

# tune_grid, tune_race_anova, etc. will use parallel processing automatically