Data-Dependent Parameters
Creating parameters with unknown bounds resolved by training data
This guide covers how to create parameters whose ranges depend on dataset characteristics using unknown() and the finalization system.
Understanding unknown()
The Placeholder
unknown() is a special placeholder for parameter bounds that cannot be determined until you see the data:
# Extension pattern
mtry <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range, # Upper bound is unknown
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(mtry = "# Randomly Selected Predictors"),
finalize = dials::get_p
)
}
mtry()
#> # Randomly Selected Predictors (quantitative)
#> Range: [1, ?]
#> ^^^ Question mark indicates unknown boundWhy unknown() Exists
Some parameters depend on dataset properties:
- Number of predictors: Can’t sample more features than exist
- Number of observations: Sample size must be ≤ dataset size
- Number of columns: PCA components ≤ number of variables
Without seeing the data, we can’t set sensible upper bounds.
Where unknown() Appears
Typically in the upper bound of the range:
range = c(1L, unknown()) # Most common: lower fixed, upper unknown
range = c(unknown(), 100L) # Rare: upper fixed, lower unknown
range = c(unknown(), unknown()) # Very rare: both bounds unknownWhen Parameters Need Data-Dependent Ranges
Use unknown() and finalization when:
Number of Predictors Determines Upper Bound
Examples:
mtry(): Randomly selected predictors in random forestsnum_comp(): Number of PCA componentsmax_features: Maximum features to selectnum_terms(): Number of model terms based on predictors
Reason: Can’t select more features than exist in the dataset
# Can't set upper bound without knowing # predictors
mtry <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(mtry = "# Randomly Selected Predictors"),
finalize = dials::get_p # Will set upper = ncol(x)
)
}Number of Observations Affects Range
Examples:
sample_size(): Rows to samplemin_n(): Minimum observations in nodebootstrap_sample(): Bootstrap sample size
Reason: Sample size must be ≤ number of rows
sample_size <- function(range = c(dials::unknown(), dials::unknown())) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = NULL,
label = c(sample_size = "# Observations Sampled"),
finalize = dials::get_n_frac
)
}Complex Data-Dependent Logic
Examples:
num_initial_terms(): MARS terms based on earth package formula- Custom bounds based on multiple data properties
- Heuristic-based range adjustment
Reason: Upper bound follows package-specific or domain-specific rules
num_initial_terms <- function(range = c(1L, dials::unknown())) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = NULL,
label = c(num_initial_terms = "# Initial MARS Terms"),
finalize = get_initial_mars_terms # Custom logic
)
}The Finalization System
Overview
Finalization resolves unknown() bounds using training data:
Parameter with unknown() → finalize(param, data) → Parameter with known bounds
The finalize Argument
When creating a parameter, provide a finalize function:
my_param <- function(range = c(1L, unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(my_param = "My Parameter"),
finalize = get_my_bound # Finalize function
)
}Finalize Function Signature
finalize_function <- function(object, x) {
# object: Parameter object with unknown() bounds
# x: Predictor data (matrix, data frame, or tibble)
#
# Returns: Parameter object with resolved bounds
}Key points:
- Takes parameter object and predictor data
- Examines data properties (ncol, nrow, etc.)
- Updates the parameter’s range
- Returns modified parameter object
Built-in Finalize Functions
dials provides several built-in finalize functions for common cases.
get_p()
Sets upper bound to number of predictors (columns)
# Extension pattern
mtry <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(mtry = "# Randomly Selected Predictors"),
finalize = dials::get_p
)
}
# Usage
param <- mtry()
param
#> Range: [1, ?]
finalized <- dials::finalize(param, mtcars[, -1]) # 10 predictors
finalized
#> Range: [1, 10]Use for: Parameters bounded by number of features/predictors
get_n()
Sets upper bound to number of observations (rows)
# Extension pattern
max_samples <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(max_samples = "# Maximum Samples"),
finalize = dials::get_n
)
}
# Usage
param <- max_samples()
finalized <- dials::finalize(param, mtcars[, -1]) # 32 rows
finalized
#> Range: [1, 32]Use for: Parameters bounded by number of observations
get_n_frac()
Sets both bounds as fractions of observations
# Extension pattern
sample_prop <- function(range = c(dials::unknown(), dials::unknown())) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = NULL,
label = c(sample_prop = "# Sampled Observations"),
finalize = dials::get_n_frac
)
}
# Usage
param <- sample_prop()
finalized <- dials::finalize(param, mtcars[, -1]) # 32 rows
finalized
#> Range: [floor(0.1 * 32), 32] = [3, 32]Use for: Sample sizes as proportion of dataset
get_log_p()
Sets upper bound to log of number of predictors
# Extension pattern
sparse_features <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(sparse_features = "# Sparse Features"),
finalize = dials::get_log_p
)
}
# Usage
param <- sparse_features()
large_data <- matrix(rnorm(100 * 100), ncol = 100)
finalized <- dials::finalize(param, large_data)
finalized
#> Range: [1, log(100)] ≈ [1, 5]Use for: Parameters that scale logarithmically with predictors
Creating Custom Finalize Functions
For complex logic, create custom finalize functions using range_get() and range_set().
Pattern
# Extension pattern
custom_finalize <- function(object, x) {
# 1. Calculate new bound(s) based on data
new_upper <- calculate_upper_bound(x)
# 2. Get current range
bounds <- dials::range_get(object)
# 3. Update bound(s)
bounds$upper <- new_upper
# 4. Set new range and return
dials::range_set(object, bounds)
}range_get()
Extract current range from parameter:
param <- mtry()
bounds <- dials::range_get(param)
bounds
#> $lower
#> [1] 1
#>
#> $upper
#> unknown()Returns a list with $lower and $upper.
range_set()
Set new range on parameter:
new_bounds <- list(lower = 1, upper = 10)
updated_param <- dials::range_set(param, new_bounds)
updated_param
#> Range: [1, 10]Takes parameter and list with $lower and $upper, returns updated parameter.
Complete Examples
Example 1: Using Built-in get_p()
Number of PCA components:
# Extension pattern
num_comp <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(num_comp = "# Principal Components"),
finalize = dials::get_p
)
}
# Usage
num_comp()
#> # Principal Components (quantitative)
#> Range: [1, ?]
# Finalize with data
param <- num_comp()
finalized <- dials::finalize(param, mtcars[, -1])
finalized
#> # Principal Components (quantitative)
#> Range: [1, 10]
# Now can generate grid
grid <- dials::grid_regular(finalized, levels = 5)
grid
#> # A tibble: 5 × 1
#> num_comp
#> <int>
#> 1 1
#> 2 3
#> 3 5
#> 4 8
#> 5 10Example 2: Using Built-in get_n()
Maximum observations to use:
# Extension pattern
max_obs <- function(range = c(10L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(max_obs = "# Maximum Observations"),
finalize = dials::get_n
)
}
# Usage
param <- max_obs()
finalized <- dials::finalize(param, mtcars[, -1]) # 32 rows
finalized
#> Range: [10, 32]Example 3: Custom Finalize with Simple Logic
Maximum features to select (80% of predictors):
# Extension pattern
max_features <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(max_features = "# Maximum Features"),
finalize = get_max_features
)
}
get_max_features <- function(object, x) {
# Set upper bound to 80% of predictors
upper_bound <- floor(0.8 * ncol(x))
upper_bound <- max(1L, upper_bound) # At least 1
upper_bound <- as.integer(upper_bound)
# Update range
bounds <- dials::range_get(object)
bounds$upper <- upper_bound
dials::range_set(object, bounds)
}
# Usage
param <- max_features()
finalized <- dials::finalize(param, mtcars[, -1]) # 10 predictors
finalized
#> Range: [1, 8] # floor(0.8 * 10) = 8Example 4: Custom Finalize with Complex Logic
MARS initial terms (earth package formula):
# Extension pattern
num_initial_terms <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(num_initial_terms = "# Initial MARS Terms"),
finalize = get_initial_mars_terms
)
}
get_initial_mars_terms <- function(object, x) {
# Earth package formula: min(200, max(20, 2 * ncol(x))) + 1
p <- ncol(x)
upper_bound <- min(200, max(20, 2 * p)) + 1
upper_bound <- as.integer(upper_bound)
# Update range
bounds <- dials::range_get(object)
bounds$upper <- upper_bound
dials::range_set(object, bounds)
}
# Usage
param <- num_initial_terms()
# Small dataset (10 predictors)
finalized_small <- dials::finalize(param, mtcars[, -1])
finalized_small
#> Range: [1, 41] # min(200, max(20, 2*10)) + 1 = 41
# Large dataset (100 predictors)
large_data <- matrix(rnorm(100 * 100), ncol = 100)
finalized_large <- dials::finalize(param, large_data)
finalized_large
#> Range: [1, 201] # min(200, max(20, 2*100)) + 1 = 201Example 5: Updating Both Bounds
Neighbor range based on data size:
# Extension pattern
neighbors_adaptive <- function(range = c(dials::unknown(), dials::unknown())) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = NULL,
label = c(neighbors_adaptive = "# Adaptive Neighbors"),
finalize = get_adaptive_neighbors
)
}
get_adaptive_neighbors <- function(object, x) {
n <- nrow(x)
# Set lower bound: at least 3, or 1% of data
lower_bound <- max(3L, floor(0.01 * n))
# Set upper bound: at most 50, or 10% of data
upper_bound <- min(50L, floor(0.10 * n))
# Ensure lower < upper
if (lower_bound >= upper_bound) {
lower_bound <- max(1L, upper_bound - 1L)
}
# Update range
bounds <- list(
lower = as.integer(lower_bound),
upper = as.integer(upper_bound)
)
dials::range_set(object, bounds)
}
# Usage
param <- neighbors_adaptive()
# Small dataset (32 rows)
finalized_small <- dials::finalize(param, mtcars[, -1])
finalized_small
#> Range: [3, 3] # max(3, floor(0.01*32)) to min(50, floor(0.10*32))
# Large dataset (1000 rows)
large_data <- matrix(rnorm(1000 * 10), ncol = 10)
finalized_large <- dials::finalize(param, large_data)
finalized_large
#> Range: [10, 50] # floor(0.01*1000)=10 to min(50, floor(0.10*1000))=50How Finalization Works in tune Workflows
Manual Finalization
Explicitly finalize before tuning:
# Define parameter with unknown bound
mtry_param <- mtry()
# Finalize with training data
mtry_finalized <- dials::finalize(mtry_param, train_x)
# Use in grid
grid <- dials::grid_regular(mtry_finalized, levels = 5)
# Tune
tune::tune_grid(model_spec, resamples, grid = grid)Automatic Finalization in tune
The tune package automatically finalizes parameters during tuning:
# Model with tunable parameter
rf_spec <- parsnip::rand_forest(mtry = tune::tune()) |>
parsnip::set_engine("ranger") |>
parsnip::set_mode("regression")
# Create workflow
wf <- workflows::workflow() |>
workflows::add_model(rf_spec) |>
workflows::add_formula(mpg ~ .)
# tune_grid automatically finalizes mtry using training data
results <- tune::tune_grid(
wf,
resamples = vfold_cv(mtcars),
grid = 10 # Grid will use finalized mtry
)Workflow with Finalization
# Extract parameter set
params <- workflows::extract_parameter_set_dials(wf)
params
#> Collection of 1 parameters for tuning
#> id parameter type object class
#> mtry mtry nparam[?]
#> Model parameters needing finalization: mtry
# Finalize parameters
params_finalized <- dials::finalize(params, mtcars[, -1])
params_finalized
#> Collection of 1 parameters for tuning
#> id parameter type object class
#> mtry mtry nparam[+]
# Generate grid with finalized parameters
grid <- dials::grid_regular(params_finalized, levels = 5)Extension vs Source Patterns
Extension Development
Use dials:: prefix throughout:
# Parameter definition
mtry <- function(range = c(1L, dials::unknown()), trans = NULL) {
dials::new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(mtry = "# Randomly Selected Predictors"),
finalize = dials::get_p # Built-in finalize
)
}
# Custom finalize function
custom_finalize <- function(object, x) {
upper_bound <- calculate_bound(x)
bounds <- dials::range_get(object)
bounds$upper <- upper_bound
dials::range_set(object, bounds)
}
# Usage
dials::finalize(param, data)Source Development
No dials:: prefix needed:
# Parameter definition
mtry <- function(range = c(1L, unknown()), trans = NULL) {
new_quant_param(
type = "integer",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(mtry = "# Randomly Selected Predictors"),
finalize = get_p # Built-in finalize
)
}
# Custom finalize function
custom_finalize <- function(object, x) {
upper_bound <- calculate_bound(x)
bounds <- range_get(object)
bounds$upper <- upper_bound
range_set(object, bounds)
}
# Usage
finalize(param, data)Testing Data-Dependent Parameters
Essential Tests
- Parameter with unknown bound: Object created correctly
- Finalization works:
finalize()resolves unknown bounds - Bounds are sensible: Finalized range makes sense for data
- Grid generation: Finalized parameter works with grid functions
- Edge cases: Small datasets, single column, etc.
Example Test Suite
# tests/testthat/test-my-data-dependent-param.R
test_that("my_param creates parameter with unknown bound", {
param <- my_param()
expect_s3_class(param, "quant_param")
expect_equal(param$range$lower, 1L)
expect_s3_class(param$range$upper, "unknown")
})
test_that("my_param finalizes with data", {
param <- my_param()
finalized <- dials::finalize(param, mtcars[, -1])
expect_s3_class(finalized, "quant_param")
expect_type(finalized$range$upper, "integer")
expect_false(inherits(finalized$range$upper, "unknown"))
})
test_that("my_param finalized range is sensible", {
param <- my_param()
finalized <- dials::finalize(param, mtcars[, -1])
# Upper bound should be number of predictors
expect_equal(finalized$range$upper, ncol(mtcars) - 1)
expect_true(finalized$range$upper >= finalized$range$lower)
})
test_that("finalized my_param works with grid functions", {
param <- my_param()
finalized <- dials::finalize(param, mtcars[, -1])
grid <- dials::grid_regular(finalized, levels = 5)
expect_equal(nrow(grid), 5)
expect_true(all(grid$my_param >= finalized$range$lower))
expect_true(all(grid$my_param <= finalized$range$upper))
})
test_that("my_param handles small datasets", {
small_data <- mtcars[1:5, 1:3] # 5 rows, 2 predictors
param <- my_param()
finalized <- dials::finalize(param, small_data[, -1])
expect_equal(finalized$range$upper, 2L)
expect_true(finalized$range$lower <= finalized$range$upper)
})
test_that("my_param handles single column", {
single_col <- data.frame(x = rnorm(10))
param <- my_param()
finalized <- dials::finalize(param, single_col)
expect_equal(finalized$range$upper, 1L)
})Best Practices
Use built-in finalize functions when possible:
get_p(),get_n()cover most casesDocument finalization logic: Explain in
@detailshow bounds are determinedHandle edge cases: Single column, single row, empty data
Ensure lower < upper: Always check bounds are valid after finalization
Use integer types appropriately: Cast to integer with
as.integer()for integer parametersTest with various data sizes: Small, medium, and large datasets
Consider reasonable bounds: Avoid extreme values that don’t make sense
Next Steps
Learn More
- Quantitative parameters: Quantitative Parameters Guide
- Grid integration: Grid Integration Guide
- Parameter system: Parameter System Overview
Implementation Guides
- Extension development: Extension Development Guide
- Source development: Source Development Guide
Last Updated: 2026-03-31