Troubleshooting Parsnip Source Development
Common issues and solutions when contributing to the parsnip package source code.
Registration Issues
Issue: Engine Not Found
Symptom:
linear_reg() |> set_engine("myengine")
#> Error: Engine 'myengine' is not registered for linear_regCauses: 1. Forgot to register engine with set_model_engine() 2. Registration code not executed 3. Typo in engine name
Solution:
# Ensure this runs before use
set_model_engine(
model = "linear_reg",
mode = "regression",
eng = "myengine"
)
# Check registration
parsnip::show_engines("linear_reg")Issue: Mode Not Available
Symptom:
linear_reg() |> set_mode("classification")
#> Error: linear_reg can only be used for regressionCauses: 1. Model only supports one mode 2. Mode not registered with set_model_mode()
Solution:
# For single-mode models, this is expected
# Don't try to change the mode
# For multi-mode models, ensure mode is registered
set_model_mode(
model = "boost_tree",
mode = "classification"
)Issue: Prediction Type Not Available
Symptom:
predict(fit, data, type = "conf_int")
#> Error: `type = 'conf_int'` is not availableCauses: 1. Forgot to register prediction type with set_pred() 2. Engine doesn’t support this prediction type 3. Wrong mode for prediction type
Solution:
# Register the prediction type
set_pred(
model = "linear_reg",
eng = "lm",
mode = "regression",
type = "conf_int",
value = list(...)
)
# Check available prediction types
# Should be documented in model help fileFit Method Issues
Issue: Formula Not Passed Correctly
Symptom:
fit(spec, mpg ~ hp, data = mtcars)
#> Error: object 'mpg' not foundCauses: 1. Wrong interface type 2. Formula being evaluated too early 3. Missing protect = c("formula", "data")
Solution:
set_fit(
...,
value = list(
interface = "formula", # Correct interface
protect = c("formula", "data"), # Protect arguments
func = c(pkg = "stats", fun = "lm"),
...
)
)Issue: Matrix Conversion Fails
Symptom:
fit(spec, y ~ x, data = data)
#> Error: (list) object cannot be coerced to type 'double'Causes: 1. Data contains non-numeric columns that can’t be converted 2. Factor encoding failed 3. Missing data not handled
Solution:
# Check data types before conversion
pre = function(data, object) {
# Remove non-numeric columns engine can't handle
numeric_cols <- sapply(data, is.numeric)
data[, numeric_cols, drop = FALSE]
}
# Or handle factors explicitly
pre = function(data, object) {
# Convert factors to indicators
model.matrix(~ . - 1, data = data)
}Issue: Arguments Not Translated
Symptom:
linear_reg(penalty = 0.1) |> set_engine("glmnet") |> fit(...)
# Warning: unused argument (penalty = 0.1)Causes: 1. Missing set_model_arg() registration 2. Wrong argument name in translation
Solution:
# Register argument translation
set_model_arg(
model = "linear_reg",
eng = "glmnet",
parsnip = "penalty", # parsnip name
original = "lambda", # glmnet name
func = list(pkg = "dials", fun = "penalty"),
has_submodel = TRUE
)Predict Method Issues
Issue: Wrong Column Names
Symptom:
predict(fit, data)
#> prediction
#> 1 21.4Should have .pred column name.
Causes: 1. Engine returns data frame with wrong names 2. Missing post function to standardize
Solution:
set_pred(
...,
value = list(
post = function(results, object) {
tibble::tibble(.pred = as.numeric(results))
},
...
)
)Issue: Predictions Wrong Type
Symptom:
predict(fit, data, type = "class")
#> .pred_class
#> 1 "A" # character, should be factorCauses: 1. Engine returns character instead of factor 2. Missing type conversion in post
Solution:
post = function(results, object) {
tibble::tibble(
.pred_class = factor(results, levels = object$lvl)
)
}Issue: Predictions Wrong Dimensions
Symptom:
predict(fit, data[1:5, ])
#> .pred
#> 1 21.4 # Only 1 row, should be 5Causes: 1. Engine returns aggregated results 2. Prediction function not working correctly
Solution:
# Debug the prediction function
set_pred(
...,
value = list(
post = function(results, object) {
# Check dimensions
if (length(results) != nrow(new_data)) {
rlang::abort(
"Prediction length doesn't match data rows",
"i" = sprintf("Got %d predictions for %d rows",
length(results), nrow(new_data))
)
}
tibble::tibble(.pred = results)
},
...
)
)Issue: Probability Columns Don’t Sum to 1
Symptom:
predict(fit, data, type = "prob")
#> .pred_A .pred_B
#> 1 0.8 0.3 # Sum is 1.1, not 1.0Causes: 1. Engine returns unnormalized probabilities 2. Conversion error in post
Solution:
post = function(results, object) {
# Normalize probabilities
row_sums <- rowSums(results)
results <- results / row_sums
# Convert to tibble with correct names
results <- as.data.frame(results)
names(results) <- paste0(".pred_", names(results))
tibble::as_tibble(results)
}Encoding Issues
Issue: Too Many Dummy Variables
Symptom:
# 3-level factor creates 3 dummy columns instead of 2
fit <- fit(spec, y ~ x, data = data)
# Model matrix has columns: x_A, x_B, x_CCauses: 1. Using one-hot encoding instead of traditional 2. Wrong predictor_indicators setting
Solution:
set_encoding(
...,
options = list(
predictor_indicators = "traditional" # n-1 encoding
)
)Issue: Intercept Handled Twice
Symptom:
# Singular matrix error or perfect collinearity
fit <- fit(spec, y ~ x, data = data)
#> Error: system is computationally singularCauses: 1. Both parsnip and engine adding intercept 2. Engine expects no intercept in matrix
Solution:
set_encoding(
...,
options = list(
compute_intercept = FALSE, # Don't add to matrix
remove_intercept = TRUE # Remove from formula
)
)Issue: Factor Levels Mismatch
Symptom:
predict(fit, new_data)
#> Error: factor x has new levels not in training dataCauses: 1. New data has different levels than training data 2. Levels not properly preserved
Solution:
# In prediction, check and fix levels
set_pred(
...,
value = list(
pre = function(new_data, object) {
# Ensure factor levels match training
for (col in names(new_data)) {
if (is.factor(new_data[[col]])) {
train_levels <- object$lvl[[col]]
if (!is.null(train_levels)) {
new_data[[col]] <- factor(new_data[[col]], levels = train_levels)
}
}
}
new_data
},
...
)
)Package Dependency Issues
Issue: Package Not Available
Symptom:
linear_reg() |> set_engine("glmnet")
#> Error: package 'glmnet' is required but not installedCauses: 1. Missing set_dependency() call 2. Package not in Imports or Suggests
Solution:
# Register dependency
set_dependency(
model = "linear_reg",
eng = "glmnet",
pkg = "glmnet",
mode = "regression"
)
# Add to DESCRIPTION
# Suggests: glmnetIssue: Function Not Found
Symptom:
fit(spec, y ~ x, data = data)
#> Error: could not find function "glmnet"Causes: 1. Wrong package name in func 2. Package not loaded 3. Function not exported
Solution:
# Use full package specification
set_fit(
...,
value = list(
func = c(pkg = "glmnet", fun = "glmnet"), # Full path
...
)
)
# For internal functions (source development only)
set_fit(
...,
value = list(
func = c(pkg = "parsnip", fun = "internal_helper"),
...
)
)Multi-Mode Issues
Issue: Wrong Mode Selected
Symptom:
boost_tree() |> fit(Species ~ ., data = iris)
#> Error: Please set the modeCauses: 1. Model supports multiple modes but none specified 2. Default mode is “unknown”
Solution:
# Explicitly set mode
spec <- boost_tree() |> set_mode("classification")
# Or set mode in constructor
spec <- boost_tree(mode = "classification")Issue: Mode-Specific Behavior Not Working
Symptom:
# Both modes using same defaults when they should differ
fit_reg <- fit(boost_tree(mode = "regression"), y ~ x, data = data)
fit_cls <- fit(boost_tree(mode = "classification"), y ~ x, data = data)
# Both use same objective functionCauses: 1. Forgot to register mode-specific settings 2. set_fit() defaults not mode-specific
Solution:
# Register different defaults for each mode
set_fit(
model = "boost_tree",
eng = "xgboost",
mode = "regression",
value = list(
defaults = list(objective = "reg:squarederror") # Regression
)
)
set_fit(
model = "boost_tree",
eng = "xgboost",
mode = "classification",
value = list(
defaults = list(objective = "multi:softprob") # Classification
)
)Testing Issues
Issue: Tests Fail When Package Not Installed
Symptom:
test_that("glmnet engine works", {
spec <- linear_reg() |> set_engine("glmnet")
fit <- fit(spec, mpg ~ ., data = mtcars)
# Error: package 'glmnet' is required
})Solution:
# Skip test if package not available
test_that("glmnet engine works", {
skip_if_not_installed("glmnet")
spec <- linear_reg() |> set_engine("glmnet")
fit <- fit(spec, mpg ~ ., data = mtcars)
expect_s3_class(fit, "model_fit")
})Issue: Formula and XY Tests Give Different Results
Symptom:
test_that("formula and xy equivalent", {
fit_formula <- fit(spec, y ~ x, data = data)
fit_xy <- fit_xy(spec, x = data[, "x"], y = data$y)
pred_formula <- predict(fit_formula, data)
pred_xy <- predict(fit_xy, data)
expect_equal(pred_formula, pred_xy)
# Fails: predictions differ
})Causes: 1. Formula includes intercept, xy doesn’t 2. Factor encoding differs 3. Different data preprocessing
Solution:
# Use appropriate tolerance
expect_equal(pred_formula, pred_xy, tolerance = 1e-5)
# Or investigate why they differ
test_that("understand difference", {
fit_formula <- fit(spec, y ~ x, data = data)
fit_xy <- fit_xy(spec, x = data[, "x", drop = FALSE], y = data$y)
# Check model matrices
mm_formula <- model.matrix(y ~ x, data = data)
mm_xy <- as.matrix(data[, "x", drop = FALSE])
# Should they be the same?
print(head(mm_formula))
print(head(mm_xy))
})Documentation Issues
Issue: Examples Don’t Work
Symptom:
# In documentation
#' @examples
#' linear_reg() |> set_engine("glmnet")
# Error when running examples: package not availableSolution:
# Wrap in \dontrun{} if package may not be installed
#' @examples
#' # Default engine always works
#' linear_reg() |> set_engine("lm")
#'
#' # Optional engine
#' \dontrun{
#' linear_reg() |> set_engine("glmnet")
#' }
# Or use \donttest{} for CRAN
#' @examples
#' \donttest{
#' if (requireNamespace("glmnet", quietly = TRUE)) {
#' linear_reg() |> set_engine("glmnet")
#' }
#' }Issue: Unclear Engine Requirements
Problem: Users don’t know what packages are needed.
Solution:
# Document clearly
#' @details
#' ## Engines
#'
#' **lm** (default):
#' - No additional packages required
#' - Uses [stats::lm()]
#'
#' **glmnet**:
#' - Requires: glmnet package
#' - Uses elastic net regularization
#' - Set penalty and mixture arguments
#'
#' **keras**:
#' - Requires: keras and tensorflow packages
#' - Neural network implementation
#' - Set epochs via set_engine()Performance Issues
Issue: Slow Data Conversion
Symptom: Fitting takes much longer than expected.
Causes: 1. Unnecessary data conversions 2. Repeated matrix conversions 3. Inefficient factor encoding
Solution:
# Cache conversions if possible
set_fit(
...,
value = list(
pre = function(data, object) {
# Do conversion once, store result
if (is.null(object$converted_data)) {
object$converted_data <- expensive_conversion(data)
}
object$converted_data
},
...
)
)
# Or optimize conversion
pre = function(data, object) {
# Use faster methods
data.matrix(data) # Instead of as.matrix()
}Issue: Memory Usage Too High
Symptom: R runs out of memory during fit or predict.
Causes: 1. Copying large datasets unnecessarily 2. Creating intermediate objects 3. Not releasing memory
Solution:
# Avoid unnecessary copies
post = function(results, object) {
# Don't create intermediate objects
tibble::tibble(.pred = results) # Direct conversion
# Instead of:
# temp <- as.vector(results)
# temp2 <- as.numeric(temp)
# tibble::tibble(.pred = temp2)
}
# Clean up after conversion
post = function(results, object) {
result_tibble <- tibble::tibble(.pred = results)
rm(results) # Release memory
gc() # Force garbage collection
result_tibble
}Version Compatibility Issues
Issue: Function Signature Changed
Symptom:
fit(spec, y ~ x, data = data)
# Error: argument "newarg" is missingPackage updated and added required argument.
Solution:
# Check version and use appropriate signature
set_fit(
...,
value = list(
pre = function(data, object) {
if (packageVersion("package") >= "2.0") {
# New signature
list(data = data, newarg = default_value)
} else {
# Old signature
list(data = data)
}
},
...
)
)Issue: Deprecated Function
Symptom:
# Warning: 'old_function' is deprecated, use 'new_function'Solution:
# Update to new function
set_fit(
...,
value = list(
func = c(pkg = "package", fun = "new_function"), # Updated
...
)
)
# Or support both versions
set_fit(
...,
value = list(
pre = function(data, object) {
# Choose function based on availability
if (exists("new_function", where = "package:package")) {
object$func_to_use <- "new_function"
} else {
object$func_to_use <- "old_function"
}
data
},
...
)
)Debugging Strategies
Enable Verbose Output
# In registration
set_fit(
...,
value = list(
pre = function(data, object) {
message("Pre-processing data...")
message("Data dimensions: ", nrow(data), " x ", ncol(data))
data
},
...
)
)Inspect Intermediate Results
# Add debugging post-processing
post = function(results, object) {
message("Raw results class: ", class(results))
message("Raw results structure: ", str(results))
formatted <- tibble::tibble(.pred = results)
message("Formatted results: ", str(formatted))
formatted
}Use Browser
# Add interactive debugging
set_pred(
...,
value = list(
post = function(results, object) {
browser() # Stop here for inspection
tibble::tibble(.pred = results)
},
...
)
)Test Registration Directly
# Check registration exists
env <- parsnip:::get_model_env()
ls(env) # See all registered models
# Get specific registration
fit_info <- parsnip:::get_from_env("linear_reg_fit")
pred_info <- parsnip:::get_from_env("linear_reg_predict")Getting Help
Check Existing Implementations
Look at similar models in parsnip source:
# How did linear_reg implement glmnet?
# R/linear_reg_data.R - find glmnet section
# How did boost_tree handle multi-mode?
# R/boost_tree_data.R - see both modesRun Parsnip Tests
# Run specific test file
testthat::test_file("tests/testthat/test-linear_reg.R")
# Run all tests
devtools::test()Check Package Status
# Full package check
devtools::check()
# Quick check
devtools::load_all()
devtools::test()Summary
Common issue categories:
- Registration - Missing set_* calls, typos, wrong names
- Fit method - Interface type, argument translation, data conversion
- Predict method - Column names, data types, dimensions
- Encoding - Dummy variables, intercept, factor levels
- Dependencies - Missing packages, wrong function names
- Multi-mode - Mode selection, mode-specific behavior
- Testing - Package availability, interface equivalence
- Documentation - Example failures, unclear requirements
Debugging checklist:
When stuck: 1. Look at similar models in parsnip source 2. Run existing tests to see patterns 3. Use verbose output and browser() 4. Ask on GitHub issues or tidymodels Slack