Roxygen Documentation
Guide to documenting R functions using roxygen2 comments.
Basic Structure
Roxygen comments start with #' and appear directly above the function:
#' Short title (one line)
#'
#' Longer description paragraph that explains what the function does
#' in more detail. Can span multiple lines.
#'
#' @param parameter_name Description of parameter
#' @return Description of what the function returns
#' @export
#' @examples
#' # Example code
#' result <- your_function(data)
your_function <- function(parameter_name) {
# Function body
}Complete Templates
Numeric/Regression Metric
#' Mean Squared Error
#'
#' Calculate the mean squared error between truth and estimate.
#'
#' @family numeric metrics
#'
#' @param data A data frame containing the columns specified by `truth` and
#' `estimate`.
#' @param truth The column identifier for the true results (numeric). This
#' should be an unquoted column name.
#' @param estimate The column identifier for the predicted results (numeric).
#' This should be an unquoted column name.
#' @param na_rm A logical value indicating whether NA values should be stripped
#' before the computation proceeds. Default is `TRUE`.
#' @param case_weights The optional column identifier for case weights. This
#' should be an unquoted column name. Default is `NULL`.
#' @param ... Not currently used.
#'
#' @return
#' A tibble with columns `.metric`, `.estimator`, and `.estimate` and 1 row of
#' values.
#'
#' For grouped data frames, the number of rows returned will be the same as the
#' number of groups.
#'
#' For `mse_vec()`, a single numeric value (or `NA`).
#'
#' @details
#' `mse()` is a metric that should be minimized. The output ranges from 0 to
#' Inf, with 0 indicating perfect predictions.
#'
#' The formula for MSE is:
#'
#' \deqn{\frac{1}{n} \sum_{i=1}^{n} (truth_i - estimate_i)^2}
#'
#' @examples
#' # Create sample data
#' df <- data.frame(
#' truth = c(1, 2, 3, 4, 5),
#' estimate = c(1.1, 2.2, 2.9, 4.1, 5.2)
#' )
#'
#' # Basic usage
#' mse(df, truth, estimate)
#'
#' # Vector interface
#' mse_vec(df$truth, df$estimate)
#'
#' @export
mse <- function(data, ...) {
UseMethod("mse")
}Class/Classification Metric
#' Accuracy
#'
#' Calculate the accuracy of predictions, the proportion of correct predictions.
#'
#' @family class metrics
#'
#' @param data A data frame containing the columns specified by `truth` and
#' `estimate`.
#' @param truth The column identifier for the true class results (factor). This
#' should be an unquoted column name.
#' @param estimate The column identifier for the predicted class results
#' (factor). This should be an unquoted column name.
#' @param estimator One of "binary", "macro", "macro_weighted", or "micro" to
#' specify the type of averaging to be done. Default is `NULL` which
#' automatically selects based on the number of classes.
#' @param na_rm A logical value indicating whether NA values should be stripped
#' before the computation proceeds. Default is `TRUE`.
#' @param case_weights The optional column identifier for case weights. This
#' should be an unquoted column name. Default is `NULL`.
#' @param event_level A string either "first" or "second" to specify which level
#' of truth to consider as the "event". Default is "first".
#' @param ... Not currently used.
#'
#' @return
#' A tibble with columns `.metric`, `.estimator`, and `.estimate` and 1 row of
#' values.
#'
#' For grouped data frames, the number of rows returned will be the same as the
#' number of groups.
#'
#' For `accuracy_vec()`, a single numeric value (or `NA`).
#'
#' @section Multiclass:
#'
#' Accuracy extends naturally to multiclass scenarios. The estimator type is
#' automatically set to "multiclass" when there are more than 2 classes.
#'
#' @details
#' `accuracy()` is a metric that should be maximized. The output ranges from 0
#' to 1, with 1 indicating perfect predictions.
#'
#' The formula for binary classification is:
#'
#' \deqn{\frac{TP + TN}{TP + TN + FP + FN}}
#'
#' @examples
#' # Binary classification
#' df <- data.frame(
#' truth = factor(c("yes", "yes", "no", "no")),
#' estimate = factor(c("yes", "no", "yes", "no"))
#' )
#'
#' accuracy(df, truth, estimate)
#'
#' # Multiclass
#' df_multi <- data.frame(
#' truth = factor(c("A", "B", "C", "A", "B", "C")),
#' estimate = factor(c("A", "B", "A", "A", "C", "C"))
#' )
#'
#' accuracy(df_multi, truth, estimate)
#'
#' @export
accuracy <- function(data, ...) {
UseMethod("accuracy")
}Recipe Step
#' Center Numeric Variables
#'
#' `step_center()` creates a *specification* of a recipe step that will
#' normalize numeric data to have a mean of zero.
#'
#' @inheritParams step_center
#' @param recipe A recipe object. The step will be added to the sequence of
#' operations for this recipe.
#' @param ... One or more selector functions to choose variables for this step.
#' See [recipes::selections()] for more details.
#' @param role Not used by this step since no new variables are created.
#' @param trained A logical to indicate if the quantities for preprocessing have
#' been estimated.
#' @param means A named numeric vector of means. This is `NULL` until computed by
#' [prep()].
#' @param na_rm A logical value indicating whether NA values should be removed
#' when computing means.
#' @param columns A character vector of column names that will be populated
#' (eventually) by the [terms] argument. This is `NULL` until computed by
#' [prep()].
#' @param skip A logical. Should the step be skipped when the recipe is baked by
#' [bake()]? While all operations are baked when [prep()] is run, some
#' operations may not be able to be conducted on new data. Care should be
#' taken when using `skip = TRUE` as it may affect the computations for
#' subsequent operations.
#' @param id A character string that is unique to this step to identify it.
#'
#' @return An updated version of `recipe` with the new step added to the
#' sequence of any existing operations.
#'
#' @family normalization steps
#' @export
#'
#' @details
#'
#' Centering data means that the mean of the data is subtracted from each value,
#' resulting in a transformed variable with a mean of zero.
#'
#' The step estimates the means from the data used in the `training` argument
#' of [prep()]. [bake()] then applies the centering to new data sets using
#' these means.
#'
#' # Tidying
#'
#' When you [`tidy()`][recipes::tidy.recipe()] this step, a tibble is returned with
#' columns `terms`, `value`, and `id`:
#'
#' \describe{
#' \item{terms}{character, the selectors or variables selected}
#' \item{value}{numeric, the mean of the variable}
#' \item{id}{character, id of this step}
#' }
#'
#' # Case weights
#'
#' This step performs an unsupervised operation that can utilize case weights.
#' As a result, case weights are used with frequency weights as well as
#' importance weights. For more information, see the documentation in
#' [recipes::case_weights] and the examples on `tidymodels.org`.
#'
#' @examplesIf rlang::is_installed("modeldata")
#' data(biomass, package = "modeldata")
#'
#' biomass_tr <- biomass[biomass$dataset == "Training", ]
#' biomass_te <- biomass[biomass$dataset == "Testing", ]
#'
#' rec <- recipe(
#' HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
#' data = biomass_tr
#' )
#'
#' # Center carbon and hydrogen
#' step_centered <- rec |>
#' step_center(carbon, hydrogen)
#'
#' step_centered
#'
#' # Train the step
#' step_trained <- prep(step_centered, training = biomass_tr)
#'
#' # Apply to test data
#' transformed_te <- bake(step_trained, biomass_te)
#'
#' # Check means are zero
#' mean(transformed_te$carbon)
#' mean(transformed_te$hydrogen)
#'
#' # View learned parameters
#' tidy(step_trained, number = 1)
step_center <- function(
recipe,
...,
role = NA,
trained = FALSE,
means = NULL,
na_rm = TRUE,
columns = NULL,
skip = FALSE,
id = recipes::rand_id("center")
) {
# Implementation...
}Special Formatting
LaTeX equations
#' \deqn{formula} # Display equation (centered, separate line)
#' \eqn{inline} # Inline equation (within text)Examples:
#' \deqn{\frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2}
#' \eqn{x_i} represents the i-th observationCode formatting
#' `function_name()` - inline code
#' [package::function()] - link to function
#' \code{code} - alternative inline codeLists
#' Bullet list:
#' - Item 1
#' - Item 2
#'
#' Numbered list:
#' 1. First
#' 2. SecondSections
#' @section Title:
#'
#' Content in this section.Common Mistakes
Missing parameter documentation
# Bad: undocumented parameter
#' @export
my_function <- function(x, y) { ... }
# Good: all parameters documented
#' @param x Description
#' @param y Description
#' @export
my_function <- function(x, y) { ... }Inconsistent parameter names
# Bad: documentation doesn't match function signature
#' @param data Data frame
my_function <- function(df) { ... } # Parameter is 'df', not 'data'
# Good: names match
#' @param df Data frame
my_function <- function(df) { ... }Using @template without templates
# Bad: @template won't work in user packages
#' @template return-metric
# Good: Write out the documentation
#' @return A tibble with columns `.metric`, `.estimator`, and `.estimate`Missing @export
If users should call your function, it needs @export:
#' @export # Don't forget this!
user_facing_function <- function() { ... }Generating Documentation
After writing roxygen comments:
# Generate documentation and update NAMESPACE
devtools::document()This creates: - man/*.Rd files (documentation) - Updates NAMESPACE with exports and imports
Previewing Documentation
# After documenting
devtools::document()
devtools::load_all()
# View in help
?your_functionNext Steps
- Learn about package imports: package-imports.md
- Follow best practices: package-extension-requirements.md#best-practices
- Set up testing: package-extension-requirements.md#testing-requirements