Linear Predictor Survival Metrics
Linear predictor survival metrics evaluate linear predictor values (from Cox models or similar) against right-censored survival data. These metrics assess the prognostic separation provided by the linear predictor.
Overview
Use when: - Truth is a Surv object (from the survival package) - Predictions are linear predictor values from a survival model - You want to measure prognostic separation or explained variation
Key characteristics: - Similar structure to static survival metrics - Predictions are unbounded (not probabilities) - Typically used with Cox proportional hazards models - Returns .estimator = "standard"
Examples: Royston’s D statistic, R² based on D
Reference implementation: R/surv-royston.R in yardstick repository
Pattern: Three-Function Approach
1. Implementation Function
# Internal calculation logic
my_metric_impl <- function(truth, estimate, case_weights = NULL) {
# truth: Surv object
# estimate: numeric vector (linear predictor values)
# case_weights: numeric vector or NULL
if (is.null(case_weights)) {
case_weights <- rep(1, length(estimate))
} else {
case_weights <- vctrs::vec_cast(case_weights, to = double())
}
# Transform linear predictor to normal scores (Blom's method)
bns <- normal_score_blom(estimate, case_weights)
# Fit Cox model with normal scores
fit <- survival::coxph(truth ~ bns, weights = case_weights)
est <- unname(stats::coef(fit))
# Calculate metric (e.g., R²_D for Royston)
est^2 / (est^2 + pi^2 / 6)
}
# Helper: Blom's normal scores transformation
normal_score_blom <- function(x, case_weights) {
# Implementation details...
}2. Vector Interface
#' @export
my_metric_vec <- function(truth, estimate, na_rm = TRUE, case_weights = NULL, ...) {
# Validate na_rm parameter
check_bool(na_rm)
# Validate inputs
check_linear_pred_survival_metric(truth, estimate, case_weights)
# Handle missing values
if (na_rm) {
result <- yardstick_remove_missing(truth, estimate, case_weights)
truth <- result$truth
estimate <- result$estimate
case_weights <- result$case_weights
} else if (yardstick_any_missing(truth, estimate, case_weights)) {
return(NA_real_)
}
# Call implementation
my_metric_impl(truth, estimate, case_weights)
}3. Data Frame Method
#' My Linear Predictor Survival Metric
#'
#' Description of what this metric measures.
#'
#' @family linear pred survival metrics
#' @templateVar fn my_metric
#' @template return
#'
#' @param data A data frame
#' @param truth Unquoted column with Surv object
#' @param estimate Unquoted column with linear predictor values
#' @param na_rm Remove missing values (default TRUE)
#' @param case_weights Optional case weights column
#' @param ... Not currently used
#'
#' @export
my_metric <- function(data, ...) {
UseMethod("my_metric")
}
my_metric <- new_linear_pred_survival_metric(
my_metric,
direction = "maximize", # or "minimize"
range = c(0, 1) # or other appropriate range
)
#' @export
#' @rdname my_metric
my_metric.data.frame <- function(data, truth, estimate, na_rm = TRUE,
case_weights = NULL, ...) {
linear_pred_survival_metric_summarizer(
name = "my_metric",
fn = my_metric_vec,
data = data,
truth = !!rlang::enquo(truth),
estimate = !!rlang::enquo(estimate),
na_rm = na_rm,
case_weights = !!rlang::enquo(case_weights)
)
}Complete Example: Royston’s D Statistic
Royston’s D measures prognostic separation based on the standard deviation of the prognostic index.
# R/royston_survival.R
# 1. Helper: Blom's normal score transformation
normal_score_blom <- function(x, case_weights) {
# Include observations with zero weights
x_0 <- tibble::tibble(.row = seq_along(x), x = x)
rankits <- tibble::tibble(
.row = rep(seq_along(x), times = case_weights),
x = rep(x, times = case_weights),
) |>
dplyr::mutate(
x_first = rank(.data$x, ties.method = "first"),
# Blom's transformation
z = stats::qnorm((.data$x_first - 3 / 8) / (dplyr::n() + 1 / 4))
) |>
# Average over ties
dplyr::mutate(s = mean(.data$z), .by = "x") |>
dplyr::slice(1, .by = .row)
dplyr::left_join(x_0, rankits, by = c(".row")) |>
dplyr::pull("s")
}
# 2. Implementation function
royston_survival_impl <- function(truth, estimate, case_weights) {
if (is.null(case_weights)) {
case_weights <- rep(1, length(estimate))
} else {
case_weights <- vctrs::vec_cast(case_weights, to = double())
}
# Transform to normal scores
bns <- normal_score_blom(estimate, case_weights)
# Fit Cox model
fit <- survival::coxph(truth ~ bns, weights = case_weights)
est <- unname(stats::coef(fit))
# Calculate R²_D
est^2 / (est^2 + pi^2 / 6)
}
# 3. Vector interface
#' @export
royston_survival_vec <- function(truth, estimate, na_rm = TRUE,
case_weights = NULL, ...) {
check_bool(na_rm)
check_linear_pred_survival_metric(truth, estimate, case_weights)
if (na_rm) {
result <- yardstick_remove_missing(truth, estimate, case_weights)
truth <- result$truth
estimate <- result$estimate
case_weights <- result$case_weights
} else if (yardstick_any_missing(truth, estimate, case_weights)) {
return(NA_real_)
}
royston_survival_impl(truth, estimate, case_weights)
}
# 4. Data frame method
#' @export
royston_survival <- function(data, ...) {
UseMethod("royston_survival")
}
royston_survival <- new_linear_pred_survival_metric(
royston_survival,
direction = "maximize",
range = c(0, 1)
)
#' @export
#' @rdname royston_survival
royston_survival.data.frame <- function(data, truth, estimate, na_rm = TRUE,
case_weights = NULL, ...) {
linear_pred_survival_metric_summarizer(
name = "royston_survival",
fn = royston_survival_vec,
data = data,
truth = !!enquo(truth),
estimate = !!enquo(estimate),
na_rm = na_rm,
case_weights = !!enquo(case_weights)
)
}Key Validation Function
check_linear_pred_survival_metric(truth, estimate, case_weights)This validates: - truth is a Surv object - estimate is a numeric vector - Lengths match - case_weights are valid (if provided)
Input Format
Truth: Surv Object
library(survival)
# Create Surv object
truth <- Surv(
time = c(5, 8, 10, 12),
event = c(1, 0, 1, 1)
)Estimate: Linear Predictor Values
# Numeric vector of linear predictor values (unbounded)
# From Cox model: linear_pred = X * beta
estimate <- c(-0.5, 0.2, 0.8, 1.2)
# Or from predict.coxph(..., type = "lp")Understanding Linear Predictors
Linear predictors from Cox models represent: - Log hazard ratio relative to baseline - Prognostic index: higher values = higher risk - Unbounded: can be any real number
# Example Cox model
fit <- survival::coxph(Surv(time, event) ~ age + sex, data = df)
# Linear predictor for new data
lp <- predict(fit, newdata = test_data, type = "lp")
# lp = beta_age * age + beta_sex * sexTransformations
Many linear predictor metrics use transformations to improve properties:
Blom’s Normal Score Transformation
# Rank-based transformation to approximate normality
# Used in Royston's D
normal_score_blom <- function(x, case_weights) {
# 1. Replicate based on weights
# 2. Rank observations
# 3. Transform ranks to normal quantiles
# 4. Average over ties
# See complete implementation above
}This transformation: - Removes dependence on scale of linear predictor - Makes metric more robust - Focuses on rank ordering rather than absolute values
Testing
# tests/testthat/test-royston_survival.R
test_that("royston_survival works correctly", {
df <- data.frame(
time = c(5, 8, 10, 12, 15),
event = c(1, 0, 1, 1, 1),
lp = c(-0.5, 0.2, 0.8, 1.2, 1.5)
)
df$surv_obj <- survival::Surv(df$time, df$event)
result <- royston_survival(df, truth = surv_obj, estimate = lp)
expect_equal(result$.metric, "royston_survival")
expect_equal(result$.estimator, "standard")
expect_true(result$.estimate >= 0 && result$.estimate <= 1)
})
test_that("royston_survival validates inputs", {
df <- data.frame(
time = c(5, 8, 10),
event = c(1, 0, 1),
lp = c("a", "b", "c") # Wrong type
)
df$surv_obj <- survival::Surv(df$time, df$event)
expect_error(royston_survival(df, truth = surv_obj, estimate = lp))
})
test_that("royston_survival handles perfect separation", {
# Perfect prognostic separation
df <- data.frame(
time = c(1, 2, 3, 4, 5),
event = c(1, 1, 1, 1, 1),
lp = c(-2, -1, 0, 1, 2) # Perfect ordering
)
df$surv_obj <- survival::Surv(df$time, df$event)
result <- royston_survival(df, truth = surv_obj, estimate = lp)
# Should be close to 1 (perfect)
expect_true(result$.estimate > 0.9)
})
test_that("royston_survival works with case weights", {
df <- data.frame(
time = c(5, 8, 10, 12),
event = c(1, 0, 1, 1),
lp = c(-0.5, 0.2, 0.8, 1.2),
weights = c(1, 2, 1, 1)
)
df$surv_obj <- survival::Surv(df$time, df$event)
result <- royston_survival(df, truth = surv_obj, estimate = lp,
case_weights = weights)
expect_true(is.numeric(result$.estimate))
})Common Patterns
Using Cox Models
# Fit Cox model within metric
fit <- survival::coxph(truth ~ transformed_estimate, weights = case_weights)
# Extract coefficient
est <- unname(stats::coef(fit))
# Calculate metric based on coefficientHandling Case Weights
if (is.null(case_weights)) {
case_weights <- rep(1, length(estimate))
} else {
case_weights <- vctrs::vec_cast(case_weights, to = double())
}Rank-Based Calculations
# Use ranks instead of raw values for robustness
ranks <- rank(estimate, ties.method = "average")
# Or use normal score transformation
normal_scores <- normal_score_blom(estimate, case_weights)Key Differences from Other Metric Types
| Aspect | Linear Pred Survival | Static Survival | Dynamic Survival |
|---|---|---|---|
| Estimate type | Linear predictor | Any numeric | Survival probabilities |
| Range | Unbounded | Often bounded | Probabilities (0-1) |
| Typical source | Cox model | Various | Survival curves |
| Use case | Prognostic separation | Overall concordance | Time-specific |
Statistical Background
Royston’s D Statistic
- Measures prognostic separation in survival models
- Related to standard deviation of prognostic index
- R²_D represents explained variation on log hazard scale
- D = β * κ where β is coefficient, κ is scaling constant
Interpretation
# R²_D values
# 0.0 = no prognostic separation
# 0.5 = moderate separation
# 1.0 = perfect separation (impossible in practice)Best Practices
- Use with Cox models: These metrics are designed for Cox model output
- Apply transformations: Use normal scores or other transformations as appropriate
- Validate inputs: Use
check_linear_pred_survival_metric() - Handle case weights: Convert to numeric with
vctrs::vec_cast() - Document statistical basis: Explain the underlying statistical model
- Consider alternatives: Compare with concordance for validation
- Check for ties: Handle tied predictions appropriately
Common Metrics
- Royston’s D: Prognostic separation statistic
- R²_D: Explained variation based on D
- Somers’ D: Rank correlation (can also use concordance)
Dependencies
Linear predictor survival metrics typically depend on:
# In DESCRIPTION
Imports:
survival,
vctrs,
dplyr,
tibble,
stats
# In R/package.R
#' @importFrom survival Surv coxph
#' @importFrom stats coef qnormSee Also
- Static Survival Metrics - Other overall survival metrics
- Dynamic Survival Metrics - Time-dependent metrics
- Metric System - Understanding metric architecture