
Tabular Data Machine Learning
This skill guides the process of developing predictive models for tabular data with proper validation practices.
Data Spending Strategy
Always partition data into:
- Training set: Used for all feature engineering, feature selection, and model development
- Test set: Reserved for final model evaluation only—requires explicit user permission before use
A common split is 75% training / 25% testing. Use stratified sampling: - Classification: stratify by outcome class - Regression: create temporary quartile groups and stratify by those
See references/data-spending.md for specific instructions for data splitting.
Test Set Rules
- NEVER predict on test data during model development
- NEVER calculate test set metrics without explicit user permission
- NEVER use test data to compare models or tune hyperparameters
- DO ask: “If you have completed model development, may I evaluate the final model on the test set?”
- DO wait for explicit confirmation before proceeding
Self-check: If you’re writing predict(..., test_data) without prior user permission, STOP—you’re making an error.
Exception: Basic verification after splitting (e.g., nrow(test_data), glimpse(test_data)) to confirm the split worked.
Empirical Validation
Always use out-of-sample predictions to measure performance:
- Large datasets (≥10,000 rows): Use a single validation set
- Small to medium datasets: Use 10-fold cross-validation or appropriate resampling
See references/resampling.md for resampling methods and implementation.
Validation Rules
- NEVER directly predict on training data to measure performance
- DO develop and compare models using only CV or validation set results
- DO select final model(s) based on out-of-sample performance
Performance Metrics
See references/evaluation.md for specific instructions for computing performance metrics.
Classification
Ask the user whether they prioritize: - Class separation: Use ROC-AUC or PR-AUC - Calibrated probabilities: Use Brier score
Default set: ROC-AUC, Brier score, and accuracy.
Regression
- RMSE: Primary accuracy metric (sensitive to outliers)
- MAE: Accuracy metric less sensitive to outliers
- R²: Measures variance explained (supplement to RMSE/MAE, not a replacement)
Default set: RMSE and R².
Model Optimization
The modeling process is iterative. Three main levers for improvement:
- Feature engineering: Modify predictors so the model does less work
- Model selection: Choose appropriate algorithm for data characteristics
- Hyperparameter tuning: Optimize parameters that can’t be estimated from data
All steps must be validated using out-of-sample data.
Optimization Rules
- NEVER use data outside the training set to determine feature engineering steps
- NEVER engineer features, then evaluate directly on training data
- DO treat feature engineering and model training as a single process
- DO use CV or validation set to measure combined feature engineering + model performance
- DO use CV or validation set to select best tuning parameters
Feature Engineering
See references/feature-engineering.md for: - Common feature engineering techniques - Model-specific requirements (mandatory vs. helpful transformations)
Model Tuning
- Use parameter ranges provided by the modeling framework
- Use space-filling designs for grid search when available
- Use racing methods for efficiency (except with validation sets)
- Visualize tuning results to show performance vs. parameter relationships
It is a good idea to propose two models to the user:
- a regularized linear (or logistic) model such as
glmnet - a boosted tree with an early stopping argument to halt after 5 poor iterations.
See references/tuning.md for details on tuning methods and implementation.
Model Evaluation
See references/tuning.md for details on tuning methods and implementation.
Without tuning: Resample the model or use a validation set. Report out-of-sample metrics.
With tuning: Select metric to optimize, identify optimal tuning parameters.
For the best model, present: - Numeric metric results - Appropriate visualizations (see below)
Evaluation Visualizations
Classification: - ROC or PR curves - Calibration curves
Regression: - Observed vs. predicted plots - Residual plots
See references/evaluation.md for metrics, visualizations, and implementation.
Final Model
Once the user selects a final model, fit it on the entire training set.
Test Set Evaluation
After receiving user permission, evaluate on the test set with: - Numeric metrics - Same visualizations as model evaluation (ROC/PR curves, calibration, observed vs. predicted, residuals)