References

SurvivalModels.AcceleratedFaillureTimeType
AcceleratedFaillureTime(T, Δ, baseline, X1, X2)
fit(AcceleratedFaillureTime, @formula(Surv(T, Δ) ~ x1 + x2), df)

Fit an Accelerated Failure Time (AFT) model with a specified baseline distribution and covariates.

Hazard function

\[h(t \,|\, x) = h_0\left(t \exp(x^\top \beta)\right) \exp(x^\top \beta)\]

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1=event, 0=censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1, X2: Covariate matrices (only X1 is used in AFT).

You can also use the fit() interface with a formula and DataFrame.

source
SurvivalModels.AcceleratedHazardType
AcceleratedHazard(T, Δ, baseline, X1, X2)
fit(AcceleratedHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)

Fit an Accelerated Hazard (AH) model with a specified baseline distribution and covariates.

Hazard function

\[h(t \,|\, z) = h_0\left(t \exp(z^\top \alpha)\right)\]

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1=event, 0=censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1, X2: Covariate matrices (only X2 is used in AH).

You can also use the fit() interface with a formula and DataFrame.

source
SurvivalModels.CoxApproxType
CoxApprox(T, Δ, X)
fit(CoxApprox, @formula(Surv(T,Δ)~X), data = ...)

The fourth implementation of the Cox proportional hazards model uses Hessian approximation based on a pre-calculated estimation. This version was created for when it might be difficult to work with full Hessian , offering faster iterations by using a Hessian approximation.

Fields:

  • X::Matrix{Float64}: The design matrix of covariates, where rows correspond to individuals and columns to features
  • T::Vector{Float64}: The observed times sorted in ascending order
  • Δ::Vector{Bool}: The event indicator vector (true for event, false for censoring)
  • sX::Vector{Float64}: Sum of X' multiplied by Δ
  • G::Vector{Float64}: Stores the gradient vector
  • η::Vector{Float64}: ηi = Xiβ
  • A::Vector{Float64}: Ai = exp(ηi)
  • B::Vector{Float64}: Stores the majoration elements of the Hessian matrix
  • C::Vector{Float64}: Used in the mkA! function
  • K::Vector{Int64}: Number of events at each unique observed event time
  • loss::Vector{Float64}: Stores the current negative partial log-likelihood value, used in CoxLLH getβ
source
SurvivalModels.CoxDefaultType
CoxDefault(T, Δ, X)
fit(CoxDefault, @formula(Surv(T,Δ)~X), data = ...)
fit(Cox, @formula(Surv(T,Δ)~X), data = ...)

The third implementation of the Cox proportional hazards model represents a highly optimized and significantly faster iteration compared to previous implementation, CoxV2.

This is the default implementation called when you ask for a Cox model.

Fields:

  • Xᵗ::Matrix{Float64}: The design matrix of covariates, transposed (m rows, n columns)
  • sX::Vector{Float64}: Sum of X' multiplied by Δ
  • T::Vector{Float64}: The observed times sorted in descending order
  • Δ::Vector{Bool}: The event indicator vector (true for event, false for censoring)
  • loss::Vector{Float64}: Stores the current negative partial log-likelihood value
  • G::Vector{Float64}: Stores the gradient vector
  • H::Matrix{Float64}: Stores the Hessian matrix
  • S₁::Vector{Float64}: Sum of rⱼxₖⱼ
  • S₂::Matrix{Float64}: Sum of rⱼxₖⱼ * xⱼ
  • μ::Vector{Float64}: Updates the gradient and Hessian
  • η::Vector{Float64}: ηi = Xiβ
  • r::Vector{Float64}: ri = exp(ηi)
  • R::Vector{UnitRange{Int64}}: A vector of the risk ranges for each output time.
source
SurvivalModels.CoxHessianType
CoxHessian(T, Δ, X)
fit(CoxHessian, @formula(Surv(T,Δ)~X), data = ...)

The second implementation of the Cox proportional hazards model uses a Newton-Raphson-like iterative update that directly calculates and utilizes the gradient and Hessian matrix. This version is updating coefficients via the update! function.

Fields:

  • X::Matrix{Float64}: The design matrix of covariates, where rows correspond to individuals and columns to features
  • T::Vector{Float64}: The observed times, sorted in ascending order
  • Δ::Vector{Int64}: The event indicator vector (true for event, false for censoring)
  • R::BitMatrix: A boolean risk matrix, where 'R[i,j]' is 'true' if individual 'j' is at risk at time 'T[i]'
source
SurvivalModels.CoxMethodType
StatsBase.fit(Cox, @formula(Surv(T,Δ)~predictors), dataset)

Arguments:

  • Cox: The model to fit, e.g. Cox. Could be specified to any of CoxNM, CoxOptim, CoxApprox or CoxDefault if you want different sovers to be used, see their own documentations. Default is CoxDefault.
  • formula: A StatsModels.FormulaTerm specifying the survival model
  • df: A DataFrame containing the variables specified in the formula

Returns:

  • predictor: A Vector{String} containing the names of the predictor variables included in the model

  • beta: A Vector{Float64} containing the estimated regression coefficients (β​) for each predictor

  • se: A Vector{Float64} containing the standard errors of the estimated regression coefficients

  • loglikelihood: A Vector{Float64} containing the log-likelihood of the fitted model. This value is repeated for each predictor row

  • coef: A vector of the estimated coefficients

  • formula: The applied formula

Example:

ovarian = dataset("survival", "ovarian")
ovarian.FUTime = Float64.(ovarian.FUTime) (Time column needs to be Float64 type)
ovarian.FUStat = Bool.(ovarian.FUStat) (Status column needs to be Bool type)
model = fit(Cox, @formula(Surv(FUTime, FUStat) ~ Age + ECOG_PS), ovarian)

We need to add details about the different prediction types here.

Types:

  • Cox : the base abstract type
  • CoxGrad<:Cox : abstract type for Cox models that are solved using gradient-based optimization
  • CoxLLH<:CoxGrad : abstract type for Cox models that are solved by optimizing the log-likelihood
source
SurvivalModels.CoxNMType
CoxNM(T, Δ, X)
fit(CoxNM, @formula(Surv(T,Δ)~X), data = ...)

An implementation of the Cox proportional hazards model that minimizes the negative partial log-likelihood function (cox_nllh). This version uses the Nelder-Mead method, a derivative-free optimization algorithm.

Fields:

  • X::Matrix{Float64}: The design matrix of covariates, where rows correspond to individuals and columns to features
  • T::Vector{Float64}: The observed times, sorted in ascending order
  • Δ::Vector{Bool}: The event indicator vector (true for event, false for censoring)
source
SurvivalModels.CoxOptimType
CoxOptim(T, Δ, X)
fit(CoxOptim, @formula(Surv(T,Δ)~X), data = ...)

The first implementation of the Cox proportional hazards model uses optimization libraries (Optimization.jl, Optim.jl) for coefficient estimation. It uses the BFGS algorithm to minimize the negative partial log-likelihood.

Fields:

  • X::Matrix{Float64}: The design matrix of covariates, where rows correspond to individuals and columns to features.
  • T::Vector{Float64}: The observed times, sorted in ascending order
  • Δ::Vector{Int64}: The event indicator vector (true for event, false for censoring)
source
SurvivalModels.GeneralHazardType
GeneralHazard(T, Δ, baseline, X1, X2)
fit(GeneralHazard, @formula(Surv(T, Δ) ~ x1 + x2), @formula(Surv(T, Δ) ~ z1 + z2), df)
fit(GeneralHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)

Fit a General Hazard (GH) model with a specified baseline distribution and covariates.

Hazard function

\[h(t \,|\, x, z) = h_0\left(t \exp(z^\top \alpha)\right) \exp(x^\top \beta)\]

Maximum likelihood estimation in General Hazards models using provided baseline distribution, provided hazard structure (through the method argument), provided design matrices..

Parameters T,Δ represent observed times and statuses, while X1, X2 should contain covariates. The number of columns in design matrices can be zero.

Hazard structures are defined by the method, which should be <:AbstractGHMethod, available possibilities are PHMethod(), AFTMethod(), AHMethod() and GHMethod().

The baseline distribution should be provided as a <:Distributions.ContinuousUnivariateDistribution object from Distributions.jl or compliant, e.g. from SurvivalDistributions.jl.

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1=event, 0=censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1, X2: Covariate matrices.

You can also use the fit() interface with:

  • Two formulas (for X1 and X2): for full GH models.
  • One formula: for PH, AFT, or AH models (the unused matrix will be ignored).

Example: Direct usage

using SurvivalModels, Distributions, Optim
T = [2.0, 3.0, 4.0, 5.0, 8.0]
Δ = [1, 1, 0, 1, 0]
X1 = [1.0 2.0; 2.0 1.0; 3.0 1.0; 4.0 2.0; 5.0 1.0]
X2 = [1.0 0.0; 0.0 1.0; 1.0 1.0; 0.0 0.0; 1.0 1.0]
model = GeneralHazard(T, Δ, Weibull, X1, X2)

Example: Using the fit() interface

using SurvivalModels, DataFrames, Distributions, Optim, StatsModels
df = DataFrame(time=T, status=Δ, x1=X1[:,1], x2=X1[:,2], z1=X2[:,1], z2=X2[:,2])
model = fit(GeneralHazard, @formula(Surv(time, status) ~ x1 + x2), @formula(Surv(time, status) ~ z1 + z2), df)
# Or for PH/AFT/AH models:
model_ph = fit(ProportionalHazard, @formula(Surv(time, status) ~ x1 + x2), df)

References:

source
SurvivalModels.GeneralHazardModelType
GeneralHazardModel{Method, B}

A flexible parametric survival model supporting Proportional Hazards (PH), Accelerated Failure Time (AFT), Accelerated Hazards (AH), and General Hazards (GH) structures.

Fields

  • T: Vector of observed times.
  • Δ: Vector of event indicators (true if event, false if censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1: Covariate matrix for the first linear predictor (e.g., PH/AFT).
  • X2: Covariate matrix for the second linear predictor (e.g., AH/GH).
  • α: Coefficient vector for X2.
  • β: Coefficient vector for X1.

Construction

You can construct a model directly by providing all parameters:

model = GeneralHazardModel(
    GHMethod(),
    T, Δ, Weibull(1.0, 2.0),
    X1, X2,
    α, β
)

or fit it from data using the fit interface.

Supported methods:

  • ProportionalHazard: For PH models.
  • AcceleratedFaillureTime: For AFT models.
  • AcceleratedHazard: For AH models.
  • GeneralHazard: For full GH models.
source
SurvivalModels.KaplanMeierType
KaplanMeier(T, Δ)
fit(KaplanMeier, @formula(Surv(T, Δ) ~ 1), df)

Efficient Kaplan-Meier estimator.

Mathematical Description

Suppose we observe $n$ individuals, with observed times $T_1, T_2, \ldots, T_n$ and event indicators $\Delta_1, \Delta_2, \ldots, \Delta_n$ ($\Delta_i = 1$ if the event occurred, $0$ if censored).

Let $t_1 < t_2 < \cdots < t_k$ be the ordered unique event times.

  • $d_j$: number of events at time $t_j$
  • $Y_j$: number of individuals at risk just before $t_j$

The Kaplan-Meier estimator of the survival function $S(t)$ is:

\[\hat{S}(t) = \prod_{t_j \leq t} \left(1 - \frac{d_j}{Y_j}\right)\]

This product runs over all event times $t_j$ less than or equal to $t$.

The Greenwood estimator for the variance of $\hat{S}(t)$ is:

\[\widehat{\mathrm{Var}}[\hat{S}(t)] = \hat{S}(t)^2 \sum_{t_j \leq t} \frac{d_j}{Y_j (Y_j - d_j)}\]

Arguments

  • T: Vector of event or censoring times.
  • Δ: Event indicator vector (1 if event, 0 if censored).

Stores

  • t: Sorted unique event times.
  • ∂N: Number of uncensored deaths at each time point.
  • Y: Number of at risk individuals at each time point.
  • ∂Λ: Increments of cumulative hazard.
  • ∂σ: Greenwood variance increments.

Example: Direct usage

using SurvivalModels
T = [2, 3, 4, 5, 8]
Δ = [1, 1, 0, 1, 0]
km = KaplanMeier(T, Δ)

Example: Using the fit() interface

using SurvivalModels, DataFrames, StatsModels
df = DataFrame(time=T, status=Δ)
km2 = fit(KaplanMeier, @formula(Surv(time, status) ~ 1), df)
source
SurvivalModels.LogRankTestType
LogRankTest(T, Δ, group, strata)
fit(LogRankTest, @formula(Surv(T, Δ) ~ gr), data = ...)
fit(LogRankTest, @formula(Surv(T, Δ) ~ Strata(st) + gr), data = ...)

Performs the stratified log-rank test for comparing survival distributions across groups.

Arguments

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1 = event, 0 = censored).
  • group: Vector indicating group membership (e.g., treatment arm).
  • strata: Vector indicating strata membership (e.g., baseline strata).
  • gr and st are the variables in the DataFrame defining the groups and strata for the fit interface.

Returns

A LogRankTest object with the following fields:

  • stat: Chi-square test statistic.
  • df: Degrees of freedom (number of groups minus 1).
  • pval: P-value of the test.

Notes

  • Implements the stratified log-rank test by aggregating test statistics and variances over strata.
  • Suitable for right-censored survival data with stratification.
source
SurvivalModels.ProportionalHazardType
ProportionalHazard(T, Δ, baseline, X1, X2)
fit(ProportionalHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)

Fit a Proportional Hazards (PH) model with a specified baseline distribution and covariates.

Hazard function

\[h(t \,|\, x) = h_0(t) \exp(x^\top \beta)\]

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1=event, 0=censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1, X2: Covariate matrices (only X1 is used in PH).

You can also use the fit() interface with a formula and DataFrame.

source
SurvivalModels.greenwoodMethod
greenwood(S::KaplanMeier, t)

Compute the Greenwood variance estimate for the Kaplan-Meier survival estimator at time t.

The Greenwood formula provides an estimate of the variance of the Kaplan-Meier survival function at a given time point. For a fitted Kaplan-Meier object S, the variance at time t is:

```math \widehat{\mathrm{Var}}[\hat{S}(t)] = \hat{S}(t)^2 \sum{tj < t} \frac{dj}{Yj (Yj - dj)}

source
SurvivalModels.harrells_cMethod
harrells_c(times, statuses, risk_scores)
harrels_c(C::Cox)

Compute Harrell's concordance index (C-index) for survival models.

Arguments

  • times: Vector of observed times (Float64).
  • statuses: Vector of event indicators (Bool; true if event, false if censored).
  • risk_scores: Vector of predicted risk scores (higher means higher risk).

Returns

  • The C-index, a value between 0 and 1 indicating the proportion of all usable patient pairs in which predictions and outcomes are concordant.

Details

The C-index measures the discriminative ability of a survival model: it is the probability that, for a randomly chosen pair of comparable subjects, the subject with the higher predicted risk actually experiences the event before the other. Tied risk scores count as half-concordant.

You can also call harrels_c(C::Cox) to compute the C-index for a fitted Cox model.

source
SurvivalModels.simGHMethod
simGH(n, model::GeneralHazardModel)

This function simulate times to event from a general hazard model, whatever the structure it has (AH, AFT, PH, GH), and whatever its baseline distribution.

Returns a vector containing the simulated times to event

References:

source
SurvivalModels.summaryMethod
summary(model::Cox)

Compute a statistical summary table for a fitted Cox proportional hazards model.

This function calculates standard errors, z-scores, p-values, and confidence intervals.

Returns

A DataFrame with the following columns:

  • predictor: Name of the covariate.
  • β: Estimated regression coefficient.
  • e_β: Hazard ratio (exp(β)).
  • se: Standard error of the coefficient.
  • z_scores: Wald statistic (β / se).
  • p_values: Two-sided p-value.
  • ci_lower_β: Lower bound of the 95% confidence interval for β.
  • ci_upper_β: Upper bound of the 95% confidence interval for β.

Example

model = fit(Cox, @formula(Surv(time, status) ~ age + sex), df)

# Get the full summary dataframe
summ = summary(model)

# Extract standard errors specifically
standard_errors = summ.se
source
[1]
E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American statistical association 53, 457–481 (1958).
[2]
M. Greenwood. The natural duration of cancer. Reports on Public Health and Medical Subjects 33, 1–26 (1926).
[3]
N. Mantel. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer chemotherapy reports 50, 163–170 (1966).
[4]
D. R. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
[5]
Y. Chen and N. Jewell. On a general class of semiparametric hazards regression models. Biometrika 88, 687–702 (2001).
[6]
F. Rubio, L. Remontet, N. Jewell and A. Belot. On a general structure for hazard-based regression models: an application to population-based cancer research. Statistical Methods in Medical Research 28, 2404–2417 (2019).
[7]
J. Kalbfleisch and R. Prentice. The statistical analysis of failure time data (John Wiley & Sons, 2011).
[8]
D. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
[9]
Y. Chen and M. Wang. Analysis of accelerated hazards models. Journal of the American Statistical Association 95, 608–618 (2000).
[10]
A. Eletti, G. Marra, M. Quaresma, R. Radice and F. Rubio. A unifying framework for flexible excess hazard modelling with applications in cancer epidemiology. Journal of the Royal Statistical Society: Series C NA, in press (2022).
[11]
F. Rubio, B. Rachet, B. Giorgi, C. Maringe and A. Belot. On models for the estimation of the excess mortality hazard in case of insufficiently stratified life tables. Biostatistics, na–na (2019).
[12]
M. Nikulin and F. Haghighi. On the power generalized Weibull family: model for cancer censored data. Metron – International Journal of Statistics 67, 75–86 (2009).
[13]
E. Stacy. A generalization of the gamma distribution. The Annals of Mathematical Statistics 33, 1187–1192 (1962).