References
SurvivalModels.AcceleratedFaillureTime — Type
AcceleratedFaillureTime(T, Δ, baseline, X1, X2)
fit(AcceleratedFaillureTime, @formula(Surv(T, Δ) ~ x1 + x2), df)Fit an Accelerated Failure Time (AFT) model with a specified baseline distribution and covariates.
Hazard function
\[h(t \,|\, x) = h_0\left(t \exp(x^\top \beta)\right) \exp(x^\top \beta)\]
T: Vector of observed times.Δ: Vector of event indicators (1=event, 0=censored).baseline: Baseline distribution (e.g., Weibull()).X1,X2: Covariate matrices (onlyX1is used in AFT).
You can also use the fit() interface with a formula and DataFrame.
SurvivalModels.AcceleratedHazard — Type
AcceleratedHazard(T, Δ, baseline, X1, X2)
fit(AcceleratedHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)Fit an Accelerated Hazard (AH) model with a specified baseline distribution and covariates.
Hazard function
\[h(t \,|\, z) = h_0\left(t \exp(z^\top \alpha)\right)\]
T: Vector of observed times.Δ: Vector of event indicators (1=event, 0=censored).baseline: Baseline distribution (e.g., Weibull()).X1,X2: Covariate matrices (onlyX2is used in AH).
You can also use the fit() interface with a formula and DataFrame.
SurvivalModels.CoxApprox — Type
CoxApprox(T, Δ, X)
fit(CoxApprox, @formula(Surv(T,Δ)~X), data = ...)The fourth implementation of the Cox proportional hazards model uses Hessian approximation based on a pre-calculated estimation. This version was created for when it might be difficult to work with full Hessian , offering faster iterations by using a Hessian approximation.
Fields:
- X::Matrix{Float64}: The design matrix of covariates, where rows correspond to individuals and columns to features
- T::Vector{Float64}: The observed times sorted in ascending order
- Δ::Vector{Bool}: The event indicator vector (true for event, false for censoring)
- sX::Vector{Float64}: Sum of X' multiplied by Δ
- G::Vector{Float64}: Stores the gradient vector
- η::Vector{Float64}: ηi = Xiβ
- A::Vector{Float64}: Ai = exp(ηi)
- B::Vector{Float64}: Stores the majoration elements of the Hessian matrix
- C::Vector{Float64}: Used in the mkA! function
- K::Vector{Int64}: Number of events at each unique observed event time
- loss::Vector{Float64}: Stores the current negative partial log-likelihood value, used in CoxLLH getβ
SurvivalModels.CoxDefault — Type
CoxDefault(T, Δ, X)
fit(CoxDefault, @formula(Surv(T,Δ)~X), data = ...)
fit(Cox, @formula(Surv(T,Δ)~X), data = ...)The third implementation of the Cox proportional hazards model represents a highly optimized and significantly faster iteration compared to previous implementation, CoxV2.
This is the default implementation called when you ask for a Cox model.
Fields:
- Xᵗ::Matrix{Float64}: The design matrix of covariates, transposed (m rows, n columns)
- sX::Vector{Float64}: Sum of X' multiplied by Δ
- T::Vector{Float64}: The observed times sorted in descending order
- Δ::Vector{Bool}: The event indicator vector (true for event, false for censoring)
- loss::Vector{Float64}: Stores the current negative partial log-likelihood value
- G::Vector{Float64}: Stores the gradient vector
- H::Matrix{Float64}: Stores the Hessian matrix
- S₁::Vector{Float64}: Sum of rⱼxₖⱼ
- S₂::Matrix{Float64}: Sum of rⱼxₖⱼ * xⱼ
- μ::Vector{Float64}: Updates the gradient and Hessian
- η::Vector{Float64}: ηi = Xiβ
- r::Vector{Float64}: ri = exp(ηi)
- R::Vector{UnitRange{Int64}}: A vector of the risk ranges for each output time.
SurvivalModels.CoxHessian — Type
CoxHessian(T, Δ, X)
fit(CoxHessian, @formula(Surv(T,Δ)~X), data = ...)The second implementation of the Cox proportional hazards model uses a Newton-Raphson-like iterative update that directly calculates and utilizes the gradient and Hessian matrix. This version is updating coefficients via the update! function.
Fields:
- X::Matrix{Float64}: The design matrix of covariates, where rows correspond to individuals and columns to features
- T::Vector{Float64}: The observed times, sorted in ascending order
- Δ::Vector{Int64}: The event indicator vector (true for event, false for censoring)
- R::BitMatrix: A boolean risk matrix, where 'R[i,j]' is 'true' if individual 'j' is at risk at time 'T[i]'
SurvivalModels.CoxMethod — Type
StatsBase.fit(Cox, @formula(Surv(T,Δ)~predictors), dataset)Arguments:
- Cox: The model to fit, e.g. Cox. Could be specified to any of
CoxNM, CoxOptim, CoxApproxorCoxDefaultif you want different sovers to be used, see their own documentations. Default isCoxDefault. - formula: A StatsModels.FormulaTerm specifying the survival model
- df: A DataFrame containing the variables specified in the formula
Returns:
predictor: A Vector{String} containing the names of the predictor variables included in the model
beta: A Vector{Float64} containing the estimated regression coefficients (β) for each predictor
se: A Vector{Float64} containing the standard errors of the estimated regression coefficients
loglikelihood: A Vector{Float64} containing the log-likelihood of the fitted model. This value is repeated for each predictor row
coef: A vector of the estimated coefficients
formula: The applied formula
Example:
ovarian = dataset("survival", "ovarian")
ovarian.FUTime = Float64.(ovarian.FUTime) (Time column needs to be Float64 type)
ovarian.FUStat = Bool.(ovarian.FUStat) (Status column needs to be Bool type)
model = fit(Cox, @formula(Surv(FUTime, FUStat) ~ Age + ECOG_PS), ovarian)We need to add details about the different prediction types here.
Types:
- Cox : the base abstract type
- CoxGrad<:Cox : abstract type for Cox models that are solved using gradient-based optimization
- CoxLLH<:CoxGrad : abstract type for Cox models that are solved by optimizing the log-likelihood
SurvivalModels.CoxNM — Type
CoxNM(T, Δ, X)
fit(CoxNM, @formula(Surv(T,Δ)~X), data = ...)An implementation of the Cox proportional hazards model that minimizes the negative partial log-likelihood function (cox_nllh). This version uses the Nelder-Mead method, a derivative-free optimization algorithm.
Fields:
- X::Matrix{Float64}: The design matrix of covariates, where rows correspond to individuals and columns to features
- T::Vector{Float64}: The observed times, sorted in ascending order
- Δ::Vector{Bool}: The event indicator vector (true for event, false for censoring)
SurvivalModels.CoxOptim — Type
CoxOptim(T, Δ, X)
fit(CoxOptim, @formula(Surv(T,Δ)~X), data = ...)The first implementation of the Cox proportional hazards model uses optimization libraries (Optimization.jl, Optim.jl) for coefficient estimation. It uses the BFGS algorithm to minimize the negative partial log-likelihood.
Fields:
- X::Matrix{Float64}: The design matrix of covariates, where rows correspond to individuals and columns to features.
- T::Vector{Float64}: The observed times, sorted in ascending order
- Δ::Vector{Int64}: The event indicator vector (true for event, false for censoring)
SurvivalModels.GeneralHazard — Type
GeneralHazard(T, Δ, baseline, X1, X2)
fit(GeneralHazard, @formula(Surv(T, Δ) ~ x1 + x2), @formula(Surv(T, Δ) ~ z1 + z2), df)
fit(GeneralHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)Fit a General Hazard (GH) model with a specified baseline distribution and covariates.
Hazard function
\[h(t \,|\, x, z) = h_0\left(t \exp(z^\top \alpha)\right) \exp(x^\top \beta)\]
Maximum likelihood estimation in General Hazards models using provided baseline distribution, provided hazard structure (through the method argument), provided design matrices..
Parameters T,Δ represent observed times and statuses, while X1, X2 should contain covariates. The number of columns in design matrices can be zero.
Hazard structures are defined by the method, which should be <:AbstractGHMethod, available possibilities are PHMethod(), AFTMethod(), AHMethod() and GHMethod().
The baseline distribution should be provided as a <:Distributions.ContinuousUnivariateDistribution object from Distributions.jl or compliant, e.g. from SurvivalDistributions.jl.
T: Vector of observed times.Δ: Vector of event indicators (1=event, 0=censored).baseline: Baseline distribution (e.g., Weibull()).X1,X2: Covariate matrices.
You can also use the fit() interface with:
- Two formulas (for
X1andX2): for full GH models. - One formula: for PH, AFT, or AH models (the unused matrix will be ignored).
Example: Direct usage
using SurvivalModels, Distributions, Optim
T = [2.0, 3.0, 4.0, 5.0, 8.0]
Δ = [1, 1, 0, 1, 0]
X1 = [1.0 2.0; 2.0 1.0; 3.0 1.0; 4.0 2.0; 5.0 1.0]
X2 = [1.0 0.0; 0.0 1.0; 1.0 1.0; 0.0 0.0; 1.0 1.0]
model = GeneralHazard(T, Δ, Weibull, X1, X2)Example: Using the fit() interface
using SurvivalModels, DataFrames, Distributions, Optim, StatsModels
df = DataFrame(time=T, status=Δ, x1=X1[:,1], x2=X1[:,2], z1=X2[:,1], z2=X2[:,2])
model = fit(GeneralHazard, @formula(Surv(time, status) ~ x1 + x2), @formula(Surv(time, status) ~ z1 + z2), df)
# Or for PH/AFT/AH models:
model_ph = fit(ProportionalHazard, @formula(Surv(time, status) ~ x1 + x2), df)References:
SurvivalModels.GeneralHazardModel — Type
GeneralHazardModel{Method, B}A flexible parametric survival model supporting Proportional Hazards (PH), Accelerated Failure Time (AFT), Accelerated Hazards (AH), and General Hazards (GH) structures.
Fields
T: Vector of observed times.Δ: Vector of event indicators (true if event, false if censored).baseline: Baseline distribution (e.g.,Weibull()).X1: Covariate matrix for the first linear predictor (e.g., PH/AFT).X2: Covariate matrix for the second linear predictor (e.g., AH/GH).α: Coefficient vector forX2.β: Coefficient vector forX1.
Construction
You can construct a model directly by providing all parameters:
model = GeneralHazardModel(
GHMethod(),
T, Δ, Weibull(1.0, 2.0),
X1, X2,
α, β
)or fit it from data using the fit interface.
Supported methods:
ProportionalHazard: For PH models.AcceleratedFaillureTime: For AFT models.AcceleratedHazard: For AH models.GeneralHazard: For full GH models.
SurvivalModels.KaplanMeier — Type
KaplanMeier(T, Δ)
fit(KaplanMeier, @formula(Surv(T, Δ) ~ 1), df)Efficient Kaplan-Meier estimator.
Mathematical Description
Suppose we observe $n$ individuals, with observed times $T_1, T_2, \ldots, T_n$ and event indicators $\Delta_1, \Delta_2, \ldots, \Delta_n$ ($\Delta_i = 1$ if the event occurred, $0$ if censored).
Let $t_1 < t_2 < \cdots < t_k$ be the ordered unique event times.
- $d_j$: number of events at time $t_j$
- $Y_j$: number of individuals at risk just before $t_j$
The Kaplan-Meier estimator of the survival function $S(t)$ is:
\[\hat{S}(t) = \prod_{t_j \leq t} \left(1 - \frac{d_j}{Y_j}\right)\]
This product runs over all event times $t_j$ less than or equal to $t$.
The Greenwood estimator for the variance of $\hat{S}(t)$ is:
\[\widehat{\mathrm{Var}}[\hat{S}(t)] = \hat{S}(t)^2 \sum_{t_j \leq t} \frac{d_j}{Y_j (Y_j - d_j)}\]
Arguments
T: Vector of event or censoring times.Δ: Event indicator vector (1if event,0if censored).
Stores
t: Sorted unique event times.∂N: Number of uncensored deaths at each time point.Y: Number of at risk individuals at each time point.∂Λ: Increments of cumulative hazard.∂σ: Greenwood variance increments.
Example: Direct usage
using SurvivalModels
T = [2, 3, 4, 5, 8]
Δ = [1, 1, 0, 1, 0]
km = KaplanMeier(T, Δ)Example: Using the fit() interface
using SurvivalModels, DataFrames, StatsModels
df = DataFrame(time=T, status=Δ)
km2 = fit(KaplanMeier, @formula(Surv(time, status) ~ 1), df)SurvivalModels.LogRankTest — Type
LogRankTest(T, Δ, group, strata)
fit(LogRankTest, @formula(Surv(T, Δ) ~ gr), data = ...)
fit(LogRankTest, @formula(Surv(T, Δ) ~ Strata(st) + gr), data = ...)Performs the stratified log-rank test for comparing survival distributions across groups.
Arguments
T: Vector of observed times.Δ: Vector of event indicators (1= event,0= censored).group: Vector indicating group membership (e.g., treatment arm).strata: Vector indicating strata membership (e.g., baseline strata).grandstare the variables in the DataFrame defining the groups and strata for thefitinterface.
Returns
A LogRankTest object with the following fields:
stat: Chi-square test statistic.df: Degrees of freedom (number of groups minus 1).pval: P-value of the test.
Notes
- Implements the stratified log-rank test by aggregating test statistics and variances over strata.
- Suitable for right-censored survival data with stratification.
SurvivalModels.ProportionalHazard — Type
ProportionalHazard(T, Δ, baseline, X1, X2)
fit(ProportionalHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)Fit a Proportional Hazards (PH) model with a specified baseline distribution and covariates.
Hazard function
\[h(t \,|\, x) = h_0(t) \exp(x^\top \beta)\]
T: Vector of observed times.Δ: Vector of event indicators (1=event, 0=censored).baseline: Baseline distribution (e.g., Weibull()).X1,X2: Covariate matrices (onlyX1is used in PH).
You can also use the fit() interface with a formula and DataFrame.
SurvivalModels.greenwood — Method
greenwood(S::KaplanMeier, t)Compute the Greenwood variance estimate for the Kaplan-Meier survival estimator at time t.
The Greenwood formula provides an estimate of the variance of the Kaplan-Meier survival function at a given time point. For a fitted Kaplan-Meier object S, the variance at time t is:
```math \widehat{\mathrm{Var}}[\hat{S}(t)] = \hat{S}(t)^2 \sum{tj < t} \frac{dj}{Yj (Yj - dj)}
SurvivalModels.harrells_c — Method
harrells_c(times, statuses, risk_scores)
harrels_c(C::Cox)Compute Harrell's concordance index (C-index) for survival models.
Arguments
times: Vector of observed times (Float64).statuses: Vector of event indicators (Bool; true if event, false if censored).risk_scores: Vector of predicted risk scores (higher means higher risk).
Returns
- The C-index, a value between 0 and 1 indicating the proportion of all usable patient pairs in which predictions and outcomes are concordant.
Details
The C-index measures the discriminative ability of a survival model: it is the probability that, for a randomly chosen pair of comparable subjects, the subject with the higher predicted risk actually experiences the event before the other. Tied risk scores count as half-concordant.
You can also call harrels_c(C::Cox) to compute the C-index for a fitted Cox model.
SurvivalModels.simGH — Method
simGH(n, model::GeneralHazardModel)This function simulate times to event from a general hazard model, whatever the structure it has (AH, AFT, PH, GH), and whatever its baseline distribution.
Returns a vector containing the simulated times to event
References:
SurvivalModels.summary — Method
summary(model::Cox)Compute a statistical summary table for a fitted Cox proportional hazards model.
This function calculates standard errors, z-scores, p-values, and confidence intervals.
Returns
A DataFrame with the following columns:
predictor: Name of the covariate.β: Estimated regression coefficient.e_β: Hazard ratio (exp(β)).se: Standard error of the coefficient.z_scores: Wald statistic (β / se).p_values: Two-sided p-value.ci_lower_β: Lower bound of the 95% confidence interval for β.ci_upper_β: Upper bound of the 95% confidence interval for β.
Example
model = fit(Cox, @formula(Surv(time, status) ~ age + sex), df)
# Get the full summary dataframe
summ = summary(model)
# Extract standard errors specifically
standard_errors = summ.se- [1]
- E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American statistical association 53, 457–481 (1958).
- [2]
- M. Greenwood. The natural duration of cancer. Reports on Public Health and Medical Subjects 33, 1–26 (1926).
- [3]
- N. Mantel. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer chemotherapy reports 50, 163–170 (1966).
- [4]
- D. R. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
- [5]
- Y. Chen and N. Jewell. On a general class of semiparametric hazards regression models. Biometrika 88, 687–702 (2001).
- [6]
- F. Rubio, L. Remontet, N. Jewell and A. Belot. On a general structure for hazard-based regression models: an application to population-based cancer research. Statistical Methods in Medical Research 28, 2404–2417 (2019).
- [7]
- J. Kalbfleisch and R. Prentice. The statistical analysis of failure time data (John Wiley & Sons, 2011).
- [8]
- D. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
- [9]
- Y. Chen and M. Wang. Analysis of accelerated hazards models. Journal of the American Statistical Association 95, 608–618 (2000).
- [10]
- A. Eletti, G. Marra, M. Quaresma, R. Radice and F. Rubio. A unifying framework for flexible excess hazard modelling with applications in cancer epidemiology. Journal of the Royal Statistical Society: Series C NA, in press (2022).
- [11]
- F. Rubio, B. Rachet, B. Giorgi, C. Maringe and A. Belot. On models for the estimation of the excess mortality hazard in case of insufficiently stratified life tables. Biostatistics, na–na (2019).
- [12]
- M. Nikulin and F. Haghighi. On the power generalized Weibull family: model for cancer censored data. Metron – International Journal of Statistics 67, 75–86 (2009).
- [13]
- E. Stacy. A generalization of the gamma distribution. The Annals of Mathematical Statistics 33, 1187–1192 (1962).