General Hazard Models

Hazard and cumulative hazard functions

The hazard and the cumulative hazard functions play a crucial role in survival analysis. These functions define the likelihood function in the presence of censored observations. Thus, they are important in many context. For more information about these functions, see Short course on Parametric Survival Analysis .

In Julia, hazard and cumulative hazard functions can be fetched through the hazard(dist, t) and cumhaz(dist, t) functions from SurvivalDistributions.jl, and can be aplied to any distributions complient with Distributions.jl's API. Note that SurvivalDistributions.jl also contains a few more distributions relevant to survival analysis. See also the (deprecated) HazReg.jl Julia Package.

Here are a few plots of hazard curves for some known distributions:

using Distributions, Plots, StatsBase, SurvivalDistributions
function hazard_cumhazard_plot(dist, distname; tlims=(0,10))
      plt1 = plot(t -> hazard(dist, t),
            xlabel = "x", ylabel = "Hazard", title = "$distname distribution",
            xlims = tlims, xticks = tlims[1]:1:tlims[2], label = "",
            xtickfont = font(16, "Courier"), ytickfont = font(16, "Courier"),
            xguidefontsize=18, yguidefontsize=18, linewidth=3,
            linecolor = "blue")
      plt2 = plot(t -> cumhazard(dist, t),
            xlabel = "x", ylabel = "Cumulative Hazard", title = "$distname distribution",
            xlims = tlims, xticks = tlims[1]:1:tlims[2], label = "",
            xtickfont = font(16, "Courier"), ytickfont = font(16, "Courier"),
            xguidefontsize=18, yguidefontsize=18, linewidth=3,
            linecolor = "blue")
      return plot(plt1, plt2)
end
hazard_cumhazard_plot (generic function with 1 method)

LogNormal

hazard_cumhazard_plot(LogNormal(0.5, 1), "LogNormal")
Example block output

LogLogistic

hazard_cumhazard_plot(LogLogistic(1, 0.5), "LogLogistic")
Example block output

Weibull

hazard_cumhazard_plot(Weibull(3, 0.5), "Weibull")
Example block output

Gamma

hazard_cumhazard_plot(Gamma(3, 0.5), "Gamma")
Example block output

General Hazard Models

The GH model is formulated in terms of the hazard structure

\[h(t; \alpha, \beta, \theta, {\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

where ${\bf x}\in{\mathbb R}^p$ are the covariates that affect the hazard level; $\tilde{\bf x} \in {\mathbb R}^q$ are the covariates the affect the time level (typically $\tilde{\bf x} \subset {\bf x}$); $\alpha \in {\mathbb R}^q$ and $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

This hazard structure leads to an identifiable model as long as the baseline hazard is not a hazard associated to a member of the Weibull family of distributions [5].

SurvivalModels.GeneralHazardModelType
GeneralHazardModel{Method, B}

A flexible parametric survival model supporting Proportional Hazards (PH), Accelerated Failure Time (AFT), Accelerated Hazards (AH), and General Hazards (GH) structures.

Fields

  • T: Vector of observed times.
  • Δ: Vector of event indicators (true if event, false if censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1: Covariate matrix for the first linear predictor (e.g., PH/AFT).
  • X2: Covariate matrix for the second linear predictor (e.g., AH/GH).
  • α: Coefficient vector for X2.
  • β: Coefficient vector for X1.

Construction

You can construct a model directly by providing all parameters:

model = GeneralHazardModel(
    GHMethod(),
    T, Δ, Weibull(1.0, 2.0),
    X1, X2,
    α, β
)

or fit it from data using the fit interface.

Supported methods:

  • ProportionalHazard: For PH models.
  • AcceleratedFaillureTime: For AFT models.
  • AcceleratedHazard: For AH models.
  • GeneralHazard: For full GH models.
source
SurvivalModels.GeneralHazardType
GeneralHazard(T, Δ, baseline, X1, X2)
fit(GeneralHazard, @formula(Surv(T, Δ) ~ x1 + x2), @formula(Surv(T, Δ) ~ z1 + z2), df)
fit(GeneralHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)

Fit a General Hazard (GH) model with a specified baseline distribution and covariates.

Hazard function

\[h(t \,|\, x, z) = h_0\left(t \exp(z^\top \alpha)\right) \exp(x^\top \beta)\]

Maximum likelihood estimation in General Hazards models using provided baseline distribution, provided hazard structure (through the method argument), provided design matrices..

Parameters T,Δ represent observed times and statuses, while X1, X2 should contain covariates. The number of columns in design matrices can be zero.

Hazard structures are defined by the method, which should be <:AbstractGHMethod, available possibilities are PHMethod(), AFTMethod(), AHMethod() and GHMethod().

The baseline distribution should be provided as a <:Distributions.ContinuousUnivariateDistribution object from Distributions.jl or compliant, e.g. from SurvivalDistributions.jl.

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1=event, 0=censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1, X2: Covariate matrices.

You can also use the fit() interface with:

  • Two formulas (for X1 and X2): for full GH models.
  • One formula: for PH, AFT, or AH models (the unused matrix will be ignored).

Example: Direct usage

using SurvivalModels, Distributions, Optim
T = [2.0, 3.0, 4.0, 5.0, 8.0]
Δ = [1, 1, 0, 1, 0]
X1 = [1.0 2.0; 2.0 1.0; 3.0 1.0; 4.0 2.0; 5.0 1.0]
X2 = [1.0 0.0; 0.0 1.0; 1.0 1.0; 0.0 0.0; 1.0 1.0]
model = GeneralHazard(T, Δ, Weibull, X1, X2)

Example: Using the fit() interface

using SurvivalModels, DataFrames, Distributions, Optim, StatsModels
df = DataFrame(time=T, status=Δ, x1=X1[:,1], x2=X1[:,2], z1=X2[:,1], z2=X2[:,2])
model = fit(GeneralHazard, @formula(Surv(time, status) ~ x1 + x2), @formula(Surv(time, status) ~ z1 + z2), df)
# Or for PH/AFT/AH models:
model_ph = fit(ProportionalHazard, @formula(Surv(time, status) ~ x1 + x2), df)

References:

source

Accelerated Failure Time (AFT) model

The AFT model is formulated in terms of the hazard structure

\[h(t; \beta, \theta, {\bf x}) = h_0\left(t \exp\{{\bf x}^{\top}\beta\}; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

where ${\bf x}\in{\mathbb R}^p$ are the available covariates; $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

SurvivalModels.AcceleratedFaillureTimeType
AcceleratedFaillureTime(T, Δ, baseline, X1, X2)
fit(AcceleratedFaillureTime, @formula(Surv(T, Δ) ~ x1 + x2), df)

Fit an Accelerated Failure Time (AFT) model with a specified baseline distribution and covariates.

Hazard function

\[h(t \,|\, x) = h_0\left(t \exp(x^\top \beta)\right) \exp(x^\top \beta)\]

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1=event, 0=censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1, X2: Covariate matrices (only X1 is used in AFT).

You can also use the fit() interface with a formula and DataFrame.

source

Proportional Hazards (PH) model

The PH model is formulated in terms of the hazard structure

\[h(t; \beta, \theta, {\bf x}) = h_0\left(t ; \theta\right) \exp\{{\bf x}^{\top}\beta\}.\]

where ${\bf x}\in{\mathbb R}^p$ are the available covariates; $\beta \in {\mathbb R}^p$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

SurvivalModels.ProportionalHazardType
ProportionalHazard(T, Δ, baseline, X1, X2)
fit(ProportionalHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)

Fit a Proportional Hazards (PH) model with a specified baseline distribution and covariates.

Hazard function

\[h(t \,|\, x) = h_0(t) \exp(x^\top \beta)\]

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1=event, 0=censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1, X2: Covariate matrices (only X1 is used in PH).

You can also use the fit() interface with a formula and DataFrame.

source

Accelerated Hazards (AH) model

The AH model is formulated in terms of the hazard structure

\[h(t; \alpha, \theta, \tilde{\bf x}) = h_0\left(t \exp\{\tilde{\bf x}^{\top}\alpha\}; \theta\right) .\]

where $\tilde{\bf x}\in{\mathbb R}^q$ are the available covariates; $\alpha \in {\mathbb R}^q$ are the regression coefficients; and $\theta \in \Theta$ is the vector of parameters of the baseline hazard $h_0(\cdot)$.

SurvivalModels.AcceleratedHazardType
AcceleratedHazard(T, Δ, baseline, X1, X2)
fit(AcceleratedHazard, @formula(Surv(T, Δ) ~ x1 + x2), df)

Fit an Accelerated Hazard (AH) model with a specified baseline distribution and covariates.

Hazard function

\[h(t \,|\, z) = h_0\left(t \exp(z^\top \alpha)\right)\]

  • T: Vector of observed times.
  • Δ: Vector of event indicators (1=event, 0=censored).
  • baseline: Baseline distribution (e.g., Weibull()).
  • X1, X2: Covariate matrices (only X2 is used in AH).

You can also use the fit() interface with a formula and DataFrame.

source

Available baseline hazards

The current version of the simGH command implements the following parametric baseline hazards for the models discussed in the previous section.

Simulating times to event from a general hazard structure with simGH

The simGH command from the HazReg.jl Julia package allows one to simulate times to event from the following models:

  • General Hazard (GH) model [5] [6].
  • Accelerated Failure Time (AFT) model [7].
  • Proportional Hazards (PH) model [8].
  • Accelerated Hazards (AH) model [9].

A description of these hazard models is presented below as well as the available baseline hazards.

SurvivalModels.simGHFunction
simGH(n, model::GeneralHazardModel)

This function simulate times to event from a general hazard model, whatever the structure it has (AH, AFT, PH, GH), and whatever its baseline distribution.

Returns a vector containing the simulated times to event

References:

source

Illustrative example

In this example, we simulate $n=1,000$ times to event from the GH, PH, AFT, and AH models with PGW baseline hazards, using the simGH() function. This functionality was ported from HazReg.jl

PGW-GH model

using SurvivalModels, Distributions, DataFrames, Random, SurvivalDistributions
using SurvivalModels: simGH

# Simulte design matrices
n = 100
Random.seed!(123)
des = randn(n, 2)
des_t = randn(n, 2)

# True parameters
theta0 = [0.1, 2.0, 5.0]
alpha0 = [0.5, 0.8]
beta0 = [-0.5, 0.75]

# Construct the model directly (no optimization)
model = GeneralHazard(zeros(n), trues(n),
    PowerGeneralizedWeibull(theta0...),
    des, des_t, alpha0, beta0)

# Simulate event times
simdat = simGH(n, model)

# Administrative censoring.
cens = 10
status = simdat .< cens
simdat = min.(simdat, cens)

# Model fit from dataframe interface.
df = DataFrame(time=simdat, status=status, x1=des[:,1], x2=des[:,2], z1=des_t[:,1], z2=des_t[:,2])
model = fit(GeneralHazard{PowerGeneralizedWeibull},
    @formula(Surv(time, status) ~ x1 + x2),
    @formula(Surv(time, status) ~ z1 + z2),
    df)

result = DataFrame(
    Parameter = ["θ₁", "θ₂", "θ₃", "α₁", "α₂","β₁", "β₂"],
    True      = vcat(theta0, alpha0, beta0),
    Fitted    = vcat(params(model.baseline)..., model.α, model.β)
)
7×3 DataFrame
RowParameterTrueFitted
StringFloat64Float64
1θ₁0.10.125543
2θ₂2.01.75385
3θ₃5.03.92871
4α₁0.50.492821
5α₂0.81.17079
6β₁-0.5-0.508637
7β₂0.750.739206

Of course, increasing hte numebr of observations would increase the quality of the fitted values. You can also use "subset" models (PH, AH, AFT) through the convenient constructors as follows:

PGW-PH model

model = ProportionalHazard(zeros(n), trues(n),
    PowerGeneralizedWeibull(theta0...),
    des, zeros(n,0),  # X2 is empty for PH
    zeros(0), beta0
)

# Simulate event times and censor them
simdat = simGH(n, model)
cens = 10
status = simdat .< cens
simdat = min.(simdat, cens)


# Build the model and fit it:
df = DataFrame(time=simdat, status=status, x1=des[:,1], x2=des[:,2])
model = fit(ProportionalHazard{PowerGeneralizedWeibull},
    @formula(Surv(time, status) ~ x1 + x2), df)

result = DataFrame(
    Parameter = ["θ₁", "θ₂", "θ₃", "β₁", "β₂"],
    True      = vcat(theta0, beta0),
    Fitted    = vcat(params(model.baseline)..., model.β)
)
5×3 DataFrame
RowParameterTrueFitted
StringFloat64Float64
1θ₁0.10.144986
2θ₂2.01.65966
3θ₃5.03.4394
4β₁-0.5-0.572272
5β₂0.750.735524

PGW-AFT model

# Construct the model directly (no optimization)
model = AcceleratedFaillureTime(
    zeros(n), trues(n), PowerGeneralizedWeibull(theta0...),
    des, zeros(n,0),  # X2 is empty for AFT
    zeros(0), beta0
)

# Simulate event times
simdat = simGH(n, model)

# Censoring
cens = 10
status = simdat .< cens
simdat = min.(simdat, cens)

df = DataFrame(time=simdat, status=status, x1=des[:,1], x2=des[:,2])
model = fit(AcceleratedFaillureTime{PowerGeneralizedWeibull},
    @formula(Surv(time, status) ~ x1 + x2), df)

result = DataFrame(
    Parameter = ["θ₁", "θ₂", "θ₃", "β₁", "β₂"],
    True      = vcat(theta0, beta0),
    Fitted    = vcat(params(model.baseline)..., model.β)
)
5×3 DataFrame
RowParameterTrueFitted
StringFloat64Float64
1θ₁0.10.0914492
2θ₂2.02.20273
3θ₃5.05.82919
4β₁-0.5-0.549269
5β₂0.750.731007

PGW-AH model

# Construct the model directly (no optimization)
model = AcceleratedHazard(zeros(n), trues(n),
    PowerGeneralizedWeibull(theta0...),
    zeros(n,0), des_t,  # X1 is empty for AH
    alpha0, zeros(0)
)

# Simulate event times
simdat = simGH(n, model)
cens = 10
status = simdat .< cens
simdat = min.(simdat, cens)

df = DataFrame(time=simdat, status=status, z1=des_t[:,1], z2=des_t[:,2])
model = fit(AcceleratedHazard{PowerGeneralizedWeibull},
    @formula(Surv(time, status) ~ z1 + z2), df)

result = DataFrame(
    Parameter = ["θ₁", "θ₂", "θ₃", "α₁", "α₂"],
    True      = vcat(theta0, alpha0),
    Fitted    = vcat(params(model.baseline)..., model.α)
)
5×3 DataFrame
RowParameterTrueFitted
StringFloat64Float64
1θ₁0.10.0869285
2θ₂2.02.42805
3θ₃5.05.73151
4α₁0.50.374832
5α₂0.80.916131
[5]
Y. Chen and N. Jewell. On a general class of semiparametric hazards regression models. Biometrika 88, 687–702 (2001).
[6]
F. Rubio, L. Remontet, N. Jewell and A. Belot. On a general structure for hazard-based regression models: an application to population-based cancer research. Statistical Methods in Medical Research 28, 2404–2417 (2019).
[7]
J. Kalbfleisch and R. Prentice. The statistical analysis of failure time data (John Wiley & Sons, 2011).
[8]
D. Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 187–202 (1972).
[9]
Y. Chen and M. Wang. Analysis of accelerated hazards models. Journal of the American Statistical Association 95, 608–618 (2000).