Getting Started

In many population-based studies, the specific cause of death is unidentified, unreliable or even unavailable. Relative survival analysis addresses this scenario, previously unexplored in general survival analysis. Different methods were created with the aim to construct a consistant and reliable estimator for this purpose.

Net survival settings

Note first that, for any positive random variable $X$, we will use extensively the following functions (that each fully characterize the distribution of $X$):

Function symbol and definitionFunction Name
$S_X(t) = \mathbb P(X > t)$Survival function
$\Lambda_X(t) = -\ln S_X(t)$Cumulative Hazard function
$\lambda_X(t) = \partial \Lambda_X(t)$Instantaneous hazard function

Consider a study that consists of censored survival times from a specific cause. Such a study consists of several random objects:

Random variableNameIs it observed ?
$E$"Excess" lifetime
$P$"Population" lifetime
$O = E \wedge P$"Overall" lifetime
$C$"Censoring" time
$\mathbf D$Vector of covariates
$T = O \wedge C$Event time
$\Delta = \mathbf{1}\{T \leq C\}$Event status

It is important to note that we do not observe a a potential indicator $\mathbf{1}\{E \geq P\}$. This is one of the key differences between net survival and standard survival. The standard Net survival analysis solves this problem by assuming that the underlying times $E$ and $P$ are independent from each other.

One central hypothesis in net survival (on top of non-informative censoring) is that $P$ and $E$ are independent. This independence can be written in terms of hazard rates $\lambda_O(t) = \lambda_P(t) + \lambda_E(t)$, or in terms of survival functions $S_O(t) = S_P(t)S_E(t)$.

The population hazard for each individual $\lambda_{P_i}$ is usally drawn from a reference life table, and may depend on covariates $\mathbf D_i$ such as age and date, sex, country, race, etc... See the RateTables.jl package for more details on the potential covariates. On the other hand, the excess mortality is assumed to be i.i.d. between individuals and not to depend on covariates at all. Thus, we mostly omit these covariates from our notations.

Available Estimators

The estimation of net survival is usually discussed in terms of the estimation of the cumulative excess hazard $\Lambda_E(t)$ and/or the instantaneous hazard $\lambda_E = \partial\Lambda_E$. To describe the estimators, we use the following counting processes notations, similar to standard survival analysis(see e.g. [6] or [7]).

  • The uncensored event indicatrix $\partial N_i(t)$ for individual $i$ at time $t$
  • The total number of uncensored events process $\partial N(t) = \sum_i \partial N_i(t)$ at time $t$
  • The at-risk indicatrix $Y_i(t)$, for whether an individual is still at risk
  • The total number at risk process $Y(t) = \sum_i Y_i(t)$ at time $t$

With these definitions and assumptions in mind, we will now present the four different methods implemented in this package, commonly used in literature, to estimate the excess hazard function $\partial\Lambda_E(t)$ and its variance. Recall that we estimate the variance as $\sigma_E^2(t) = \int_{0}^t \partial\sigma_E^2(s)$.

NameReferenceProposed (partial) Excess Hazard $\partial\hat{\Lambda}_E(s)$Proposed (partial) Variance $\partial\hat{\sigma}_E^2(s)$
Ederer I[1]$\frac{\sum_i N_i(s)}{\sum_i Y_i(s)} - \frac{\sum_i S_{P_i}(s)\partial\Lambda_{P_i}(s)}{\sum_i S_{P_i}(s)}$$\frac{\sum_i N_i(s)}{\left(\sum_i Y_i(s)\right)^2}$
Ederer II[2]$\frac{\sum_i N_i(s)}{\sum_i Y_i(s)} - \frac{\sum_i Y_i(s)\partial\Lambda_{P_i}(s)}{\sum_i Y_i(s)}$$\frac{\sum_i N_i(s)}{\left(\sum_i Y_i(s)\right)^2}$
Hakulinen[3]$\frac{\sum_i N_i(s)}{\sum_i Y_i(s)} - \frac{\sum_i \frac{Y_i(s)}{ S_{P_i}(s)}\partial\Lambda_{P_i}(s)}{\sum_i \frac{Y_i(s)}{ S_{P_i}(s)}}$$\frac{\sum_i N_i(s)}{\left(\sum_i Y_i(s)\right)^2}$
Pohar Perme[4]$\frac{\sum_i \frac{\partial N_i(s)}{S_{P_i}(s)} - \sum_i \frac{Y_i(s)}{S_{P_i}(s)}\partial\Lambda_{P_i}(s)}{\sum_i \frac{Y_i(s)}{S_{P_i}(s)}}$$\frac{\sum_{i=1}^n \frac{\partial N_i(s)}{S^2_{P_i}}}{\left(\sum_i \frac{Y_i(s)}{S_{p_i}(s)}\right)^2}$

where, in the variances, it is understood that when no more individuals are at risk $0/0$ gives $0$.

The Pohar Perme estimator [4] is the newest addition to relative survival analysis between the four methods, particularly designed to handle situations where covariates may change over time. It is trusted from the field (see e.g. [8] and [9]) that only this estimator should really be used, the other ones being included mostly for historical reasons and comparisons.

NetSurvival.PoharPermeType
PoharPerme

This method estimates net survival probabilities by applying the following estimation:

\[\partial\hat{\Lambda}_E(t) = \frac{\sum_i \frac{dN_i(u)}{S_{P_i}(u)} - \sum_i \frac{Y_i(u)}{S_{P_i}(u)}d\Lambda_{P_i}(u)}{\sum_i \frac{Y_i(u)}{S_{P_i}(u)}}\]

To fit the Pohar Perme to your data based on a certain rate table, apply the example below to your code :

fit(PoharPerme, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)

References:

  • [4] Perme, Maja Pohar and Stare, Janez and Estève, Jacques (2012). On Estimation in Relative Survival.
source
NetSurvival.EdererIType
EdererI

This method estimates net survival probabilities by applying the following estimation:

\[\partial\hat{\Lambda}_E(t) = \frac{\sum_i N_i(u)}{\sum_i Y_i(u)} - \frac{\sum_i S_{P_i}(u)d\Lambda_{P_i}(u)}{\sum_i S_{P_i}(u)}\]

To call this function:

fit(EdererI, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)
source
NetSurvival.EdererIIType
EdererII

This method estimates net survival probabilities by applying the following estimation:

\[\partial\hat{\Lambda}_E(t) = \frac{\sum_i N_i(u)}{\sum_i Y_i(u)} - \frac{\sum_i Y_i(u)d\Lambda_{P_i}(u)}{\sum_i Y_i(u)}\]

To call this function:

fit(EdererII, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)
source
NetSurvival.HakulinenType
Hakulinen

To call this function:

fit(Hakulinen, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)
source

Crude Mortality

The crude mortality rate is the global mortality rate (from all causes of death) for a population. This measure does not use the cause of death information which, as we previously mentioned, can be unreliable and incomplete. It is given as:

\[M_E(t) = \int_0^t S_O(u-) \lambda_E(u)du\]

There exists a few estimators of this quantity, the most known one being the Cronin-Feuer estimator [10], given by:

\[\hat{M}_E(t) = \int_0^t \hat{S}_O(u-) \hat{\lambda}_E(u)du\]

where, traditionally, Kaplan-Meier is used to estimate the overall survival function $\hat{S}_O$ and Ederer II is used for the excess hazard rate $\hat{\lambda}_E$ and the population hazard rate $\hat{\lambda}_P$.

Our implementation is a bit more permissive, as any net survival estimators can be used for $\hat{\lambda}_E$. Of course, the default is still the original Ederer II which provides the original Cronin-Feuer estimator.

NetSurvival.CrudeMortalityType
CrudeMortality

This method calculates the crude mortality and presents both the excess mortality and population mortality rates.

The default Cronin-Feuer estimator can be fitted to data with the following interface:

fit(CrudeMortality, args...)

where the args are passed to fit(EdererII,args...) to compute the excess hazard.

A more direct syntax can be used, specifying directly the estimator for the excess hazard:

CrudeMortality(fit(EdererII,args...))

To use another excess hazard estimator, simply replace EdererII with the method of your choice (PoharPerme, EdererI, Hakulinen).

References:

  • [10] Cronin, Kathleen A and Feuer, Eric J (2000). Cumulative cause-specific mortality for cancer patients in the presence of other causes: a crude analogue of relative survival.
source

Grafféo Log-Rank Test

The Grafféo Log-Rank Test [5] was constructed as a complement to the Pohar Perme estimator, aiming to compare the net survival functions provided by the latter. The test is designed to compare these functions across multiple groups, including stratified covariables, and to ultimately determine, with the given p-value, which covariables are impactful to the study.

The null $(H_0)$ hypothesis tests the following assumption:

\[\forall t \in [0,T], \; \; \Lambda_{E,g_1}(t) = \Lambda_{E,g_2}(t) = ... = \Lambda_{E,g_k}(t),\]

where $G = \{g_1,...,g_k\}$ is a partition of $1,...,n$ consisting of disjoint groups of individuals that we wish to compare to each other. For all group $g \in G$, let's denote the numerator and denominator of the Pohar Perme (partial) excess hazard estimators, restricted to individuals in the group, by:

  • $\partial N_{E,g}(s) = \sum_{i \in g} \frac{\partial N_i(s)}{S_{P_i}(s)} - \frac{Y_i(s)}{S_{P_i}(s)}\partial\Lambda_{P_i}(s)$
  • $Y_{E,g}(s) = \sum_{i \in g} \frac{Y_i(s)}{S_{P_i}(s)}$
  • $R_{g}(s) = \frac{Y_{E,g}(s)}{\sum_{g\in G} Y_{E,g}(s)}$

Then, define the vector $\mathbf Z = \left(Z_{g_r}: \; r \in 1,...,k-1 \right)$ with entries:

\[Z_g(T) = N_{E,g}(s) - \int_{0}^T Y_{E,g}(s) \partial\hat{\Lambda}_E(s)\]

The test statistic is then given by:

\[U(T) = \mathbf Z(T)'\hat{\Sigma}_Z^{-1} \mathbf Z(T)\]

where the entries of the $\hat{\Sigma}_Z$ matrix are given by:

\[\sigma_{g,h}(T) = \int_0^T \sum_{\ell \in G} \left(\delta_{g,\ell} - R_g(t) \right)\left(\delta_{h,\ell} - R_h(t)\right) \left(\sum_{i\in\ell} \frac{\partial N_i(s)}{S^2_{P_i}}\right)\]

Under $H_0$, the statistic $U(T)$ is asymptotically $\chi^2(k-1)$-distributed. We thus reject the $H_0$ hypothesis when the p-value obtained is under $0.05$, admitting the notable difference between the groups.

NetSurvival.GraffeoTestType
GraffeoTest

The Grafféo test is a log-rank type test and is typically used in net survival analysis to determine the impact of certain covariates in the study.

The null $(H_0)$ hypothesis tests the following assumption:

\[\forall t \in [0,T], \Lambda_{E,g_1}(t) = \Lambda_{E,g_2}(t) = ... = \Lambda_{E,g_k}(t)\]

where $G = {g_1,...,g_k}$ is a partition of $1,...,n$ consisting of disjoint groups of individuals that we wish to compare to each other. For all group $g \in G$, let's denote the numerator and denominator of the Pohar Perme (partial) excess hazard estimators, restricted to individuals in the group, by:

  • $\partial N_{E,g}(s) = \sum_{i \in g} \frac{\partial N_i(s)}{S_{P_i}(s)} - \frac{Y_i(s)}{S_{P_i}(s)}\partial\Lambda_{P_i}(s)$
  • $Y_{E,g}(s) = \sum_{i \in g} \frac{Y_i(s)}{S_{P_i}(s)}$
  • $R_{g}(s) = \frac{Y_{E,g}(s)}{\sum_{g\in G} Y_{E,g}(s)}$

Then, define the vector $\mathbf Z = \left(Z_{g_r}: r \in 1,...,k-1 \right)$ with entries:

\[Z_g(T) = N_{E,g}(s) - \int_{0}^T Y_{E,g}(s) \partial\hat{\Lambda}_E(s)\]

The test statistic is then given by:

\[U(T) = \mathbf Z(T)'\hat{\Sigma}_Z^{-1} \mathbf Z(T)\]

where the entries of the $\hat{\Sigma}_Z$ matrix are given by:

\[\sigma_{g,h}(T) = \int_0^T \sum_{\ell \in G} \left(\delta_{g,\ell} - R_g(t) \right)\left(\delta_{h,\ell} - R_h(t)\right) \left(\sum_{i\in\ell} \frac{\partial N_i(s)}{S^2_{P_i}}\right)\]

Under $H_0$, the statistic $U(T)$ is asymptotically $\chi^2(k-1)$-distributed.

To apply the test to your data based on a certain rate table, apply the example below to your code :

fit(GraffeoTest, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)

If you wish to stratify a covariate:

fit(GraffeoTest, @formula(Surv(time,status)~covariable1 + Strata(covariable2)), data, ratetable)

The produced test statistics is supposed to follow a chi squared distribution under $(H_0)$. You can fetch the results using the .stat, .df and .pval fields of the returned object.

References:

  • [5] Grafféo, Nathalie and Castell, Fabienne and Belot, Aurélien and Giorgi, Roch (2016). A Log-Rank-Type Test to Compare Net Survival Distributions.
source

Nessie

The Nessie function estimates the sample size by yearly intervals as well as averages an estimated lifespan left for a given group.

This function is highly dependant on the Life function taken from the RateTables.jl package which you can find documented here.

The sample size is thus taken by the following formula:

\[ESS(t) = \sum_i^N S_{P_i}(t)\]

While the estimated lifepsan is directly taken from the expectation function.

NetSurvival.nessieFunction
nessie

To call this function, use the formula below:

nessie(@formula(Surv(time,status)~covariate), data, ratetable)
source

Relaxing the independence assumption

The independence assumption of the random vector $(E,P)$ can be relaxed by specifying a dependence structure for this random vector, defined by a copula from Copulas.jl. The description of the underlying method to compute the net survival, its variance and associated log-rank tests under these dependence assumptions are given in [11].

The generalization of the Pohar Perme estimator with a given copula C::Copulas.Copula can be used as follows:

C = FrankCopula(2,-10)
fit(GenPoharPerme(C), @formula(Surv(time, status)~ x1 + x2), data, ratetable)
fit(GraffeoTest(C), @formula(Surv(time, status)~ x1 + x2), data, ratetable)

By default, GraffeoTest(C::Copula) uses GenPoharPerme(C) as a method to compute net survival in each category, but this modification also allows to use other methods, such as:

fit(GraffeoTest(Ederer1()), @formula(Surv(time, status)~ x1 + x2), data, ratetable)

Even if these results are not supported by any theoretical work and are probably meaningless, it is fun to to see that the code goes through, thanks to the modularity of Julia's dispatch, even on routes that were not designed for.

NetSurvival.GenPoharPermeType
GenPoharPerme

This method estimates net survival probabilities under the generalized pohar perme method, taking into account a copula for (E,P) as described in the reference paper. You may use it as:

fit(GenPoharPerme(copula), @formula(Surv(time, status)~ x1 + x2), data, ratetable)

Note that GenPoharPerme(::IndependenceCopula) simply returns PoharPerme.

References:

  • [11] Laverny, O., Grafféo, N., & Giorgi, R. (2025). Non-parametric estimation of net survival under dependence between death causes. arXiv preprint arXiv:2502.09273.
source

Available Datasets

Two classroom datasets are provided in the package:

NetSurvival.colrecConstant
colrec

The colrec dataset provides an example of relative survival dataset from slovenia, and should be paired with the slopop ratetable in the RateTables.jl package. It contains the following columns:

  • time, in days: follow-up time since diagnosis,
  • status, boolean: observed death (1) or censored (0),
  • age, in days: age at diagnosis,
  • year, in days since 0000-00-00: date of diagnosis,
  • sex, :male or :female,
  • stage, cancer stage, can be 1,2,3 or 99 (unknown),
  • site, cancer site, can be :rectum or :colon.

The 5971 patients were diagnosed with colon or rectal cancer in 1994-2000. The original source of the dataset is the relsurv R package. Data were provided by the Slovene Cancer Registry. The original data collected were slightly modified for privacy purpose and legal reason. Because of this data perturbation, neither dates nor ages correspond strictly to their real original values. This dataset is therefore provided as a pedagogical and methodological example and should not be used for medical research purposes.

References:

  • [8] Pohar Perme, Maja and Pavlic, Klemen (2018). Nonparametric relative survival analysis with the R package relsurv. Journal of Statistical Software
  • [12] Zadnik V, Primic Žakelj M, Krajc M (2012). Cancer Burden in Slovenia in Comparison with the Burden in Other European Countries. Zdravniški Vestnik
  • [13] Zadnik V, Žagar T, Primic Žakelj M (2016). Cancer Patients’ Survival: Standard Calculation Methods and Some Considerations Regarding Their Interpretation. Zdravstveno Varstvo
source
NetSurvival.ccolonConstant
ccolon

The ccolon dataset provides an example of relative survival dataset from france, and should be paired with the survexp_fr ratetable in the RateTables.jl package. It contains the following columns:

  • time, in days: follow-up time since diagnosis,
  • status, boolean: observed death (1) or censored (0),
  • age, in days: age at diagnosis,
  • year, in days since 0000-00-00: date of diagnosis,
  • sex, :male or :female,
  • stage, interger from 0 to 3, correspnding to Cancer TNM (tumor node metastatis) stages at diagnosis, either I, II, III, IIIb or IV.
  • side, the primary tumor location, either :right or :left of the colon.

This dataset has been studied in [14], [15] and [11]. It contains population-based survival data on cases of colorectal cancer from the Registry of Digestive Cancers in Burgundy, France, diagnosed between 1976 and 1990. Patients were censored 1) after 10 years of followup and 2) on the 31 December 1994. The original data collected were slightly modified for privacy purpose and legal reason. Because of this data perturbation, neither dates nor ages correspond strictly to their real original values. This dataset is therefore provided as a pedagogical and methodological example and should not be used for medical research purposes.

References:

  • [14] Giorgi, Roch and Abrahamowicz, Michal and Quantin, Catherine and Bolard, Philippe and Esteve, Jacques and Gouvernet, Joanny and Faivre, Jean. A relative survival regression model using B-spline functions to model non-proportional hazards. Statistics in medicine
  • [15] Wolski, Anna and Graff{'e}o, Nathalie and Giorgi, Roch and {the CENSUR working survival group}. A Permutation Test Based on the Restricted Mean Survival Time for Comparison of Net Survival Distributions in Non-Proportional Excess Hazard Settings. Statistical Methods in Medical Research.
  • [11] Laverny, O., Grafféo, N., & Giorgi, R. (2025). Non-parametric estimation of net survival under dependence between death causes. arXiv preprint arXiv:2502.09273.
source

References

[1]
F. Ederer. The relative survival rate: a statistical methodology. Natl. Cancer Inst. Monogr. 6, 101–121 (1961).
[2]
F. Ederer and H. Heise. The effect of eliminating deaths from cancer in general survival rates, methodological notes 11. End Result Evaluation Section, National Cancer Institute (1959).
[3]
T. Hakulinen. On long-term relative survival rates. Journal of Chronic Diseases 30, 431–443 (1977).
[4]
M. Pohar Perme, J. Stare and J. Estève. On Estimation in Relative Survival. Biometrics 68, 113–120 (2011).
[5]
N. Grafféo, F. Castell, A. Belot and R. Giorgi. A Log-Rank-Type Test to Compare Net Survival Distributions. Biometrics 72, 760–769 (2016).
[6]
T. R. Fleming and D. P. Harrington. Counting Processes and Survival Analysis. Vol. 625 (John Wiley & Sons, 2013).
[7]
P. K. Andersen, O. Borgan, R. D. Gill and N. Keiding. Statistical Models Based on Counting Processes. Springer Series in Statistics (Springer US, New York, NY, 1993).
[8]
M. Pohar Perme and K. Pavlic. Nonparametric relative survival analysis with the R package relsurv. Journal of Statistical Software 87, 1–27 (2018).
[9]
H. Charvat and A. Belot. Mexhaz: An R package for fitting flexible hazard-based regression models for overall and excess mortality with a random effect. Journal of Statistical Software 98, 1–36 (2021).
[10]
K. A. Cronin and E. J. Feuer. Cumulative cause-specific mortality for cancer patients in the presence of other causes: a crude analogue of relative survival. Statistics in medicine 19, 1729–1740 (2000).
[11]
[12]
V. Zadnik, M. P. Žakelj and M. Krajc. Cancer burden in Slovenia in comparison with the burden in other European countries. Slovenian Medical Journal 81 (2012).
[13]
V. Zadnik, T. Žagar and M. P. Žakelj. Cancer patients’ survival: standard calculation methods and some considerations regarding their interpretation. Slovenian Journal of Public Health 55, 144–151 (2016).
[14]
R. Giorgi, M. Abrahamowicz, C. Quantin, P. Bolard, J. Esteve, J. Gouvernet and J. Faivre. A relative survival regression model using B-spline functions to model non-proportional hazards. Statistics in medicine 22, 2767–2784 (2003).
[15]
A. Wolski, N. Grafféo, R. Giorgi and the CENSUR working survival group. A Permutation Test Based on the Restricted Mean Survival Time for Comparison of Net Survival Distributions in Non-Proportional Excess Hazard Settings. Statistical Methods in Medical Research 29, 1612–1623 (2020).