Getting Started
In many population-based studies, the specific cause of death is unidentified, unreliable or even unavailable. Relative survival analysis addresses this scenario, previously unexplored in general survival analysis. Different methods were created with the aim to construct a consistant and reliable estimator for this purpose.
Net survival settings
Note first that, for any positive random variable $X$, we will use extensively the following functions (that each fully characterize the distribution of $X$):
Function symbol and definition | Function Name |
---|---|
$S_X(t) = \mathbb P(X > t)$ | Survival function |
$\Lambda_X(t) = -\ln S_X(t)$ | Cumulative Hazard function |
$\lambda_X(t) = \partial \Lambda_X(t)$ | Instantaneous hazard function |
Consider a study that consists of censored survival times from a specific cause. Such a study consists of several random objects:
Random variable | Name | Is it observed ? |
---|---|---|
$E$ | "Excess" lifetime | ✘ |
$P$ | "Population" lifetime | ✘ |
$O = E \wedge P$ | "Overall" lifetime | ✘ |
$C$ | "Censoring" time | ✘ |
$\mathbf D$ | Vector of covariates | ✔ |
$T = O \wedge C$ | Event time | ✔ |
$\Delta = \mathbf{1}\{T \leq C\}$ | Event status | ✔ |
It is important to note that we do not observe a a potential indicator $\mathbf{1}\{E \geq P\}$. This is one of the key differences between net survival and standard survival. The standard Net survival analysis solves this problem by assuming that the underlying times $E$ and $P$ are independent from each other.
One central hypothesis in net survival (on top of non-informative censoring) is that $P$ and $E$ are independent. This independence can be written in terms of hazard rates $\lambda_O(t) = \lambda_P(t) + \lambda_E(t)$, or in terms of survival functions $S_O(t) = S_P(t)S_E(t)$.
The population hazard for each individual $\lambda_{P_i}$ is usally drawn from a reference life table, and may depend on covariates $\mathbf D_i$ such as age and date, sex, country, race, etc... See the RateTables.jl package for more details on the potential covariates. On the other hand, the excess mortality is assumed to be i.i.d. between individuals and not to depend on covariates at all. Thus, we mostly omit these covariates from our notations.
Available Estimators
The estimation of net survival is usually discussed in terms of the estimation of the cumulative excess hazard $\Lambda_E(t)$ and/or the instantaneous hazard $\lambda_E = \partial\Lambda_E$. To describe the estimators, we use the following counting processes notations, similar to standard survival analysis(see e.g. [6] or [7]).
- The uncensored event indicatrix $\partial N_i(t)$ for individual $i$ at time $t$
- The total number of uncensored events process $\partial N(t) = \sum_i \partial N_i(t)$ at time $t$
- The at-risk indicatrix $Y_i(t)$, for whether an individual is still at risk
- The total number at risk process $Y(t) = \sum_i Y_i(t)$ at time $t$
With these definitions and assumptions in mind, we will now present the four different methods implemented in this package, commonly used in literature, to estimate the excess hazard function $\partial\Lambda_E(t)$ and its variance. Recall that we estimate the variance as $\sigma_E^2(t) = \int_{0}^t \partial\sigma_E^2(s)$.
Name | Reference | Proposed (partial) Excess Hazard $\partial\hat{\Lambda}_E(s)$ | Proposed (partial) Variance $\partial\hat{\sigma}_E^2(s)$ |
---|---|---|---|
Ederer I | [1] | $\frac{\sum_i N_i(s)}{\sum_i Y_i(s)} - \frac{\sum_i S_{P_i}(s)\partial\Lambda_{P_i}(s)}{\sum_i S_{P_i}(s)}$ | $\frac{\sum_i N_i(s)}{\left(\sum_i Y_i(s)\right)^2}$ |
Ederer II | [2] | $\frac{\sum_i N_i(s)}{\sum_i Y_i(s)} - \frac{\sum_i Y_i(s)\partial\Lambda_{P_i}(s)}{\sum_i Y_i(s)}$ | $\frac{\sum_i N_i(s)}{\left(\sum_i Y_i(s)\right)^2}$ |
Hakulinen | [3] | $\frac{\sum_i N_i(s)}{\sum_i Y_i(s)} - \frac{\sum_i \frac{Y_i(s)}{ S_{P_i}(s)}\partial\Lambda_{P_i}(s)}{\sum_i \frac{Y_i(s)}{ S_{P_i}(s)}}$ | $\frac{\sum_i N_i(s)}{\left(\sum_i Y_i(s)\right)^2}$ |
Pohar Perme | [4] | $\frac{\sum_i \frac{\partial N_i(s)}{S_{P_i}(s)} - \sum_i \frac{Y_i(s)}{S_{P_i}(s)}\partial\Lambda_{P_i}(s)}{\sum_i \frac{Y_i(s)}{S_{P_i}(s)}}$ | $\frac{\sum_{i=1}^n \frac{\partial N_i(s)}{S^2_{P_i}}}{\left(\sum_i \frac{Y_i(s)}{S_{p_i}(s)}\right)^2}$ |
where, in the variances, it is understood that when no more individuals are at risk $0/0$ gives $0$.
The Pohar Perme estimator [4] is the newest addition to relative survival analysis between the four methods, particularly designed to handle situations where covariates may change over time. It is trusted from the field (see e.g. [8] and [9]) that only this estimator should really be used, the other ones being included mostly for historical reasons and comparisons.
NetSurvival.PoharPerme
— TypePoharPerme
This method estimates net survival probabilities by applying the following estimation:
\[\partial\hat{\Lambda}_E(t) = \frac{\sum_i \frac{dN_i(u)}{S_{P_i}(u)} - \sum_i \frac{Y_i(u)}{S_{P_i}(u)}d\Lambda_{P_i}(u)}{\sum_i \frac{Y_i(u)}{S_{P_i}(u)}}\]
To fit the Pohar Perme to your data based on a certain rate table, apply the example below to your code :
fit(PoharPerme, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)
References:
- [4] Perme, Maja Pohar and Stare, Janez and Estève, Jacques (2012). On Estimation in Relative Survival.
NetSurvival.EdererI
— TypeEdererI
This method estimates net survival probabilities by applying the following estimation:
\[\partial\hat{\Lambda}_E(t) = \frac{\sum_i N_i(u)}{\sum_i Y_i(u)} - \frac{\sum_i S_{P_i}(u)d\Lambda_{P_i}(u)}{\sum_i S_{P_i}(u)}\]
To call this function:
fit(EdererI, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)
NetSurvival.EdererII
— TypeEdererII
This method estimates net survival probabilities by applying the following estimation:
\[\partial\hat{\Lambda}_E(t) = \frac{\sum_i N_i(u)}{\sum_i Y_i(u)} - \frac{\sum_i Y_i(u)d\Lambda_{P_i}(u)}{\sum_i Y_i(u)}\]
To call this function:
fit(EdererII, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)
NetSurvival.Hakulinen
— TypeHakulinen
To call this function:
fit(Hakulinen, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)
Crude Mortality
The crude mortality rate is the global mortality rate (from all causes of death) for a population. This measure does not use the cause of death information which, as we previously mentioned, can be unreliable and incomplete. It is given as:
\[M_E(t) = \int_0^t S_O(u-) \lambda_E(u)du\]
There exists a few estimators of this quantity, the most known one being the Cronin-Feuer estimator [10], given by:
\[\hat{M}_E(t) = \int_0^t \hat{S}_O(u-) \hat{\lambda}_E(u)du\]
where, traditionally, Kaplan-Meier is used to estimate the overall survival function $\hat{S}_O$ and Ederer II is used for the excess hazard rate $\hat{\lambda}_E$ and the population hazard rate $\hat{\lambda}_P$.
Our implementation is a bit more permissive, as any net survival estimators can be used for $\hat{\lambda}_E$. Of course, the default is still the original Ederer II which provides the original Cronin-Feuer estimator.
NetSurvival.CrudeMortality
— TypeCrudeMortality
This method calculates the crude mortality and presents both the excess mortality and population mortality rates.
The default Cronin-Feuer estimator can be fitted to data with the following interface:
fit(CrudeMortality, args...)
where the args
are passed to fit(EdererII,args...)
to compute the excess hazard.
A more direct syntax can be used, specifying directly the estimator for the excess hazard:
CrudeMortality(fit(EdererII,args...))
To use another excess hazard estimator, simply replace EdererII
with the method of your choice (PoharPerme
, EdererI
, Hakulinen
).
References:
- [10] Cronin, Kathleen A and Feuer, Eric J (2000). Cumulative cause-specific mortality for cancer patients in the presence of other causes: a crude analogue of relative survival.
Grafféo Log-Rank Test
The Grafféo Log-Rank Test [5] was constructed as a complement to the Pohar Perme estimator, aiming to compare the net survival functions provided by the latter. The test is designed to compare these functions across multiple groups, including stratified covariables, and to ultimately determine, with the given p-value, which covariables are impactful to the study.
The null $(H_0)$ hypothesis tests the following assumption:
\[\forall t \in [0,T], \; \; \Lambda_{E,g_1}(t) = \Lambda_{E,g_2}(t) = ... = \Lambda_{E,g_k}(t),\]
where $G = \{g_1,...,g_k\}$ is a partition of $1,...,n$ consisting of disjoint groups of individuals that we wish to compare to each other. For all group $g \in G$, let's denote the numerator and denominator of the Pohar Perme (partial) excess hazard estimators, restricted to individuals in the group, by:
- $\partial N_{E,g}(s) = \sum_{i \in g} \frac{\partial N_i(s)}{S_{P_i}(s)} - \frac{Y_i(s)}{S_{P_i}(s)}\partial\Lambda_{P_i}(s)$
- $Y_{E,g}(s) = \sum_{i \in g} \frac{Y_i(s)}{S_{P_i}(s)}$
- $R_{g}(s) = \frac{Y_{E,g}(s)}{\sum_{g\in G} Y_{E,g}(s)}$
Then, define the vector $\mathbf Z = \left(Z_{g_r}: \; r \in 1,...,k-1 \right)$ with entries:
\[Z_g(T) = N_{E,g}(s) - \int_{0}^T Y_{E,g}(s) \partial\hat{\Lambda}_E(s)\]
The test statistic is then given by:
\[U(T) = \mathbf Z(T)'\hat{\Sigma}_Z^{-1} \mathbf Z(T)\]
where the entries of the $\hat{\Sigma}_Z$ matrix are given by:
\[\sigma_{g,h}(T) = \int_0^T \sum_{\ell \in G} \left(\delta_{g,\ell} - R_g(t) \right)\left(\delta_{h,\ell} - R_h(t)\right) \left(\sum_{i\in\ell} \frac{\partial N_i(s)}{S^2_{P_i}}\right)\]
Under $H_0$, the statistic $U(T)$ is asymptotically $\chi^2(k-1)$-distributed. We thus reject the $H_0$ hypothesis when the p-value obtained is under $0.05$, admitting the notable difference between the groups.
NetSurvival.GraffeoTest
— TypeGraffeoTest
The Grafféo test is a log-rank type test and is typically used in net survival analysis to determine the impact of certain covariates in the study.
The null $(H_0)$ hypothesis tests the following assumption:
\[\forall t \in [0,T], \Lambda_{E,g_1}(t) = \Lambda_{E,g_2}(t) = ... = \Lambda_{E,g_k}(t)\]
where $G = {g_1,...,g_k}$ is a partition of $1,...,n$ consisting of disjoint groups of individuals that we wish to compare to each other. For all group $g \in G$, let's denote the numerator and denominator of the Pohar Perme (partial) excess hazard estimators, restricted to individuals in the group, by:
- $\partial N_{E,g}(s) = \sum_{i \in g} \frac{\partial N_i(s)}{S_{P_i}(s)} - \frac{Y_i(s)}{S_{P_i}(s)}\partial\Lambda_{P_i}(s)$
- $Y_{E,g}(s) = \sum_{i \in g} \frac{Y_i(s)}{S_{P_i}(s)}$
- $R_{g}(s) = \frac{Y_{E,g}(s)}{\sum_{g\in G} Y_{E,g}(s)}$
Then, define the vector $\mathbf Z = \left(Z_{g_r}: r \in 1,...,k-1 \right)$ with entries:
\[Z_g(T) = N_{E,g}(s) - \int_{0}^T Y_{E,g}(s) \partial\hat{\Lambda}_E(s)\]
The test statistic is then given by:
\[U(T) = \mathbf Z(T)'\hat{\Sigma}_Z^{-1} \mathbf Z(T)\]
where the entries of the $\hat{\Sigma}_Z$ matrix are given by:
\[\sigma_{g,h}(T) = \int_0^T \sum_{\ell \in G} \left(\delta_{g,\ell} - R_g(t) \right)\left(\delta_{h,\ell} - R_h(t)\right) \left(\sum_{i\in\ell} \frac{\partial N_i(s)}{S^2_{P_i}}\right)\]
Under $H_0$, the statistic $U(T)$ is asymptotically $\chi^2(k-1)$-distributed.
To apply the test to your data based on a certain rate table, apply the example below to your code :
fit(GraffeoTest, @formula(Surv(time,status)~covariable1 + covariable2), data, ratetable)
If you wish to stratify a covariate:
fit(GraffeoTest, @formula(Surv(time,status)~covariable1 + Strata(covariable2)), data, ratetable)
The produced test statistics is supposed to follow a chi squared distribution under $(H_0)$. You can fetch the results using the .stat
, .df
and .pval
fields of the returned object.
References:
- [5] Grafféo, Nathalie and Castell, Fabienne and Belot, Aurélien and Giorgi, Roch (2016). A Log-Rank-Type Test to Compare Net Survival Distributions.
Nessie
The Nessie function estimates the sample size by yearly intervals as well as averages an estimated lifespan left for a given group.
This function is highly dependant on the Life
function taken from the RateTables.jl
package which you can find documented here.
The sample size is thus taken by the following formula:
\[ESS(t) = \sum_i^N S_{P_i}(t)\]
While the estimated lifepsan is directly taken from the expectation
function.
NetSurvival.nessie
— Functionnessie
To call this function, use the formula below:
nessie(@formula(Surv(time,status)~covariate), data, ratetable)
Relaxing the independence assumption
The independence assumption of the random vector $(E,P)$ can be relaxed by specifying a dependence structure for this random vector, defined by a copula from Copulas.jl. The description of the underlying method to compute the net survival, its variance and associated log-rank tests under these dependence assumptions are given in [11].
The generalization of the Pohar Perme estimator with a given copula C::Copulas.Copula
can be used as follows:
C = FrankCopula(2,-10)
fit(GenPoharPerme(C), @formula(Surv(time, status)~ x1 + x2), data, ratetable)
fit(GraffeoTest(C), @formula(Surv(time, status)~ x1 + x2), data, ratetable)
By default, GraffeoTest(C::Copula)
uses GenPoharPerme(C)
as a method to compute net survival in each category, but this modification also allows to use other methods, such as:
fit(GraffeoTest(Ederer1()), @formula(Surv(time, status)~ x1 + x2), data, ratetable)
Even if these results are not supported by any theoretical work and are probably meaningless, it is fun to to see that the code goes through, thanks to the modularity of Julia's dispatch, even on routes that were not designed for.
NetSurvival.GenPoharPerme
— TypeGenPoharPerme
This method estimates net survival probabilities under the generalized pohar perme method, taking into account a copula for (E,P) as described in the reference paper. You may use it as:
fit(GenPoharPerme(copula), @formula(Surv(time, status)~ x1 + x2), data, ratetable)
Note that GenPoharPerme(::IndependenceCopula)
simply returns PoharPerme
.
References:
- [11] Laverny, O., Grafféo, N., & Giorgi, R. (2025). Non-parametric estimation of net survival under dependence between death causes. arXiv preprint arXiv:2502.09273.
Available Datasets
Two classroom datasets are provided in the package:
NetSurvival.colrec
— Constantcolrec
The colrec
dataset provides an example of relative survival dataset from slovenia, and should be paired with the slopop
ratetable in the RateTables.jl
package. It contains the following columns:
time
, in days: follow-up time since diagnosis,status
, boolean: observed death (1) or censored (0),age
, in days: age at diagnosis,year
, in days since 0000-00-00: date of diagnosis,sex
,:male
or:female
,stage
, cancer stage, can be 1,2,3 or 99 (unknown),site
, cancer site, can be:rectum
or:colon
.
The 5971 patients were diagnosed with colon or rectal cancer in 1994-2000. The original source of the dataset is the relsurv
R package. Data were provided by the Slovene Cancer Registry. The original data collected were slightly modified for privacy purpose and legal reason. Because of this data perturbation, neither dates nor ages correspond strictly to their real original values. This dataset is therefore provided as a pedagogical and methodological example and should not be used for medical research purposes.
References:
- [8] Pohar Perme, Maja and Pavlic, Klemen (2018). Nonparametric relative survival analysis with the R package relsurv. Journal of Statistical Software
- [12] Zadnik V, Primic Žakelj M, Krajc M (2012). Cancer Burden in Slovenia in Comparison with the Burden in Other European Countries. Zdravniški Vestnik
- [13] Zadnik V, Žagar T, Primic Žakelj M (2016). Cancer Patients’ Survival: Standard Calculation Methods and Some Considerations Regarding Their Interpretation. Zdravstveno Varstvo
NetSurvival.ccolon
— Constantccolon
The ccolon
dataset provides an example of relative survival dataset from france, and should be paired with the survexp_fr
ratetable in the RateTables.jl
package. It contains the following columns:
time
, in days: follow-up time since diagnosis,status
, boolean: observed death (1) or censored (0),age
, in days: age at diagnosis,year
, in days since 0000-00-00: date of diagnosis,sex
,:male
or:female
,stage
, interger from 0 to 3, correspnding to Cancer TNM (tumor node metastatis) stages at diagnosis, either I, II, III, IIIb or IV.side
, the primary tumor location, either:right
or:left
of the colon.
This dataset has been studied in [14], [15] and [11]. It contains population-based survival data on cases of colorectal cancer from the Registry of Digestive Cancers in Burgundy, France, diagnosed between 1976 and 1990. Patients were censored 1) after 10 years of followup and 2) on the 31 December 1994. The original data collected were slightly modified for privacy purpose and legal reason. Because of this data perturbation, neither dates nor ages correspond strictly to their real original values. This dataset is therefore provided as a pedagogical and methodological example and should not be used for medical research purposes.
References:
- [14] Giorgi, Roch and Abrahamowicz, Michal and Quantin, Catherine and Bolard, Philippe and Esteve, Jacques and Gouvernet, Joanny and Faivre, Jean. A relative survival regression model using B-spline functions to model non-proportional hazards. Statistics in medicine
- [15] Wolski, Anna and Graff{'e}o, Nathalie and Giorgi, Roch and {the CENSUR working survival group}. A Permutation Test Based on the Restricted Mean Survival Time for Comparison of Net Survival Distributions in Non-Proportional Excess Hazard Settings. Statistical Methods in Medical Research.
- [11] Laverny, O., Grafféo, N., & Giorgi, R. (2025). Non-parametric estimation of net survival under dependence between death causes. arXiv preprint arXiv:2502.09273.
References
- [1]
- F. Ederer. The relative survival rate: a statistical methodology. Natl. Cancer Inst. Monogr. 6, 101–121 (1961).
- [2]
- F. Ederer and H. Heise. The effect of eliminating deaths from cancer in general survival rates, methodological notes 11. End Result Evaluation Section, National Cancer Institute (1959).
- [3]
- T. Hakulinen. On long-term relative survival rates. Journal of Chronic Diseases 30, 431–443 (1977).
- [4]
- M. Pohar Perme, J. Stare and J. Estève. On Estimation in Relative Survival. Biometrics 68, 113–120 (2011).
- [5]
- N. Grafféo, F. Castell, A. Belot and R. Giorgi. A Log-Rank-Type Test to Compare Net Survival Distributions. Biometrics 72, 760–769 (2016).
- [6]
- T. R. Fleming and D. P. Harrington. Counting Processes and Survival Analysis. Vol. 625 (John Wiley & Sons, 2013).
- [7]
- P. K. Andersen, O. Borgan, R. D. Gill and N. Keiding. Statistical Models Based on Counting Processes. Springer Series in Statistics (Springer US, New York, NY, 1993).
- [8]
- M. Pohar Perme and K. Pavlic. Nonparametric relative survival analysis with the R package relsurv. Journal of Statistical Software 87, 1–27 (2018).
- [9]
- H. Charvat and A. Belot. Mexhaz: An R package for fitting flexible hazard-based regression models for overall and excess mortality with a random effect. Journal of Statistical Software 98, 1–36 (2021).
- [10]
- K. A. Cronin and E. J. Feuer. Cumulative cause-specific mortality for cancer patients in the presence of other causes: a crude analogue of relative survival. Statistics in medicine 19, 1729–1740 (2000).
- [11]
- O. Laverny, N. Grafféo and R. Giorgi. Non-parametric estimation of net survival under dependence between death causes (2025), arXiv:2502.09273 [math.ST].
- [12]
- V. Zadnik, M. P. Žakelj and M. Krajc. Cancer burden in Slovenia in comparison with the burden in other European countries. Slovenian Medical Journal 81 (2012).
- [13]
- V. Zadnik, T. Žagar and M. P. Žakelj. Cancer patients’ survival: standard calculation methods and some considerations regarding their interpretation. Slovenian Journal of Public Health 55, 144–151 (2016).
- [14]
- R. Giorgi, M. Abrahamowicz, C. Quantin, P. Bolard, J. Esteve, J. Gouvernet and J. Faivre. A relative survival regression model using B-spline functions to model non-proportional hazards. Statistics in medicine 22, 2767–2784 (2003).
- [15]
- A. Wolski, N. Grafféo, R. Giorgi and the CENSUR working survival group. A Permutation Test Based on the Restricted Mean Survival Time for Comparison of Net Survival Distributions in Non-Proportional Excess Hazard Settings. Statistical Methods in Medical Research 29, 1612–1623 (2020).