RateTables.jl

The RateTables.jl Julia package provides daily rate table objects extracted from census datasets, tailored for person-year computations. This functionality is similar to R's ratetable class. You can install and load it through:

using Pkg
Pkg.add("https://github.com/JuliaSurv/RateTables.jl")

RateTables objects

Loading this package exports several constant RateTable objects, which list given below. Since they are exported, you can simply call them after using RateTables as follow :

using RateTables
hmd_rates
RateTable(:country, :sex)

This hmd_rates rate table represents mortality rates extracted from the Human Mortality Database (HMD). The output of the REPL shows that we have a RateTable object with two covariates :country and :sex. You can query the available covariates of a given RateTable as such:

available_covariates(hmd_rates, :sex)
(:female, :male, :total)

For this specific dataset, the number of countries is huge and calling available_covariates(hmd_rates, :country) won't be very useful. Thus, for convenience and only for this dataset we provided details on the country codes separately in another constant object called hmd_countries:

hmd_countries
Dict{Symbol, String} with 50 entries:
  :fracnp  => "France, Civilian Population"
  :gbr_np  => "United Kingdom"
  :deutw   => "West Germany"
  :dnk     => "Denmark"
  :hkg     => "Hong Kong"
  :fratnp  => "France, Total Population"
  :jpn     => "Japan"
  :nzl_np  => "New Zealand"
  :gbrcenw => "England and Wales, Civilian National Population"
  :lva     => "Latvia"
  :cze     => "Czechia"
  :prt     => "Portugal"
  :esp     => "Spain"
  :svn     => "Slovenia"
  :bel     => "Belgium"
  :gbr_sco => "Scotland"
  :irl     => "Ireland"
  :nzl_nm  => "New Zealand -- Non-Maori"
  :che     => "Switzerland"
  ⋮        => ⋮

You can then use these covariates to subset the Rate Table object:

brt = hmd_rates[:svn,:male]
BasicRateTable:
    ages, in years from 0 to 110 (in days from 0.0 to 40176.509999999995)
    date, in years from 1983 to 2019 (in days from 724272.9029999999 to 737421.579) 

You obtain another object of the class BasicRateTable, as the core of the implementation. These objects have very strict internal characteristics. They mostly hold a matrix of daily hazard rates, indexed by ages (yearly) and dates (yearly too). The show function shows you the ranges of values for both ages and dates. When we constructed the life tables, we took care of other irregularities so that they all have exactly this shape (yearly intervals on both axes).

The most important thing that you can do with them is querying mortality rates, which is done through the daily_hazard function.

Daily hazard

Recall that the daily hazard rate of mortality is defined as $-\log(1 - q_x)/365.241$ for an annual death rate $q_x$. We present an alternative approach to displaying mortality tables that is particularly convenient for person-year computations. To obtain daily rates from the tables, you can use the daily_hazard function. Its arguments need to be in the following specific format:

  • The age parameter should be provided in days, with the conversion factor being 1 year = 365.241 days.
  • The date parameter should be provided in days as well, with the same conversion factor.
  • The format of other covariates may vary between rate tables, but it's essential to consider that their order is significant.

The sex covariate typically has values such as :male and :female, and sometimes :total. For the hmd_rates table, we have previously observed two additional covariates: country and sex. Recall that you can use the available_covariates function to obtain these informations.

There are several querying syntax, all lowering to the same code. You are free to choose the syntax that you prefer. Depending on the querying syntax, the order of the passed arguments can be significant. For instance, the daily hazard rate for a Slovene male, on his 20th birthday, which happens to fall on the tenth of January 2010, can be queried using one of the following syntaxes:

c = :svn # slovenia.
s = :male
a = 20 * 365.241 # twenty years old
d = 2010 * 365.241 + 10 # tenth of january 2010

v1 = daily_hazard(hmd_rates, a, d, c, s)
v2 = daily_hazard(hmd_rates, a, d; country=c, sex=s)
v3 = daily_hazard(hmd_rates, a, d; sex=s, country=c) # when using kwargs syntax, the order of additional covariates does not matter.
v4 = daily_hazard(hmd_rates[c, s], a, d) # here, the order of the arguments (c,s) matters.
(v1,v2,v3,v4)
(1.8021431794632215e-6, 1.8021431794632215e-6, 1.8021431794632215e-6, 1.8021431794632215e-6)

For completeness, this package also includes datasets commonly used in R for census data, particularly, the relsurv::slopop dataset pertaining to Slovenia:

daily_hazard(slopop, a, d; sex=s)
8.66087235766869e-7

Note the discrepancy with the HMD data: the source of the information is not exactly the same and so the rates dont perfectly match. Another example with additional covariates would be the survival::survexp.usr dataset which includes race as a covariate. In this case, the calling structure remains similar:

r = :white
v1 = daily_hazard(survexp_usr, a, d, s, r)
v2 = daily_hazard(survexp_usr, a, d; sex=s, race=r)
v3 = daily_hazard(survexp_usr, a, d; race=r, sex=s)
v4 = daily_hazard(survexp_usr[s, r], a, d)
(v1,v2,v3,v4)
(2.785811075678776e-6, 2.785811075678776e-6, 2.785811075678776e-6, 2.785811075678776e-6)

Please note that retrieving these daily hazards is a highly sensitive operation that is very optimized for speed, especially considering it's often used within critical loops. As such, we prioritized the performance of our fetching algorithms over convenience of other parts of the implementation. The core algorithm is as follows:

  • Fetch the right BasicRateTable from a dictionary using the provided covariates
  • Convert from days to years the provided ages and dates
  • Index the rate matrix at corresponding indices.

If you feel like you are not getting top fetching performance, please open an issue.

Life random variables

The Life function is used to extract individual life profiles (as random variables compliant with Distributions.jl's API) from a RateTable, by using covariates such as age, gender, and health status or others. Once these life profiles are established, they serve as foundational elements for various analytical practices such as survival probability estimations, expected lifespan calculations, and simulations involving random variables related to life expectancy.

When applying it to a male individual aged $20$ in $1990$, we get the outcome below:

L = Life(slopop[:male], 7000, 1990*365.241)
Life(
∂t: [3.637978807091713e-12, 304.81999999999607, 60.421000000003914, 304.81999999999607, 60.421000000003914, 304.81999999999607, 60.421000000003914, 304.81999999999607, 60.421000000003914, 304.81999999999607  …  365.241, 365.241, 365.241, 365.241, 365.241, 365.241, 365.241, 365.241, 365.241, 365.241]
λ: [5.5910580526963e-6, 3.17782866861003e-6, 3.45195218341532e-6, 4.76812723189004e-6, 4.60357074517423e-6, 3.39712528476538e-6, 3.45195218341532e-6, 4.1922227932534e-6, 5.20699288852109e-6, 4.21964406742172e-6  …  0.0011935251398196602, 0.0012526563832215363, 0.0011598216701132712, 0.0011970556602975712, 0.0013592036116259977, 0.000734484865047132, 0.0014403998891054917, 0.0014403998891054917, 0.0014403998891054917, 0.0014403998891054917]
)

Since hazard rates are constants on each cell of a rate tables, the life expectation can be computed exactly through the following formula:

\[\mathbf{E}(P) = \int_0^\inf S_p (t) dt = \sum_{j=0}^\inf \frac{S_p(t_j)}{\lambda_p(t_j)(1 - exp(-\lambda_p(t_j)(t_{j+1}-t_j)))}\]

Two approximations are made when the life gets out of the life table:

  • The last line of the ratetable is assumed to last until eternity. Indeed, the last line represents persons that are already 110 years old, and thus assuming that their future death rates are constants is not that much of an issue.
  • When on the other hand a life exits the ratetable from the right, i.e. into the future but at a young age, we assume the last column of the rate table to define the future for this person.

All this is implemented as a method for the Distributions.expectation function, since Lifes are random variables:

expectation(L)/365.241
57.70236509448671

On this example, we get $57.7$ years left, implying a total life expectancy of about $77$ years for the given individual.

These random variables comply with the Distributions.jl's API.

Exported RateTables

RateTables.frpopConstant
frpop

French census datas, sourced from the Human mortality database (not exaclty the same series as hmd_rates[:fr]).

Segmented by sex ∈ (:male, :female)

source
RateTables.hmd_ratesConstant
hmd_rates

RateTable providing daily hazard rates for both sexes for several countries. They are derived from annual death probabilities (qₓ's) from the Human Mortality Database

Segmented by country ∈ keys(hmd_countries) and sex ∈ (:male, :female, :total).

The list of countries codes is given with details in the hmd_countries constant.

source
RateTables.slopopConstant
slopop

Slovene census data. Correspond to R's relsurv::slopop ratetable from the relsurv package. Segmented by sex ∈ (:male, :female).

source
RateTables.survexp_frConstant
survexp_fr

French census datas, drawn from R's package survexp.fr. Death rates are available from 1977 to 2019 for males and females aged from 0 to 99. Segmented by sex ∈ (:male, :female)

Source: https://www.insee.fr/fr/statistiques/fichier/5390366/fm_t68.xlsx

References: Institut National de la Statistique et des Etudes Economiques

source
RateTables.survexp_mnConstant
survexp_mn

Census data set for the US population, drawn from R's package survival. RateTable survexp_mn gives total Minnesota population, by age and sex, 1970 to 2013. Segmented by sex ∈ (:male, :female)

source
RateTables.survexp_usConstant
survexp_us

Census data set for the US population, drawn from R's package survival. RateTable survexp_us gives total United States population, by age and sex, 1940 to 2012. Segmented by sex ∈ (:male, :female)

source
RateTables.survexp_usrConstant
survexp_usr

Census data set for the US population, drawn from R's package survival. RateTable survexp_usr gives the United States population, by age, sex and race, 1940 to 2014. Race is white or black. For 1960 and 1970 the black population values were not reported separately, so the nonwhite values were used. (Over the years, the reported tables have differed wrt reporting non-white and/or black.). Segmented by sex ∈ (:male, :female) and race∈ (:white, :black)`.

source

Other docstrings

RateTables.LifeType
Life(brt::BasicRateTable,a,d)

This function returns a random variable that correspond to an extracted Life from the BasicRateTable at age a and date d.

This works by checking if the individual is closer to the oldest age than the last year in the ratetable, calculating at each step the time difference and the hazard values. For the younger individuals, we assume they go through the last column at the end no matter what age they are.

source
RateTables.RateTableType
RateTable

This class contains daily rate tables used in person-years computation.

Each of these tables contains the daily hazard rate for a matched subject from the population, defined as $-\log(1-qₓ)$ for $qₓ$ the 1 year probability of death as reported in the original tables from the US Census. The tables are given in terms of hazard per day for computational convenience.

source
RateTables.cumhazardMethod
cumhazard

Assuming the last box is infinitely wide, we calculate the cumulative hazard from ∂t and λ taken from the Life function.

source
RateTables.daily_hazardMethod
daily_hazard(rt::BasicRateTable, age, date)
daily_hazard(rt::RateTable,      age, date, args...)
daily_hazard(rt::RateTable,      age, date; kwargs...)

This function queries daily hazard values from a given BasicRateTable. The parameters age and date have to be in days (1 year = 365.241 days). Potential args and kwargs will be used to subset the ratetable.

source