Formulas for the Kaplan–Meier estimator
Let represent the distinct event times. For each event time , let be the size of the risk set (the number of surviving observations) just prior to .
Survival data is assumed when outcome.type = "survival".
Let
be the number of failures at
.
The Kaplan–Meier estimator of the survival function
is as below.
Kaplan–Meier estimator
Note that the estimator is defined to be right-continuous, so the events at are included in the estimate of .
The variance or standard error of the Kaplan–Meier estimator is often calculated with the Greenwood formula. This formula is derived from a binomial argument, so extension to the weighted case is ad hoc. Alternatively, Tsiatis (1981) proposes a slightly different formula based on a counting process argument which includes the weighted case.
Greenwood variance
Tsiatis variance
Xie and Liu (2005) considered Greenwood-type variance for the weighted Kaplan–Meier estimator. Suppose that the survival data consists of (), an independent sample of right-censored survival data with weights. Let and be the weighted number of failures and the weighted number at risk, respectively, at time . The weighted Kaplan–Meier estimator of the survival function is
The variance proposed by Xie and Liu (2005) is expressed as an adjusted Greenwood formula,
Greenwood variance for weighted Kaplan–Meier where is an adjustment factor defined as The Tsiatis-type variance is calculated as follows in the same spirit.
Tsiatis variance for weighted Kaplan–Meier
Formulas for the Aalen-Johansen estimator
Competing risks (outcome.type = "competing-risk") arise
in studies in which individuals are exposed to two or more mutually
exclusive failure events. When a failure occurs, we observe the time to
event
and the cause of failure
.
Suppose that
and
represent the event of interest and the competing risk, respectively.
Let
be the number of failures of cause
at time
,
and now the total number of failures at
is
.
The Aalen-Johansen estimator of CIF for cause is as below.
Aalen-Johansen estimator
where is the overall survival function.
Two variance estimators of the Aalen-Johansen estimator are commonly used: one based on counting process theory (Aalen, 1978) and the other based on the delta method.
Aalen variance
Delta method variance
Variance based on influence functions
It is known that the Aalen-Johansen estimator can be expanded under
regularity conditions as
and the process
converges weakly to a tight Gaussian process. Here
is the influence function, the contribution of
-th
observation to the Aalen-Johansen estimator, and may be written as
where and is the Martingale process of the total count and the count of cause of -th observation, respectively, and is the at-risk process. A consistent variance estimator for is .
Confidence interval options
Standard errors The default in
cifcurve() with weights=NULL is the Greenwood
SE when outcome.type="survival" and the delta SE when
outcome.type="competing-risk". The default in
cifcurve() with weights is the SE based on influence
functions. By default cifcurve() rescales the
Greenwood/Tsiatis quantities so that std.err is reported on
the probability scale; set report.survfit.std.err = TRUE to
return the conventional log-survival SEs from
survival::survfit().
Confidence intervals cifcurve()
constructs intervals on the probability scale using the requested
transformation:
"arcsine-square root"/"arcsin"/"a"
(default), "plain", "log",
"log-log", or "logit". Passing
"none"/"n" skips interval computation
entirely. The function exponentiates back to the probability scale,
clips bounds to [0, 1], and replaces undefined values with
NA so that interval endpoints remain well behaved in plots
and summaries.