Skip to contents

Model formula and outcome.type in cifcurve() and polyreg()

Common ground

  • Both functions expect a Surv() or Event() response on the LHS of the model formula so that follow-up time and event indicators are supplied in the same way. Internally the helpers validate non-negative times, check that the event codes match the user-specified code.event1 and code.event2 arguments, and derive indicators used downstream.
  • The outcome.type argument is normalized by the same utility, meaning that the same set of synonyms (e.g. "C", "competing risk", or "S") are accepted in both workflows and mapped onto the canonical labels used in the estimators.

Key differences

  • In cifcurve() the RHS of the formula specifies the grouping variable whose curves are estimated; weights (if any) are supplied separately. There is no dedicated exposure argument because all strata are handled through the formula term that is turned into a factor for plotting and estimation.
  • In polyreg() the nuisance.model formula describes only the outcome and nuisance covariates (i.e. confounders), while the exposure is passed separately through the exposure argument. The exposure design matrix, reference level handling, and scaling of nuisance covariates are handled explicitly for regression fitting.
  • cifcurve() focuses on two estimands, selecting between the Kaplan-Meier and Aalen-Johansen estimators via outcome.type = "survival" or "competing-risk". In contrast, polyreg() supports additional time-constant and binomial estimands ("proportional-survival", "proportional-competing-risk", "binomial") and enforces time-point requirements accordingly—for example time.point is mandatory for the effects at a fixed time, while time-constant-effect (e.g. proportional subdistribution hazards) models can either use all observed event times or a user-specified grid supplied via a (vector) time.point.

Event coding conventions incifmodeling and CDISC ADaM ADTTE

When working with survival data, one common source of confusion arises from inconsistent event coding conventions between the standard input of Surv(), provided by the survival package, and the CDISC ADaM ADTTE (Analysis Data Model for Time-to-Event Endpoints) used in regulatory clinical trials. With Surv(), event is often coded as 1 and censoring is coded as 0. By contrast, the CDISC ADaM ADTTE (commonly used in pharmaceutical submissions and SDTM-based analyses) follows the opposite convention. CNSR = 0 means that event occurred and CNSR = 1 represents censoring, which aligns with the broader ADaM convention (where 1 = TRUE).

To accommodate differences in coding conventions, one can explicitly specify event and censoring codes using the code.event1 and code.censoring arguments in cifcurve(), cifplot(), cifpanel() and polyreg(). This allows users to perform accurate survival analysis using functions from the cifmodeling package while maintaining the CDISC ADaM ADTTE coding scheme.

polyreg(Event(AVAL, CNSR) ~ ARM, data = adtte, 
        outcome.type = "survival", code.event1=0, code.censoring=1)

Key arguments of msummary() that are helpful when reporting polyreg() results

  • output controls the destination of the table (e.g., "markdown", "latex", "html", or a file path such as 'results/cif_table.docx').
  • coef_map or coef_rename renames model terms for interpretability.
  • statistic specifies which uncertainty summaries to display. Examples: statistic = "p.value" for p-values, statistic = "conf.int" for Wald CIs, statistic = "t" for t statistics, and statistic = "z" for z statistics, instead of SEs (default).

You can also supply multiple model summaries as a named list to compare estimates across specifications in a single table. See vignette('modelsummary') for a comprehensive overview of additional layout and styling options.

Processing pipeline of cifcurve()

A user-specified model formula in Event() or Surv() is first parsed by util_read_surv() to extract the time-to-event, the event indicator, optional weights, and any stratification variables. Those components are then passed to the back-end estimators implemented in src/calculateAJ_Rcpp.cpp. The C++ routine supports both weighted and stratified data, so the heavy lifting for the Kaplan-Meier or Aalen-Johansen estimates (and associated at-risk, event, censor counts and influence functions) happens efficiently regardless of how the input was specified. The error argument determines how SEs are computed.

Processing pipeline of polyreg()

Input pre-processing mirrors the survival workflow by calling util_check_outcome_type(), reg_check_effect.measure(), and reg_check_input() to parse the format of inputs, event codes, exposure levels, and requested estimands before createAnalysisDataset() builds the modeling frame and reg_normalize_covariate() rescales covariates by default. Starting values are obtained from the nuisance model fits via getInitialValues() or getInitialValuesProportional(), or, when necessary, from user-supplied data frame of starting values. Inverse-probability-of-censoring weights for survival and competing-risks outcomes are calculated by calculateIPCW(), which in turn wraps the calculateKM() C++ routine to obtain the Kaplan-Meier estimates of censoring, while the proportional models assemble matrices through calculateIPCWMatrix().

The weighted estimating equations are then solved by solveEstimatingEquation(), which dispatches to specific score functions such as estimating_equation_ipcw(), estimating_equation_survival(), and estimating_equation_proportional() and iterates with nleqslv/Levenberg-Marquardt updates until the convergence criteria in assessConvergence() are met. Sandwich variances come from calculateCov() or calculateCovSurvival() and are rescaled back to the original covariate scale by reg_normalize_estimate(), while optional bootstrap intervals reuse the same solver inside the boot resampling loop. The resulting influence functions, weights, and fitted CIFs are retained in diagnostic.statistics for downstream diagnostics.

Reproducibility and conventions for polyreg()

  • If convergence warnings appear, relax/tighten tolerances or cap the parameter bound (optim.parameter1 to optim.parameter3) and inspect report.optim.convergence = TRUE.
  • If necessary, try modifying other optim.parameter, providing user-specified initial values, or reducing the number of nuisance parameters (e.g. provide a time.point grid that contributes to estimation when using "proportional-survival" and "proportional-competing-risk").
  • Set boot.seed for reproducible bootstrap results.
  • Match CDISC ADaM conventions via code.event1 = 0, code.censoring = 1 (and, if applicable, code.event2 for competing events).