A Story of Coffee Chat and Research Hypothesis

Study Design I − A Story of Coffee Chat and Research Hypothesis
Keywords: hypothesis/outcome/population, observational study, study design, language/writing
A daughter beginning her first study and her father, a statistician
Me: “Dad…you really are a professor of statistics, right?”
Dad: “In Japan, yes. One of many.”
Me: “My boss keeps telling me it’s time I do ‘some research’ and present at a conference. He’s really into supporting cancer patients in returning to work since the Basic Plan to Promote Cancer Control Programs was revised in 2016. Long story short, I think he wants me to run some kind of survey. I’m not against it, but this is clearly your domain, isn’t it? Statistics and all that.”
Dad: “Half my domain, half yours. You see the patients, I only see the data. Do you have anything clinical you want to know? Any hypothesis?”
Me: “Nope. Not really.”
Dad: “Without at least a working hypothesis, it becomes hard to design a good study.”
Me: “Seriously? I thought I could just collect the data first and improvise the rest. Guess not. Hmm…if I had to say, I’d want to know what kind of patients have difficulty returning to work.”
Dad: “For example, that men might have higher return-to-work rates than women?”
Me: “I’m interested in sex differences too, but I’m more curious about cancer stage or complications. I work in gastrointestinal surgery, and I wonder whether patients with stomas can return to the same job they had before surgery. Yeah…I’ll start drafting a questionnaire. Thanks!”
Dad: “Hold on a moment. Before you get too excited—have you decided on your study population yet?”
Me: “Hm? Patients who had cancer surgery at our hospital.”
Dad: “Do gastric cancer patients ever receive a stoma…?”
Me: “It’s extremely rare in gastric cancer, practically negligible.”
Dad: “Right. If your main interest is stomas, it might make sense to focus on cancers where stomas are common, like rectal cancer.”
Me: “…yeah, okay. I guess that’s true when you put it that way.”
Dad: “It’s important to think about who should be surveyed at the design stage. In general, narrowing the target population allows you to ask more detailed questions. And it also improves comparability between groups. If you want to compare patients with and without stomas, you’d want the cancer type to be consistent.”
Me: “There you go again with the technical terms. But fine—I do get the point. If patients’ backgrounds vary too much, it becomes hard to compare them. Keeping them similar makes things easier.”
Dad: “There are statistical methods and R packages to improve comparability too, like glm() for regression adjustment, orCBPS() for propensity score weighting. But on the other hand, a broader study population increases generalizability. If you want to estimate return-to-work rates among cancer survivors, you don’t necessarily need to limit the cancer type. Ideally, you’d collect data from multiple institutions.”
Me: “True…it feels strange to call it ‘a survey of cancer survivors’ if it’s only our hospital. Generalizability means gathering data that is useful for other hospitals too, right?”
Dad: “In class, I usually tell my students something like this:”
When your clinical question is still vague,
try to express it using PICO or PECO before you design the study.
Dad: It sounds fancy, but we simply call that ‘structuring’ a study.”
Me: “Structuring…? That sounds like something from a meeting. Okay, I’ve heard the word before.”
Dad: “Then let’s go through it lightly. In PICO/PECO, P stands for Patients or Population. Deciding who your patients are is a key element in study design.”
- Which patients or population are being studied (P)
- What exposure or intervention is of interest (E/I)
- What they are compared against (C)
- What outcomes will be evaluated (O)
Me: “I’ve never heard of Exposure or Comparison before…But in my case, it would be patients with versus without a stoma, right? And what’s Outcome?”
Dad: “Exactly. And O is treatment results, prognosis, and endpoints of patients. In statistical analysis, the outcome variable is the most important part. So it’s crucial to define it before you begin.”
Me: “Can you even define it before the data comes in?”
Dad: “You should, because once the data is collected, you can’t revise them any more.It needs to be decided before finalizing the questionnaire. Right…you probably don’t have a clear picture of your outcome yet. But the outcome determines which data you’ll collect and which statistical methods you’ll use. By the way, which software are you planning to use once the data is ready?”
Me: “R. My seniors in the department use it.”
Dad: “Are you comfortable with R?”
Me: “I took a class in college, but I’ve forgotten most of it.”
Dad: “Then you’ll need to relearn it. And which R functions you use depends entirely on your outcome.”
In daily clinical practice, many questions naturally arise. A question emerging directly from clinical experience is called a clinical question.
When starting a clinical research, what matters is expressing that clinical question as a research hypothesis. A research hypothesis consists of minimal essential elements, often summarized using PICO/PECO:
- P: Patients/Population
- I/E: Intervention/Exposure
- C: Comparison
- O: Outcome
The I (Intervention) or E (Exposure) differs depending on whether your study is an interventional trial or an observational study such as a cohort, case?control, or cross-sectional study. All can be structured using PECO.
Here are three examples of research hypotheses using the cancer survivor survey as a theme.
Example: Research hypothesis 1
Among rectal cancer patients after curative resection in Japan, is there a difference in return-to-work within one year between those with and without a stoma?
- P: Rectal cancer patients after curative resection
- E: Stoma present
- C: Stoma absent
- O: Returned to work within one year after surgery (yes/no)
Here you’d compare the return-to-work proportion between the stoma and non-stoma groups.
What to choose for C (Comparison) is important. If you only know the return-to-work rate in the stoma group, it’s hard to say whether that number is “high” or “low”.
Having a clear comparison group improves interpretability.
Example: Research hypothesis 2
Among cancer patients after curative surgery, does providing a guidebook on balancing work and cancer treatment increase the proportion who return to work within one year?
- P: Patients with cancer after curative surgery
- E: Received the guidebook
- C: Did not receive the guidebook
- O: Returned to work within one year after surgery (yes/no)
Here P is broader, so the results could potentially apply to a wider range of patients. This is an example of higher generalizability.
Example: Research hypothesis 3 (exploratory)
“Are sex, age, stage, adjuvant chemotherapy, performance status, comorbidities, stoma, preoperative employment status, household income, cancer insurance, and use of employment support services associated with return-to-work within one year?”
In exploratory studies like this, P is clear, but E and C are not just one thing — we’re screening multiple potential risk factors.
So “structuring the question” doesn’t mean you have to force every study into a clean PECO shape. What matters is to identify the minimal set of elements needed to design a coherent study for your question.
Me: ”I think I see. Coming up with an original hypothesis still feels intimidating, but breaking it down into pieces like this? I actually don’t mind that part. I suppose it’s something I’ll need to learn eventually.”
Father: “Exactly. Plus, replacing your thinking with specialized terminology early on improves communication. Definitions of diseases and outcomes, detailed treatment regimens, and exposure specifics can varysubtly across clinicians and facilities. If the terminology is vague, confusion follows.”
When formulating study design, the Latin expression “ceteris paribus” often appears. Which of the following best describes its meaning?
- If there is a causal relationship, it will inevitably occur
- A universally generalizable law
- All other conditions being equal
- Observing a sufficiently large amount of data
- The correct answer is 3.
The concept of “ceteris paribus” is very close to the idea of comparability between groups.
Episodes and R-script
Data Have Types: A Coffee-Chat Guide to R Functions for Common Outcomes
[Outcomes: The Bridge from Data Collection to Analysis]
[A First Step into Survival and Competing Risks Analysis with R]
[When Bias Creeps In: Selection, Information, and Confounding in Clinical Surveys]