"Is the difference statistically significant?" "Is the p-value below 0.05?" These come up constantly in research reviews — and most people don't understand what statistical significance actually means.
This article covers the minimum stats knowledge you need to read survey results in practice, without being a statistician.
Clearing up misconceptions
"Statistically significant" means:
The observed difference is too large to be explained by random variation alone.
Nothing more, nothing less. Common misconceptions:
| Misconception | Correct reading |
|---|---|
| Significant = important | Significant just means "not random." Importance is a separate question |
| Not significant = no effect | Small samples won't reach significance even for real effects |
| p=0.04 → p=0.06 flips the conclusion | 0.05 is a convention threshold — nothing substantively flipped |
| Significant = causal | Correlation ≠ causation |
What a p-value is
p-value:
Assuming "no difference" is true, the probability that a difference at least as large as the one observed could occur by chance.
p = 0.03 → there's a 3% probability this happened by chance
p = 0.30 → 30% probability of random origin — hard to call significant
By convention, p < 0.05 (under 5%) is "statistically significant." This is a social convention, not a fundamental threshold.
What a confidence interval is
Confidence interval (CI):
The range within which the true value is believed to lie, with a given probability (commonly 95%).
NPS = 32 (95% CI: 28–36)
→ True NPS sits, with 95% confidence, between 28 and 36
Wide CI = small sample or high variance. Narrow CI = large sample or low variance.
In practice, arguing in confidence intervals is often more legible than in p-values.
The hard part — pitfalls of statistical significance
Pitfall 1: Significant ≠ important
Situation: n=10,000 survey, satisfaction differs by 0.05 points
Verdict: Statistically significant (p < 0.001)
Decision: Does 0.05 points matter? → Basically not
Large samples make trivial differences significant. "Significant therefore important" is wrong.
Decision matrix:
- Significant AND practically large → genuinely important
- Significant but practically small → not important
- Not significant but practically large → increase sample, retest
- Not significant and practically small → ignore
Pitfall 2: Not significant ≠ no effect
Situation: n=30, new feature satisfaction +1 point
Verdict: Not significant (p = 0.18)
Decision: "No effect" — concluded
Small samples won't reach significance even when there's a real effect. "Not significant = no difference" is wrong.
The correct reading: "with current sample size we can't tell — increase n and re-measure."
Pitfall 3: Multiple comparisons trap
20 questions tested at 5% significance → one will be "significant" by chance (20 × 5% = 1).
Counters:
- Bonferroni correction (p-value / number of comparisons)
- Distinguish exploratory analysis from confirmatory analysis
- For multi-comparison situations, look at confidence intervals together
Pitfall 4: Confusing correlation with causation
"NPS and retention are significantly correlated" ≠ "high NPS causes retention":
- A third factor (e.g., product usability) could drive both
- Reverse causation (retention drives satisfaction)
- Selection bias (NPS responders skew toward retention-minded)
Statistical significance rules out chance, not non-causal explanations.
Using p-values and CIs in practice
Pattern 1: A/B test effect validation
Send A: n=500, purchase rate 5.2%
Send B: n=500, purchase rate 6.8%
Diff: +1.6% (p = 0.04)
Conclusion: Statistically significant. Adopt B.
Caveat: Whether 1.6% matters commercially is a separate question.
Pattern 2: Satisfaction over time
Last quarter: NPS 28 (CI 24–32)
This quarter: NPS 32 (CI 28–36)
Diff: +4 points
Verdict: CIs overlap; can't claim improvement.
Continue tracking.
Pattern 3: Cross-segment comparison
Segment A: satisfaction 4.2 (n=80)
Segment B: satisfaction 3.8 (n=80)
Diff: +0.4 (p = 0.07)
Verdict: Borderline. Not significant at 5%, significant at 10%.
Action: Combine with other indicators for a holistic call.
Balancing statistical strictness with practical judgment
In practice:
- Perfect statistical strictness is impossible — full random sampling and full independence are rare in real work
- "Approximately right" is the working bar — waiting for 100% certainty paralyzes you
- Consider the cost of being wrong each way — false negative cost vs. false positive cost
Statistics is decision support, not a decision substitute.
Prefer "effect size" over p-values
Modern statistics has moved past p-value worship. The push is toward reporting effect size alongside p-values:
- Cohen's d — mean difference standardized by SD
- Pearson's r — correlation strength
- Odds ratio, risk ratio — for categorical data
The modern best practice is report effect size + CI + p-value, not p-value alone.
Practical decision rules
Even without formal stats training, you can mostly judge correctly with these:
Rule 1: Look at CI before looking at the difference
"NPS 28 → 32" is less informative than "NPS 28 (CI 24–32) → 32 (CI 28–36)." If the CIs overlap, you can't claim a difference.
Rule 2: n<30 → treat as qualitative trend
Numbers from n≤30 don't carry statistical weight. Frame them as "directional" or "worth probing in interviews."
Rule 3: Multiple metrics moving together
"NPS up, retention up, open-text tone trending positive" — when multiple indicators move in the same direction, the result is reliable regardless of formal significance.
Rule 4: Repeated confirmation over time
A single survey calling something "significant" is weaker than the same trend appearing across multiple consecutive rounds.
Where Repoan fits
Repoan provides statistical-aware analysis without requiring statistics knowledge:
- Auto-displayed CIs — 95% CI alongside NPS / CSAT numbers
- Time-series significance check — auto-flag whether a change is within error margin
- Cross-segment difference testing — display segment comparisons with p-values
- AI interpretation — auto-generated commentary on practical importance of differences
- Multiple comparison correction — auto-applied for segment-level analysis
Summary
Using statistical significance correctly:
- Significant ≠ important; not significant ≠ no effect
- Look at CIs and effect size, not just p-values
- Big samples make trivial differences significant
- Correlation ≠ causation
- "Multiple indicators × repeated rounds" beats "one-shot significant"
- Statistics supports decisions; humans make decisions
Statistical significance is important but not sufficient. The orgs that truly use data well aren't the ones with the stats experts — they're the ones with a bridge between practical and statistical judgment.