P-Value Calculator
Compute and interpret p-values for hypothesis tests with AI-powered step-by-step solutions
Drag & drop or click to add images or PDF
What is a P-Value?
A p-value is the probability of observing test results as extreme as, or more extreme than, the actual results — assuming the null hypothesis is true.
Formally, for a test statistic with observed value :
- Right-tailed:
- Left-tailed:
- Two-tailed:
Interpretation: a small p-value means the observed data would be surprising if were true, so we have evidence against . A large p-value means the data are consistent with — but does not prove is true.
Decision rule: compare to a pre-chosen significance level (typically 0.05):
- → reject ('statistically significant')
- → fail to reject (not enough evidence)
What p-value is NOT:
- It is not the probability that is true.
- It is not the probability that the alternative is true.
- It is not a measure of effect size.
- It does not distinguish 'practical significance' from 'statistical significance'.
How to Compute and Use P-Values
Step-by-Step
- State the hypotheses and .
- Choose a test appropriate for the data (z-test, t-test, chi-square, F-test, ...).
- Compute the test statistic from the data.
- Determine the tail(s) based on : right-tailed (), left-tailed (), or two-tailed ().
- Find the p-value from the test's distribution.
- Compare to and conclude.
P-Values from a Z-Statistic
For a standard normal :
- Right-tailed:
- Left-tailed:
- Two-tailed:
Quick reference: → two-tailed . → two-tailed .
P-Values from a T-Statistic
Use the t-distribution with degrees of freedom (or as specified by the test). Same tail logic as z, but the distribution has slightly heavier tails for small df.
P-Values from a Chi-Square Statistic
Chi-square tests are inherently right-tailed because and larger values indicate worse fit to :
One-Tailed vs Two-Tailed: Which to Use?
- Two-tailed: when you care about deviation from in either direction. Default in most academic settings.
- One-tailed: when the alternative hypothesis is directional and pre-specified (, not ). Halves the p-value if the direction matches.
Never pick the tail after seeing the data — that's p-hacking.
Common Significance Thresholds
| Common label | |
|---|---|
| 0.10 | suggestive |
| 0.05 | standard |
| 0.01 | strong |
| 0.001 | very strong |
The American Statistical Association has warned against treating as a bright line — context and effect size matter more than crossing a threshold.
Common Mistakes to Avoid
- 'P-value is the probability is true': WRONG. P-value is computed assuming is true; it does not measure how likely is.
- Treating and as fundamentally different: they aren't. The 0.05 threshold is a convention, not a phase transition.
- Picking the tail after seeing the data: if you see and switch to a left-tailed test, you've doubled your false-positive rate. Pre-specify.
- Confusing significance with effect size: a tiny effect with a huge sample can be 'highly significant' yet practically irrelevant. Always report effect sizes alongside p-values.
- Multiple comparisons inflation: running 20 tests at , one false positive is expected by chance. Use Bonferroni or FDR corrections.
- ' proves ': NO. Failing to reject is not the same as accepting. It just means the data don't have enough evidence against at this sample size.
Examples
Frequently Asked Questions
It means the observed data (or more extreme data) would occur in less than 5% of repeated samples if the null hypothesis were true. By convention, this is treated as 'statistically significant' — but it doesn't mean the null hypothesis is necessarily false, and it doesn't measure the size of the effect.
The p-value is calculated *assuming* H₀ is true — it is conditional on H₀. Computing P(H₀ true | data) requires Bayesian methods with a prior probability for H₀, which the frequentist p-value does not use.
Only when the research question is genuinely directional and pre-specified before seeing the data — e.g., a new drug must perform *better* than placebo to be useful, with worse performance equivalent to no effect. Choosing the tail post-hoc is p-hacking.
P-hacking is the practice of running many analyses (different subsets, transformations, exclusions) and reporting only the significant ones, or switching test directions after seeing the data. It inflates false-positive rates and is a major contributor to the replication crisis.
Related Solvers
Related Guides
Try AI-Math for Free
Get step-by-step solutions to any math problem. Upload a photo or type your question.
Start Solving