P-Value Calculator

Compute and interpret p-values for hypothesis tests with AI-powered step-by-step solutions

Drag & drop or click to add images or PDF

Math Input
p-value for z = 2.1 two-tailed
p-value for t = 1.8 with 19 degrees of freedom, right-tailed
p-value for chi-square = 7.5 with 3 df
Is p = 0.03 significant at alpha = 0.05?

What is a P-Value?

A p-value is the probability of observing test results as extreme as, or more extreme than, the actual results — assuming the null hypothesis H0H_0 is true.

Formally, for a test statistic TT with observed value tt:

  • Right-tailed: p=P(TtH0)p = P(T \geq t \mid H_0)
  • Left-tailed: p=P(TtH0)p = P(T \leq t \mid H_0)
  • Two-tailed: p=2P(TtH0)p = 2 \cdot P(T \geq |t| \mid H_0)

Interpretation: a small p-value means the observed data would be surprising if H0H_0 were true, so we have evidence against H0H_0. A large p-value means the data are consistent with H0H_0 — but does not prove H0H_0 is true.

Decision rule: compare pp to a pre-chosen significance level α\alpha (typically 0.05):

  • p<αp < \alpha → reject H0H_0 ('statistically significant')
  • pαp \geq \alpha → fail to reject H0H_0 (not enough evidence)

What p-value is NOT:

  • It is not the probability that H0H_0 is true.
  • It is not the probability that the alternative H1H_1 is true.
  • It is not a measure of effect size.
  • It does not distinguish 'practical significance' from 'statistical significance'.

How to Compute and Use P-Values

Step-by-Step

  1. State the hypotheses H0H_0 and H1H_1.
  2. Choose a test appropriate for the data (z-test, t-test, chi-square, F-test, ...).
  3. Compute the test statistic from the data.
  4. Determine the tail(s) based on H1H_1: right-tailed (>>), left-tailed (<<), or two-tailed (\neq).
  5. Find the p-value from the test's distribution.
  6. Compare to α\alpha and conclude.

P-Values from a Z-Statistic

For a standard normal ZZ:

  • Right-tailed: p=1Φ(z)p = 1 - \Phi(z)
  • Left-tailed: p=Φ(z)p = \Phi(z)
  • Two-tailed: p=2(1Φ(z))p = 2(1 - \Phi(|z|))

Quick reference: z=1.96z = 1.96 → two-tailed p0.05p \approx 0.05. z=2.576z = 2.576 → two-tailed p0.01p \approx 0.01.

P-Values from a T-Statistic

Use the t-distribution with n1n - 1 degrees of freedom (or as specified by the test). Same tail logic as z, but the distribution has slightly heavier tails for small df.

P-Values from a Chi-Square Statistic

Chi-square tests are inherently right-tailed because χ20\chi^2 \geq 0 and larger values indicate worse fit to H0H_0:

p=P(χdf2observed)p = P(\chi^2_{df} \geq \text{observed})

One-Tailed vs Two-Tailed: Which to Use?

  • Two-tailed: when you care about deviation from H0H_0 in either direction. Default in most academic settings.
  • One-tailed: when the alternative hypothesis is directional and pre-specified (H1:μ>0H_1: \mu > 0, not μ0\mu \neq 0). Halves the p-value if the direction matches.

Never pick the tail after seeing the data — that's p-hacking.

Common Significance Thresholds

α\alphaCommon label
0.10suggestive
0.05standard
0.01strong
0.001very strong

The American Statistical Association has warned against treating α=0.05\alpha = 0.05 as a bright line — context and effect size matter more than crossing a threshold.

Common Mistakes to Avoid

  • 'P-value is the probability H0H_0 is true': WRONG. P-value is computed assuming H0H_0 is true; it does not measure how likely H0H_0 is.
  • Treating p=0.049p = 0.049 and p=0.051p = 0.051 as fundamentally different: they aren't. The 0.05 threshold is a convention, not a phase transition.
  • Picking the tail after seeing the data: if you see z=2z = -2 and switch to a left-tailed test, you've doubled your false-positive rate. Pre-specify.
  • Confusing significance with effect size: a tiny effect with a huge sample can be 'highly significant' yet practically irrelevant. Always report effect sizes alongside p-values.
  • Multiple comparisons inflation: running 20 tests at α=0.05\alpha = 0.05, one false positive is expected by chance. Use Bonferroni or FDR corrections.
  • 'p>0.05p > 0.05 proves H0H_0': NO. Failing to reject is not the same as accepting. It just means the data don't have enough evidence against H0H_0 at this sample size.

Examples

Step 1: Look up Φ(2.1)0.9821\Phi(2.1) \approx 0.9821
Step 2: Right-tail probability: 10.9821=0.01791 - 0.9821 = 0.0179
Step 3: Two-tailed p-value: 2×0.0179=0.03582 \times 0.0179 = 0.0358
Answer: p0.0358p \approx 0.0358 (significant at α=0.05\alpha = 0.05)

Step 1: Use t-distribution with df=19df = 19
Step 2: From t-tables: P(T191.8)0.0438P(T_{19} \geq 1.8) \approx 0.0438
Step 3: Compare to common thresholds: significant at α=0.05\alpha = 0.05, not at α=0.01\alpha = 0.01
Answer: p0.044p \approx 0.044 (significant at α=0.05\alpha = 0.05)

Step 1: Chi-square is right-tailed
Step 2: P(χ327.5)P(\chi^2_3 \geq 7.5) from chi-square table
Step 3: Critical values for df = 3: χ0.102=6.25\chi^2_{0.10} = 6.25, χ0.052=7.81\chi^2_{0.05} = 7.81
Step 4: 7.57.5 lies between, so 0.05<p<0.100.05 < p < 0.10
Step 5: More precisely, p0.058p \approx 0.058
Answer: p0.058p \approx 0.058 (not significant at α=0.05\alpha = 0.05, suggestive at α=0.10\alpha = 0.10)

Frequently Asked Questions

It means the observed data (or more extreme data) would occur in less than 5% of repeated samples if the null hypothesis were true. By convention, this is treated as 'statistically significant' — but it doesn't mean the null hypothesis is necessarily false, and it doesn't measure the size of the effect.

The p-value is calculated *assuming* H₀ is true — it is conditional on H₀. Computing P(H₀ true | data) requires Bayesian methods with a prior probability for H₀, which the frequentist p-value does not use.

Only when the research question is genuinely directional and pre-specified before seeing the data — e.g., a new drug must perform *better* than placebo to be useful, with worse performance equivalent to no effect. Choosing the tail post-hoc is p-hacking.

P-hacking is the practice of running many analyses (different subsets, transformations, exclusions) and reporting only the significant ones, or switching test directions after seeing the data. It inflates false-positive rates and is a major contributor to the replication crisis.

Related Solvers

Probability CalculatorStandard Deviation CalculatorMean Median Mode Calculator
Try AI-Math for Free

Get step-by-step solutions to any math problem. Upload a photo or type your question.

Start Solving