Programming lesson
Mastering t-Tests, Chi-Square, and Regression for STATS 101G/108
A comprehensive tutorial on key statistical tests covered in STATS 101G and STATS 108: t-tests, F-tests, Chi-square tests, and regression inference. Learn with timely examples from May 2026.
Introduction: Why These Tests Matter in May 2026
As we move through 2026, data-driven decisions are everywhere—from AI model evaluations to sports analytics and financial forecasts. In STATS 101G and STATS 108, you'll encounter four essential tools: the t-test, F-test, Chi-square test, and regression inference. This tutorial breaks down each concept with formulas, interpretations, and real-world examples tied to current trends. Whether you're comparing app engagement rates or analyzing election polls, these methods are your statistical Swiss Army knife.
1. The t-Test: Comparing Means
1.1 Single Mean t-Test
The t-test for a single mean tests whether the population mean μ equals a hypothesized value. The test statistic is:
t0 = (x̄ - μ0) / (s / √n)where x̄ is the sample mean, s is the sample standard deviation, and n is the sample size. Degrees of freedom (df) = n - 1.
Example: Suppose a popular AI chatbot claims an average response time of 1.2 seconds. You sample 30 interactions in May 2026 and find x̄ = 1.5 seconds, s = 0.4 seconds. Compute t0 = (1.5 - 1.2) / (0.4 / √30) ≈ 4.11. With df = 29, the p-value is very small, so you reject the claim—the chatbot is slower than advertised.
1.2 Two-Sample t-Test (Independent Samples)
When comparing means from two independent groups (e.g., test scores for two teaching methods), the test statistic is:
t0 = (x̄1 - x̄2) / SEwhere SE is the standard error of the difference. For equal variances, SE = sp * √(1/n1 + 1/n2), with pooled variance sp². For unequal variances, use Welch's approximation.
Example: In the 2026 NBA playoffs, you want to compare average points per game between two teams. Team A (n=10) averages 110.2, Team B (n=10) averages 105.6. With equal variance assumption, compute t0 = (110.2 - 105.6) / (sp * √(1/10+1/10)). If the p-value < 0.05, you conclude a significant difference.
1.3 Difference Between Two Proportions
For proportions from two independent samples (e.g., approval ratings for two political candidates), the test statistic is:
z = (p̂1 - p̂2) / SEwhere SE = √(p̂(1-p̂)(1/n1 + 1/n2)) and p̂ is the pooled proportion.
Situation (c): One sample of size n with many yes/no items—this is a binomial test. For example, a survey of 100 students in May 2026 asks 10 yes/no questions. You can test if the proportion of 'yes' for a specific question equals 0.5.
2. The F-Test: Comparing Variances
The F-test statistic is:
f0 = s1² / s2²where s1² and s2² are sample variances. It tests equality of two population variances. It's also used in ANOVA to compare multiple means.
Example: In finance, you compare the volatility (variance) of two cryptocurrencies in May 2026. If f0 = 1.8 and the critical value at α=0.05 is 2.0, you fail to reject equal variances.
3. The Chi-Square Test: Goodness-of-Fit and Independence
The Chi-square test statistic is:
χ² = Σ (observed - expected)² / expectedExpected count in cell (i,j) = (row total * column total) / grand total.
Example: A gaming company wants to know if player preference for three game genres (Action, Puzzle, RPG) is equally distributed. In a sample of 300 players in May 2026, observed counts are 120, 90, 90. Expected under equal distribution: 100 each. Compute χ² = (20²/100)+(10²/100)+(10²/100)=6.0. With df=2, p-value ≈ 0.05—borderline significant.
4. Regression Inference: Intercept and Slope
In simple linear regression Y = β0 + β1X + ε, we test hypotheses about β0 and β1 using t-tests with df = n-2. The test statistic for slope:
t = (b1 - β1_hypothesized) / SE(b1)where SE(b1) = s / √(Σ(xi - x̄)²) and s is the regression standard error.
Example: In May 2026, you model app downloads (Y) vs. advertising spend (X). With n=50, b1=2.3, SE(b1)=0.5, test H0: β1=0: t=2.3/0.5=4.6, p<0.001—significant positive relationship.
5. Putting It All Together: A May 2026 Case Study
Imagine you're analyzing a new AI tutoring app's effectiveness. You collect data from 100 students: 50 use the app (Group A), 50 use traditional methods (Group B). You measure test scores (means), pass rates (proportions), and variance in scores.
- t-test for means: Compare average scores. If t0=2.1, p=0.038, significant at α=0.05.
- F-test for variances: Check if score variability differs. f0=1.3, p=0.34—no significant difference.
- Chi-square test: Test if pass/fail rates are independent of group. χ²=4.2, df=1, p=0.04—significant.
- Regression: Model score vs. hours of app usage. Slope b1=5.2, p<0.001—more usage predicts higher scores.
This integrated approach gives a complete picture.
Conclusion: Confidence in Statistical Inference
Mastering these tests—t-test, F-test, Chi-square, and regression—will serve you well in STATS 101G/108 and beyond. Practice with real datasets from current events: sports stats from the 2026 season, election polls, or AI benchmarks. Remember, the key is understanding when to use each test and how to interpret the results. Happy analyzing!