Programming lesson
Mastering Cross Section and Panel Data Analysis: A Guide for ECON3173 Using World Bank Enterprise Survey Data
Learn how to analyze cross-sectional and panel data using World Bank Enterprise Survey Data (WBESD) for ECON3173. This tutorial covers data preparation, probit models, fixed effects, and causal inference—all with practical Stata examples and timely analogies.
Introduction: Why Cross Section and Panel Data Matter in 2026
In today's data-driven economy, understanding how firms access external finance and how that affects their performance is crucial. With the World Bank Enterprise Survey Data (WBESD), you can explore these relationships using cross-sectional and panel data techniques. As of June 2026, global discussions on financial inclusion are at an all-time high, with fintech innovations and AI-driven credit scoring reshaping access to capital for small businesses. This tutorial will guide you through the key concepts and Stata implementations needed for your ECON3173 individual project, without giving away the answers.
Understanding the Project Theme: Access to External Finance and Firm Performance
The core question is: Does gaining access to credit cause firms to expand output? You'll model the probability of having a loan (cross-section) and then test the causal impact using panel data. This mirrors real-world policy debates, such as how microfinance programs in developing countries affect entrepreneurship. By using WBESD, you're working with actual firm-level data from multiple economies, similar to how analysts at the IMF or World Bank assess financial constraints.
Part A: Data Management and Exploratory Analysis
Data Preparation: Appending and Renaming Variables
First, you'll combine data from multiple years and economies using Stata's append command. This creates a panel dataset where each firm is observed over time. For example, if you have data for Kenya (2013, 2018) and Vietnam (2015, 2020), appending stacks the observations. Rename variables as specified in Table 1 (e.g., k8 becomes creditline).
use "kenya_2013.dta", clear
append using "kenya_2018.dta"
append using "vietnam_2015.dta"
append using "vietnam_2020.dta"
rename k8 creditline
rename b4 female_owner
Generating Key Dummy Variables
Create creditdum (1 if firm has a loan) and Femaledum (1 if female participation in ownership). Also generate the natural log of sales: ln_sales = ln(sales). These will be used in regressions.
gen creditdum = (creditline == 1) if !missing(creditline)
gen Femaledum = (female_owner == 1) if !missing(female_owner)
gen ln_sales = ln(sales)
Exploratory Analysis: Summary Statistics and Graphs
Before modeling, explore your data. Use summarize and tabulate to check distributions. For instance, what percentage of firms have credit? How does this vary by country? Create histograms of sales and bar charts of credit access by year. This step is like a "data health check"—similar to how analysts at a startup might explore user engagement data before building a predictive model.
Part B: Determinants of Credit Access (Cross-Sectional Analysis)
Modeling Probability of Having a Loan with Probit/Logit
Use a probit model to estimate the probability that a firm has a loan, given firm characteristics like size, age, and female ownership. The dependent variable is creditdum. For example:
probit creditdum i.size i.age femaledum, robust
margins, dydx(*) post
Interpret marginal effects: a one-unit increase in firm size (e.g., from small to medium) increases the probability of having a loan by X percentage points. This is analogous to how a credit scoring model (like those used by AI lenders) predicts loan approval based on borrower attributes.
Addressing Endogeneity and Omitted Variable Bias
Cross-sectional models suffer from omitted variable bias. For instance, more productive firms may both seek credit and have higher sales. You can't control for unobserved heterogeneity with a single cross-section. This is where panel data shines—by using fixed effects to control for time-invariant firm characteristics.
Part C: Impact of Credit on Firm Performance (Panel Data Analysis)
Fixed Effects vs. Random Effects
With panel data, you can estimate the causal effect of gaining credit on sales. The fixed effects (FE) model controls for unobserved firm-specific factors (e.g., managerial ability) that don't change over time. The random effects (RE) model assumes these factors are uncorrelated with regressors. Use the Hausman test to decide.
xtset firmid year
xtreg ln_sales creditdum i.year, fe
estimates store fe
xtreg ln_sales creditdum i.year, re
hausman fe re
If the Hausman test rejects RE, use FE. The coefficient on creditdum tells you the within-firm effect of gaining credit: on average, firms that obtain a loan increase sales by X%.
Dynamic Panel Models and Difference-in-Differences
For robustness, consider a difference-in-differences (DiD) approach: compare firms that gained credit (treatment) vs. those that never had credit (control) before and after the credit event. This requires defining a treatment period. For example, if a firm had no loan in 2013 but had one in 2018, the treatment is the change. Use Stata's diff command or manually create interaction terms.
gen treat = (creditdum == 1 & L.creditdum == 0)
gen post = (year >= 2018)
di diff ln_sales, treated(treat) period(post) cov(ln_age femaledum)
Common Pitfalls and How to Avoid Them
- Data cleaning errors: Ensure consistent panel identifiers (firmid) and year variables. Use
isid firmid yearto verify uniqueness. - Misinterpreting coefficients: In probit, report marginal effects, not raw coefficients. In FE, remember that coefficients are identified by within-firm variation.
- Ignoring attrition bias: If firms drop out of the survey, consider whether attrition is random. Use balanced panel checks.
- Overlooking heteroskedasticity: Always use robust standard errors.
Timely Example: Fintech and Financial Inclusion in 2026
Imagine a fintech startup in Nigeria that uses AI to approve small business loans. The startup wants to know if its loans actually increase sales. Using panel data from before and after loan disbursement, they can run a fixed effects regression similar to what you'll do in ECON3173. This real-world application shows the power of econometrics in evaluating policy and business interventions.
Conclusion: From Assignment to Real-World Skills
By mastering cross-sectional and panel data analysis, you're not just completing an assignment—you're building skills used by economists at the World Bank, investment banks, and tech companies. The techniques you learn in ECON3173—probit models, fixed effects, and causal inference—are directly applicable to analyzing any longitudinal data, from customer churn to employee productivity. As you work through the project, remember to check your Stata do files for errors, use the append command carefully, and always interpret results in the context of your research question.
References
- World Bank Enterprise Surveys. (2025). Enterprise Surveys Data. https://www.enterprisesurveys.org
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press.
- Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton University Press.