Assignment Chef icon Assignment Chef
All English tutorials

Programming lesson

Optimizing a Five-Stock Portfolio with CAPM and Efficient Frontier in Stata: A Step-by-Step Guide

Learn how to apply CAPM and portfolio optimization to a five-stock portfolio using Stata. This guide covers data download, return calculations, descriptive statistics, correlation, CAPM estimation, and efficient frontier construction with a long-only constraint.

CAPM estimation Stata efficient frontier Stata portfolio optimization Stata global minimum variance portfolio Sharpe ratio calculation long-only constraint portfolio five-stock portfolio project LM Economics financial markets Yahoo Finance data Stata monthly stock returns analysis correlation matrix significance descriptive statistics returns optimal risky portfolio tangency capital market line plot Stata do file portfolio portfolio diversification 2025

Introduction

In the LM Economics of Financial Markets and Institutions project, you are tasked with constructing an optimal portfolio of five stocks from different industries. This guide walks you through the entire process using Stata—from downloading data to plotting the efficient frontier. By the end, you'll understand how to apply the Capital Asset Pricing Model (CAPM) and modern portfolio theory to real-world data, much like how portfolio managers at firms like BlackRock or Vanguard optimize their clients' investments. For instance, during the 2024-2025 AI boom, many investors adjusted their portfolios to include tech giants like NVIDIA and Microsoft, balancing risk and return.

Step 1: Choosing Stocks and Downloading Data

Select five companies from different industries to ensure diversification. For example, you might pick Apple (AAPL) from technology, Johnson & Johnson (JNJ) from healthcare, Exxon Mobil (XOM) from energy, Visa (V) from financials, and Walmart (WMT) from consumer staples. Download monthly adjusted close prices from Yahoo Finance for January 2014 to December 2023. In Stata, you can use the import delimited command after saving CSV files. Alternatively, use the yahooFinance package if available. Ensure the market proxy (e.g., S&P 500 via ^GSPC) is also downloaded.

Step 2: Graph Time Series of Prices

Plotting the price series helps visualize trends and volatility. Use the tsline command in Stata. For example:

tsline price_aapl price_jnj price_xom price_v price_wmt, legend(label(1 "Apple") ...)

This graph reveals how each stock performed during events like the COVID-19 crash in 2020 or the 2022 inflation surge. Notice how tech stocks rebounded faster, while energy stocks benefited from rising oil prices.

Step 3: Compute Returns

Calculate monthly returns as the percentage change in adjusted close prices: return = (price_t - price_{t-1}) / price_{t-1}. In Stata:

gen return_aapl = (price_aapl - price_aapl[_n-1]) / price_aapl[_n-1]

Repeat for all stocks and the market index. Drop the first observation (missing return).

Step 4: Descriptive Statistics

Compute mean, standard deviation, maximum, and minimum for each return series using tabstat:

tabstat return_aapl return_jnj return_xom return_v return_wmt, statistics(mean sd min max) columns(statistics)

Present these in a table. For example, Apple might have a higher mean return but also higher standard deviation (risk) compared to Walmart. This trade-off is central to portfolio optimization.

Step 5: Correlation Matrix

Calculate the correlation matrix with significance levels using pwcorr:

pwcorr return_aapl return_jnj return_xom return_v return_wmt, sig star(0.05)

Report the matrix. Low correlations between stocks (e.g., Apple vs. Exxon) indicate diversification benefits. In contrast, high correlations reduce the benefit.

Step 6: Frequency Histograms

Histograms show the distribution of returns. Use histogram with normal density overlay:

histogram return_aapl, frequency normal

Look for skewness or kurtosis. For instance, tech stocks may exhibit fat tails, meaning extreme returns occur more often than a normal distribution predicts—a lesson from the 2021 meme stock frenzy.

Step 7: Linear Relationship with Market

Estimate the market model for each stock: regress return_stock return_market. Plot the fitted line using twoway scatter and lfit. For example:

twoway (scatter return_aapl return_market) (lfit return_aapl return_market)

This visualizes the stock's beta—the slope. A beta > 1 (like Apple) means the stock amplifies market movements; a beta < 1 (like Walmart) means it's less volatile.

Step 8: Estimate CAPM

The CAPM equation: E(R_i) - R_f = β_i * (E(R_m) - R_f). With an annual risk-free rate of 2.4%, convert to monthly: R_f_monthly = (1.024)^(1/12) - 1 ≈ 0.00198. Then run regressions with excess returns:

gen excess_aapl = return_aapl - 0.00198
gen excess_market = return_market - 0.00198
regress excess_aapl excess_market

Report alpha (intercept) and beta (slope). A statistically significant alpha (p < 0.05) suggests mispricing. For instance, if Apple's alpha is positive, it outperformed the CAPM prediction.

Step 9: Portfolio Optimization

You need to compute the Global Minimum Variance Portfolio (GMVP), four additional portfolios with increasing target returns, and the optimal risky portfolio (tangency portfolio). Use Stata's mvport or manual matrix programming. Assume long-only constraint (no short selling).

9.1 Global Minimum Variance Portfolio (GMVP)

The GMVP minimizes risk. In Stata, you can compute it using the covariance matrix of returns:

mat cov = cov(return_aapl return_jnj return_xom return_v return_wmt)
mat ones = J(5,1,1)
mat gmvp_weights = (invsym(cov) * ones) / (ones' * invsym(cov) * ones)
mat list gmvp_weights

These weights sum to 1. Compute GMVP expected return and standard deviation.

9.2 Four Additional Portfolios

Choose target returns above the GMVP return. For example, set target returns as GMVP return + 0.5%, +1%, +1.5%, and the maximum possible return (the stock with highest mean return). Use optimization to minimize variance for each target return, subject to weights summing to 1 and non-negative (long-only).

9.3 Optimal Risky Portfolio (Tangency Portfolio)

This portfolio maximizes the Sharpe ratio: (E(R_p) - R_f) / σ_p. Solve using:

mat mu = (mean_return_aapl, ... )'
mat Rf = J(5,1,0.00198)
mat excess = mu - Rf
mat tang_weights = (invsym(cov) * excess) / (ones' * invsym(cov) * excess)
mat list tang_weights

Ensure all weights are non-negative; if some are negative, you must impose long-only constraints (use quadratic programming).

Step 10: Plot Efficient Frontier (Long-Only)

Generate a grid of target returns between the GMVP return and the maximum return. For each target, minimize variance using Stata's quadprog or by solving the optimization manually. Plot the frontier with return on y-axis and standard deviation on x-axis:

twoway (scatter return sd, ...) (line return sd, sort)

Mark the GMVP and the tangency portfolio. The efficient frontier shows the best possible return for each risk level.

Step 11: Plot Capital Market Line (CML)

The CML starts at the risk-free rate and is tangent to the efficient frontier at the optimal risky portfolio. Plot the CML as a line: E(R_p) = R_f + (E(R_tang) - R_f)/σ_tang * σ_p. Overlay this on the efficient frontier graph. The tangency portfolio is where the CML touches the frontier.

Conclusion

By following these steps, you'll produce a comprehensive portfolio analysis. Remember to interpret your results: discuss why certain stocks have higher betas, how diversification reduces risk, and whether CAPM holds for your chosen stocks. For example, if your tangency portfolio heavily weights Apple, it reflects its high historical return relative to risk. Just as investors in 2025 are rebalancing towards AI and clean energy stocks, your analysis provides a data-driven recommendation. Good luck with your project!