Introduction¶
This report provides strategies for modeling and optimization to determine optimal assets allocation for portfolios. We use daily returns of seven (7) assets (AAPL, NVDA, TSLA, XOM, REGN, LLY, JPM), from January $1^st$, 2023 to january $30^th$, 2025.
First, we provide descriptive statistics of the selected assets, followed by determination of optimal weights of portfloio assets with and without allowing short selling and/or with restriction that no single asset in the portfolio represent more than 20%.
In step 2 we test if the 1/N portfolio is better option or not. To do this we select 3 years, split in two parts, 2 years for parameters estimation and 1 year for testing portfolio. Then we compute mean-variance optimization portfolio without (1) allowing short selling and (2) with 1/N portfolio followed by computation of returns for each two years selected.
We reapeat the process using Monte Carlo Simulations with 5000 iterations, followed by plots of returns distributions and some summary statistics.
In Part 3 and 4 we compute weights using Black-Litterman model and we present an interpretation of the results considering half-kelly and the double-kelly stragies respectively.
Part 1¶
To solve this question, we annualized daily returns, assuming 252 trading days per year in order to calculate the expected returns, volatility, skewness ($S$ and kurtosis (K$), using the well-known formulas:
$$S = \frac{E[(R - \mu)^3]}{\sigma^3}, \quad K = \frac{E[(R - \mu)^4]}{\sigma^4}$$
where $R$ is the assets returns vector, $\mu$ the expected returns and $\sigma$ standard desviation.
The analysis of these distribution moments is essential because financial times series rarely follow a Gaussian distribution (Cont, 224).
%pip install yfinance pandas numpy scipy matplotlib seaborn PyPortfolioOpt --quiet
import yfinance as yf
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import skew, kurtosis, norm
import scipy.stats as stats
sns.set_style('whitegrid')
np.random.seed(42)
assets = ['AAPL', 'NVDA', 'TSLA', 'XOM', 'REGN', 'LLY', 'JPM']
start_date = '2023-01-01'
end_date = '2025-06-30'
data = yf.download(assets, start=start_date, end=end_date)['Close']
returns = data.pct_change().dropna()
summary_stats = pd.DataFrame({
'Annualized Mean (%)': returns.mean() * 252 * 100,
'Annualized Volatility (%)': returns.std() * np.sqrt(252) * 100,
'Skewness': returns.apply(skew),
'Kurtosis': returns.apply(kurtosis)
}).T
print("\n Table 1: Asset Summary Statistics")
display(summary_stats.round(2))
plt.figure(figsize=(10, 8))
corr_matrix = returns.corr()
sns.heatmap(corr_matrix, annot=True, cmap='RdYlGn', center=0,
fmt=".2f", linewidths=0.5, cbar_kws={"shrink": .8})
plt.title('Figure 1: Asset Correlation Heatmap (Daily Returns)', fontsize=14, pad=15)
plt.tight_layout()
plt.show()
Note: you may need to restart the kernel to use updated packages.
[*********************100%***********************] 7 of 7 completed
Table 1: Asset Summary Statistics
| Ticker | AAPL | JPM | LLY | NVDA | REGN | TSLA | XOM |
|---|---|---|---|---|---|---|---|
| Annualized Mean (%) | 23.21 | 35.96 | 36.70 | 111.31 | -8.53 | 63.70 | 7.26 |
| Annualized Volatility (%) | 26.38 | 23.71 | 32.85 | 53.16 | 29.52 | 62.49 | 23.38 |
| Skewness | 0.84 | 0.21 | 0.74 | 0.79 | -2.10 | 0.59 | -0.23 |
| Kurtosis | 13.92 | 10.02 | 9.65 | 7.39 | 20.61 | 4.01 | 1.76 |
Step 1: Mean-variance optimization¶
To determine the optimal weights without assuming allowing short selling we used used quadratic programming to find tangency portfolio that maximize the Sharpe Index using the following formula:
$$\max_{w} \quad \text{Sharpe} = \frac{w^T \mu - R_f}{\sqrt{w^T \Sigma w}}$$
subject to:
$$\sum_{i=1}^{N} w_i = 1 \quad \text{e} \quad w_i \ge 0 \quad \forall i$$
where $w$ is a vector of portfolio weights; $\mu$ is vector of expected returns; $\Sigma$ is the variance-covariance matrix of assets returns; $R_f$ is free-risk fee and $w_i \ge 0$ ensure the absence short selling. The maximization of Sharpe Index provide portfolio that provide highets excess return per unit risk (Markowitz, 79).
from pypfopt import EfficientFrontier, risk_models, expected_returns
mu = expected_returns.mean_historical_return(data)
S = risk_models.sample_cov(data)
ef1 = EfficientFrontier(mu, S, weight_bounds=(0, 1))
ef1.max_sharpe()
weights_step1 = ef1.clean_weights()
weights_step1
OrderedDict([('AAPL', 0.0),
('JPM', 0.34879),
('LLY', 0.14029),
('NVDA', 0.51091),
('REGN', 0.0),
('TSLA', 0.0),
('XOM', 0.0)])
Step 3: Optimization with maximum restriction of diversification¶
To solve this question we used the same objective function described above aiming to maximize the Sharpe Index but imposing he following restriction:
$$0 \le w_i \le 0.20 \quad \forall i$$
because unrestricted mean-variance result in corner portfolios, i.e, portfolio that allocate more capital in small number of assets with marginally higher historical returns. Imposing upper restriction help to mitigate errors in estimation of variance-covariance matrix and expected returns, enforcing strututal diversification and improving the robustness of portfolio out-of-sample (Jagannathan and Ma, 1653).
ef2 = EfficientFrontier(mu, S, weight_bounds=(0, 0.20))
ef2.max_sharpe()
weights_step2 = ef2.clean_weights()
weights_step2
OrderedDict([('AAPL', 0.19021),
('JPM', 0.2),
('LLY', 0.2),
('NVDA', 0.2),
('REGN', 0.0),
('TSLA', 0.00979),
('XOM', 0.2)])
Step 3: Optimization with allowing short selling¶
Here we used the following restriction:
$$-1 \le w_i \le 1 \quad \forall i$$
where negative weight ($w_i < 0$) mean short selling and the proceeds are used to financiate long positions in other assets. Not that weights still sum to 1 ($\sum w_i = 1$) (Jacobs e Levy, 53).
To test the goodness of fit we used detrended Q-Q plot known as Worm plot, but first we standardized the empirical returns using follwing formula:
$$Z_{empirical} = \frac{R - \mu}{\sigma}$$
follwed by calculating the theoretical quantiles of standard normal distribution ($Z_{theoretical}$):
$$\text{Desviation} = Z_{empirical} - Z_{theoretical}$$
The Worn plot remove linear trend present in Q-Q plot, making the desviation from normal distribution easier to detect (van Buuren e Fredriks, 1261).
ef3 = EfficientFrontier(mu, S, weight_bounds=(-1, 1))
ef3.max_sharpe()
weights_step3 = ef3.clean_weights()
weights_df = pd.DataFrame({
'Step 1: Long Only (%)': pd.Series(weights_step1) * 100,
'Step 2: Diversified Max 20% (%)': pd.Series(weights_step2) * 100,
'Step 3: Aggressive w/ Shorting (%)': pd.Series(weights_step3) * 100
}).fillna(0)
display(weights_df.round(2))
weights_df.plot(kind='bar', figsize=(14, 7), colormap='viridis', width=0.8, edgecolor='black')
plt.title('Figure 2: Optimal portfolio allocations across methodologies', fontsize=16, pad=15)
plt.ylabel('Allocation Weight (%)', fontsize=12)
plt.xlabel('Assets', fontsize=12)
plt.axhline(0, color='black', linewidth=1)
plt.axhline(20, color='red', linestyle='--', alpha=0.5, label='20% Constraint (Step 2)')
plt.legend(fontsize=10)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
test_asset = 'NVDA'
actual_returns = returns[test_asset]
mu_daily = actual_returns.mean()
sigma_daily = actual_returns.std()
z_empirical = np.sort((actual_returns - mu_daily) / sigma_daily)
n = len(z_empirical)
p_i = (np.arange(1, n + 1) - 0.5) / n
z_theoretical = stats.norm.ppf(p_i)
deviations = z_empirical - z_theoretical
se = (1 / stats.norm.pdf(z_theoretical)) * np.sqrt(p_i * (1 - p_i) / n)
ci_upper = 1.96 * se
ci_lower = -1.96 * se
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
sns.histplot(actual_returns, kde=True, stat="density", bins=50, ax=ax[0], color='skyblue', label='Empirical Returns')
x_axis = np.linspace(actual_returns.min(), actual_returns.max(), 100)
ax[0].plot(x_axis, norm.pdf(x_axis, mu_daily, sigma_daily), color='red', linestyle='dashed', linewidth=2, label='Normal Fit')
ax[0].set_title(f'{test_asset} Figure 3: Return distribution vs. Normal fit', fontsize=14)
ax[0].set_xlabel('Daily Returns')
ax[0].set_ylabel('Density')
ax[0].legend()
ax[1].plot(z_theoretical, deviations, 'o', markerfacecolor='skyblue', markeredgecolor='black', alpha=0.7, label='Deviations (Worm)')
ax[1].plot(z_theoretical, np.zeros_like(z_theoretical), 'r--', linewidth=2, label='Baseline (Perfect Normal)')
ax[1].plot(z_theoretical, ci_upper, 'r:', linewidth=1.5, label='95% CI')
ax[1].plot(z_theoretical, ci_lower, 'r:', linewidth=1.5)
ax[1].set_title(f'Figure 4: Worm Plot (Detrended Q-Q) for {test_asset}', fontsize=14)
ax[1].set_xlabel('Theoretical Normal Quantiles')
ax[1].set_ylabel('Deviation (Empirical - Theoretical)')
ax[1].legend()
plt.tight_layout()
plt.show()
| Step 1: Long Only (%) | Step 2: Diversified Max 20% (%) | Step 3: Aggressive w/ Shorting (%) | |
|---|---|---|---|
| AAPL | 0.00 | 19.02 | -26.61 |
| JPM | 34.88 | 20.00 | 73.08 |
| LLY | 14.03 | 20.00 | 34.32 |
| NVDA | 51.09 | 20.00 | 79.29 |
| REGN | 0.00 | 0.00 | -50.38 |
| TSLA | 0.00 | 0.98 | -9.88 |
| XOM | 0.00 | 20.00 | 0.19 |
Step 4: Results¶
From descriptive statistics (Table 1) we can see that in the analyzed period, we have high dispersion in risk and returns. For example we see that NVDA and LLY have expressive returns followed by higher volatility and excess of kurtosis wich lead to high high tails (Figure 3 and 4) and probability of extreme events. The correalation matrix (Figure 1) show this dynamics. Assets such as XOM and JPM are not correlated wich is beneficial for portfolio diversification.
In the optimization of mean-variance (Table 2) with restriction the weights compared to that without restriction show advantage of maximizing the Sharpe Index without marginal cost but as desadivantage of generating concentrated portfolios, allocating small assets of high historical performance. To mitigate this fragility, the second scenario which impose upper limit of 20% per asset, ensure the diversification and protection to investor under idiosyncratic sgocks but as disadivantage of absolute returns (Figure 2).
The third scenario which allows agressive restriction, achieved high efficiency adjusted to the risk, however as disadivantage of put investors capital to unlimited losses and high sensibility to small erros in the variance-covariance matrix estimation.
Part 2¶
To solve the hypothesis of superiority of 1/N portfolio we used Monte Carlo simulations with validation out-of-sample.
First we selected 20 assets of high liquid and 3 years, which was segmented as follow:
In-sample of 2 years used to calculate the statistical parameters, such as expected returns and variance-covariance matrix.
OUt-of-sample of 1 year, used to test real performance of construted portfloios.
This strategy help to mitigate the problem of overfitting. We know from financial literature that the true test of optimization is confirmed by it's performance out-of-sample (DeMiguel et al., 1916).
%pip install yfinance pandas numpy scipy matplotlib seaborn PyPortfolioOpt --quiet
import yfinance as yf
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
from pypfopt import EfficientFrontier, risk_models, expected_returns
import warnings
warnings.filterwarnings("ignore")
np.random.seed(42)
universe = [
'AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'BRK-B', 'JNJ', 'JPM', 'V',
'PG', 'UNH', 'HD', 'MA', 'BAC', 'DIS', 'CVX', 'XOM', 'KO', 'PEP'
]
train_start = '2021-01-01'
train_end = '2022-12-31'
test_start = '2023-01-01'
test_end = '2023-12-31'
print(f"Fetching data for {len(universe)} assets...")
data = yf.download(universe, start=train_start, end=test_end)['Close']
train_data = data.loc[train_start:train_end]
test_data = data.loc[test_start:test_end]
oos_asset_returns = (test_data.iloc[-1] / test_data.iloc[0]) - 1
Note: you may need to restart the kernel to use updated packages. Fetching data for 20 assets...
[*********************100%***********************] 20 of 20 completed
Optimization of mean-variance¶
For each simulation, 5 assets was selected randomly without replacement. We used the in-saple data to calculate the expected returns ($\mu$) and variance-covariânce sample matrix ($\Sigma$). This resulted in optimal weights portfolio that maximize the Sharpe Index, subject to absence of short selling. The quadratic optimization was formulated as follows:
$$\max_{w} \quad \frac{w^T \mu - R_f}{\sqrt{w^T \Sigma w}}$$
subject to:
$$\sum_{i=1}^{N} w_i = 1 \quad \text{e} \quad w_i \ge 0 \quad \forall i$$
where $w$ is $N \times 1$ column vector of weights for each asset; $\mu$ is $N \times 1$ vector of historical expected returns of the assets; $\Sigma$ is $N \times N$ quadratic matrix of variance-covariance assets returns; $R_f$ is risk free fee and $w_i \ge 0$ is long only restriction.
This structure follow the theory of modern portfolio, which state that rational investor seeks the tangency portfolio offer high expected returns per unit of standard desviation (Markowitz, 79).
1/N portfolio¶
Using the same 5 assets, we computed 1/N portfolio simultaneosily, were the capital is allocated in equal propotions ignoring historical data. The weight formula used is:
$$w_i = \frac{1}{N} \quad \forall i=1, \dots, N$$
where $N = 5$, resulting in constant weight of $0.20$ (or 20%) per asset. Fixing the weight serve as banchmark but the fact of not using historical data imples that we are assuming zero estimation risk which make it high mean-variance portfolio diverse in scenarios where variance-covariance matrix is unistable or has excess noise (DeMiguel et al., 1920).
We then proceeeded with 5000 interations of Monte Carlo simulations, where for each K interation we selected 5 assets from set of 20, optimizing weights vectors $w_{mean-variance, k}$ and $w_{1/N, k}$ using in-sample data. Then we multipplied the weights by accumulated returns vectors of 1 year belloging to out-of-sample ($R_{test, k}$) to find final returns:
$$R_{portfolio, k} = \sum_{i=1}^{N} w_{i, k} \cdot R_{i, test, k}$$
N_SIMULATIONS = 5000
N_ASSETS_PER_PORTFOLIO = 5
np.random.seed(12)
mvo_outcomes = []
ew_outcomes = []
for i in range(N_SIMULATIONS):
selected_assets = np.random.choice(universe, N_ASSETS_PER_PORTFOLIO, replace=False)
train_subset = train_data[selected_assets]
mu = expected_returns.mean_historical_return(train_subset)
S = risk_models.sample_cov(train_subset)
try:
ef = EfficientFrontier(mu, S, weight_bounds=(0, 1))
ef.add_objective(objective_functions.L2_reg, gamma=0.1)
ef.max_sharpe()
w_mvo_dict = ef.clean_weights()
w_mvo = np.array([w_mvo_dict[ticker] for ticker in selected_assets])
except:
try:
ef = EfficientFrontier(mu, S, weight_bounds=(0, 1))
ef.min_volatility()
w_mvo_dict = ef.clean_weights()
w_mvo = np.array([w_mvo_dict[ticker] for ticker in selected_assets])
except:
w_mvo = np.repeat(1/N_ASSETS_PER_PORTFOLIO, N_ASSETS_PER_PORTFOLIO)
w_ew = np.repeat(1/N_ASSETS_PER_PORTFOLIO, N_ASSETS_PER_PORTFOLIO)
asset_test_returns = oos_asset_returns[selected_assets].values
mvo_oos_ret = np.dot(w_mvo, asset_test_returns)
ew_oos_ret = np.dot(w_ew, asset_test_returns)
mvo_outcomes.append(mvo_oos_ret)
ew_outcomes.append(ew_oos_ret)
mvo_outcomes = np.array(mvo_outcomes)
ew_outcomes = np.array(ew_outcomes)
results_df = pd.DataFrame({
'Metric': [
'Mean 1-Year OOS Return',
'Median 1-Year OOS Return',
'Standard Deviation (Volatility of Outcomes)',
'Max Return Achieved',
'Min Return Achieved',
'Skewness',
'Kurtosis'
],
'MVO Portfolio': [
np.mean(mvo_outcomes),
np.median(mvo_outcomes),
np.std(mvo_outcomes),
np.max(mvo_outcomes),
np.min(mvo_outcomes),
stats.skew(mvo_outcomes),
stats.kurtosis(mvo_outcomes)
],
'1/N (Equal Weight) Portfolio': [
np.mean(ew_outcomes),
np.median(ew_outcomes),
np.std(ew_outcomes),
np.max(ew_outcomes),
np.min(ew_outcomes),
stats.skew(ew_outcomes),
stats.kurtosis(ew_outcomes)
]
})
mvo_win_rate = np.mean(mvo_outcomes > ew_outcomes) * 100
print("\n Table 3: Statistical Comparison")
display(results_df.round(4))
print(f"\n Mena-variance optimization outperformance frequency (Win Rate): {mvo_win_rate:.2f}%")
Table 3: Statistical Comparison
| Metric | MVO Portfolio | 1/N (Equal Weight) Portfolio | |
|---|---|---|---|
| 0 | Mean 1-Year OOS Return | 0.0577 | 0.3238 |
| 1 | Median 1-Year OOS Return | 0.0300 | 0.2998 |
| 2 | Standard Deviation (Volatility of Outcomes) | 0.0923 | 0.1943 |
| 3 | Max Return Achieved | 0.5834 | 1.0115 |
| 4 | Min Return Achieved | -0.0785 | -0.0466 |
| 5 | Skewness | 1.4643 | 0.4960 |
| 6 | Kurtosis | 2.3455 | -0.2717 |
Mena-variance optimization outperformance frequency (Win Rate): 2.18%
fig = plt.figure(figsize=(18, 16))
grid = plt.GridSpec(3, 2, figure=fig, hspace=0.4)
ax1 = fig.add_subplot(grid[0, 0])
sns.kdeplot(mvo_outcomes, fill=True, color='blue', label='MVO Portfolio', ax=ax1, alpha=0.4)
sns.kdeplot(ew_outcomes, fill=True, color='orange', label='1/N Portfolio', ax=ax1, alpha=0.4)
ax1.axvline(np.mean(mvo_outcomes), color='blue', linestyle='--', label='MVO Mean')
ax1.axvline(np.mean(ew_outcomes), color='orange', linestyle='--', label='1/N Mean')
ax1.set_title('Figure 4: 1-Year Out-of-Sample Return Distributions (5,000 Sims)', fontsize=14)
ax1.set_xlabel('Cumulative Return (1 Year)')
ax1.set_ylabel('Density')
ax1.legend()
ax2 = fig.add_subplot(grid[0, 1])
sns.violinplot(data=[mvo_outcomes, ew_outcomes], palette=['blue', 'orange'], ax=ax2)
ax2.set_xticks([0, 1])
ax2.set_xticklabels(['MVO Portfolio', '1/N Portfolio'])
ax2.set_title('Figure 5: Return Dispersion and Quartiles', fontsize=14)
ax2.set_ylabel('1-Year Return')
ax3 = fig.add_subplot(grid[1, 0])
sns.ecdfplot(mvo_outcomes, color='blue', label='MVO Portfolio', ax=ax3, linewidth=2)
sns.ecdfplot(ew_outcomes, color='orange', label='1/N Portfolio', ax=ax3, linewidth=2)
ax3.set_title('Figure 6: Empirical CDF (Stochastic Dominance Analysis)', fontsize=14)
ax3.set_xlabel('Cumulative Return (1 Year)')
ax3.set_ylabel('Cumulative Probability')
ax3.legend()
ax4 = fig.add_subplot(grid[1, 1])
stats.probplot(mvo_outcomes, dist="norm", plot=ax4)
ax4.get_lines()[0].set_markerfacecolor('blue')
ax4.get_lines()[0].set_markeredgecolor('black')
ax4.get_lines()[0].set_alpha(0.3)
ax4.get_lines()[1].set_color('red')
ax4.set_title('Figure 7: Goodness of Fit (Q-Q Plot): MVO Results', fontsize=14)
mvo_sorted = np.sort(mvo_outcomes)
mu_mvo = np.mean(mvo_sorted)
std_mvo = np.std(mvo_sorted)
z_empirical = (mvo_sorted - mu_mvo) / std_mvo
n = len(z_empirical)
p_i = (np.arange(1, n + 1) - 0.5) / n
z_theoretical = stats.norm.ppf(p_i)
deviations = z_empirical - z_theoretical
se = (1 / stats.norm.pdf(z_theoretical)) * np.sqrt(p_i * (1 - p_i) / n)
ci_upper = 1.96 * se
ci_lower = -1.96 * se
ax5 = fig.add_subplot(grid[2, :])
ax5.plot(z_theoretical, deviations, 'o', markerfacecolor='blue', markeredgecolor='black', alpha=0.3, label='MVO Deviations (Worm)')
ax5.plot(z_theoretical, np.zeros_like(z_theoretical), 'r--', linewidth=2, label='Baseline (Perfect Normality)')
ax5.plot(z_theoretical, ci_upper, 'r:', linewidth=1.5, label='95% Confidence Interval')
ax5.plot(z_theoretical, ci_lower, 'r:', linewidth=1.5)
ax5.set_title('Figure 8: Worm Plot (Detrended Q-Q): Simulated MVO Portfolio Returns', fontsize=14)
ax5.set_xlabel('Theoretical Normal Quantiles')
ax5.set_ylabel('Deviation (Empirical - Theoretical)')
ax5.set_ylim(min(deviations)*1.2, max(deviations)*1.2)
ax5.legend()
plt.show()
The results (Table 1) show that althout the 1/N portfolio benefits from absence of parametric error estimation, the restricted mean-variance optimization portfolio compared to short selling achieved best performance with winning fee predominante in all simulations.
The accumulated returns distribution of 1 year (Figure 4) show that 1/N portfolio tends to concentrate its returns around the marked mean since allocates 20% of its capital equally to each asset ignoring if any asset is more volatill or not when compared to others.
On other hand the mean-variance optimization (MVO) distribution show positive skewness high probability of achieve higher returns. This implies its ability of penalize assets by disproportional variance and allocate more weight to efficient assets.
In goodness of fit of simulated results, we see that Worm Plot and Q-Q plot (Figure 2) show presence of heavy tails wich mean that market shocks occur in high frequence than in what normal distribution can predict. Ignoring historical data may lead to inaccurate prediction of returns, however this does not imply that 1/N portfolio is inferior, instand, it's highlights the comom problem between robustness in error estimation and sensitivity to distribution assumptions.
Part 3 and 4¶
We continued using the same assets described above, however the implied equilibrium returns was obteined using the following formula:
$$\Pi = \delta \Sigma w_{mkt}$$
where $\Pi$ is $N \times 1$ vector of equilibrium returns; $\delta$ is marked risk aversion coefficient; $\Sigma$ is $N \times N$ variance-covariance return matrix; $w_{mkt}$ is the vector of the weight assets.
The posterior expected return ($E[R]$) was finded using the following formula:
$$E[R] = [(\tau \Sigma)^{-1} + P^T \Omega^{-1} P]^{-1} [(\tau \Sigma)^{-1} \Pi + P^T \Omega^{-1} Q]$$
where $\tau$ is scalar constant that represent uncertainty of the marked prior distribution; $P$ is $K \times N$ matrix identifying the $K$ views formulated by analysts on N assets; $Q$ is $K \times 1$ column vector of expected returns associated to analysts views and $\Omega$ is $K \times K$ diagonal matrix that represent the variance of error terms (Black e Litterman, 30).
%pip install yfinance pandas numpy scipy matplotlib seaborn PyPortfolioOpt --quiet
import yfinance as yf
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from pypfopt import black_litterman, risk_models, EfficientFrontier
from pypfopt.black_litterman import BlackLittermanModel
assets = ['AAPL', 'NVDA', 'TSLA', 'XOM', 'REGN', 'LLY', 'JPM']
start_date = '2023-01-01'
end_date = '2025-06-30'
data = yf.download(assets, start=start_date, end=end_date)['Close']
returns = data.pct_change().dropna()
mcaps = {'AAPL': 2800, 'NVDA': 2200, 'TSLA': 550, 'XOM': 450, 'REGN': 100, 'LLY': 700, 'JPM': 500}
S = risk_models.sample_cov(data)
delta = 2.5
market_prior = black_litterman.market_implied_prior_returns(mcaps, delta, S)
asset_cols = list(data.columns)
P = np.zeros((4, len(assets)))
Q = np.array([0.25, 0.05, 0.02, 0.04])
nvda_idx = asset_cols.index('NVDA')
lly_idx = asset_cols.index('LLY')
regn_idx = asset_cols.index('REGN')
aapl_idx = asset_cols.index('AAPL')
jpm_idx = asset_cols.index('JPM')
xom_idx = asset_cols.index('XOM')
tsla_idx = asset_cols.index('TSLA')
P[0, nvda_idx] = 1.0
P[1, lly_idx] = 1.0; P[1, regn_idx] = -1.0
P[2, aapl_idx] = 1.0; P[2, jpm_idx] = -1.0
P[3, xom_idx] = 1.0; P[3, tsla_idx] = -1.0
bl = BlackLittermanModel(S, pi=market_prior, P=P, Q=Q, omega="idzorek", view_confidences=[0.8, 0.6, 0.4, 0.6])
posterior_rets = bl.bl_returns()
posterior_cov = bl.bl_cov()
ef_bl = EfficientFrontier(posterior_rets, posterior_cov, weight_bounds=(0, 1))
ef_bl.max_sharpe()
bl_weights = ef_bl.clean_weights()
total_mcap = sum(mcaps.values())
market_weights = {ticker: cap/total_mcap for ticker, cap in mcaps.items()}
comparison_df = pd.DataFrame({
'Market Cap Weights (Prior)': pd.Series(market_weights),
'Black-Litterman (Posterior)': pd.Series(bl_weights)
}).fillna(0) * 100
comparison_df.plot(kind='bar', figsize=(12, 6), color=['#A9A9A9', '#1F77B4'], edgecolor='black')
plt.title('Figure 9: Black-Litterman Portfolio Optimization: Prior vs. Posterior Allocations', fontsize=14)
plt.ylabel('Allocation (%)')
plt.axhline(0, color='black', linewidth=1)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
print("\n Table 4: Black-Litterman Optimal Weights")
display(comparison_df.round(2))
Note: you may need to restart the kernel to use updated packages.
[*********************100%***********************] 7 of 7 completed
Table 4: Black-Litterman Optimal Weights
| Market Cap Weights (Prior) | Black-Litterman (Posterior) | |
|---|---|---|
| AAPL | 38.36 | 26.50 |
| JPM | 6.85 | 7.44 |
| LLY | 9.59 | 14.42 |
| NVDA | 30.14 | 29.35 |
| REGN | 1.37 | 0.00 |
| TSLA | 7.53 | 0.00 |
| XOM | 6.16 | 22.31 |
Assets from which positive views was formulated such as NVDA and LLY suffer from underweight when comapreded to the market equilibrium portfolio. After incorporating analysts views the weights increased accordingly. However assets from which negative viewa was formulated such as TSLA and REGN registed overweights relative to its expressed expectations, having its allocations minimized to acomodate analyst viewrs. This behavior show that final results recalibrate the general market equilibrium, incorporating the information provided by specialized investors and generating portfolio that reflict views of analystes adjusted to the uncertainty in each prediction.
Step 4: Optimal increase and risk management using multivariate Kelly¶
Here we used levarage fractional, i.e, weights vectors to optimize the growth rate, using the following formula:
$$f^* = \Sigma^{-1} (\mu - r_f \mathbf{1})$$
See (Merton, 399) and (MacLean, Thorp e Ziemba, 685) for more explanation
import scipy.stats as stats
r_f = 0.04
annualized_returns = returns.mean() * 252
annualized_cov = returns.cov() * 252
excess_returns = annualized_returns - r_f
inv_cov = np.linalg.inv(annualized_cov.values)
full_kelly_weights = inv_cov.dot(excess_returns.values)
half_kelly_weights = 0.5 * full_kelly_weights
double_kelly_weights = 2.0 * full_kelly_weights
def calculate_daily_kelly_path(weights, daily_returns, rf_annual=0.04):
rf_daily = rf_annual / 252
weight_sum = np.sum(weights)
cash_weight = 1.0 - weight_sum
port_returns = daily_returns.dot(weights) + (cash_weight * rf_daily)
return port_returns
ret_full = calculate_daily_kelly_path(full_kelly_weights, returns)
ret_half = calculate_daily_kelly_path(half_kelly_weights, returns)
ret_double = calculate_daily_kelly_path(double_kelly_weights, returns)
wealth_full = (1 + ret_full).cumprod()
wealth_half = (1 + ret_half).cumprod()
wealth_double = (1 + ret_double).cumprod()
fig = plt.figure(figsize=(18, 12))
grid = plt.GridSpec(2, 2, figure=fig, hspace=0.3)
ax1 = fig.add_subplot(grid[0, 0])
ax1.plot(wealth_half.index, wealth_half, label='Half Kelly (0.5f*)', color='green', linewidth=2)
ax1.plot(wealth_full.index, wealth_full, label='Full Kelly (1.0f*)', color='blue', linewidth=2)
ax1.plot(wealth_double.index, wealth_double, label='Double Kelly (2.0f*)', color='red', linewidth=2)
ax1.set_yscale('log')
ax1.set_title('Figure 10: Kelly Strategies Cumulative Wealth (Log Scale)', fontsize=14)
ax1.set_ylabel('Portfolio Value (Log $)')
ax1.legend()
ax1.grid(True, which="both", ls="--", alpha=0.5)
def get_drawdown(wealth_path):
peak = wealth_path.cummax()
return (wealth_path - peak) / peak
ax2 = fig.add_subplot(grid[0, 1])
ax2.fill_between(wealth_half.index, get_drawdown(wealth_half), color='green', alpha=0.3, label='Half Kelly')
ax2.fill_between(wealth_full.index, get_drawdown(wealth_full), color='blue', alpha=0.3, label='Full Kelly')
ax2.fill_between(wealth_double.index, get_drawdown(wealth_double), color='red', alpha=0.3, label='Double Kelly')
ax2.set_title('Figure 11: Strategy Drawdowns', fontsize=14)
ax2.set_ylabel('Drawdown (%)')
ax2.legend()
mvo_sorted = np.sort(ret_full)
mu_mvo = np.mean(mvo_sorted)
std_mvo = np.std(mvo_sorted)
z_empirical = (mvo_sorted - mu_mvo) / std_mvo
n = len(z_empirical)
p_i = (np.arange(1, n + 1) - 0.5) / n
z_theoretical = stats.norm.ppf(p_i)
deviations = z_empirical - z_theoretical
se = (1 / stats.norm.pdf(z_theoretical)) * np.sqrt(p_i * (1 - p_i) / n)
ci_upper = 1.96 * se
ci_lower = -1.96 * se
ax3 = fig.add_subplot(grid[1, 0])
stats.probplot(ret_full, dist="norm", plot=ax3)
ax3.get_lines()[0].set_markerfacecolor('blue')
ax3.get_lines()[0].set_markeredgecolor('black')
ax3.get_lines()[0].set_alpha(0.5)
ax3.get_lines()[1].set_color('red')
ax3.set_title('Figure 12: Goodness of Fit (Q-Q): Full Kelly Returns', fontsize=14)
ax4 = fig.add_subplot(grid[1, 1])
ax4.plot(z_theoretical, deviations, 'o', markerfacecolor='blue', markeredgecolor='black', alpha=0.4, label='Deviations (Worm)')
ax4.plot(z_theoretical, np.zeros_like(z_theoretical), 'r--', linewidth=2, label='Perfect Normal Baseline')
ax4.plot(z_theoretical, ci_upper, 'r:', linewidth=1.5, label='95% Confidence Interval')
ax4.plot(z_theoretical, ci_lower, 'r:', linewidth=1.5)
ax4.set_title('Figure 13: Worm Plot (Detrended Q-Q): Diagnostic of Kelly Tail Risks', fontsize=14)
ax4.set_xlabel('Theoretical Normal Quantiles')
ax4.set_ylabel('Deviation (Empirical - Theoretical)')
ax4.legend()
plt.show()
print("\n Table 5: Kelly Leverage Weights (Fraction of Equity)")
kelly_df = pd.DataFrame({
'Half Kelly': half_kelly_weights,
'Full Kelly': full_kelly_weights,
'Double Kelly': double_kelly_weights
}, index=returns.columns).round(3)
display(kelly_df)
Table 5: Kelly Leverage Weights (Fraction of Equity)
| Half Kelly | Full Kelly | Double Kelly | |
|---|---|---|---|
| Ticker | |||
| AAPL | -0.503 | -1.005 | -2.010 |
| JPM | 2.235 | 4.471 | 8.941 |
| LLY | 1.323 | 2.647 | 5.294 |
| NVDA | 1.661 | 3.322 | 6.645 |
| REGN | -1.763 | -3.525 | -7.050 |
| TSLA | 0.061 | 0.122 | 0.244 |
| XOM | -0.157 | -0.314 | -0.629 |
Evaluating the cumulative wealth over time in logarithmic sclae (Figure 10), we see that Double Kelly exhibited any inicial strong increase when compared to others in many moments of high volatile, however, this led to catastrophic collapse in the beginning of 2025 lossing pratically all capital.
A Full Kelly, on other hand, showed best performance in many part of time providing the best hight comulative returns until the end of 2024 but is associated with high flutuations. The impacts of this flutuations are presented in Figure 11, were it's possible to see that Double Kelly achieved loss of -100% while Full Kelly suffered relagations up to 80%.
The Half Kelly strategy showed to be the most resiliente even if was not capable to achieve the high pick achieved by Full Kelly. We can see in the Figure 12 ans 13 that Full Kelly have heavy tails and terrible adjustment
Conclusion¶
This project showed the evolution and complexity of quantitatives strategies in allocating assets, highlighting the trade-off between maximizing returns and risk management. The traditional mean-variance showed to be unrelistic when used to find Sharpe Index, making essential to put some restriction in the weights if we need to diversify our portfolio.
Monte Carlo simulations showed that the restricted optimization is best that considering equal weight 1/N. The Aplication of Black-Litterman model showed to be efficient for calibrating weights of equilibrium market.
The Half Kelly strategy showed to be the most resiliente even if was not capable to achieve the high pick achieved by Full Kelly. We can see in the Figure 12 ans 13 that Full Kelly have heavy tails and terrible adjustment
References¶
Jacobs, Bruce I., and Kenneth N. Levy. "Long/Short Equity Investing." The Journal of Portfolio Management, vol. 20, no. 1, 1993, pp. 52-63.
Jagannathan, Ravi, and Tongshu Ma. "Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps." The Journal of Finance, vol. 58, no. 4, 2003, pp. 1651-1683.
Markowitz, Harry. "Portfolio Selection." The Journal of Finance, vol. 7, no. 1, 1952, pp. 77-91.
van Buuren, Stef, and Miranda Fredriks. "Worm plot: a simple diagnostic device for modelling growth reference curves." Statistics in Medicine, vol. 20, no. 8, 2001, pp. 1259-1277.
Cont, Rama. "Empirical properties of asset returns: stylized facts and statistical issues." Quantitative Finance, vol. 1, no. 2, 2001, pp. 223-236.
DeMiguel, Victor, Lorenzo Garlappi, and Raman Uppal. "Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy?." The Review of Financial Studies, vol. 22, no. 5, 2009, pp. 1915-1953.
Michaud, Richard O. "The Markowitz Optimization Enigma: Is 'Optimized' Optimal?" Financial Analysts Journal, vol. 45, no. 1, 1989, pp. 31-42.
Black, Fischer, e Robert Litterman. "Global Portfolio Optimization." Financial Analysts Journal, vol. 48, no. 5, 1992, pp. 28-43.
MacLean, Leonard C., Edward O. Thorp, e William T. Ziemba. "Long-term capital growth: the good and bad properties of the Kelly and fractional Kelly capital growth criteria." Quantitative Finance, vol. 4, no. 6, 2004, pp. 681-690.
Merton, Robert C. "Optimum consumption and portfolio rules in a continuous-time model." Journal of Economic Theory, vol. 3, no. 4, 1971, pp. 373-413.