Introduction¶
In the realm of financial data analysis, ensuring the quality and reliability of data is paramount. The significance of good quality data becomes evident when we examine its implications in financial modeling and decision-making processes. This assignment is structured to provide a comprehensive understanding of data quality, yield curve modeling, the exploitation of correlation, and empirical analysis of Exchange-Traded Funds (ETFs), all within the broader context of financial data analysis.
Firstly (Question 1), we will delve into the concept of data quality by providing examples of poor-quality structured and unstructured data, along with methods to identify their shortcomings. This foundational knowledge is crucial for recognizing the importance of maintaining high standards in financial data.
Next (Question 2), we will explore yield curve modeling by fitting Nelson-Siegel and Cubic-Spline models to government securities from a European Union (Germany), ranging from short-term to long-term maturities. The section 2 will highlight the comparative analysis of these models in terms of fit and interpretation, addressing the ethical considerations of data smoothing.
Furthermore, in Question 3 we will investigate the role of correlation and principal components in financial data analysis by generating uncorrelated Gaussian random variables and running Principal Component Analysis (PCA). The analysis will be extended to real data, collecting daily closing yields for government securities and comparing the results through screeplots.
Lastly, in Question 4, the empirical analysis of ETFs will involve computing daily returns, covariance matrices, PCA, and Singular Value Decomposition (SVD) for a sector ETF, providing a thorough explanation of each transformation and its implications.
Installing Packages¶
! pip install -q kaggle --quiet
! pip install pandas --quiet
! pip install numpy --quiet
! pip install matplotlib --quiet
! pip install seaborn --quiet
! pip install pingouin --quiet
! pip install factor_analyzer --quiet
! pip install gdown --quiet
! pip install datetime --quiet
! pip install requests --quiet
! pip install yfinance --quiet
! pip install pandas-datareader --quiet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 204.4/204.4 kB 5.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.8/42.8 kB 2.0 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheel for factor_analyzer (pyproject.toml) ... done ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.4/44.4 kB 2.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 52.6/52.6 kB 4.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 259.8/259.8 kB 10.2 MB/s eta 0:00:00
Importing packages¶
import pandas as pd
from factor_analyzer import FactorAnalyzer
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
from factor_analyzer.factor_analyzer import calculate_kmo
import pingouin as pg
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from numpy import linalg as LA
from scipy.interpolate import CubicSpline
from scipy.optimize import minimize
from datetime import datetime, timedelta
import time
from operator import index
import yfinance as yfin
from PIL.ImageChops import screen
from fontTools.misc.plistlib import end_key
from matplotlib.projections import projection_registry
from numpy.array_api import full_like
from numpy.core.shape_base import block
from numpy.ma.core import around, outer
from rich.jupyter import display
from scipy import stats
from numpy import linalg as la
from scipy.odr import exponential
from unicodedata import decimal
from IPython.display import set_matplotlib_formats
import math
import re
import gdown
import warnings
import requests
warnings.filterwarnings('ignore')
<ipython-input-6-1dea6b209a65>:19: UserWarning: The numpy.array_api submodule is still experimental. See NEP 47. from numpy.array_api import full_like
Question 1: Data Quality¶
Data quality is defined by its ability to meet the expectations of data consumers and accurately represent the objects, events, and concepts it is intended to depict (Sebastian xxx). Data can be classified into structured, unstructured, and semi-structured data (Praveen and Umesh 67). Beyond this classification, data possess dimensions that contribute to its quality, including accuracy, objectivity, believability, reputation, accessibility, access security, relevancy, value-added, timeliness, completeness, amount of data, interpretability, ease of understanding, concise representation, and currency (Huh et al. 560; Strong et al. 104). The absence of at least one of these dimensions can result in the data classification being preceded (prefix) by the term poor quality. Possible causes of poor data quality include missing values, data errors, discrepancies, biases, inconsistencies, static header data, changes in historical data, lack of transparency, reporting time issues, and misuse of data (Liu 34). Other causes can include multiple names for the same entity, missing values, incorrect values, and duplicate records for the same customer (Wang and Diane 7; Nadinić and Kalpić 328). Thus, to answer the questions of this exercise, we will base ourselves on these aspects.
Thus, to answer the questions of this exercise, we will base ourselves on these aspects.
a. Provide an example of poor quality structured data¶
Answer
file_id = "1Rp1Ubva5Q7ReW_zs7EowH3yge8rq-i5w"
url = f"https://drive.google.com/uc?id={file_id}"
output_filename = "financials.zip"
gdown.download(url, output_filename, quiet=False)
import os
if os.path.exists(output_filename):
print(f"File '{output_filename}' downloaded successfully!")
else:
print("File download failed.")
try:
import pandas as pd
datafin = pd.read_csv(output_filename)
# print(datafin.head())
except Exception as e:
print(f"Unable to read the file: {e}")
datafin.head()
Downloading... From: https://drive.google.com/uc?id=1Rp1Ubva5Q7ReW_zs7EowH3yge8rq-i5w To: /content/financials.zip 100%|██████████| 22.0k/22.0k [00:00<00:00, 33.8MB/s]
File 'financials.zip' downloaded successfully!
| Segment | Country | Product | Discount Band | Units Sold | Manufacturing Price | Sale Price | Gross Sales | Discounts | Sales | COGS | Profit | Date | Month Number | Month Name | Year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Government | Canada | Carretera | None | $1,618.50 | $3.00 | $20.00 | $32,370.00 | $- | $32,370.00 | $16,185.00 | $16,185.00 | 01/01/2014 | 1 | January | 2014 |
| 1 | Government | Germany | Carretera | None | $1,321.00 | $3.00 | $20.00 | $26,420.00 | $- | $26,420.00 | $13,210.00 | $13,210.00 | 01/01/2014 | 1 | January | 2014 |
| 2 | Midmarket | France | Carretera | None | $2,178.00 | $3.00 | $15.00 | $32,670.00 | $- | $32,670.00 | $21,780.00 | $10,890.00 | 01/06/2014 | 6 | June | 2014 |
| 3 | Midmarket | Germany | Carretera | None | $888.00 | $3.00 | $15.00 | $13,320.00 | $- | $13,320.00 | $8,880.00 | $4,440.00 | 01/06/2014 | 6 | June | 2014 |
| 4 | Midmarket | Mexico | Carretera | None | $2,470.00 | $3.00 | $15.00 | $37,050.00 | $- | $37,050.00 | $24,700.00 | $12,350.00 | 01/06/2014 | 6 | June | 2014 |
datafin.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 700 entries, 0 to 699 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Segment 700 non-null object 1 Country 700 non-null object 2 Product 700 non-null object 3 Discount Band 700 non-null object 4 Units Sold 700 non-null object 5 Manufacturing Price 700 non-null object 6 Sale Price 700 non-null object 7 Gross Sales 700 non-null object 8 Discounts 700 non-null object 9 Sales 700 non-null object 10 COGS 700 non-null object 11 Profit 700 non-null object 12 Date 700 non-null object 13 Month Number 700 non-null int64 14 Month Name 700 non-null object 15 Year 700 non-null int64 dtypes: int64(2), object(14) memory usage: 87.6+ KB
Data description: This dataset, sourced from Atharva Arya, provides detailed sales and profit information across various market segments and countries, including key metrics like units sold, sale price, gross sales, discounts, and profit. It enables in-depth analysis of sales performance, profitability, and time-based trends to support business insights and decision-making.
datafin.head()
| Segment | Country | Product | Discount Band | Units Sold | Manufacturing Price | Sale Price | Gross Sales | Discounts | Sales | COGS | Profit | Date | Month Number | Month Name | Year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Government | Canada | Carretera | None | $1,618.50 | $3.00 | $20.00 | $32,370.00 | $- | $32,370.00 | $16,185.00 | $16,185.00 | 01/01/2014 | 1 | January | 2014 |
| 1 | Government | Germany | Carretera | None | $1,321.00 | $3.00 | $20.00 | $26,420.00 | $- | $26,420.00 | $13,210.00 | $13,210.00 | 01/01/2014 | 1 | January | 2014 |
| 2 | Midmarket | France | Carretera | None | $2,178.00 | $3.00 | $15.00 | $32,670.00 | $- | $32,670.00 | $21,780.00 | $10,890.00 | 01/06/2014 | 6 | June | 2014 |
| 3 | Midmarket | Germany | Carretera | None | $888.00 | $3.00 | $15.00 | $13,320.00 | $- | $13,320.00 | $8,880.00 | $4,440.00 | 01/06/2014 | 6 | June | 2014 |
| 4 | Midmarket | Mexico | Carretera | None | $2,470.00 | $3.00 | $15.00 | $37,050.00 | $- | $37,050.00 | $24,700.00 | $12,350.00 | 01/06/2014 | 6 | June | 2014 |
# Checking if there are any NA values
datafin.isna().any().any()
False
# Checking if there are duplicated data
datafin.duplicated().sum()
0
datafin[' Discount Band '].unique()
array([' None ', ' Low ', ' Medium ', ' High '], dtype=object)
datafin.columns
Index(['Segment', 'Country', ' Product ', ' Discount Band ', ' Units Sold ',
' Manufacturing Price ', ' Sale Price ', ' Gross Sales ', ' Discounts ',
' Sales ', ' COGS ', ' Profit ', 'Date', 'Month Number',
' Month Name ', 'Year'],
dtype='object')
b. How would you recognize this poor quality?¶
Write 3 - 4 sentences that show how the data fails to include properties of good quality data.
Answer
Although the data does not have missing or duplicated values, it includes currency symbols ($), missing or incorrect values, like the placeholder ($-), commas, and extra spaces before and after the names of some variables as well as their observations. Additionally, quantitative variables are classified as objects rather than as integers (int) or floats, which makes it difficult to perform numerical operations or analysis.
c. Provide an example of poor quality unstructured data
file_id = "1NIakNIwqoDgw9CvFWrU-s-uuJHdu6KQu"
url = f"https://drive.google.com/uc?id={file_id}"
output_filename = "Airplane_Crashes_and_Fatalities_Since_1908.csv.zip"
gdown.download(url, output_filename, quiet=False)
import os
if os.path.exists(output_filename):
print(f"File '{output_filename}' downloaded successfully!")
else:
print("File download failed.")
try:
import pandas as pd
data = pd.read_csv(output_filename)
# print(data.head())
except Exception as e:
print(f"Unable to read the file: {e}")
data.head()
Downloading... From: https://drive.google.com/uc?id=1NIakNIwqoDgw9CvFWrU-s-uuJHdu6KQu To: /content/Airplane_Crashes_and_Fatalities_Since_1908.csv.zip 100%|██████████| 596k/596k [00:00<00:00, 103MB/s]
File 'Airplane_Crashes_and_Fatalities_Since_1908.csv.zip' downloaded successfully!
| index | Date | Time | Location | Operator | Flight # | Route | Type | Registration | cn/In | Aboard | Fatalities | Ground | Summary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 09/17/1908 | 17:18 | Fort Myer, Virginia | Military - U.S. Army | NaN | Demonstration | Wright Flyer III | NaN | 1 | 2.0 | 1.0 | 0.0 | During a demonstration flight, a U.S. Army fly... |
| 1 | 1 | 07/12/1912 | 06:30 | AtlantiCity, New Jersey | Military - U.S. Navy | NaN | Test flight | Dirigible | NaN | NaN | 5.0 | 5.0 | 0.0 | First U.S. dirigible Akron exploded just offsh... |
| 2 | 2 | 08/06/1913 | NaN | Victoria, British Columbia, Canada | Private | - | NaN | Curtiss seaplane | NaN | NaN | 1.0 | 1.0 | 0.0 | The first fatal airplane accident in Canada oc... |
| 3 | 3 | 09/09/1913 | 18:30 | Over the North Sea | Military - German Navy | NaN | NaN | Zeppelin L-1 (airship) | NaN | NaN | 20.0 | 14.0 | 0.0 | The airship flew into a thunderstorm and encou... |
| 4 | 4 | 10/17/1913 | 10:30 | Near Johannisthal, Germany | Military - German Navy | NaN | NaN | Zeppelin L-2 (airship) | NaN | NaN | 30.0 | 30.0 | 0.0 | Hydrogen gas which was being vented was sucked... |
Data description: This dataset, sourced from Data Society, compiles information on Boeing 707 accidents dating back to 1948. It includes various details such as the incident's date and time, location, and the operator of the aircraft. Additional data includes the flight number, the route taken, and the aircraft's type and registration number. Also recorded are the construction (serial) number, the total number of people aboard, fatalities, fatalities on the ground, and a summary of each incident.
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5268 entries, 0 to 5267 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 index 5268 non-null int64 1 Date 5268 non-null object 2 Time 3049 non-null object 3 Location 5248 non-null object 4 Operator 5250 non-null object 5 Flight # 1069 non-null object 6 Route 3561 non-null object 7 Type 5241 non-null object 8 Registration 4933 non-null object 9 cn/In 4040 non-null object 10 Aboard 5246 non-null float64 11 Fatalities 5256 non-null float64 12 Ground 5246 non-null float64 13 Summary 4878 non-null object dtypes: float64(3), int64(1), object(10) memory usage: 576.3+ KB
data.columns
Index(['index', 'Date', 'Time', 'Location', 'Operator', 'Flight #', 'Route',
'Type', 'Registration', 'cn/In', 'Aboard', 'Fatalities', 'Ground',
'Summary'],
dtype='object')
data.isna().sum()
| 0 | |
|---|---|
| index | 0 |
| Date | 0 |
| Time | 2219 |
| Location | 20 |
| Operator | 18 |
| Flight # | 4199 |
| Route | 1707 |
| Type | 27 |
| Registration | 335 |
| cn/In | 1228 |
| Aboard | 22 |
| Fatalities | 12 |
| Ground | 22 |
| Summary | 390 |
data.duplicated().sum()
0
groupdata = data.groupby(['Date', 'Time', 'Location', 'Flight #'])['Fatalities'].sum()
groupdata
| Fatalities | ||||
|---|---|---|---|---|
| Date | Time | Location | Flight # | |
| 01/01/1976 | 05:30 | Near Al Qaysumah, Saudi Arabia | 438 | 82.0 |
| 01/01/1978 | 20:15 | Off Bandra, Maharashtra, India | 855 | 213.0 |
| 01/01/1985 | 20:30 | Near La Paz, Bolivia | 980 | 29.0 |
| 01/01/2002 | 18:02 | Hollywood, Florida | - | 5.0 |
| 01/01/2007 | 14:07 | Off Parepare, Indonesia | 574 | 102.0 |
| ... | ... | ... | ... | ... |
| 12/29/1991 | 15:05 | Wanli, Taiwan | 358 | 5.0 |
| 12/29/1994 | 15:30 | Near Van, Turkey | 278 | 57.0 |
| 12/30/1945 | 21:13 | New York City, New York | 14 | 1.0 |
| 12/30/1951 | 22:07 | Near Fairbanks, Alaska | 501 | 4.0 |
| 12/31/1968 | 08:30 | Port Hedland, WA, Australia | 1750 | 26.0 |
996 rows × 1 columns
filtered_data=data[(data['Time'].isna() & data['Flight #'].isna()) & (data['Fatalities'] >= 10)]
filtered_data.head()
| index | Date | Time | Location | Operator | Flight # | Route | Type | Registration | cn/In | Aboard | Fatalities | Ground | Summary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7 | 7 | 07/28/1916 | NaN | Near Jambol, Bulgeria | Military - German Army | NaN | NaN | Schutte-Lanz S-L-10 (airship) | NaN | NaN | 20.0 | 20.0 | 0.0 | Crashed near the Black Sea, cause unknown. |
| 10 | 10 | 11/21/1916 | NaN | Mainz, Germany | Military - German Army | NaN | NaN | Super Zeppelin (airship) | NaN | NaN | 28.0 | 27.0 | 0.0 | Crashed in a storm. |
| 12 | 12 | 03/04/1917 | NaN | Near Gent, Belgium | Military - German Army | NaN | NaN | Airship | NaN | NaN | 20.0 | 20.0 | 0.0 | Caught fire and crashed. |
| 13 | 13 | 03/30/1917 | NaN | Off Northern Germany | Military - German Navy | NaN | NaN | Schutte-Lanz S-L-9 (airship) | NaN | NaN | 23.0 | 23.0 | 0.0 | Struck by lightning and crashed into the Balti... |
| 19 | 19 | 05/10/1918 | NaN | Off Helgoland Island, Germany | Military - German Navy | NaN | NaN | Zeppelin L-70 (airship) | NaN | NaN | 22.0 | 22.0 | 0.0 | Shot down by British aircraft crashing from a ... |
filtered_data.shape
(986, 14)
d. Unstructured data can be more difficult to assess than structured data. Just you did in part b, write 3 - 4 sentences that show how this unstructured data fails to check requirements of good quality data.
Answer
Although there are no duplicate data, there is an excess of missing values (NaN), even in situations where fatalities are recorded. In other words, there are missing values in columns like "Flight #" and "cn/In," which makes it difficult to fully assess the incident details and compare across records. For instance, a filter was applied to all rows with missing values in the "Date" and "Flight #" variables but with fatalities above 10, and it was found that there are 986 rows, which reinforces the fact of poor quality. Additionally, variables like "Date" are not properly formatted (datatype), and there is an "Index" column that does not contribute to the analysis, further indicating that the data is not structured properly (unstructured).
Question 2: Yield Curve Modeling¶
a. Pick government securities from a country. The country selected should be one of the countries from your group so that you can fit a Nelson-Siegel mode
Answer
Fitting a Nelson-Siegel model on EU government securities
base_url = 'https://sdw-wsrest.ecb.europa.eu/service/'
params = {
'startPeriod': '2020-01-01',
'endPeriod': '2024-12-31',
'format': 'jsondata'
}
maturities = ['SR_3M', 'SR_1Y', 'SR_2Y', 'SR_5Y', 'SR_10Y', 'SR_30Y']
data_dict = {}
for maturity in maturities:
url = f"{base_url}data/YC/B.U2.EUR.4F.G_N_A.SV_C_YM.{maturity}"
response = requests.get(url, params=params)
data = response.json()
# Extract values (first element of each observation array)
values = [obs[0] for obs in data['dataSets'][0]['series']['0:0:0:0:0:0:0']['observations'].values()]
# Get dates from structure
dates = [date['id'] for date in data['structure']['dimensions']['observation'][0]['values']]
data_dict[maturity] = pd.Series(values, index=pd.to_datetime(dates))
df = pd.DataFrame(data_dict)
df.columns = ['3M', '1Y', '2Y', '5Y', '10Y', '30Y']
b. Be sure to pick maturities ranging from short-term to long-term (e.g. 6 month maturity to 20 or 30 year maturities).
Answer
df
| 3M | 1Y | 2Y | 5Y | 10Y | 30Y | |
|---|---|---|---|---|---|---|
| 2020-01-02 | -0.623324 | -0.634844 | -0.622180 | -0.478606 | -0.173995 | 0.358465 |
| 2020-01-03 | -0.595449 | -0.630324 | -0.638365 | -0.520394 | -0.222926 | 0.306064 |
| 2020-01-06 | -0.597296 | -0.637580 | -0.651086 | -0.541757 | -0.247426 | 0.283818 |
| 2020-01-07 | -0.605300 | -0.641443 | -0.650553 | -0.532864 | -0.232130 | 0.304965 |
| 2020-01-08 | -0.597340 | -0.633612 | -0.641188 | -0.515378 | -0.204580 | 0.341047 |
| ... | ... | ... | ... | ... | ... | ... |
| 2024-12-20 | 2.616587 | 2.172129 | 1.960729 | 2.049615 | 2.374684 | 2.445430 |
| 2024-12-23 | 2.632250 | 2.201549 | 1.997799 | 2.084822 | 2.402344 | 2.456214 |
| 2024-12-24 | 2.661780 | 2.211694 | 2.003148 | 2.090155 | 2.405459 | 2.455847 |
| 2024-12-27 | 2.653268 | 2.188540 | 2.007697 | 2.126310 | 2.431800 | 2.494656 |
| 2024-12-30 | 2.575177 | 2.178646 | 2.011151 | 2.130018 | 2.447304 | 2.513773 |
1278 rows × 6 columns
df.to_csv('yield_curves.csv')
def nelson_siegel(t, params):
beta0, beta1, beta2, tau = params
# Avoid division by zero
exp_term = np.exp(-t/tau)
term = (1 - exp_term)/(t/tau)
return beta0 + beta1 * term + beta2 * (term - exp_term)
def objective(params, t, y):
return np.sum((nelson_siegel(t, params) - y)**2)
# Convert maturities to years
maturities = np.array([0.25, 1, 2, 5, 10, 30])
# Fit for each date
results = []
for date in df.index:
yields = df.loc[date].values
# Initial parameters [β₀, β₁, β₂, τ]
initial_guess = [yields[-1], yields[0]-yields[-1], 0, 2]
# Bounds for parameters
bounds = [(0, 15), (-15, 15), (-15, 15), (0.1, 10)]
res = minimize(objective, initial_guess, args=(maturities, yields),
bounds=bounds, method='L-BFGS-B')
# Store parameters and fitted yields
params = res.x
fitted_yields = nelson_siegel(maturities, params)
rmse = np.sqrt(np.mean((yields - fitted_yields)**2))
results.append({
'date': date,
'beta0': params[0], # Level
'beta1': params[1], # Slope
'beta2': params[2], # Curvature
'tau': params[3], # Decay
'rmse': rmse
})
# Create DataFrame of results
results_df = pd.DataFrame(results).set_index('date')
results_df
| beta0 | beta1 | beta2 | tau | rmse | |
|---|---|---|---|---|---|
| date | |||||
| 2020-01-02 | 0.698999 | -1.310728 | -1.646603 | 3.455619 | 0.001258 |
| 2020-01-03 | 0.640179 | -1.215057 | -1.799441 | 3.325815 | 0.000900 |
| 2020-01-06 | 0.620464 | -1.195045 | -1.839001 | 3.329411 | 0.000919 |
| 2020-01-07 | 0.644782 | -1.228894 | -1.830838 | 3.332548 | 0.000943 |
| 2020-01-08 | 0.683263 | -1.259286 | -1.862525 | 3.289445 | 0.000826 |
| ... | ... | ... | ... | ... | ... |
| 2024-12-20 | 2.569276 | 0.315362 | -2.540065 | 1.188862 | 0.035549 |
| 2024-12-23 | 2.576592 | 0.318141 | -2.435052 | 1.169326 | 0.037266 |
| 2024-12-24 | 2.575070 | 0.360573 | -2.474106 | 1.149855 | 0.037276 |
| 2024-12-27 | 2.607960 | 0.329663 | -2.519298 | 1.095611 | 0.032313 |
| 2024-12-30 | 2.632903 | 0.187122 | -2.373655 | 1.157329 | 0.034248 |
1278 rows × 5 columns
c. Fit a Nelson-Siegel model.
Answer
sns.set()
# Read data
df = pd.read_csv('yield_curves.csv', index_col=0)
df.index = pd.to_datetime(df.index)
# Nelson-Siegel function
def nelson_siegel(t, params):
beta0, beta1, beta2, tau = params
exp_term = np.exp(-t/tau)
return beta0 + beta1 * ((1 - exp_term)/(t/tau)) + beta2 * ((1 - exp_term)/(t/tau) - exp_term)
# Objective function
def objective(params, maturities, rates):
return np.sum((nelson_siegel(maturities, params) - rates)**2)
# Convert maturities to years
maturities = np.array([0.25, 1, 2, 5, 10, 30])
latest_rates = df.iloc[-1].values
# Fit Nelson-Siegel model
initial_guess = [3, -2, 2, 1] # [beta0, beta1, beta2, tau]
result = minimize(objective, initial_guess, args=(maturities, latest_rates))
fitted_params = result.x
# Generate smooth curve
t_points = np.linspace(0.1, 30, 100)
fitted_curve = nelson_siegel(t_points, fitted_params)
# Plot
plt.figure(figsize=(10, 6))
sns.set_style('whitegrid')
sns.scatterplot(x=maturities, y=latest_rates, color='blue', s=100, label='Actual Yields')
plt.plot(t_points, fitted_curve, 'r-', label='Nelson-Siegel Fit')
plt.xlabel('Maturity (years)')
plt.ylabel('Yield (%)')
plt.title(f'Figure 2.1: Nelson-Siegel Yield Curve Fit ({df.index[-1].strftime("%Y-%m-%d")})')
plt.legend()
plt.grid(True)
print("Fitted Parameters:")
print(f"β0 (level): {fitted_params[0]:.4f}")
print(f"β1 (slope): {fitted_params[1]:.4f}")
print(f"β2 (curvature): {fitted_params[2]:.4f}")
print(f"τ (decay): {fitted_params[3]:.4f}")
plt.show()
Fitted Parameters: β0 (level): 2.0494 β1 (slope): -0.3579 β2 (curvature): 1.3981 τ (decay): 0.2822
d. Fit a Cubic-Spline model.
Answer
cs = CubicSpline(maturities, latest_rates)
# Generate smooth curve
t_points = np.linspace(0.1, 30, 100)
fitted_curve = cs(t_points)
# Plot
plt.figure(figsize=(10, 6))
sns.set_style('whitegrid')
sns.scatterplot(x=maturities, y=latest_rates, color='blue', s=100, label='Actual Yields')
plt.plot(t_points, fitted_curve, 'r-', label='Cubic Spline Fit')
plt.xlabel('Maturity (years)')
plt.ylabel('Yield (%)')
plt.title(f'Figure 2.2: Cubic Spline Yield Curve Fit ({df.index[-1].strftime("%Y-%m-%d")})')
plt.legend()
plt.grid(True)
plt.show()
e. Compare the models in terms of 1) fit and 2) interpretation
Answer
In analyzing the Nelson-Siegel (Figure 2.1) and cubic models (Figure 2.2) for interest rate term structure, both models offer distinct advantages. The Nelson-Siegel model uses three parameters to shape the yield curve, making it flexible and straightforward to interpret (Wahlstrøm et al. 971). This model excels in its simplicity and ability to clearly represent economic factors like long-term rates, curve steepness, and mid-maturity humps (Akinyemi et al. 14). The Nelson-Siegel model produces a relatively smooth curve that captures the general trend but misses some key data points. The model parameters shown ($\beta_{0}$=2.6494, $\beta_{1}$=-0.2579, $\beta_{2}$=1.3981, $\tau$=0.2822) generate a curve that notably underestimates yields at both very short and long maturities. The misfit is particularly visible at the 30-year point where the actual yield is around 2.5% but the fitted curve predicts closer to 2.1%.
Conversely, the cubic model employs a third-degree polynomial to capture complex, non-linear data trends. Although it provides a more precise fit for intricate patterns, interpreting its coefficients demands careful analysis. Overall, the Nelson-Siegel model is favored for its ease of use and clarity, while the cubic model is better suited for detailed data fitting but requires deeper examination of its terms (Nymand 43). Cubic Spline model produces a more flexible fit that passes much closer to the observed data points. It captures both the short-term dynamics and the long-term behavior more accurately. The curve shows more local variation, particularly evident in how it models the "belly" of the yield curve between 5 and 15 years.
f. Be sure to specify at the levels of model parameters (ex. Alpha1).
Answer
The Nelson-Siegel model offers clear economic interpretation of its parameters. $\beta_{0}$ (2.6494) represents the long-term interest rate level, $\beta_{1}$ (-0.2579) controls the slope of the yield curve, and $\beta_{2}$ (1.3981) determines the curvature. The decay parameter $\tau$ (0.2822) governs how quickly the short-term components decay. These parameters have direct economic meaning and can be used to understand market expectations about future interest rates. The Cubic Spline model, while providing better fit, lacks such clear economic interpretation. Its parameters are purely mathematical, representing the coefficients of cubic polynomials between knot points. This makes it harder to extract meaningful economic insights about market expectations or risk premiums from the model parameters.
g. In Module 2 Lesson 4 (‘Smoothing Data’), we said smoothing data can be unethical. If Nelson-Siegel is smoothing the yield curve, is this considered unethical? Why or why not?
Answer
Using smoothing techniques like the Nelson-Siegel method to estimate yield curves is typically ethical when the goal is to improve the accuracy of models and better understand market trends. These methods help to remove noise and clarify long-term data patterns. However, it crosses into unethical territory if used to mislead stakeholders or distort data, such as by inflating asset values or hiding volatility. The key to ethical use lies in being transparent about the methodology and intentions. Problems arise when the smoothing process is used to misrepresent the true nature of the data (Copeland 103).
3. Exploiting Correlation¶
Financial Data is meant not only to process data but to understand how meaningful factors can be used to summarize or represent the data. Let’s understand the role that correlation and principal components play.
a) Generate 5 uncorrelated Gaussian random variables that simulate yield changes (they can be positive or negative with a mean close to 0 and a standard deviation that is small).
Answer¶
Table of first five values generated
np.random.seed(432)
num_samples = 100
num_variables = 5
mean=0
std_deviation=0.01
data=np.random.normal(loc=mean,scale=std_deviation, size=(num_samples,num_variables))
dataF=pd.DataFrame(data)
dataF.columns=["Variable 1","Variable 2","Variable 3","Variable 4","Variable 5"]
dataF.head()
| Variable 1 | Variable 2 | Variable 3 | Variable 4 | Variable 5 | |
|---|---|---|---|---|---|
| 0 | 0.001869 | 0.000345 | -0.003848 | 0.004646 | 0.006009 |
| 1 | -0.017359 | -0.003120 | 0.003078 | -0.002261 | -0.000731 |
| 2 | 0.000010 | 0.003327 | 0.019509 | -0.007947 | 0.006518 |
| 3 | -0.000422 | -0.003114 | 0.000570 | 0.003264 | -0.003010 |
| 4 | 0.002769 | 0.005642 | 0.005877 | 0.016587 | -0.014441 |
Correlation matrix
matrix_corr = dataF.corr()
print(matrix_corr)
Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 Variable 1 1.000000 -0.008919 0.039846 0.013316 0.202243 Variable 2 -0.008919 1.000000 0.121684 0.155779 -0.133540 Variable 3 0.039846 0.121684 1.000000 -0.130289 -0.111868 Variable 4 0.013316 0.155779 -0.130289 1.000000 -0.092115 Variable 5 0.202243 -0.133540 -0.111868 -0.092115 1.000000
Correloplot
corr = dataF.corr()
f, ax =plt.subplots(figsize=(9, 6))
mask=np.triu(np.ones_like(corr, dtype=bool))
cmap=sns.diverging_palette(230, 20, n=256, as_cmap=True)
sns.heatmap(dataF.corr(),
mask=mask,
cmap=cmap,
vmax=1,
vmin = -.25,
center=0,
square=True,
linewidths=.5,
annot = True,
fmt='.3f',
annot_kws={'size': 16},
cbar_kws={"shrink": .75})
plt.title('Figure 3.1: Correloplot')
plt.tight_layout()
ax.tick_params(axis = 'x', labelsize = 14)
ax.tick_params(axis = 'y', labelsize = 14)
ax.set_ylim(len(corr))
plt.show()
As we see in the correleplot (Figure 3.1) all correlation values a near zero, which aline to what exercise need, 5 uncorrelated exercises.
b) Run a Principal Components using EITHER the correlation OR covariance matrix.
Answer
Eigenvalues
# Calculating eigenvectors and eigenvalues of the covariance matrix of simulated dataset
dataF_cov = dataF.cov()
eigenvalues, eigenvectors = LA.eig(dataF_cov)
sorted_indices=np.argsort(eigenvalues)[::-1]
eigenvectors=eigenvectors[:,sorted_indices]
eigenvalues=eigenvalues[sorted_indices]
eigenvalues
array([1.27782260e-04, 1.14523588e-04, 9.47725977e-05, 7.27875167e-05,
6.21821048e-05])
Eigenvectors
eigenvectors
array([[ 0.1038149 , -0.21881596, -0.68515343, -0.26798617, -0.63252686],
[-0.49630891, 0.41260863, -0.40928584, 0.64272115, -0.053162 ],
[-0.74956086, -0.50816345, -0.1023671 , -0.29393446, 0.28818733],
[-0.01974409, 0.64348683, -0.31144041, -0.59654411, 0.3642456 ],
[ 0.42505132, -0.33101102, -0.50554497, 0.26987143, 0.61754091]])
Principal components
principal_components =dataF.dot(eigenvectors)
principal_components.columns = ["PC_1","PC_2","PC_3","PC_4","PC_5"]
principal_components.head()
| PC_1 | PC_2 | PC_3 | PC_4 | PC_5 | |
|---|---|---|---|---|---|
| 0 | 0.005369 | 0.002689 | -0.005512 | -0.000298 | 0.003094 |
| 1 | -0.002826 | -0.000266 | 0.013929 | 0.002893 | 0.010758 |
| 2 | -0.013346 | -0.015815 | -0.004186 | 0.002901 | 0.006569 |
| 3 | -0.000269 | 0.001614 | 0.002010 | -0.004816 | -0.000073 |
| 4 | -0.013384 | 0.014189 | -0.002674 | -0.012635 | -0.003234 |
c) Write a paragraph explaining how the variances of each component compare with each other. In this paragraph, you will address the following question: how much variance is explained by Component 1, Component 2, Component 3?
Answer
# Put data into a DataFrame
df_eigval = pd.DataFrame({"Eigenvalues":eigenvalues}, index=range(1,6))
# Work out explained proportion
df_eigval["Explained variance proportion"] = df_eigval["Eigenvalues"] / np.sum(df_eigval["Eigenvalues"])
df_eigval["Cumulative variance proportion"] = df_eigval["Explained variance proportion"].cumsum()
#Format as percentage
df_eigval.style.format({"Explained variance proportion": "{:.2%}",
"Cumulative variance proportion": "{:.2%}"})
| Eigenvalues | Explained variance proportion | Cumulative variance proportion | |
|---|---|---|---|
| 1 | 0.000128 | 27.07% | 27.07% |
| 2 | 0.000115 | 24.26% | 51.33% |
| 3 | 0.000095 | 20.08% | 71.41% |
| 4 | 0.000073 | 15.42% | 86.83% |
| 5 | 0.000062 | 13.17% | 100.00% |
The component 1, component 2 and component 3 explain 27.07%, 24.26% and 20.08%, respectively, of the total variability of our dataset. Cumulatively, the first three components explain 71.41% of the total data variability.
d. Produce a screeplot (see https://en.wikipedia.org/wiki/Scree_plot) of the variance explained for each component.
Answer
plt.figure(figsize=(8, 5))
plt.plot(range(1,len(eigenvalues)+1), eigenvalues, marker="o",linestyle="--")
plt.title("Figure 3.2: Scree Plot")
plt.xlabel("Principal Components")
plt.ylabel("Eigenvalue")
plt.xticks(range(1,len(eigenvalues)+1))
plt.grid(True)
e. Collect the daily closing yields for 5 government securities, say over 6 months.
Answer
api_key = "ec67a401ae513935b544843d8c308536" # Alex API
series_ids = [
'DGS1',
'DGS2',
'DGS5',
'DGS10',
'DGS30'
]
base_url = "https://api.stlouisfed.org/fred/series/observations"
# Date range for the past 7 months
end_date = datetime.now().date()
start_date = end_date - timedelta(days=7*30) # 7 months ago
# Function to fetch yield data
def get_yield_data(series_id, start_date, end_date):
params = {
"series_id": series_id,
"api_key": api_key,
"file_type": "json",
"observation_start": str(start_date),
"observation_end": str(end_date),
}
response = requests.get(base_url, params=params)
if response.status_code == 200:
res_data = response.json()
if 'observations' in res_data:
obs_data = pd.DataFrame(res_data['observations'])
obs_data['date'] = pd.to_datetime(obs_data['date'])
obs_data.set_index('date', inplace=True)
obs_data['value'] = pd.to_numeric(obs_data['value'], errors='coerce')
return obs_data
return None
# Collect yield data for each series
yields_dict = {}
for series_id in series_ids:
yields_dict[series_id] = get_yield_data(series_id, start_date, end_date)
# Combine data into a single DataFrame
yields_df = pd.DataFrame({series_id: data['value'] for series_id, data in yields_dict.items() if data is not None})
yields_df.columns = ['1 Year', '2 Year', '5 Year', '10 Year', '30 Year']
# Compute daily yield changes using .diff()
daily_yield_changes = yields_df.diff()
daily_yield_changes.dropna(inplace=True)
print(daily_yield_changes)
1 Year 2 Year 5 Year 10 Year 30 Year date 2024-06-25 0.00 -0.06 -0.02 -0.02 -0.02 2024-06-26 0.03 0.06 0.07 0.09 0.09 2024-06-27 -0.03 -0.01 -0.03 -0.03 -0.02 2024-06-28 -0.01 0.01 0.04 0.07 0.08 2024-07-01 0.01 0.06 0.11 0.12 0.13 ... ... ... ... ... ... 2025-01-10 0.09 0.13 0.13 0.09 0.04 2025-01-13 -0.01 0.00 0.02 0.02 0.01 2025-01-14 -0.02 -0.03 -0.02 -0.01 0.01 2025-01-15 -0.03 -0.10 -0.14 -0.12 -0.10 2025-01-16 -0.01 -0.04 -0.06 -0.05 -0.04 [134 rows x 5 columns]
f. Be sure to compute the daily yield changes!
plt.figure(figsize=(12, 6))
for column in daily_yield_changes.columns:
plt.plot(daily_yield_changes.index, daily_yield_changes[column], label=column)
plt.title('Figure 3.3: Daily Yield Changes for U.S. Treasury Securities')
plt.xlabel('Date')
plt.ylabel('Daily Yield Change (%)')
plt.legend(title="Maturities")
plt.grid(True)
plt.show()
g. Re-run the Principal Components using EITHER the correlation or covariance matrix.
Answer of question g
Eigenvalues
yields_df_cov = daily_yield_changes.cov()
eigenvalues, eigenvectors = LA.eig(yields_df_cov)
sorted_indices=np.argsort(eigenvalues)[::-1]
eigenvectors=eigenvectors[:,sorted_indices]
eigenvalues=eigenvalues[sorted_indices]
eigenvalues
array([1.27432534e-02, 1.86871188e-03, 2.76677987e-04, 1.32902058e-04,
3.25563762e-05])
Eigenvectors
eigenvectors
array([[-0.366009 , -0.535143 , -0.73188899, 0.20908883, -0.01672587],
[-0.49679164, -0.45651126, 0.40733949, -0.61471393, -0.03157313],
[-0.50391522, 0.02895065, 0.40052634, 0.62880218, 0.43522153],
[-0.45748966, 0.36706634, 0.02217696, 0.15266311, -0.79508973],
[-0.39497866, 0.60798323, -0.37080976, -0.39963788, 0.42087758]])
Principal components
principal_components =yields_df.dot(eigenvectors)
principal_components.columns = ["PC_1","PC_2","PC_3","PC_4","PC_5"]
principal_components.dropna(inplace=True)
principal_components.head()
| PC_1 | PC_2 | PC_3 | PC_4 | PC_5 | |
|---|---|---|---|---|---|
| date | |||||
| 2024-06-24 | -10.032590 | -0.532780 | -1.633712 | -0.245560 | 0.088697 |
| 2024-06-25 | -9.975655 | -0.525469 | -1.659190 | -0.216314 | 0.089371 |
| 2024-06-26 | -10.128439 | -0.479133 | -1.660047 | -0.225135 | 0.083761 |
| 2024-06-27 | -10.075749 | -0.482554 | -1.647428 | -0.240712 | 0.086957 |
| 2024-06-28 | -10.160836 | -0.406276 | -1.648127 | -0.245083 | 0.082232 |
h. How do the variances of each component compare? In other words, how much variance is explained by Component 1, Component 2, Component 3, etc.?
# Put data into a DataFrame
df_eigval = pd.DataFrame({"Eigenvalues":eigenvalues}, index=range(1,6))
# Work out explained proportion
df_eigval["Explained variance proportion"] = df_eigval["Eigenvalues"] / np.sum(df_eigval["Eigenvalues"])
df_eigval["Cumulative variance proportion"] = df_eigval["Explained variance proportion"].cumsum()
#Format as percentage
df_eigval.style.format({"Explained variance proportion": "{:.2%}",
"Cumulative variance proportion": "{:.2%}"})
| Eigenvalues | Explained variance proportion | Cumulative variance proportion | |
|---|---|---|---|
| 1 | 0.012743 | 84.65% | 84.65% |
| 2 | 0.001869 | 12.41% | 97.06% |
| 3 | 0.000277 | 1.84% | 98.90% |
| 4 | 0.000133 | 0.88% | 99.78% |
| 5 | 0.000033 | 0.22% | 100.00% |
The component 1, component 2, component 3, component 4 and component 5 explain 84.31%, 12.76%, 1.81%, 0.89% and 0.23%, respectively, of the total variability of our dataset. Cumulatively, the first two components explain 97.07% of the total data variability.
i. Produce a screeplot of the variance explained for each component.
plt.figure(figsize=(8,5))
plt.plot(range(1,len(eigenvalues)+1), eigenvalues,marker="o",linestyle="--")
plt.title("Figure 3.4: Scree Plot")
plt.xlabel("Principal Components")
plt.ylabel("Eigenvalue")
plt.xticks(range(1,len(eigenvalues)+1))
plt.grid(True)
j. How does the screeplot from the uncorrelated data compare with the screeplot from the government data?
Answer
In the screeplot of uncorrelated data (Figure 3.2), we observe a gradual decline in the dashed line, indicating the need for many principal components to achieve a high proportion of explained variance. Conversely, in the screeplot of government data (Figure 3.4), the dashed line shows a more pronounced decline. This suggests that in correlated data, the first few principal components explain a larger proportion of variability compared to the later components.
4. Empirical Analysis of ETFs Pick a sector ETF (in the US, for example, XLRE)¶
a. Find the 30 largest holdings.
Answer
The 30 largest holdings of XLI Industrials are
keys = ['XLI']
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0"}
def main(url):
with requests.Session() as req:
req.headers.update(headers)
for key in keys:
r = req.get(url.format(key))
#print(f"Extracting: {r.url}")
goal = re.findall(r'etf\\\/(.*?)\\', r.text)
print(goal[:30])
main("https://www.zacks.com/funds/etf/{}/holding")
['GE', 'CAT', 'RTX', 'HON', 'UNP', 'BA', 'ETN', 'UBER', 'ADP', 'DE', 'LMT', 'UPS', 'GEV', 'TT', 'PH', 'WM', 'TDG', 'EMR', 'MMM', 'ITW', 'GD', 'NOC', 'FDX', 'CTAS', 'CSX', 'CARR', 'PCAR', 'JCI', 'NSC', 'CPRT']
b. Get at least 6 months of data (~ 120 data points).
Answer
130 recent data points of XLI are
xlf = yfin.download(['GE', 'CAT', 'RTX', 'UBER', 'ETN', 'HON', 'UNP', 'ADP', 'DE', 'BA', 'LMT', 'GEV', 'UPS', 'TT', 'PH', 'WM', 'EMR', 'CTAS', 'ITW', 'TDG', 'MMM', 'GD', 'CSX', 'NOC', 'FDX', 'CARR', 'PCAR', 'NSC', 'URI', 'JCI'])['Close']
xlf.index = pd.to_datetime(xlf.index).strftime("%Y-%m-%d")
xlf=xlf.dropna().iloc[-130:]
xlf
[*********************100%***********************] 30 of 30 completed
| Ticker | ADP | BA | CARR | CAT | CSX | CTAS | DE | EMR | ETN | FDX | ... | PCAR | PH | RTX | TDG | TT | UBER | UNP | UPS | URI | WM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2024-07-15 | 240.175369 | 179.110001 | 66.957771 | 343.134705 | 34.364563 | 178.026718 | 368.950012 | 115.904816 | 325.018585 | 302.040466 | ... | 101.046936 | 541.951538 | 100.543625 | 1176.302490 | 340.164673 | 72.290001 | 233.394363 | 140.607178 | 704.445190 | 214.500183 |
| 2024-07-16 | 243.323074 | 186.050003 | 69.462715 | 357.831818 | 35.089600 | 180.666565 | 374.747284 | 117.787514 | 330.415192 | 310.457428 | ... | 105.241524 | 565.768982 | 102.581795 | 1212.824585 | 345.471649 | 74.300003 | 240.386597 | 143.709030 | 739.896057 | 219.356216 |
| 2024-07-17 | 246.718231 | 184.839996 | 67.176453 | 355.072998 | 35.208782 | 179.254501 | 382.311493 | 117.173164 | 309.584106 | 307.397614 | ... | 105.550797 | 555.083984 | 103.877922 | 1175.857300 | 334.618286 | 68.620003 | 241.207458 | 143.757812 | 720.557434 | 221.769363 |
| 2024-07-18 | 244.887009 | 180.229996 | 66.649628 | 353.316528 | 34.781704 | 189.014389 | 377.854370 | 115.478729 | 307.169037 | 303.624847 | ... | 106.294998 | 544.866577 | 102.740097 | 1180.072144 | 329.560699 | 66.260002 | 240.337143 | 141.202194 | 725.397095 | 221.550888 |
| 2024-07-19 | 244.461380 | 179.669998 | 65.844460 | 344.980530 | 34.453949 | 188.912292 | 375.293243 | 114.626556 | 309.971710 | 303.317871 | ... | 104.197701 | 541.862000 | 101.721008 | 1172.636963 | 329.331299 | 67.309998 | 239.654755 | 141.611862 | 715.628174 | 221.252960 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2025-01-13 | 290.200012 | 170.570007 | 67.110001 | 362.500000 | 31.889999 | 190.600006 | 429.910004 | 118.470001 | 340.880005 | 274.589996 | ... | 108.160004 | 637.989990 | 117.739998 | 1278.459961 | 376.980011 | 65.699997 | 226.759995 | 125.010002 | 689.150024 | 207.529999 |
| 2025-01-14 | 291.690002 | 167.020004 | 68.400002 | 371.570007 | 32.139999 | 192.279999 | 432.309998 | 119.790001 | 340.140015 | 277.619995 | ... | 110.199997 | 647.010010 | 119.470001 | 1300.729980 | 382.709991 | 64.839996 | 229.380005 | 125.839996 | 729.859985 | 207.169998 |
| 2025-01-15 | 293.369995 | 166.199997 | 68.800003 | 374.890015 | 32.459999 | 195.699997 | 428.880005 | 120.720001 | 342.579987 | 276.589996 | ... | 109.629997 | 656.320007 | 119.089996 | 1293.530029 | 385.549988 | 67.019997 | 230.860001 | 127.470001 | 741.489990 | 208.000000 |
| 2025-01-16 | 296.230011 | 168.929993 | 69.739998 | 380.549988 | 32.970001 | 198.050003 | 439.109985 | 123.290001 | 345.190002 | 277.369995 | ... | 109.480003 | 658.659973 | 120.459999 | 1322.020020 | 388.859985 | 68.580002 | 234.889999 | 129.940002 | 757.419983 | 211.009995 |
| 2025-01-17 | 296.179993 | 171.089996 | 69.660004 | 386.019989 | 32.730000 | 198.309998 | 455.440002 | 124.529999 | 346.279999 | 275.100006 | ... | 110.330002 | 669.460022 | 121.110001 | 1340.020020 | 389.190002 | 67.339996 | 234.259995 | 129.729996 | 771.000000 | 211.710007 |
130 rows × 30 columns
c. Compute the daily returns.
Answer
The daily returns are
xlf_returns=xlf.dropna().pct_change().dropna()
xlf_returns
| Ticker | ADP | BA | CARR | CAT | CSX | CTAS | DE | EMR | ETN | FDX | ... | PCAR | PH | RTX | TDG | TT | UBER | UNP | UPS | URI | WM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2024-07-16 | 0.013106 | 0.038747 | 0.037411 | 0.042832 | 0.021098 | 0.014828 | 0.015713 | 0.016243 | 0.016604 | 0.027867 | ... | 0.041511 | 0.043948 | 0.020271 | 0.031048 | 0.015601 | 0.027805 | 0.029959 | 0.022060 | 0.050325 | 0.022639 |
| 2024-07-17 | 0.013953 | -0.006504 | -0.032914 | -0.007710 | 0.003397 | -0.007816 | 0.020185 | -0.005216 | -0.063045 | -0.009856 | ... | 0.002939 | -0.018886 | 0.012635 | -0.030480 | -0.031416 | -0.076447 | 0.003415 | 0.000339 | -0.026137 | 0.011001 |
| 2024-07-18 | -0.007422 | -0.024940 | -0.007842 | -0.004947 | -0.012130 | 0.054447 | -0.011658 | -0.014461 | -0.007801 | -0.012273 | ... | 0.007051 | -0.018407 | -0.010953 | 0.003584 | -0.015114 | -0.034392 | -0.003608 | -0.017777 | 0.006717 | -0.000985 |
| 2024-07-19 | -0.001738 | -0.003107 | -0.012081 | -0.023594 | -0.009423 | -0.000540 | -0.006778 | -0.007379 | 0.009124 | -0.001011 | ... | -0.019731 | -0.005514 | -0.009919 | -0.006301 | -0.000696 | 0.015847 | -0.002839 | 0.002901 | -0.013467 | -0.001345 |
| 2024-07-22 | 0.004130 | -0.004286 | 0.026570 | 0.004766 | -0.005189 | 0.017784 | 0.003148 | 0.018586 | 0.027029 | 0.003656 | ... | 0.011595 | 0.013715 | 0.009338 | 0.014361 | 0.021082 | 0.006240 | 0.003920 | 0.000000 | 0.027427 | 0.002917 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2025-01-13 | 0.011679 | -0.008314 | 0.009021 | 0.032764 | 0.003461 | 0.006867 | 0.052231 | 0.004494 | -0.001669 | 0.014258 | ... | 0.014825 | 0.012618 | 0.022848 | 0.012954 | -0.000371 | -0.004093 | 0.006882 | 0.014033 | 0.027141 | 0.001351 |
| 2025-01-14 | 0.005134 | -0.020813 | 0.019222 | 0.025021 | 0.007839 | 0.008814 | 0.005583 | 0.011142 | -0.002171 | 0.011035 | ... | 0.018861 | 0.014138 | 0.014693 | 0.017419 | 0.015200 | -0.013090 | 0.011554 | 0.006639 | 0.059073 | -0.001735 |
| 2025-01-15 | 0.005760 | -0.004910 | 0.005848 | 0.008935 | 0.009956 | 0.017787 | -0.007934 | 0.007764 | 0.007173 | -0.003710 | ... | -0.005172 | 0.014389 | -0.003181 | -0.005535 | 0.007421 | 0.033621 | 0.006452 | 0.012953 | 0.015935 | 0.004006 |
| 2025-01-16 | 0.009749 | 0.016426 | 0.013663 | 0.015098 | 0.015712 | 0.012008 | 0.023853 | 0.021289 | 0.007619 | 0.002820 | ... | -0.001368 | 0.003565 | 0.011504 | 0.022025 | 0.008585 | 0.023277 | 0.017456 | 0.019377 | 0.021484 | 0.014471 |
| 2025-01-17 | -0.000169 | 0.012786 | -0.001147 | 0.014374 | -0.007279 | 0.001313 | 0.037189 | 0.010058 | 0.003158 | -0.008184 | ... | 0.007764 | 0.016397 | 0.005396 | 0.013616 | 0.000849 | -0.018081 | -0.002682 | -0.001616 | 0.017929 | 0.003317 |
129 rows × 30 columns
d. Compute the covariance matrix.
Answer
The covariance matrix is
xlf_cov_matrix=xlf_returns.cov()
xlf_cov_matrix
| Ticker | ADP | BA | CARR | CAT | CSX | CTAS | DE | EMR | ETN | FDX | ... | PCAR | PH | RTX | TDG | TT | UBER | UNP | UPS | URI | WM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ticker | |||||||||||||||||||||
| ADP | 0.000095 | 0.000023 | 0.000061 | 0.000074 | 0.000085 | 0.000066 | 0.000048 | 0.000079 | 0.000073 | 0.000050 | ... | 0.000078 | 0.000089 | 0.000061 | 0.000069 | 0.000040 | 0.000019 | 0.000070 | 0.000030 | 0.000098 | 0.000042 |
| BA | 0.000023 | 0.000459 | 0.000100 | 0.000094 | 0.000006 | -0.000025 | 0.000086 | 0.000104 | 0.000119 | 0.000056 | ... | -0.000009 | 0.000104 | 0.000065 | 0.000122 | 0.000053 | 0.000130 | 0.000028 | 0.000026 | 0.000106 | -0.000019 |
| CARR | 0.000061 | 0.000100 | 0.000312 | 0.000167 | 0.000109 | 0.000096 | 0.000098 | 0.000176 | 0.000216 | 0.000095 | ... | 0.000132 | 0.000184 | 0.000062 | 0.000154 | 0.000172 | 0.000160 | 0.000130 | 0.000046 | 0.000231 | 0.000062 |
| CAT | 0.000074 | 0.000094 | 0.000167 | 0.000325 | 0.000150 | 0.000077 | 0.000179 | 0.000226 | 0.000211 | 0.000124 | ... | 0.000185 | 0.000245 | 0.000078 | 0.000126 | 0.000098 | 0.000172 | 0.000125 | 0.000096 | 0.000303 | 0.000014 |
| CSX | 0.000085 | 0.000006 | 0.000109 | 0.000150 | 0.000253 | 0.000088 | 0.000093 | 0.000145 | 0.000103 | 0.000064 | ... | 0.000163 | 0.000167 | 0.000068 | 0.000092 | 0.000047 | 0.000112 | 0.000158 | 0.000081 | 0.000186 | 0.000045 |
| CTAS | 0.000066 | -0.000025 | 0.000096 | 0.000077 | 0.000088 | 0.000269 | 0.000020 | 0.000083 | 0.000109 | 0.000035 | ... | 0.000106 | 0.000100 | 0.000048 | 0.000082 | 0.000087 | 0.000050 | 0.000069 | 0.000049 | 0.000142 | 0.000070 |
| DE | 0.000048 | 0.000086 | 0.000098 | 0.000179 | 0.000093 | 0.000020 | 0.000287 | 0.000106 | 0.000079 | 0.000100 | ... | 0.000123 | 0.000113 | 0.000059 | 0.000020 | 0.000016 | 0.000072 | 0.000077 | 0.000102 | 0.000169 | -0.000004 |
| EMR | 0.000079 | 0.000104 | 0.000176 | 0.000226 | 0.000145 | 0.000083 | 0.000106 | 0.000327 | 0.000215 | 0.000121 | ... | 0.000162 | 0.000239 | 0.000062 | 0.000152 | 0.000126 | 0.000108 | 0.000112 | 0.000087 | 0.000259 | 0.000032 |
| ETN | 0.000073 | 0.000119 | 0.000216 | 0.000211 | 0.000103 | 0.000109 | 0.000079 | 0.000215 | 0.000381 | 0.000089 | ... | 0.000154 | 0.000243 | 0.000044 | 0.000186 | 0.000226 | 0.000220 | 0.000082 | 0.000052 | 0.000269 | 0.000066 |
| FDX | 0.000050 | 0.000056 | 0.000095 | 0.000124 | 0.000064 | 0.000035 | 0.000100 | 0.000121 | 0.000089 | 0.000344 | ... | 0.000104 | 0.000097 | 0.000044 | 0.000035 | 0.000053 | 0.000077 | 0.000093 | 0.000143 | 0.000135 | 0.000038 |
| GD | 0.000071 | 0.000066 | 0.000091 | 0.000099 | 0.000088 | 0.000083 | 0.000047 | 0.000091 | 0.000109 | 0.000030 | ... | 0.000076 | 0.000114 | 0.000104 | 0.000134 | 0.000062 | 0.000066 | 0.000076 | 0.000019 | 0.000138 | 0.000035 |
| GE | 0.000068 | 0.000106 | 0.000183 | 0.000149 | 0.000088 | 0.000078 | 0.000053 | 0.000149 | 0.000246 | 0.000051 | ... | 0.000084 | 0.000186 | 0.000098 | 0.000225 | 0.000193 | 0.000193 | 0.000075 | -0.000014 | 0.000228 | 0.000036 |
| GEV | 0.000076 | 0.000127 | 0.000183 | 0.000227 | 0.000103 | 0.000065 | 0.000068 | 0.000247 | 0.000412 | 0.000099 | ... | 0.000167 | 0.000299 | 0.000064 | 0.000241 | 0.000271 | 0.000216 | 0.000079 | 0.000073 | 0.000285 | 0.000063 |
| HON | 0.000053 | 0.000051 | 0.000104 | 0.000068 | 0.000074 | 0.000056 | 0.000025 | 0.000084 | 0.000095 | 0.000052 | ... | 0.000067 | 0.000075 | 0.000020 | 0.000090 | 0.000072 | 0.000055 | 0.000078 | 0.000026 | 0.000077 | 0.000081 |
| ITW | 0.000059 | 0.000044 | 0.000095 | 0.000121 | 0.000089 | 0.000046 | 0.000088 | 0.000107 | 0.000076 | 0.000081 | ... | 0.000090 | 0.000113 | 0.000054 | 0.000060 | 0.000037 | 0.000061 | 0.000075 | 0.000062 | 0.000135 | 0.000015 |
| JCI | 0.000086 | 0.000056 | 0.000179 | 0.000190 | 0.000125 | 0.000110 | 0.000083 | 0.000191 | 0.000237 | 0.000111 | ... | 0.000163 | 0.000211 | 0.000063 | 0.000158 | 0.000151 | 0.000127 | 0.000110 | 0.000085 | 0.000248 | 0.000063 |
| LMT | 0.000045 | 0.000023 | 0.000026 | 0.000010 | 0.000025 | 0.000030 | 0.000017 | 0.000008 | 0.000003 | -0.000017 | ... | 0.000007 | 0.000031 | 0.000090 | 0.000066 | 0.000035 | -0.000034 | 0.000024 | -0.000047 | 0.000028 | 0.000013 |
| MMM | 0.000070 | 0.000057 | 0.000151 | 0.000161 | 0.000125 | 0.000073 | 0.000082 | 0.000153 | 0.000119 | 0.000067 | ... | 0.000120 | 0.000155 | 0.000059 | 0.000086 | 0.000101 | -0.000022 | 0.000120 | 0.000067 | 0.000155 | 0.000010 |
| NOC | 0.000048 | 0.000017 | 0.000009 | 0.000042 | 0.000054 | 0.000031 | 0.000046 | 0.000034 | 0.000007 | 0.000012 | ... | 0.000028 | 0.000051 | 0.000112 | 0.000057 | 0.000013 | -0.000043 | 0.000031 | 0.000009 | 0.000068 | -0.000019 |
| NSC | 0.000096 | 0.000042 | 0.000130 | 0.000174 | 0.000223 | 0.000112 | 0.000094 | 0.000159 | 0.000124 | 0.000100 | ... | 0.000146 | 0.000161 | 0.000076 | 0.000086 | 0.000061 | 0.000089 | 0.000176 | 0.000099 | 0.000201 | 0.000038 |
| PCAR | 0.000078 | -0.000009 | 0.000132 | 0.000185 | 0.000163 | 0.000106 | 0.000123 | 0.000162 | 0.000154 | 0.000104 | ... | 0.000350 | 0.000170 | 0.000052 | 0.000090 | 0.000082 | 0.000086 | 0.000121 | 0.000168 | 0.000191 | 0.000074 |
| PH | 0.000089 | 0.000104 | 0.000184 | 0.000245 | 0.000167 | 0.000100 | 0.000113 | 0.000239 | 0.000243 | 0.000097 | ... | 0.000170 | 0.000334 | 0.000093 | 0.000159 | 0.000133 | 0.000162 | 0.000126 | 0.000070 | 0.000305 | 0.000029 |
| RTX | 0.000061 | 0.000065 | 0.000062 | 0.000078 | 0.000068 | 0.000048 | 0.000059 | 0.000062 | 0.000044 | 0.000044 | ... | 0.000052 | 0.000093 | 0.000168 | 0.000088 | 0.000038 | -0.000014 | 0.000056 | 0.000010 | 0.000131 | -0.000015 |
| TDG | 0.000069 | 0.000122 | 0.000154 | 0.000126 | 0.000092 | 0.000082 | 0.000020 | 0.000152 | 0.000186 | 0.000035 | ... | 0.000090 | 0.000159 | 0.000088 | 0.000299 | 0.000151 | 0.000160 | 0.000089 | 0.000029 | 0.000186 | 0.000050 |
| TT | 0.000040 | 0.000053 | 0.000172 | 0.000098 | 0.000047 | 0.000087 | 0.000016 | 0.000126 | 0.000226 | 0.000053 | ... | 0.000082 | 0.000133 | 0.000038 | 0.000151 | 0.000221 | 0.000132 | 0.000050 | 0.000018 | 0.000142 | 0.000069 |
| UBER | 0.000019 | 0.000130 | 0.000160 | 0.000172 | 0.000112 | 0.000050 | 0.000072 | 0.000108 | 0.000220 | 0.000077 | ... | 0.000086 | 0.000162 | -0.000014 | 0.000160 | 0.000132 | 0.000739 | 0.000087 | 0.000038 | 0.000187 | 0.000033 |
| UNP | 0.000070 | 0.000028 | 0.000130 | 0.000125 | 0.000158 | 0.000069 | 0.000077 | 0.000112 | 0.000082 | 0.000093 | ... | 0.000121 | 0.000126 | 0.000056 | 0.000089 | 0.000050 | 0.000087 | 0.000172 | 0.000078 | 0.000142 | 0.000051 |
| UPS | 0.000030 | 0.000026 | 0.000046 | 0.000096 | 0.000081 | 0.000049 | 0.000102 | 0.000087 | 0.000052 | 0.000143 | ... | 0.000168 | 0.000070 | 0.000010 | 0.000029 | 0.000018 | 0.000038 | 0.000078 | 0.000286 | 0.000062 | 0.000038 |
| URI | 0.000098 | 0.000106 | 0.000231 | 0.000303 | 0.000186 | 0.000142 | 0.000169 | 0.000259 | 0.000269 | 0.000135 | ... | 0.000191 | 0.000305 | 0.000131 | 0.000186 | 0.000142 | 0.000187 | 0.000142 | 0.000062 | 0.000494 | 0.000018 |
| WM | 0.000042 | -0.000019 | 0.000062 | 0.000014 | 0.000045 | 0.000070 | -0.000004 | 0.000032 | 0.000066 | 0.000038 | ... | 0.000074 | 0.000029 | -0.000015 | 0.000050 | 0.000069 | 0.000033 | 0.000051 | 0.000038 | 0.000018 | 0.000162 |
30 rows × 30 columns
e. Compute the PCA.
Answer
The principal component analysis(PCA) is
xlf_returns_mean=xlf_returns.mean()
xlf_returns_std=xlf_returns.std()
xlf_standardized_data=(xlf_returns-xlf_returns_mean)/xlf_returns_std
xlf_standardized_cov=xlf_standardized_data.cov()
eigenvalues, eigenvectors = la.eig(xlf_standardized_cov)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
xlf_principal_components= xlf_standardized_data.dot(eigenvectors)
xlf_principal_components.columns=["PC_1","PC_2","PC_3","PC_4","PC_5","PC_6","PC_7","PC_8","PC_9","PC_10","PC_11","PC_12","PC_13","PC_14","PC_15","PC_16","PC_17","PC_18","PC_19","PC_20","PC_21","PC_22","PC_23","PC_24","PC_25","PC_26","PC_27","PC_28","PC_29","PC_30"]
xlf_principal_components
| PC_1 | PC_2 | PC_3 | PC_4 | PC_5 | PC_6 | PC_7 | PC_8 | PC_9 | PC_10 | ... | PC_21 | PC_22 | PC_23 | PC_24 | PC_25 | PC_26 | PC_27 | PC_28 | PC_29 | PC_30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | |||||||||||||||||||||
| 2024-07-16 | 7.924030 | 0.137422 | 0.886763 | -0.059148 | 1.805747 | 0.023687 | 0.014664 | 0.273473 | -0.186904 | -0.061284 | ... | 0.500468 | -0.562224 | 0.032079 | -0.623973 | 0.031814 | -0.131576 | -0.058823 | -0.023292 | -0.245373 | -0.536470 |
| 2024-07-17 | -3.543482 | 4.703898 | 3.583108 | -1.650724 | 1.032094 | 0.083364 | 1.125461 | -1.607877 | 0.442119 | 1.205232 | ... | 0.174183 | 0.885378 | -0.356334 | -0.211199 | 0.133578 | -0.097412 | 0.437272 | -0.542359 | 0.474256 | -0.061649 |
| 2024-07-18 | -2.319138 | 0.407285 | -0.737715 | -1.692023 | -0.957578 | -0.698202 | -2.062343 | -0.898443 | -1.145080 | -1.306339 | ... | -0.475764 | -0.436102 | 0.476509 | 0.148484 | 0.000695 | -0.065869 | -0.973560 | 0.277110 | 0.003407 | 0.084452 |
| 2024-07-19 | -2.762167 | -1.247847 | -0.884262 | -0.052278 | -0.624147 | -0.217254 | 0.109672 | 1.256096 | 0.010304 | -0.435093 | ... | -0.365277 | 1.091306 | 0.653740 | 0.187338 | 0.090358 | 0.539398 | -0.171228 | -0.061050 | -0.093868 | 0.545500 |
| 2024-07-22 | 3.317883 | -0.784245 | -1.181744 | -0.277742 | 0.015225 | -0.492282 | -0.304679 | -0.970278 | -0.283205 | -0.771144 | ... | 0.144122 | -0.096397 | -0.115491 | 0.320641 | -0.279635 | 0.171495 | -0.064578 | -0.400981 | -0.228847 | 0.194881 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2025-01-13 | 3.924230 | 3.535143 | 1.238639 | 0.974713 | 0.713740 | -1.254833 | 0.087407 | -0.531806 | 1.612548 | -0.358643 | ... | -0.471550 | -0.189333 | -0.098235 | -0.333291 | 0.211624 | 0.395580 | 0.303054 | 0.054771 | -0.147809 | 0.153371 |
| 2025-01-14 | 3.769834 | 0.396798 | -0.001358 | 0.626505 | -1.095203 | -0.868700 | -0.728517 | -0.375547 | 0.356758 | -0.806851 | ... | 0.534097 | -0.694173 | -0.401121 | -0.351800 | -0.730760 | 0.445435 | 0.047243 | -0.352723 | 0.636313 | -0.337180 |
| 2025-01-15 | 1.327423 | -0.907914 | 0.023650 | -0.546966 | 0.050550 | 0.662722 | -0.957914 | 0.446743 | -0.229821 | -0.579190 | ... | 0.876482 | 0.589169 | -0.393185 | -0.431120 | -0.698458 | 0.675571 | 0.231880 | -0.102541 | 0.530894 | 0.146579 |
| 2025-01-16 | 3.884179 | 0.916024 | 0.753182 | -0.524593 | 1.023519 | 0.528138 | 0.448983 | 0.508906 | -0.124216 | 0.053304 | ... | -0.302446 | 0.109677 | 0.272910 | 0.005086 | -0.397692 | 0.076441 | -0.242917 | -0.133467 | -0.007336 | 0.064547 |
| 2025-01-17 | 1.386382 | 0.169151 | -0.446350 | 1.317249 | -0.021273 | -0.813384 | 0.248580 | -0.869154 | 0.785681 | 0.866992 | ... | 0.084466 | -0.051597 | -0.383539 | 0.269699 | 0.253224 | -0.045663 | 0.066251 | -0.244582 | -0.413982 | -0.034901 |
129 rows × 30 columns
f. Compute the SVD.
Answer
The singular value decomposition (SVD) is
standardized_returns_dvd_sqrt_n=(xlf_standardized_data/math.sqrt(len(xlf_standardized_data)-1))
U_st_return, s_st_return, VT_st_return = np.linalg.svd(standardized_returns_dvd_sqrt_n)
Matrix of left singular vectors
U_st_return
array([[-0.20930198, 0.0074209 , -0.04893938, ..., -0.01761956,
-0.09612688, -0.01893472],
[ 0.09359602, 0.25401361, -0.19774751, ..., -0.04017135,
0.05278976, 0.07862361],
[ 0.06125673, 0.02199364, 0.04071362, ..., 0.05197161,
0.06485387, -0.00574327],
...,
[-0.035062 , -0.04902796, -0.00130521, ..., 0.83354267,
-0.01081503, 0.01220279],
[-0.10259506, 0.04946591, -0.04156722, ..., 0.00562025,
0.86803635, -0.00854122],
[-0.03661932, 0.00913428, 0.02463354, ..., 0.05158599,
-0.01561285, 0.8748643 ]])
Singular values
s_st_return
array([3.34632245, 1.63680109, 1.60156236, 1.36331236, 1.12608502,
1.06081308, 1.02599675, 0.90912456, 0.86740454, 0.85508141,
0.82863216, 0.76399463, 0.68735184, 0.67355388, 0.63943669,
0.61713275, 0.5982534 , 0.58915346, 0.54750511, 0.52876127,
0.52132406, 0.47621634, 0.47233618, 0.44350506, 0.42235973,
0.38939778, 0.38502787, 0.3386785 , 0.31066052, 0.25473011])
Eigenvalues With The SVD Method
s_st_return**2
array([11.19787397, 2.67911782, 2.565002 , 1.85862058, 1.26806748,
1.1253244 , 1.05266932, 0.82650747, 0.75239063, 0.73116421,
0.68663126, 0.58368779, 0.47245256, 0.45367483, 0.40887927,
0.38085283, 0.35790713, 0.3471018 , 0.29976185, 0.27958848,
0.27177878, 0.226782 , 0.22310147, 0.19669673, 0.17838774,
0.15163063, 0.14824646, 0.11470312, 0.09650996, 0.06488743])
Matrix of the right singular vectors
VT_st_return.T
array([[-2.04561657e-01, 1.98823851e-01, -5.55835898e-02,
1.95703312e-01, -4.11442942e-02, 3.09622025e-02,
-1.13914459e-02, 7.67302825e-02, -6.84252241e-02,
-1.49265844e-01, -1.89567279e-01, 4.04895427e-01,
-2.48803611e-01, -1.27740381e-01, 5.60824078e-02,
1.61507979e-02, 3.11498403e-02, 5.65149820e-01,
-3.30417974e-02, -1.78431265e-01, -1.79512862e-01,
-2.65947621e-01, -5.14301642e-02, 1.66646486e-01,
3.58342229e-02, 1.64777695e-01, 2.12864359e-02,
-4.39880829e-02, -2.00605095e-01, 4.86899024e-02],
[-8.75596753e-02, -1.62552367e-02, 1.33533526e-01,
-3.02663392e-01, -3.66292488e-01, -2.22945878e-01,
-4.22875960e-01, -5.18377077e-02, -3.43113454e-01,
-1.59571956e-01, 3.27669925e-01, 2.00612922e-01,
1.88456955e-01, -1.53049320e-01, 2.48060692e-01,
1.75296532e-02, 1.50254105e-01, 5.16648371e-02,
-7.62257950e-03, 8.66222897e-03, 9.47027828e-02,
1.27065675e-01, -1.26985433e-01, 2.71658549e-02,
-2.13191620e-02, -1.28303416e-01, -8.58528564e-02,
3.54596755e-02, -9.59063920e-02, 5.96915933e-02],
[-2.19220724e-01, -1.31178853e-01, 7.43344686e-02,
2.67186836e-02, -7.77282159e-02, -1.35446673e-01,
-1.00326483e-01, 2.47455411e-01, 1.76274533e-01,
3.06843367e-01, 1.76589638e-01, -3.51063871e-01,
-9.57551770e-02, -2.63589136e-01, 6.32013190e-02,
2.91386908e-01, 8.88583531e-04, 5.95289127e-02,
-1.43869588e-01, -2.21495896e-03, -9.82678472e-02,
-6.17294744e-02, -6.88371457e-02, -1.16340932e-01,
1.28363565e-01, 2.22119515e-01, -1.47011841e-01,
-4.76848980e-01, -3.48505784e-02, 1.16569752e-01],
[-2.36482430e-01, -5.79039101e-02, -1.17079216e-01,
-2.37636866e-01, 3.07116109e-03, -1.44043290e-02,
1.29793623e-01, 1.53783590e-01, 9.65123199e-02,
-9.21236370e-02, 1.76502278e-02, 2.88928546e-02,
1.67104147e-02, 1.91141091e-01, -6.75668859e-02,
-2.03189352e-01, 1.09802083e-01, -2.56474450e-01,
-1.47451140e-02, 2.46539573e-02, -9.82361533e-02,
-1.04639455e-01, 1.00802558e-01, 6.10823840e-01,
1.06869685e-01, -2.69474630e-01, -7.91429705e-02,
-3.28869050e-01, -2.05899971e-01, 6.95707534e-02],
[-2.06471728e-01, 8.98749208e-02, -2.25960499e-01,
1.19540410e-01, 8.61513013e-02, -2.75552939e-01,
1.94902731e-01, -1.98725092e-01, 8.21253027e-02,
-1.65166193e-01, -1.08920352e-01, -5.91850276e-02,
2.34985525e-01, -1.13265797e-01, 1.15016596e-01,
4.94218844e-02, -1.94032047e-01, 7.41560734e-02,
1.90880433e-01, 2.57685369e-03, 2.31067119e-01,
2.03624124e-01, -8.27938976e-02, -1.82310381e-01,
-1.97902441e-01, -1.77810477e-01, 1.95232447e-01,
-3.26350305e-01, -3.64378868e-01, -1.65207948e-01],
[-1.42384977e-01, 2.91392582e-02, -2.03576516e-03,
3.14924169e-01, 9.27019852e-02, 1.19527056e-01,
3.12481738e-01, 1.44507587e-01, -3.98359288e-01,
3.92471001e-01, 3.98005685e-01, 2.15862314e-01,
1.77596493e-01, 1.63191377e-01, 1.51629732e-01,
-1.18180403e-01, -9.74547679e-02, 9.92343287e-02,
-6.06864801e-02, 1.29964814e-01, -2.54336333e-02,
1.04086912e-01, 2.13233570e-01, -5.64591867e-02,
-9.62924340e-03, -8.30309351e-02, -1.19439927e-01,
-5.61997351e-02, -4.73067280e-03, -3.06645544e-02],
[-1.39310180e-01, 5.78051879e-02, -2.37619548e-01,
-2.93482523e-01, -1.98464529e-01, 1.10003009e-01,
-7.87432864e-04, 1.73208382e-01, 4.38547652e-01,
-3.55654308e-02, 3.69531202e-01, 2.27597619e-01,
8.98075476e-02, -4.71468788e-02, -2.21171049e-01,
-1.42186259e-01, -3.48309398e-01, 7.98202468e-02,
5.28853171e-02, -6.32855403e-02, -1.79673474e-01,
-2.21019570e-02, 1.00818655e-01, -2.23210552e-01,
-1.95059142e-01, 1.27706549e-02, 7.63411448e-02,
9.05747243e-02, 1.05013195e-01, 3.57654688e-02],
[-2.28678290e-01, -9.55947369e-02, -4.13407143e-02,
-1.21923614e-01, 4.74387979e-02, 9.94365524e-03,
-2.40600513e-02, 2.15640296e-01, -1.74596705e-01,
-1.94622359e-01, -1.65686339e-01, -1.64526645e-01,
-1.82725760e-02, 2.07692421e-01, 4.59511234e-01,
-4.36287710e-02, -2.62839700e-01, -1.24797945e-01,
3.03612490e-01, -1.23968657e-01, -2.42453296e-01,
-2.16491774e-01, 2.74404231e-02, -2.88219371e-01,
3.04894510e-01, -4.75793005e-02, -1.56389386e-02,
1.06891019e-01, 5.80275717e-02, -3.48138577e-02],
[-2.25824364e-01, -2.43276323e-01, 2.00042897e-01,
-5.02288750e-02, 5.84127444e-02, 9.49781275e-02,
4.93019371e-02, 4.78210633e-02, -4.02815731e-02,
-6.64961218e-02, 9.50484366e-03, 1.45041378e-01,
5.16899645e-02, -4.74051485e-02, -1.66981378e-01,
2.23871948e-01, -2.92316290e-02, 5.18354101e-02,
2.78461881e-01, 1.82147244e-01, -6.69450572e-02,
1.71143528e-02, -2.62634108e-01, 1.86841331e-01,
-8.89276702e-02, 8.58434536e-02, -9.22271546e-02,
-5.66547199e-02, 2.86782828e-01, -6.22799394e-01],
[-1.26969087e-01, -8.45754237e-02, -2.21994626e-01,
-7.33375274e-02, -2.49452427e-01, 2.90954779e-01,
-2.81217879e-01, -6.79256050e-02, -3.78913285e-02,
4.85740089e-01, -4.92283983e-01, 9.44215378e-02,
1.47063602e-01, 2.70014595e-02, 6.76940539e-02,
-1.77835000e-01, 1.34616097e-01, -5.48952633e-02,
1.31945234e-01, 6.37828447e-02, -1.20019250e-01,
1.14673736e-01, -1.30201705e-02, -6.94517217e-02,
-2.23183886e-01, 6.64095010e-02, 3.66102472e-04,
-1.16514820e-01, -3.63321846e-02, 2.62837496e-03],
[-1.99076713e-01, 2.46626307e-01, 1.34481924e-01,
7.84470318e-02, -1.81681359e-01, -3.77343810e-02,
1.33954024e-01, -6.32259648e-02, -2.13025773e-01,
-5.26149716e-02, 5.94463994e-02, -1.92305546e-03,
-2.83401409e-01, -1.07375499e-01, -3.15138549e-01,
-7.26859275e-02, 6.64558256e-02, -3.46076373e-01,
2.42921779e-01, 4.23609675e-01, -1.90962208e-01,
-9.39287059e-02, -1.38687182e-01, -1.99078367e-01,
-2.09614633e-02, 1.12623921e-01, 1.00281967e-01,
4.55634483e-02, -2.20208388e-01, 1.73952476e-01],
[-1.88988595e-01, -9.78581444e-03, 3.39915610e-01,
-6.39030441e-02, 3.83528771e-02, 1.52587557e-02,
1.35043256e-02, -1.48484245e-01, 2.09773492e-01,
1.42582693e-01, -1.78567540e-02, -6.80169563e-02,
-8.46550924e-02, -9.84236016e-02, 3.84921845e-02,
-4.76108269e-01, -6.01006316e-02, 2.90070340e-01,
2.91297449e-01, 1.39686221e-01, 4.41018991e-01,
-8.65904310e-02, -1.42087023e-02, 1.13984367e-02,
1.83432368e-01, -8.50664558e-02, -1.11475338e-01,
-1.22588454e-02, 1.69850399e-01, 1.71978376e-01],
[-1.71947105e-01, -1.90453198e-01, 2.11900954e-01,
-1.18373171e-01, 2.82492470e-01, 2.15197454e-01,
-1.06383605e-01, -2.35931812e-01, 3.26388468e-02,
-2.25839050e-01, -1.43098139e-01, 2.66711999e-01,
9.32397601e-02, -1.43139500e-01, -1.02925290e-01,
9.58393096e-02, -1.01494275e-01, -5.84955567e-02,
-1.97560690e-01, 1.49531119e-01, -1.06840680e-01,
2.80263242e-01, 3.91081117e-01, -1.73889019e-01,
2.59542948e-01, 1.06566458e-01, -1.37641990e-01,
-1.04624585e-01, -1.44886175e-01, 1.07947210e-01],
[-1.53083550e-01, -5.91480454e-02, 2.67892625e-02,
3.41798851e-01, -2.82166795e-01, -1.82789851e-01,
-2.60406976e-01, 2.38619386e-01, 4.81488741e-02,
-1.92080116e-01, -7.60675279e-02, -1.14310072e-01,
2.04852437e-01, 4.69814662e-01, -3.36766748e-01,
5.91316366e-02, 7.41217806e-02, 1.43896736e-01,
9.21074234e-02, 7.56327784e-02, 1.79069605e-01,
6.61167749e-02, 2.75134538e-01, -3.66322005e-02,
8.15919260e-02, 1.33963865e-01, -1.29388103e-02,
3.35060918e-02, -1.10076451e-02, 1.33872909e-02],
[-2.14899992e-01, 9.59846529e-02, -1.89326552e-01,
-1.03093220e-01, -1.22598498e-01, -2.31204750e-02,
1.49973294e-02, 1.58171315e-01, 5.70577855e-04,
7.75680110e-02, -8.06960615e-02, 5.93772075e-02,
-6.29481275e-01, 1.96003027e-02, 1.08869079e-01,
2.18228329e-01, -5.72110147e-02, -4.45417563e-02,
-5.61937150e-02, 4.58434630e-02, 2.88618918e-01,
4.34485704e-01, 1.81457499e-01, 4.16830260e-02,
2.68495629e-02, -1.59884689e-01, 6.24146292e-04,
1.41927066e-01, 1.08167755e-01, -9.88696472e-02],
[-2.52539697e-01, -1.20180303e-01, 4.04953794e-02,
3.75220846e-02, 1.11412102e-01, 1.27979092e-01,
-3.04375322e-02, 8.15217111e-02, -4.08375030e-02,
-3.84248627e-02, -8.95149263e-02, 1.63217436e-02,
2.79134445e-03, 1.58180058e-01, -2.31349066e-01,
4.53742373e-02, 1.17370360e-01, 1.17099833e-01,
-3.51508959e-01, -1.32644395e-02, -2.67992486e-02,
-2.70442033e-02, -4.33076122e-01, -2.92528925e-01,
4.09622709e-03, -5.58483663e-01, 8.52581741e-02,
-1.37985865e-02, 1.29141766e-01, 1.68752958e-01],
[-7.41435166e-02, 4.19799116e-01, 2.47881319e-01,
1.00055992e-01, -4.11912379e-02, 9.72157218e-02,
-9.67990394e-02, -2.19995575e-02, 3.60837736e-01,
-2.67984532e-03, 4.75541414e-02, 4.84810656e-02,
1.23377815e-01, 2.06365299e-01, 3.83815279e-01,
4.95039625e-03, 7.22957868e-02, -5.51080892e-02,
-1.98934424e-01, 2.64510159e-01, -1.50739016e-01,
1.53755809e-01, -1.28158206e-01, 9.15382727e-02,
1.52251119e-01, 7.18597013e-02, 3.78841262e-01,
-3.96300997e-02, 9.57860900e-02, -1.19422561e-01],
[-1.44014401e-01, 4.53247089e-02, -8.54670595e-02,
-2.25795894e-02, 5.25781307e-01, -1.47671044e-01,
-4.52492559e-01, -2.77384163e-02, 6.48226633e-02,
1.38899559e-01, 2.28500988e-01, 4.45788341e-02,
-2.15804413e-01, 1.60998289e-01, -3.00874994e-02,
-1.88083103e-01, 2.24669194e-01, -9.39300541e-02,
-8.11214302e-03, -5.24787001e-02, 5.34114916e-02,
-1.79716108e-01, 5.81899584e-02, -1.54881731e-01,
-1.30280403e-01, 7.02563208e-02, 1.43764132e-02,
4.71480711e-02, -2.12021747e-01, -2.91917932e-01],
[-1.04377302e-01, 4.88605670e-01, 7.35385020e-02,
-6.95850729e-02, 5.17880173e-02, 1.80559677e-01,
-1.73245015e-02, -1.38533077e-01, 4.90661297e-02,
9.92291113e-04, -1.44277837e-02, -1.23916460e-02,
8.52186534e-02, 2.73273008e-01, -2.65472092e-02,
2.37178871e-01, -1.47784639e-01, -1.11324486e-01,
4.99609660e-02, -2.03637261e-01, 1.25340358e-01,
9.78733850e-03, -2.10568423e-01, 2.49502847e-03,
-1.22056071e-01, 8.04173808e-02, -6.08779037e-01,
-1.57635096e-02, -5.44018111e-02, 1.00706845e-01],
[-1.97547406e-01, 7.66694401e-02, -2.48797277e-01,
8.50090798e-02, 2.78820605e-01, -3.20073310e-01,
-1.24487941e-01, -1.71622083e-01, -1.08740575e-01,
8.95255739e-02, 1.96592299e-02, 1.57653274e-01,
1.06050841e-01, -2.86470060e-02, -9.66582175e-02,
9.87794614e-02, -4.89274276e-02, -1.06394686e-01,
1.95342805e-01, -1.14505050e-01, -9.24022821e-02,
1.40164094e-01, -8.63018745e-02, 1.89813601e-01,
1.11102861e-01, 1.36927034e-01, 1.74305806e-01,
-3.93511261e-02, 4.92054802e-01, 3.77883342e-01],
[-1.87807573e-01, -6.30089153e-02, -2.19908875e-01,
9.42306866e-02, 3.00184652e-02, 2.84970455e-01,
1.72337690e-01, -1.12378295e-01, 1.34430974e-01,
-2.61526732e-01, 2.23109605e-01, -2.43286778e-01,
1.25931183e-02, -7.42338173e-02, 1.53056298e-01,
-9.93023917e-03, 6.06212538e-01, 1.33570539e-01,
2.14194559e-01, -6.06081430e-02, -1.38494873e-01,
2.07939176e-01, 5.25760945e-02, -5.66008515e-02,
-7.68021967e-02, 4.47309493e-02, -1.29499547e-01,
1.09623798e-01, 7.28184989e-02, 9.58528698e-02],
[-2.48162724e-01, -4.81933495e-02, 1.54687355e-02,
-1.46342267e-01, 8.11939337e-02, -1.99169501e-02,
1.43751950e-01, 1.15548232e-01, -7.66315458e-02,
-1.72406066e-01, -1.31704424e-01, 9.88241609e-03,
9.40454962e-02, -4.50238531e-03, 2.13941758e-01,
8.00079853e-02, 5.80141981e-02, -8.89810639e-02,
-2.75597013e-01, 2.43131644e-01, 2.92855654e-01,
-3.42618628e-01, 1.56607537e-01, -8.46591721e-03,
-4.98031883e-01, 2.42695780e-01, 6.20428796e-02,
3.00863997e-02, 2.10248927e-01, 1.71345750e-01],
[-1.51700649e-01, 4.09977378e-01, 9.26018987e-02,
-1.20875603e-01, -6.94984520e-02, 1.09243036e-01,
1.26381205e-02, -3.21615901e-02, -1.33923856e-01,
1.18809677e-01, -5.17951398e-02, -1.13881591e-01,
1.91590783e-01, -2.69077039e-01, -1.96730677e-01,
1.73915680e-01, 1.97310212e-01, -4.28632156e-02,
3.74327449e-02, -2.57801318e-01, 7.66605349e-02,
-2.56724090e-01, 4.01848841e-01, -6.65235806e-02,
1.53877023e-01, -2.41432706e-01, 1.97714295e-01,
-2.97583774e-02, 1.53415416e-01, -2.24042663e-01],
[-1.99883241e-01, 7.36309967e-03, 2.81106297e-01,
4.88706935e-02, -1.15106811e-01, -6.23985703e-02,
-1.50694099e-02, -2.37617785e-01, -1.90420829e-01,
-1.09737823e-01, 4.80395204e-03, -3.15340722e-01,
-1.79397499e-01, 7.16075222e-02, -5.97017199e-02,
-3.44650969e-01, -1.94036906e-01, 2.91393848e-02,
-2.24606200e-01, -3.39942144e-01, -2.70092565e-01,
2.13763496e-01, 9.88731207e-02, 7.85880195e-02,
-3.20811708e-01, 7.89066180e-02, 5.88012912e-02,
-1.13883750e-01, 1.45039069e-01, -1.07755461e-01],
[-1.89726874e-01, -2.00645896e-01, 3.06362260e-01,
1.03664238e-01, 9.08834969e-02, 1.50654024e-01,
-9.34164118e-02, 2.21001982e-03, 1.06786888e-01,
1.95216332e-01, 7.53816913e-02, -9.96635871e-02,
4.66076155e-02, -2.86990250e-02, 4.78487007e-02,
3.24094821e-01, -1.56560084e-01, -1.61245557e-02,
2.10641626e-01, -1.45865227e-01, -3.18253309e-02,
-2.28120909e-03, 4.75283305e-02, 2.67317414e-01,
-2.25837964e-01, -1.38059827e-01, 2.48087043e-01,
3.85639028e-01, -3.20463040e-01, 2.36271481e-01],
[-1.10655848e-01, -2.35324818e-01, 8.70816049e-02,
-1.07464259e-01, -2.58547361e-01, -3.11510401e-01,
3.14344926e-01, -4.66469923e-01, 1.89002448e-01,
2.15889006e-01, 2.11074606e-02, 2.37100185e-01,
-8.66949152e-02, 3.20811798e-01, 6.43791899e-02,
1.73642049e-01, 1.90601712e-01, -3.62391114e-02,
-5.31077177e-02, -1.55352629e-01, -5.76460876e-02,
-1.91408259e-01, 6.25639883e-02, -1.56064515e-01,
9.42695254e-02, 3.73819277e-02, -9.25441486e-03,
5.45048695e-02, -1.61140961e-02, -5.29235942e-02],
[-2.13704817e-01, 5.47278840e-02, -2.17551050e-01,
1.63965178e-01, -3.97549664e-02, -2.74472181e-01,
-5.10365649e-02, -1.49643745e-01, 1.17554785e-01,
7.64481701e-02, -1.09275170e-01, -1.94592735e-01,
1.55697848e-01, -2.53974164e-01, 1.94547387e-02,
-7.05850535e-02, -1.20606388e-01, 3.68064253e-02,
-2.60894214e-01, 2.83672951e-01, -2.07850254e-01,
-9.13229681e-02, 3.42378824e-02, 1.44770270e-01,
7.76689692e-02, -1.54100635e-01, -3.39602982e-01,
4.53464400e-01, -4.87552182e-03, -1.50864039e-01],
[-1.04445510e-01, -1.03930485e-01, -3.51353987e-01,
-4.65297430e-03, -1.26362081e-01, 3.97891284e-01,
-1.32714686e-01, -4.04996467e-01, -1.76930155e-01,
-4.17530796e-02, 2.32489559e-01, -2.11332509e-01,
-6.18631772e-02, 1.19987288e-01, -4.77552393e-02,
8.02145893e-02, -2.33852444e-01, 6.69549974e-02,
-1.40461565e-01, 1.06560115e-01, 2.27571947e-01,
-2.08869624e-01, -9.97974603e-02, 1.32310066e-01,
2.32574955e-01, 1.54477416e-01, 2.05018180e-01,
-4.10627189e-02, -4.96404985e-02, -5.40940323e-02],
[-2.38927801e-01, -1.18252394e-02, -1.69451130e-03,
-1.85613351e-01, 3.34573957e-02, -2.30558467e-02,
2.44002789e-01, 2.34080103e-01, -6.64081409e-02,
9.39645831e-02, -6.60376242e-02, -1.73742482e-02,
2.05033545e-01, -7.37659864e-02, -1.26079792e-01,
-1.95043407e-01, 9.41344431e-02, -8.96160809e-02,
-1.93669312e-01, -2.77182941e-01, 2.03192740e-01,
2.13718784e-01, -2.73290112e-01, 2.93250785e-03,
2.54605909e-01, 4.05348423e-01, 1.20116917e-01,
2.95004917e-01, -2.07038143e-01, -8.16965520e-02],
[-9.26843088e-02, -1.43711414e-01, 1.02962326e-02,
5.32817020e-01, -1.88515754e-01, 1.36900630e-01,
-1.00193790e-01, 1.64064697e-02, 1.67420857e-01,
-1.32579642e-01, 4.96911897e-02, 2.64892788e-01,
-3.58782210e-02, -2.51912216e-01, 1.14511692e-01,
-1.30576792e-01, -2.61789839e-02, -4.92368779e-01,
-5.76320838e-02, -2.80122198e-01, 2.06965217e-01,
-1.19696792e-01, -5.69479674e-02, -2.40639658e-02,
2.77121618e-02, -4.37756691e-02, -8.46245379e-02,
-4.64828950e-02, 8.81483650e-02, -5.78901592e-02]])
The percentage of variance contribution each principal component catches.
df_eigval = pd.DataFrame({"Eigenvalues":eigenvalues}, index=range(1,31))
df_eigval["Explained proportion"] = df_eigval["Eigenvalues"] / np.sum(df_eigval["Eigenvalues"])
df_eigval["Explained proportion"]=df_eigval["Explained proportion"].map('{:.2%}'.format)
df_eigval
| Eigenvalues | Explained proportion | |
|---|---|---|
| 1 | 11.197874 | 37.33% |
| 2 | 2.679118 | 8.93% |
| 3 | 2.565002 | 8.55% |
| 4 | 1.858621 | 6.20% |
| 5 | 1.268067 | 4.23% |
| 6 | 1.125324 | 3.75% |
| 7 | 1.052669 | 3.51% |
| 8 | 0.826507 | 2.76% |
| 9 | 0.752391 | 2.51% |
| 10 | 0.731164 | 2.44% |
| 11 | 0.686631 | 2.29% |
| 12 | 0.583688 | 1.95% |
| 13 | 0.472453 | 1.57% |
| 14 | 0.453675 | 1.51% |
| 15 | 0.408879 | 1.36% |
| 16 | 0.380853 | 1.27% |
| 17 | 0.357907 | 1.19% |
| 18 | 0.347102 | 1.16% |
| 19 | 0.299762 | 1.00% |
| 20 | 0.279588 | 0.93% |
| 21 | 0.271779 | 0.91% |
| 22 | 0.226782 | 0.76% |
| 23 | 0.223101 | 0.74% |
| 24 | 0.196697 | 0.66% |
| 25 | 0.178388 | 0.59% |
| 26 | 0.151631 | 0.51% |
| 27 | 0.148246 | 0.49% |
| 28 | 0.114703 | 0.38% |
| 29 | 0.096510 | 0.32% |
| 30 | 0.064887 | 0.22% |
Now that you have calculated, presented and plotted tasks from c to f, you must explain each transformation thoroughly. Write a paragraph of 500 words at minimum that explains why returns are important, compare and contrast PCA and SVD, explain what the eigenvectors, eigenvalues, singular values etc show us for the specific data, etc.
Answer
Returns are a measure of the performance of our investments and allow us to compare performances from different investments. PCA is primarily used for dimensionality reduction, while SVD has broader applications, including matrix approximation and dimensionality reduction (Hair 92). Both PCA and SVD produce orthogonal vectors and the same eigenvalues. PCA involves computing eigenvectors and eigenvalues of the covariance matrix, while SVD directly deals with singular values. The singular values obtained from SVD can be used to calculate a variance of each principal component (Schwarz et al 2256). Each eigenvector transforms the standardized dataset into a principal component, whichcan explain a portion of the variance of the dataset. The corresponding eigenvalue for the principal component is the variance of the whole data explained by the principal component. The sum of all the eigenvalues is the total variance of the data.
Conclusion¶
Ensuring data quality in financial analysis is of paramount importance. Poor-quality data, whether structured or unstructured, hampers accurate modeling and reliable decision-making. For instance, in the context of this assignment, structured data included currency symbols, placeholders, and improper classifications that hinder numerical operations. Similarly, unstructured data exhibited missing values and formatting issues that complicated assessment.
The comparative analysis of the Nelson-Siegel and Cubic-Spline models highlighted their respective strengths and weaknesses. The Nelson-Siegel model, while easier to interpret, struggled with certain maturities, whereas the Cubic-Spline model offered a more precise fit but lacked clear economic interpretation. Ethical considerations in data smoothing were addressed, emphasizing transparency and accuracy over potential misrepresentation.
Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) both proved essential in reducing dimensionality and understanding underlying data structures, offering insights into returns and variability in investment performance. The screeplots revealed the differences in explained variance between uncorrelated and government data, emphasizing the significance of correlation in principal components.
References¶
Akinyemi, Kemi, et al. ''Yield curve extrapolation methods: Methodologies for valuing cash flows that extend beyond the maximum yield curve''. Society of actuaries, March (2019), https://www.soa.org/resources/research-reports/2019/yield-curve-report/
Copeland, Ronald M. "Income smoothing." Journal of accounting research, (1968): 101-116, https://doi.org/10.2307/2490073
Hair, J. F., et al. Multivariate Data Analysis. 7th Edn Prentice Hall International. Prentice Hall, (2009).
Huh, Y. U., et al. ''Data quality''. Information and software technology, 32.8 (1990): 559-565., https://www.sciencedirect.com/science/article/pii/095058499090146I
Liu, Grace. "Data quality problems troubling business and financial researchers: A literature review and synthetic analysis." Journal of Business & Finance Librarianship, 25.3-4 (2020): 315-371, https://doi.org/10.1080/08963568.2020.1847555
Nadinić, Berislav, and Damir Kalpić. "Data quality in finances and its impact on credit risk management and CRM integration." International Conference on Software and Data Technologies, Special Session on Applications in Banking and Financing, (3; 2008). 2008, https://doi.org/10.5220/0001879103270331
Nymand-Andersen, Per. ''Yield curve modelling and a conceptual framework for estimating yield curves: evidence from the European Central Bank's yield curves''. ECB Statistics Paper, 27, 2018, https://doi.org/10.2866/892636
Praveen, Shagufta, and Umesh Chandra. "Influence of structured, semi-structured, unstructured data on various data models." International Journal of Scientific & Engineering Research, 8.12 (2017): 67-69, https://www.researchgate.net/publication/344363081
Schwarz, Christian, et al. "Principal component analysis and singular value decomposition used for a numerical sensitivity analysis of a complex drawn part." The International Journal of Advanced Manufacturing Technology, 94 (2018): 2255-2265. https://link.springer.com/article/10.1007/s00170-017-0980-z
Sebastian-Coleman, Laura. Measuring data quality for ongoing improvement: a data quality assessment framework. Newnes, 2012, https://www.sciencedirect.com/book/9780123970336/measuring-data-quality-for-ongoing-improvement
Strong, Diane M., et al. "Data quality in context." Communications of the ACM, 40.5 (1997): 103-110, https://doi.org/10.1145/253769.253804
Wahlstrøm, Ranik Raaen, et al. ''A comparative analysis of parsimonious yield curve models with focus on the Nelson-Siegel, Svensson and Bliss versions''. Computational Economics (2022): 1-38, https://doi.org/10.1007/s10614-021-10113-w
Wang, Richard Y., and Diane M. Strong. "Beyond accuracy: What data quality means to data consumers." Journal of management information systems, 12.4 (1996): 5-33, https://doi.org/10.1080/07421222.1996.11518099.
Useful links
.
%%capture
!wget -nc https://raw.githubusercontent.com/brpy/colab-pdf/master/colab_pdf.py
from colab_pdf import colab_pdf
colab_pdf('Group_Work_Project 1, group 8072.ipynb')
#set_matplotlib_formats('pdf', 'svg')