{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Title :\n",
"Exercise: PCA 2 - Implementing PCA from scratch (Numpy)\n",
"\n",
"## Description :\n",
"To produce a plot that roughly looks like following: \n",
"\n",
"
\n",
"\n",
"## Data Description:\n",
"\n",
"## Instructions:\n",
"\n",
"### Part - I\n",
"\n",
"In this question, you have to implement the PCA technique using **numpy**. \n",
"\n",
"The idea is to maximize the variance along axes by rotating the points. Given the rotation matrix:\n",
"\n",
"$$R\\ =\\ \\left[\\begin{matrix}\\cos\\theta&-\\sin\\theta\\\\\\sin\\theta&\\cos\\theta\\end{matrix}\\right]$$\n",
"\n",
"the rotation of a matrix X is given by \n",
"\n",
"$$X_{R\\ }=\\ X\\cdot R$$\n",
"\n",
"where X_R is the rotated matrix and ยท symbol is the dot product operator.\n",
"\n",
"Once you have the `transform_pca` function, you have to evaluate and save the variance for each angle in thetas list.\n",
"\n",
"The best angle will be:\n",
"\n",
"For every $\\theta$ we find $var(X_p)$ \n",
" - Best angle is $\\theta$ corresponding to $max(var(X_p))$\n",
"\n",
"Note that we are only using variance for first predictor. \n",
"\n",
"Notice that the angle is in radians then:\n",
"\n",
"$$\\theta_{\\deg ree}\\ =\\ \\theta_{\\left\\{radians\\right\\}}\\cdot\\frac{180}{pi}$$\n",
"\n",
"Finally, you have to visualize the rotation given the best angle.\n",
"\n",
"### Part - II\n",
"\n",
"On this part, the idea is to compare the results with **scikit-learn**\n",
"\n",
"First, fit the PCA `model using PCA(n_components = 2) as you` did before in Q1. Given the components matrix C, the rotation angle is defined as follow:\n",
"\n",
"$$\\theta\\ =\\ \\arctan\\left(\\frac{component\\left(0,0\\right)}{component\\left(0,1\\right)}\\right)$$\n",
"\n",
"## Hints: \n",
"\n",
"`test_transform_pca` - You may use np.dot to calculate dot product. `transform_pca()` returns $X_p$\n",
"\n",
"`test_variances` - You may use np.var to calculate variances, using the correct `axis` parameter. \n",
"\n",
"`test_angle` - See np.argmax to find index of the maximum value in an array, use np.pi for $\\pi$\n",
"\n",
"`test_PCA_fit` - PCA.fit() - Note that this is unsupervised learning method. (Similar to Q1) \n",
"\n",
"`test_angle_sklearn` - You may use np.arctan with component(0,0) and component(0,1) as mentioned above.\n",
"\n",
"All the blanks are vectorized code (one liners, no looping constructs required).\n",
"\n",
"**Note:** You do not need to standardize for this particular exercise. \n",
"This exercise is auto-graded and you can try multiple attempts."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PCA 2"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pylab as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"X = np.loadtxt('PCA.csv')\n",
"print(X.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Numpy PCA"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_transform_pca) ###\n",
"def transform_pca(X, theta):\n",
" \"\"\"\n",
" Make linear transformation given particular angle\n",
" \n",
" Parameters:\n",
" X (np.array) : Input matrix\n",
" theta (float) : Radians angle\n",
" \n",
" Returns: Transformed input matrix\n",
" Xp (np.array)\n",
" \"\"\"\n",
" R = np.array( [[np.cos(theta), -np.sin(theta)],[np.sin(theta), np.cos(theta)]])\n",
" Xp = ___ \n",
" return Xp"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting the best angle \n",
"\n",
"For every $\\theta$ we find $var(X_p)$ \n",
" - Best angle is $\\theta$ corresponding to $max(var(X_p))$ where $X_p = X \\cdot R$\n",
"\n",
"Note that we are only using variance for first predictor. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_variances) ###\n",
"thetas = np.arange(0, np.pi/2, 0.01) # Angles for rotation\n",
"var_a1 = [] # First component variances\n",
"\n",
"for theta in thetas: \n",
" Xp = ___ \n",
" var = ___ \n",
" var_a1.append(var[0])\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_angle) ###\n",
"\n",
"#We have an array of theta values (thetas). Here we want to pick the \n",
"# value of theta that corresponds to maximum variance for first component. \n",
"angle_numpy = ___ \n",
"angle_np_degree = angle_numpy*180/np.pi # converting to degrees\n",
"\n",
"print('Best angle: {:.2f}'.format(angle_np_degree))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_linear_transformation) ###\n",
"Xp = transform_pca(X, angle_numpy) # Linear transformation of the input"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(6, 5))\n",
"plt.plot(X[:,0], X[:,1], \"+\", alpha=0.8, label='Original')\n",
"plt.plot(Xp[:,0], Xp[:,1], \"+\", alpha=0.8, label='Transformed')\n",
"plt.xticks(fontsize=15)\n",
"plt.yticks(fontsize=15)\n",
"plt.xlabel('X1', fontsize=15)\n",
"plt.ylabel('X2', fontsize=15)\n",
"plt.legend(fontsize=12)\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparing with Scikit-learn PCA"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_PCA_fit) ###\n",
"from sklearn.decomposition import PCA\n",
"\n",
"pca = PCA(___).fit(__) \n",
"pca_x = pca.transform(__) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting angle from components"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_angle_sklearn) ###\n",
"components = pca.components_\n",
"angle_sklearn = ___ \n",
"angle_sklearn_degrees = angle_sklearn*180/np.pi\n",
"print('Best angle: {:.2f}'.format(angle_sklearn_degrees))"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"plt.figure()\n",
"plt.plot(X[:,0], X[:,1], '+', label='original', alpha=0.6)\n",
"plt.plot(Xp[:,0], Xp[:,1], '+', label='numpy', alpha=0.6)\n",
"plt.plot(pca_x[:,0], pca_x[:,1], '+', label='sklearn', alpha=0.6)\n",
"plt.legend(fontsize=12)\n",
"plt.xticks(fontsize=15)\n",
"plt.yticks(fontsize=15)\n",
"plt.xlabel('X1', fontsize=15)\n",
"plt.ylabel('X2', fontsize=15)\n",
"plt.title(r'$\\theta$ numpy: {:.2f} - $\\theta$ sklearn: {:.2f}'.format(angle_np_degree, angle_sklearn_degrees),\n",
" fontsize=18)\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}