{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Title :\n", "Exercise: PCA 2 - Implementing PCA from scratch (Numpy)\n", "\n", "## Description :\n", "To produce a plot that roughly looks like following: \n", "\n", "\n", "\n", "## Data Description:\n", "\n", "## Instructions:\n", "\n", "### Part - I\n", "\n", "In this question, you have to implement the PCA technique using **numpy**. \n", "\n", "The idea is to maximize the variance along axes by rotating the points. Given the rotation matrix:\n", "\n", "$$R\\ =\\ \\left[\\begin{matrix}\\cos\\theta&-\\sin\\theta\\\\\\sin\\theta&\\cos\\theta\\end{matrix}\\right]$$\n", "\n", "the rotation of a matrix X is given by \n", "\n", "$$X_{R\\ }=\\ X\\cdot R$$\n", "\n", "where X_R is the rotated matrix and ยท symbol is the dot product operator.\n", "\n", "Once you have the `transform_pca` function, you have to evaluate and save the variance for each angle in thetas list.\n", "\n", "The best angle will be:\n", "\n", "For every $\\theta$ we find $var(X_p)$ \n", " - Best angle is $\\theta$ corresponding to $max(var(X_p))$\n", "\n", "Note that we are only using variance for first predictor. \n", "\n", "Notice that the angle is in radians then:\n", "\n", "$$\\theta_{\\deg ree}\\ =\\ \\theta_{\\left\\{radians\\right\\}}\\cdot\\frac{180}{pi}$$\n", "\n", "Finally, you have to visualize the rotation given the best angle.\n", "\n", "### Part - II\n", "\n", "On this part, the idea is to compare the results with **scikit-learn**\n", "\n", "First, fit the PCA `model using PCA(n_components = 2) as you` did before in Q1. Given the components matrix C, the rotation angle is defined as follow:\n", "\n", "$$\\theta\\ =\\ \\arctan\\left(\\frac{component\\left(0,0\\right)}{component\\left(0,1\\right)}\\right)$$\n", "\n", "## Hints: \n", "\n", "`test_transform_pca` - You may use np.dot to calculate dot product. `transform_pca()` returns $X_p$\n", "\n", "`test_variances` - You may use np.var to calculate variances, using the correct `axis` parameter. \n", "\n", "`test_angle` - See np.argmax to find index of the maximum value in an array, use np.pi for $\\pi$\n", "\n", "`test_PCA_fit` - PCA.fit() - Note that this is unsupervised learning method. (Similar to Q1) \n", "\n", "`test_angle_sklearn` - You may use np.arctan with component(0,0) and component(0,1) as mentioned above.\n", "\n", "All the blanks are vectorized code (one liners, no looping constructs required).\n", "\n", "**Note:** You do not need to standardize for this particular exercise. \n", "This exercise is auto-graded and you can try multiple attempts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PCA 2" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pylab as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "X = np.loadtxt('PCA.csv')\n", "print(X.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numpy PCA" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "### edTest(test_transform_pca) ###\n", "def transform_pca(X, theta):\n", " \"\"\"\n", " Make linear transformation given particular angle\n", " \n", " Parameters:\n", " X (np.array) : Input matrix\n", " theta (float) : Radians angle\n", " \n", " Returns: Transformed input matrix\n", " Xp (np.array)\n", " \"\"\"\n", " R = np.array( [[np.cos(theta), -np.sin(theta)],[np.sin(theta), np.cos(theta)]])\n", " Xp = ___ \n", " return Xp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting the best angle \n", "\n", "For every $\\theta$ we find $var(X_p)$ \n", " - Best angle is $\\theta$ corresponding to $max(var(X_p))$ where $X_p = X \\cdot R$\n", "\n", "Note that we are only using variance for first predictor. \n", "\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "### edTest(test_variances) ###\n", "thetas = np.arange(0, np.pi/2, 0.01) # Angles for rotation\n", "var_a1 = [] # First component variances\n", "\n", "for theta in thetas: \n", " Xp = ___ \n", " var = ___ \n", " var_a1.append(var[0])\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "### edTest(test_angle) ###\n", "\n", "#We have an array of theta values (thetas). Here we want to pick the \n", "# value of theta that corresponds to maximum variance for first component. \n", "angle_numpy = ___ \n", "angle_np_degree = angle_numpy*180/np.pi # converting to degrees\n", "\n", "print('Best angle: {:.2f}'.format(angle_np_degree))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "### edTest(test_linear_transformation) ###\n", "Xp = transform_pca(X, angle_numpy) # Linear transformation of the input" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(6, 5))\n", "plt.plot(X[:,0], X[:,1], \"+\", alpha=0.8, label='Original')\n", "plt.plot(Xp[:,0], Xp[:,1], \"+\", alpha=0.8, label='Transformed')\n", "plt.xticks(fontsize=15)\n", "plt.yticks(fontsize=15)\n", "plt.xlabel('X1', fontsize=15)\n", "plt.ylabel('X2', fontsize=15)\n", "plt.legend(fontsize=12)\n", "plt.grid()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparing with Scikit-learn PCA" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "### edTest(test_PCA_fit) ###\n", "from sklearn.decomposition import PCA\n", "\n", "pca = PCA(___).fit(__) \n", "pca_x = pca.transform(__) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting angle from components" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "### edTest(test_angle_sklearn) ###\n", "components = pca.components_\n", "angle_sklearn = ___ \n", "angle_sklearn_degrees = angle_sklearn*180/np.pi\n", "print('Best angle: {:.2f}'.format(angle_sklearn_degrees))" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [ "plt.figure()\n", "plt.plot(X[:,0], X[:,1], '+', label='original', alpha=0.6)\n", "plt.plot(Xp[:,0], Xp[:,1], '+', label='numpy', alpha=0.6)\n", "plt.plot(pca_x[:,0], pca_x[:,1], '+', label='sklearn', alpha=0.6)\n", "plt.legend(fontsize=12)\n", "plt.xticks(fontsize=15)\n", "plt.yticks(fontsize=15)\n", "plt.xlabel('X1', fontsize=15)\n", "plt.ylabel('X2', fontsize=15)\n", "plt.title(r'$\\theta$ numpy: {:.2f} - $\\theta$ sklearn: {:.2f}'.format(angle_np_degree, angle_sklearn_degrees),\n", " fontsize=18)\n", "plt.grid()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 0, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }