{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CS109A Introduction to Data Science \n", "\n", "## Lab 2: Pandas and Web Scraping with Beautiful Soup\n", "\n", "**Harvard University**
\n", "**Fall 2019**
\n", "**Instructors:** Pavlos Protopapas, Kevin Rader, and Chris Tanner
\n", "**Lab Instructors:** Chris Tanner and Eleni Kaxiras
\n", "**Authors:** Rahul Dave, David Sondak, Will Claybaugh, Pavlos Protopapas, Chris Tanner, Eleni Kaxiras\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## RUN THIS CELL TO GET THE RIGHT FORMATTING \n", "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"../../styles/cs109.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Table of Contents \n", "
    \n", "
  1. Learning Goals
  2. \n", "
  3. Loading and Cleaning with Pandas
  4. \n", "
  5. Parsing and Completing the Dataframe
  6. \n", "
  7. Grouping
  8. \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning Goals\n", "\n", "About 6,000 odd \"best books\" were fetched and parsed from [Goodreads](https://www.goodreads.com). The \"bestness\" of these books came from a proprietary formula used by Goodreads and published as a list on their web site.\n", "\n", "We parsed the page for each book and saved data from all these pages in a tabular format as a CSV file. In this lab we'll clean and further parse the data. We'll then do some exploratory data analysis to answer questions about these best books and popular genres. \n", "\n", "\n", "By the end of this lab, you should be able to:\n", "\n", "- Load and systematically address missing values, ancoded as `NaN` values in our data set, for example, by removing observations associated with these values.\n", "- Parse columns in the dataframe to create new dataframe columns.\n", "- Use groupby to aggregate data on a particular feature column, such as author.\n", "\n", "*This lab corresponds to lectures #1, #2, and #3 and maps on to homework #1 and further.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic EDA workflow\n", "\n", "(From the lecture, repeated here for convenience).\n", "\n", "The basic workflow is as follows:\n", "\n", "1. **Build** a DataFrame from the data (ideally, put all data in this object)\n", "2. **Clean** the DataFrame. It should have the following properties:\n", " - Each row describes a single object\n", " - Each column describes a property of that object\n", " - Columns are numeric whenever appropriate\n", " - Columns contain atomic properties that cannot be further decomposed\n", "3. Explore **global properties**. Use histograms, scatter plots, and aggregation functions to summarize the data.\n", "4. Explore **group properties**. Use groupby and small multiples to compare subsets of the data.\n", "\n", "This process transforms your data into a format which is easier to work with, gives you a basic overview of the data's properties, and likely generates several questions for you to followup in subsequent analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Loading and Cleaning with Pandas \n", "Read in the `goodreads.csv` file, examine the data, and do any necessary data cleaning. \n", "\n", "Here is a description of the columns (in order) present in this csv file:\n", "\n", "```\n", "rating: the average rating on a 1-5 scale achieved by the book\n", "review_count: the number of Goodreads users who reviewed this book\n", "isbn: the ISBN code for the book\n", "booktype: an internal Goodreads identifier for the book\n", "author_url: the Goodreads (relative) URL for the author of the book\n", "year: the year the book was published\n", "genre_urls: a string with '|' separated relative URLS of Goodreads genre pages\n", "dir: a directory identifier internal to the scraping code\n", "rating_count: the number of ratings for this book (this is different from the number of reviews)\n", "name: the name of the book\n", "```\n", "\n", "Let us see what issues we find with the data and resolve them. \n", "\n", "\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "After loading appropriate libraries\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import seaborn as sns\n", "pd.set_option('display.width', 500)\n", "pd.set_option('display.max_columns', 100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cleaning: Reading in the data\n", "We read in and clean the data from `goodreads.csv`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
4.401364550439023483good_reads:bookhttps://www.goodreads.com/author/show/153394.Suzanne_Collins2008/genres/young-adult|/genres/science-fiction|/genres/dystopia|/genres/fantasy|/genres/science-fiction|/genres/romance|/genres/adventure|/genres/book-club|/genres/young-adult|/genres/teen|/genres/apocalyptic|/genres/post-apocalyptic|/genres/actiondir01/2767052-the-hunger-games.html2958974The Hunger Games (The Hunger Games, #1)
04.41166480439358078good_reads:bookhttps://www.goodreads.com/author/show/1077326....2003.0/genres/fantasy|/genres/young-adult|/genres/fi...dir01/2.Harry_Potter_and_the_Order_of_the_Phoe...1284478Harry Potter and the Order of the Phoenix (Har...
13.56857460316015849good_reads:bookhttps://www.goodreads.com/author/show/941441.S...2005.0/genres/young-adult|/genres/fantasy|/genres/ro...dir01/41865.Twilight.html2579564Twilight (Twilight, #1)
24.23479060061120081good_reads:bookhttps://www.goodreads.com/author/show/1825.Har...1960.0/genres/classics|/genres/fiction|/genres/histo...dir01/2657.To_Kill_a_Mockingbird.html2078123To Kill a Mockingbird
34.23347720679783261good_reads:bookhttps://www.goodreads.com/author/show/1265.Jan...1813.0/genres/classics|/genres/fiction|/genres/roman...dir01/1885.Pride_and_Prejudice.html1388992Pride and Prejudice
44.25123630446675539good_reads:bookhttps://www.goodreads.com/author/show/11081.Ma...1936.0/genres/classics|/genres/historical-fiction|/g...dir01/18405.Gone_with_the_Wind.html645470Gone with the Wind
54.2272050066238501good_reads:bookhttps://www.goodreads.com/author/show/1069006....1949.0/genres/classics|/genres/young-adult|/genres/c...dir01/11127.The_Chronicles_of_Narnia.html286677The Chronicles of Narnia (Chronicles of Narnia...
64.38109020060256656good_reads:bookhttps://www.goodreads.com/author/show/435477.S...1964.0/genres/childrens|/genres/young-adult|/genres/...dir01/370493.The_Giving_Tree.html502891The Giving Tree
73.79206700452284244good_reads:bookhttps://www.goodreads.com/author/show/3706.Geo...1945.0/genres/classics|/genres/fiction|/genres/scien...dir01/7613.Animal_Farm.html1364879Animal Farm
84.18123020345391802good_reads:bookhttps://www.goodreads.com/author/show/4.Dougla...1979.0/genres/science-fiction|/genres/humor|/genres/...dir01/11.The_Hitchhiker_s_Guide_to_the_Galaxy....724713The Hitchhiker's Guide to the Galaxy (Hitchhik...
94.03209370739326228good_reads:bookhttps://www.goodreads.com/author/show/614.Arth...1997.0/genres/fiction|/genres/historical-fiction|/ge...dir01/930.Memoirs_of_a_Geisha.html1042679Memoirs of a Geisha
103.72349590307277674good_reads:bookhttps://www.goodreads.com/author/show/630.Dan_...2003.0/genres/mystery|/genres/thriller|/genres/suspe...dir01/968.The_Da_Vinci_Code.html1220657The Da Vinci Code (Robert Langdon, #2)
114.36695240375831002good_reads:bookhttps://www.goodreads.com/author/show/11466.Ma...2005.0/genres/historical-fiction|/genres/young-adult...dir01/19063.The_Book_Thief.html675431The Book Thief
124.0555160451527747good_reads:bookhttps://www.goodreads.com/author/show/8164.Lew...1865.0/genres/classics|/genres/childrens|/genres/you...dir01/24213.Alice_s_Adventures_in_Wonderland_T...301702Alice's Adventures in Wonderland & Through the...
133.72101560743477111good_reads:bookhttps://www.goodreads.com/author/show/947.Will...1597.0/genres/classics|/genres/plays|/genres/fiction...dir01/18135.Romeo_and_Juliet.html1211146Romeo and Juliet
144.09100820451525264good_reads:bookhttps://www.goodreads.com/author/show/13661.Vi...1862.0/genres/classics|/genres/historical-fiction|/g...dir01/24280.Les_Mis_rables.html418004Les Misérables
153.9238061NaNgood_reads:bookhttps://www.goodreads.com/author/show/498072.A...2003.0/genres/fiction|/genres/romance|/genres/fantas...dir01/18619684-the-time-traveler-s-wife.html927254The Time Traveler's Wife
164.5813140345538374good_reads:bookhttps://www.goodreads.com/author/show/656983.J...1973.0/genres/fantasy|/genres/classics|/genres/scien...dir01/30.J_R_R_Tolkien_4_Book_Boxed_Set.html68495J.R.R. Tolkien 4-Book Boxed Set
173.60180390140283331good_reads:bookhttps://www.goodreads.com/author/show/306.Will...1954.0/genres/classics|/genres/academic|/genres/scho...dir01/7624.Lord_of_the_Flies.html1232126Lord of the Flies
184.28308150812550706good_reads:bookhttps://www.goodreads.com/author/show/589.Orso...1985.0/genres/science-fiction|/genres/young-adult|/g...dir01/375802.Ender_s_Game.html624730Ender's Game (The Ender Quintet, #1)
194.02119420375751513good_reads:bookhttps://www.goodreads.com/author/show/3565.Osc...1890.0/genres/classics|/genres/fiction|/genres/horro...dir01/5297.The_Picture_of_Dorian_Gray.html409478The Picture of Dorian Gray
204.1486810143058142good_reads:bookhttps://www.goodreads.com/author/show/3137322....1866.0/genres/classics|/genres/cultural|/genres/russ...dir01/7144.Crime_and_Punishment.html294297Crime and Punishment
214.1188970064410935good_reads:bookhttps://www.goodreads.com/author/show/988142.E...1952.0/genres/childrens|/genres/fiction|/genres/clas...dir01/24178.Charlotte_s_Web.html662707Charlotte's Web
224.2086780451528824good_reads:bookhttps://www.goodreads.com/author/show/5350.L_M...1908.0/genres/fiction|/genres/young-adult|/genres/cl...dir01/8127.Anne_of_Green_Gables.html393594Anne of Green Gables (Anne of Green Gables, #1)
233.75369550061122416good_reads:bookhttps://www.goodreads.com/author/show/566.Paul...1988.0/genres/fiction|/genres/classics|/genres/fanta...dir01/865.The_Alchemist.html876518The Alchemist
243.94185810007491565good_reads:bookhttps://www.goodreads.com/author/show/1630.Ray...1953.0/genres/classics|/genres/fiction|/genres/scien...dir01/17470674-fahrenheit-451.html783133Fahrenheit 451
254.431122790525478817good_reads:bookhttps://www.goodreads.com/author/show/1406384....2012.0/genres/young-adult|/genres/book-club|/genres/...dir01/11870085-the-fault-in-our-stars.html1150626The Fault in Our Stars
263.79158330142000671good_reads:bookhttps://www.goodreads.com/author/show/585.John...1937.0/genres/fiction|/genres/classics|/genres/acade...dir01/890.Of_Mice_and_Men.html1070755Of Mice and Men
274.04132140440498058good_reads:bookhttps://www.goodreads.com/author/show/106.Made...1962.0/genres/fantasy|/genres/young-adult|/genres/cl...dir01/18131.A_Wrinkle_in_Time.html420001A Wrinkle in Time (A Wrinkle in Time Quintet, #1)
283.94117360393970124good_reads:bookhttps://www.goodreads.com/author/show/6988.Bra...1897.0/genres/classics|/genres/horror|/genres/fictio...dir01/17245.Dracula.html429079Dracula
294.24106140345418263good_reads:bookhttps://www.goodreads.com/author/show/12521.Wi...1973.0/genres/fantasy|/genres/classics|/genres/roman...dir01/21787.The_Princess_Bride.html457219The Princess Bride
.................................
59693.971820399151311good_reads:bookhttps://www.goodreads.com/author/show/33987.La...1985.0/genres/romance|/genres/romance|/genres/contem...dir60/572626.Separate_Beds.html3544Separate Beds
59704.24720413748308good_reads:bookhttps://www.goodreads.com/author/show/29185.Sa...2000.0/genres/plays|/genres/drama|/genres/plays|/gen...dir60/146548.4_48_Psychosis.html10164.48 Psychosis
59714.191670NaNgood_reads:bookhttps://www.goodreads.com/author/show/4586597....2011.0/genres/romance|/genres/romance|/genres/contem...dir60/12351649-perfection.html35197Perfection (Neighbor from Hell, #2)
59724.177891401324290good_reads:bookhttps://www.goodreads.com/author/show/4627059....2011.0/genres/biography|/genres/animals|/genres/auto...dir60/10393675-until-tuesday.html4685Until Tuesday
59733.9929440425267040good_reads:bookhttps://www.goodreads.com/author/show/24978.Ma...2013.0/genres/romance|/genres/adult-fiction|/genres/...dir60/16033902-rush.html41287Rush (Breathless, #1)
59744.07105850061950726good_reads:bookhttps://www.goodreads.com/author/show/157146.C...2013.0/genres/historical-fiction|/genres/book-club|/...dir60/15818107-orphan-train.html76606Orphan Train
59754.231185NaNgood_reads:bookhttps://www.goodreads.com/author/show/5160667....2014.0/genres/romance|/genres/science-fiction|/genre...dir60/20504754-transcendence.html4942Transcendence
59764.03218NaNgood_reads:bookhttps://www.goodreads.com/author/show/5769580....1987.0/genres/fiction|/genres/novels|/genres/literat...dir60/5948927.html1607التيه
59773.99271853408360good_reads:bookhttps://www.goodreads.com/author/show/851161.K...2005.0/genres/young-adult|/genres/romance|/genres/co...dir60/2274992.Tessa_in_Love.html294Tessa in Love
59782.778000060988649good_reads:bookhttps://www.goodreads.com/author/show/7025.Gre...2001.0/genres/fantasy|/genres/fiction|/genres/myster...dir60/24929.Lost.html11128Lost
59793.841650571207995good_reads:bookhttps://www.goodreads.com/author/show/16865.Ti...1977.0/genres/fiction|/genres/cultural|/genres/canad...dir60/29898.The_Wars.html4160The Wars
59803.3616930312424442good_reads:bookhttps://www.goodreads.com/author/show/3083854....2003.0/genres/fiction|/genres/novels|/genres/contemp...dir60/231.I_am_Charlotte_Simmons.html17743I am Charlotte Simmons
59814.093621407103946good_reads:bookhttps://www.goodreads.com/author/show/81096.Ch...2009.0/genres/fantasy|/genres/horror|/genres/young-a...dir60/6364017-malice.html2013Malice (Malice, #1)
59824.231371582430438good_reads:bookhttps://www.goodreads.com/author/show/8567.Wen...1974.0/genres/fiction|/genres/novels|/genres/literat...dir60/227274.The_Memory_of_Old_Jack.html1085The Memory of Old Jack
59834.025310575085150good_reads:bookhttps://www.goodreads.com/author/show/81096.Ch...2009.0/genres/science-fiction|/genres/steampunk|/gen...dir60/6285903-retribution-falls.html3878Retribution Falls (Tales of the Ketty Jay, #1)
59843.611091401360106good_reads:bookhttps://www.goodreads.com/author/show/183537.K...2005.0/genres/fiction|/genres/young-adult|/genres/bo...dir60/319403.Pigtopia.html529Pigtopia
59854.069541606840584good_reads:bookhttps://www.goodreads.com/author/show/2891503....2010.0/genres/young-adult|/genres/fantasy|/genres/pa...dir60/7831742-the-lost-saint.html12690The Lost Saint (The Dark Divine, #2)
59864.264770517548233good_reads:bookhttps://www.goodreads.com/author/show/2062.Hen...1946.0/genres/economics|/genres/non-fiction|/genres/...dir60/3028.Economics_in_One_Lesson.html5767Economics in One Lesson
59874.34930575070706good_reads:bookhttps://www.goodreads.com/author/show/58.Frank...1977.0/genres/science-fiction|/genres/fantasy|/genre...dir60/53764.The_Great_Dune_Trilogy.html41378The Great Dune Trilogy
59883.36192842534607Xgood_reads:bookhttps://www.goodreads.com/author/show/3493970....2011.0/genres/european-literature|/genres/spanish-li...dir60/10832326-si-t-me-dices-ven-lo-dejo-todo-...1914Si tú me dices ven lo dejo todo... pero dime ven
59894.1211500140143459good_reads:bookhttps://www.goodreads.com/author/show/776.Mich...1989.0/genres/non-fiction|/genres/economics|/genres/...dir60/1171.Liar_s_Poker.html32637Liar's Poker
59904.20650NaNgood_reads:bookhttps://www.goodreads.com/author/show/1112683._2009.0/genres/novels|/genres/fiction|/genres/religio...dir60/6976667.html2899ألواح ودسر
59913.891321400303400good_reads:bookhttps://www.goodreads.com/author/show/5544.Fra...2002.0/genres/christian-fiction|/genres/christian|/g...dir60/65686.Nightmare_Academy.html3531Nightmare Academy (Veritas Project, #2)
59924.0912560345515501good_reads:bookhttps://www.goodreads.com/author/show/18149.Te...2011.0/genres/mystery|/genres/mystery|/genres/crime|...dir60/9578677-the-silent-girl.html16312The Silent Girl (Rizzoli & Isles, #9)
59934.37280393062260good_reads:bookhttps://www.goodreads.com/author/show/62157.Ro...2007.0/genres/poetry|/genres/religion|/genres/christ...dir60/1251125.The_Book_of_Psalms.html242The Book of Psalms
59944.1722260767913736good_reads:bookhttps://www.goodreads.com/author/show/44565.Ca...2005.0/genres/history|/genres/non-fiction|/genres/bi...dir60/78508.The_River_of_Doubt.html16618The River of Doubt
59953.997751416909427good_reads:bookhttps://www.goodreads.com/author/show/151371.J...2006.0/genres/young-adult|/genres/realistic-fiction|...dir60/259068.Shug.html6179Shug
59963.785401620612321good_reads:bookhttps://www.goodreads.com/author/show/5761314....2012.0/genres/contemporary|/genres/romance|/genres/y...dir60/13503247-flawed.html2971Flawed
59973.91281NaNgood_reads:bookhttps://www.goodreads.com/author/show/1201952....2006.0/genres/religion|/genres/islam|/genres/religio...dir60/2750008.html3083أسعد ام", "رأة في العالÙ
59984.35610786929081good_reads:bookhttps://www.goodreads.com/author/show/1023510....2001.0/genres/fiction|/genres/fantasy|/genres/magic|...dir60/66677.Legacy_of_the_Drow_Collector_s_Edi...3982Legacy of the Drow Collector's Edition (Legacy...
\n", "

5999 rows × 10 columns

\n", "
" ], "text/plain": [ " 4.40 136455 0439023483 good_reads:book https://www.goodreads.com/author/show/153394.Suzanne_Collins 2008 /genres/young-adult|/genres/science-fiction|/genres/dystopia|/genres/fantasy|/genres/science-fiction|/genres/romance|/genres/adventure|/genres/book-club|/genres/young-adult|/genres/teen|/genres/apocalyptic|/genres/post-apocalyptic|/genres/action dir01/2767052-the-hunger-games.html 2958974 The Hunger Games (The Hunger Games, #1)\n", "0 4.41 16648 0439358078 good_reads:book https://www.goodreads.com/author/show/1077326.... 2003.0 /genres/fantasy|/genres/young-adult|/genres/fi... dir01/2.Harry_Potter_and_the_Order_of_the_Phoe... 1284478 Harry Potter and the Order of the Phoenix (Har...\n", "1 3.56 85746 0316015849 good_reads:book https://www.goodreads.com/author/show/941441.S... 2005.0 /genres/young-adult|/genres/fantasy|/genres/ro... dir01/41865.Twilight.html 2579564 Twilight (Twilight, #1)\n", "2 4.23 47906 0061120081 good_reads:book https://www.goodreads.com/author/show/1825.Har... 1960.0 /genres/classics|/genres/fiction|/genres/histo... dir01/2657.To_Kill_a_Mockingbird.html 2078123 To Kill a Mockingbird\n", "3 4.23 34772 0679783261 good_reads:book https://www.goodreads.com/author/show/1265.Jan... 1813.0 /genres/classics|/genres/fiction|/genres/roman... dir01/1885.Pride_and_Prejudice.html 1388992 Pride and Prejudice\n", "4 4.25 12363 0446675539 good_reads:book https://www.goodreads.com/author/show/11081.Ma... 1936.0 /genres/classics|/genres/historical-fiction|/g... dir01/18405.Gone_with_the_Wind.html 645470 Gone with the Wind\n", "... ... ... ... ... ... ... ... ... ... ...\n", "5994 4.17 2226 0767913736 good_reads:book https://www.goodreads.com/author/show/44565.Ca... 2005.0 /genres/history|/genres/non-fiction|/genres/bi... dir60/78508.The_River_of_Doubt.html 16618 The River of Doubt\n", "5995 3.99 775 1416909427 good_reads:book https://www.goodreads.com/author/show/151371.J... 2006.0 /genres/young-adult|/genres/realistic-fiction|... dir60/259068.Shug.html 6179 Shug\n", "5996 3.78 540 1620612321 good_reads:book https://www.goodreads.com/author/show/5761314.... 2012.0 /genres/contemporary|/genres/romance|/genres/y... dir60/13503247-flawed.html 2971 Flawed\n", "5997 3.91 281 NaN good_reads:book https://www.goodreads.com/author/show/1201952.... 2006.0 /genres/religion|/genres/islam|/genres/religio... dir60/2750008.html 3083 أسعد اÙ\n", "رأة في العالÙ\n", "\n", "5998 4.35 61 0786929081 good_reads:book https://www.goodreads.com/author/show/1023510.... 2001.0 /genres/fiction|/genres/fantasy|/genres/magic|... dir60/66677.Legacy_of_the_Drow_Collector_s_Edi... 3982 Legacy of the Drow Collector's Edition (Legacy...\n", "\n", "[5999 rows x 10 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Read the data into a dataframe\n", "df = pd.read_csv(\"data/goodreads.csv\", encoding='utf-8')\n", "\n", "#Examine the first few rows of the dataframe\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Oh dear. That does not quite seem to be right. We are missing the column names. We need to add these in! But what are they?\n", "\n", "Here is a list of them in order:\n", "\n", "`[\"rating\", 'review_count', 'isbn', 'booktype','author_url', 'year', 'genre_urls', 'dir','rating_count', 'name']`\n", "\n", "
Exercise
\n", "Use these to load the dataframe properly! And then \"head\" the dataframe... (you will need to look at the read_csv docs)\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ratingreview_countisbnbooktypeauthor_urlyeargenre_urlsdirrating_countname
04.401364550439023483good_reads:bookhttps://www.goodreads.com/author/show/153394.S...2008.0/genres/young-adult|/genres/science-fiction|/g...dir01/2767052-the-hunger-games.html2958974The Hunger Games (The Hunger Games, #1)
14.41166480439358078good_reads:bookhttps://www.goodreads.com/author/show/1077326....2003.0/genres/fantasy|/genres/young-adult|/genres/fi...dir01/2.Harry_Potter_and_the_Order_of_the_Phoe...1284478Harry Potter and the Order of the Phoenix (Har...
23.56857460316015849good_reads:bookhttps://www.goodreads.com/author/show/941441.S...2005.0/genres/young-adult|/genres/fantasy|/genres/ro...dir01/41865.Twilight.html2579564Twilight (Twilight, #1)
34.23479060061120081good_reads:bookhttps://www.goodreads.com/author/show/1825.Har...1960.0/genres/classics|/genres/fiction|/genres/histo...dir01/2657.To_Kill_a_Mockingbird.html2078123To Kill a Mockingbird
44.23347720679783261good_reads:bookhttps://www.goodreads.com/author/show/1265.Jan...1813.0/genres/classics|/genres/fiction|/genres/roman...dir01/1885.Pride_and_Prejudice.html1388992Pride and Prejudice
\n", "
" ], "text/plain": [ " rating review_count isbn booktype author_url year genre_urls dir rating_count name\n", "0 4.40 136455 0439023483 good_reads:book https://www.goodreads.com/author/show/153394.S... 2008.0 /genres/young-adult|/genres/science-fiction|/g... dir01/2767052-the-hunger-games.html 2958974 The Hunger Games (The Hunger Games, #1)\n", "1 4.41 16648 0439358078 good_reads:book https://www.goodreads.com/author/show/1077326.... 2003.0 /genres/fantasy|/genres/young-adult|/genres/fi... dir01/2.Harry_Potter_and_the_Order_of_the_Phoe... 1284478 Harry Potter and the Order of the Phoenix (Har...\n", "2 3.56 85746 0316015849 good_reads:book https://www.goodreads.com/author/show/941441.S... 2005.0 /genres/young-adult|/genres/fantasy|/genres/ro... dir01/41865.Twilight.html 2579564 Twilight (Twilight, #1)\n", "3 4.23 47906 0061120081 good_reads:book https://www.goodreads.com/author/show/1825.Har... 1960.0 /genres/classics|/genres/fiction|/genres/histo... dir01/2657.To_Kill_a_Mockingbird.html 2078123 To Kill a Mockingbird\n", "4 4.23 34772 0679783261 good_reads:book https://www.goodreads.com/author/show/1265.Jan... 1813.0 /genres/classics|/genres/fiction|/genres/roman... dir01/1885.Pride_and_Prejudice.html 1388992 Pride and Prejudice" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# your code here\n", "df=pd.read_csv(\"data/goodreads.csv\", header=None,\n", " names=[\"rating\", 'review_count', 'isbn', 'booktype','author_url', 'year', 'genre_urls', 'dir','rating_count', 'name'],\n", ")\n", "\n", "#Examine the first few rows of the dataframe\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cleaning: Examing the dataframe - quick checks\n", "\n", "We should examine the dataframe to get a overall sense of the content. \n", "\n", "
Exercise
\n", "Lets check the types of the columns. What do you find?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "rating float64\n", "review_count object\n", "isbn object\n", "booktype object\n", "author_url object\n", "year float64\n", "genre_urls object\n", "dir object\n", "rating_count object\n", "name object\n", "dtype: object" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# your code here\n", "####### \n", "df.dtypes\n", "####### \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*your answer here*\n", "\n", "Notice that `review_count` and `rating_counts` are objects instead of ints, and the `year` is a float!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a couple more quick sanity checks to perform on the dataframe. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(6000, 10)\n" ] }, { "data": { "text/plain": [ "Index(['rating', 'review_count', 'isbn', 'booktype', 'author_url', 'year', 'genre_urls', 'dir', 'rating_count', 'name'], dtype='object')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(df.shape)\n", "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cleaning: Examining the dataframe - a deeper look" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Beyond performing checking some quick general properties of the data frame and looking at the first $n$ rows, we can dig a bit deeper into the values being stored. If you haven't already, check to see if there are any missing values in the data frame.\n", "\n", "Let's see for a column which seemed OK to us." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "0\n", "475\n", "0\n", "0\n", "7\n", "62\n", "0\n", "0\n", "0\n" ] } ], "source": [ "#Get a sense of how many missing values there are in the dataframe.\n", "print(np.sum([df.rating.isnull()]))\n", "print(np.sum([df.review_count.isnull()]))\n", "print(np.sum([df.isbn.isnull()]))\n", "print(np.sum([df.booktype.isnull()]))\n", "print(np.sum([df.author_url.isnull()]))\n", "print(np.sum([df.year.isnull()]))\n", "print(np.sum([df.genre_urls.isnull()]))\n", "print(np.sum([df.dir.isnull()]))\n", "print(np.sum([df.rating_count.isnull()]))\n", "print(np.sum([df.name.isnull()]))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ratingreview_countisbnbooktypeauthor_urlyeargenre_urlsdirrating_countname
3643NaNNoneNoneNoneNoneNaNNaNdir37/9658936-harry-potter.htmlNoneNone
5282NaNNoneNoneNoneNoneNaNNaNdir53/113138.The_Winner.htmlNoneNone
\n", "
" ], "text/plain": [ " rating review_count isbn booktype author_url year genre_urls dir rating_count name\n", "3643 NaN None None None None NaN NaN dir37/9658936-harry-potter.html None None\n", "5282 NaN None None None None NaN NaN dir53/113138.The_Winner.html None None" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Try to locate where the missing values occur\n", "df[df.rating.isnull()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How does `pandas` or `numpy` handle missing values when we try to compute with data sets that include them?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll now check if any of the other suspicious columns have missing values. Let's look at `year` and `review_count` first.\n", "\n", "One thing you can do is to try and convert to the type you expect the column to be. If something goes wrong, it likely means your data are bad." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets test for missing data:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(6000, 10)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.year.isnull()]\n", "\n", "df.year.isnull()\n", "df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cleaning: Dealing with Missing Values\n", "How should we interpret 'missing' or 'invalid' values in the data (hint: look at where these values occur)? One approach is to simply exclude them from the dataframe. Is this appropriate for all 'missing' or 'invalid' values? " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "#Treat the missing or invalid values in your dataframe\n", "####### \n", "\n", "df = df[df.year.notnull()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok so we have done some cleaning. What do things look like now? Notice the float has not yet changed." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "rating float64\n", "review_count object\n", "isbn object\n", "booktype object\n", "author_url object\n", "year float64\n", "genre_urls object\n", "dir object\n", "rating_count object\n", "name object\n", "dtype: object" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n" ] }, { "data": { "text/plain": [ "(5993, 10)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(np.sum(df.year.isnull()))\n", "df.shape # We removed seven rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Exercise
\n", "\n", "Ok so lets fix those types. Convert them to ints. If the type conversion fails, we now know we have further problems." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# your code here\n", "df.rating_count=df.rating_count.astype(int)\n", "df.review_count=df.review_count.astype(int)\n", "df.year=df.year.astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you do this, we seem to be good on these columns (no errors in conversion). Lets look:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "rating float64\n", "review_count int64\n", "isbn object\n", "booktype object\n", "author_url object\n", "year int64\n", "genre_urls object\n", "dir object\n", "rating_count int64\n", "name object\n", "dtype: object" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sweet!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some of the other colums that should be strings have NaN." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "df.loc[df.genre_urls.isnull(), 'genre_urls']=\"\"\n", "df.loc[df.isbn.isnull(), 'isbn']=\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Parsing and Completing the Data Frame \n", "\n", "We will parse the `author` column from the author_url and `genres` column from the genre_urls. Keep the `genres` column as a string separated by '|'.\n", "\n", "We will use panda's `map` to assign new columns to the dataframe. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examine an example `author_url` and reason about which sequence of string operations must be performed in order to isolate the author's name." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'https://www.goodreads.com/author/show/153394.Suzanne_Collins'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Get the first author_url\n", "test_string = df.author_url[0]\n", "test_string" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Suzanne_Collins'" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Test out some string operations to isolate the author name\n", "\n", "test_string.split('/')[-1].split('.')[1:][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Exercise
\n", "\n", "Lets wrap the above code into a function which we will then use" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Write a function that accepts an author url and returns the author's name based on your experimentation above\n", "def get_author(url):\n", " # your code here\n", " name = url.split('/')[-1].split('.')[1:][0]\n", " ####### \n", " return name" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Suzanne_Collins\n", "1 J_K_Rowling\n", "2 Stephenie_Meyer\n", "3 Harper_Lee\n", "4 Jane_Austen\n", "Name: author, dtype: object" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Apply the get_author function to the 'author_url' column using '.map' \n", "#and add a new column 'author' to store the names\n", "df['author'] = df.author_url.map(get_author)\n", "df.author[0:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Exercise
\n", "\n", "Now parse out the genres from `genre_url`. \n", "\n", "This is a little more complicated because there be more than one genre.\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 /genres/young-adult|/genres/science-fiction|/g...\n", "1 /genres/fantasy|/genres/young-adult|/genres/fi...\n", "2 /genres/young-adult|/genres/fantasy|/genres/ro...\n", "3 /genres/classics|/genres/fiction|/genres/histo...\n", "4 /genres/classics|/genres/fiction|/genres/roman...\n", "Name: genre_urls, dtype: object" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "df.genre_urls.head()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "young-adult\n", "science-fiction\n", "dystopia\n", "fantasy\n", "science-fiction\n", "romance\n", "adventure\n", "book-club\n", "young-adult\n", "teen\n", "apocalyptic\n", "post-apocalyptic\n", "action\n" ] } ], "source": [ "# your code here\n", "#Examine some examples of genre_urls\n", "\n", "#Test out some string operations to isolate the genre name\n", "test_genre_string=df.genre_urls[0]\n", "genres=test_genre_string.strip().split('|')\n", "for e in genres:\n", " print(e.split('/')[-1])\n", " \"|\".join(genres)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Exercise
\n", "\n", "Write a function that accepts a genre url and returns the genre name based on your experimentation above\n", "\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def split_and_join_genres(url):\n", " # your code here\n", " genres=url.strip().split('|')\n", " genres=[e.split('/')[-1] for e in genres]\n", " return \"|\".join(genres)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Test your function" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'young-adult|science-fiction'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "split_and_join_genres(\"/genres/young-adult|/genres/science-fiction\")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "''" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "split_and_join_genres(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Exercise
\n", "\n", "Use map again to create a new \"genres\" column" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ratingreview_countisbnbooktypeauthor_urlyeargenre_urlsdirrating_countnameauthorgenres
04.401364550439023483good_reads:bookhttps://www.goodreads.com/author/show/153394.S...2008/genres/young-adult|/genres/science-fiction|/g...dir01/2767052-the-hunger-games.html2958974The Hunger Games (The Hunger Games, #1)Suzanne_Collinsyoung-adult|science-fiction|dystopia|fantasy|s...
14.41166480439358078good_reads:bookhttps://www.goodreads.com/author/show/1077326....2003/genres/fantasy|/genres/young-adult|/genres/fi...dir01/2.Harry_Potter_and_the_Order_of_the_Phoe...1284478Harry Potter and the Order of the Phoenix (Har...J_K_Rowlingfantasy|young-adult|fiction|fantasy|magic|chil...
23.56857460316015849good_reads:bookhttps://www.goodreads.com/author/show/941441.S...2005/genres/young-adult|/genres/fantasy|/genres/ro...dir01/41865.Twilight.html2579564Twilight (Twilight, #1)Stephenie_Meyeryoung-adult|fantasy|romance|paranormal|vampire...
34.23479060061120081good_reads:bookhttps://www.goodreads.com/author/show/1825.Har...1960/genres/classics|/genres/fiction|/genres/histo...dir01/2657.To_Kill_a_Mockingbird.html2078123To Kill a MockingbirdHarper_Leeclassics|fiction|historical-fiction|academic|s...
44.23347720679783261good_reads:bookhttps://www.goodreads.com/author/show/1265.Jan...1813/genres/classics|/genres/fiction|/genres/roman...dir01/1885.Pride_and_Prejudice.html1388992Pride and PrejudiceJane_Austenclassics|fiction|romance|historical-fiction|li...
\n", "
" ], "text/plain": [ " rating review_count isbn booktype author_url year genre_urls dir rating_count name author genres\n", "0 4.40 136455 0439023483 good_reads:book https://www.goodreads.com/author/show/153394.S... 2008 /genres/young-adult|/genres/science-fiction|/g... dir01/2767052-the-hunger-games.html 2958974 The Hunger Games (The Hunger Games, #1) Suzanne_Collins young-adult|science-fiction|dystopia|fantasy|s...\n", "1 4.41 16648 0439358078 good_reads:book https://www.goodreads.com/author/show/1077326.... 2003 /genres/fantasy|/genres/young-adult|/genres/fi... dir01/2.Harry_Potter_and_the_Order_of_the_Phoe... 1284478 Harry Potter and the Order of the Phoenix (Har... J_K_Rowling fantasy|young-adult|fiction|fantasy|magic|chil...\n", "2 3.56 85746 0316015849 good_reads:book https://www.goodreads.com/author/show/941441.S... 2005 /genres/young-adult|/genres/fantasy|/genres/ro... dir01/41865.Twilight.html 2579564 Twilight (Twilight, #1) Stephenie_Meyer young-adult|fantasy|romance|paranormal|vampire...\n", "3 4.23 47906 0061120081 good_reads:book https://www.goodreads.com/author/show/1825.Har... 1960 /genres/classics|/genres/fiction|/genres/histo... dir01/2657.To_Kill_a_Mockingbird.html 2078123 To Kill a Mockingbird Harper_Lee classics|fiction|historical-fiction|academic|s...\n", "4 4.23 34772 0679783261 good_reads:book https://www.goodreads.com/author/show/1265.Jan... 1813 /genres/classics|/genres/fiction|/genres/roman... dir01/1885.Pride_and_Prejudice.html 1388992 Pride and Prejudice Jane_Austen classics|fiction|romance|historical-fiction|li..." ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "df['genres']=df.genre_urls.map(split_and_join_genres)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's pick an author at random so we can see the results of the transformations. Scroll to see the `author` and `genre` columns that we added to the dataframe." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ratingreview_countisbnbooktypeauthor_urlyeargenre_urlsdirrating_countnameauthorgenres
10144.234830374529264good_reads:bookhttps://www.goodreads.com/author/show/7732.Mar...1951/genres/historical-fiction|/genres/fiction|/ge...dir11/12172.Memoirs_of_Hadrian.html6258Memoirs of HadrianMarguerite_Yourcenarhistorical-fiction|fiction|cultural|france|cla...
56204.11742070367983good_reads:bookhttps://www.goodreads.com/author/show/7732.Mar...1968/genres/fiction|/genres/historical-fiction|/ge...dir57/953435.L_uvre_au_noir.html1601L'Œuvre au noirMarguerite_Yourcenarfiction|historical-fiction|cultural|france|eur...
\n", "
" ], "text/plain": [ " rating review_count isbn booktype author_url year genre_urls dir rating_count name author genres\n", "1014 4.23 483 0374529264 good_reads:book https://www.goodreads.com/author/show/7732.Mar... 1951 /genres/historical-fiction|/genres/fiction|/ge... dir11/12172.Memoirs_of_Hadrian.html 6258 Memoirs of Hadrian Marguerite_Yourcenar historical-fiction|fiction|cultural|france|cla...\n", "5620 4.11 74 2070367983 good_reads:book https://www.goodreads.com/author/show/7732.Mar... 1968 /genres/fiction|/genres/historical-fiction|/ge... dir57/953435.L_uvre_au_noir.html 1601 L'Œuvre au noir Marguerite_Yourcenar fiction|historical-fiction|cultural|france|eur..." ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.author == \"Marguerite_Yourcenar\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us delete the `genre_urls` column." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "del df['genre_urls']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then save the dataframe out!" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "df.to_csv(\"data/cleaned-goodreads.csv\", index=False, header=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Grouping " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It appears that some books were written in negative years! Print out the observations that correspond to negative years. What do you notice about these books? " ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "47 The Odyssey\n", "246 The Iliad/The Odyssey\n", "455 The Republic\n", "596 The Aeneid\n", "629 Oedipus Rex\n", "674 The Art of War\n", "746 The Bhagavad Gita\n", "777 Antigone\n", "1233 The Oedipus Cycle\n", "1397 Aesop's Fables\n", "1398 The Epic of Gilgamesh\n", "1428 Medea\n", "1815 The Oresteia\n", "1882 The Trial and Death of Socrates\n", "2078 The History of the Peloponnesian War\n", "2527 The Histories\n", "3133 Complete Works\n", "3274 The Nicomachean Ethics\n", "3757 Lysistrata\n", "4402 The Symposium\n", "4475 Apology\n", "5367 Five Dialogues\n", "Name: name, dtype: object" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# your code here\n", "df[df.year < 0].name\n", "#These are books written before the Common Era (BCE, equivalent to BC)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can determine the \"best book\" by year! For this we use Panda's `groupby()`. `Groupby()` allows grouping a dataframe by any (usually categorical) variable. Would it make sense to ever groupby integer variables? Floating point variables?" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.groupby.generic.DataFrameGroupBy" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfgb_author = df.groupby('author')\n", "type(dfgb_author)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perhaps we want the number of books each author wrote" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ratingreview_countisbnbooktypeauthor_urlyeardirrating_countnamegenres
author
A_A_Milne6666666666
A_G_Howard1111111111
A_J_Cronin1111111111
A_J_Jacobs1111111111
A_J_Salt1111111111
A_Meredith_Walters2222222222
A_N_Roquelaure2222222222
A_S_Byatt1111111111
A_S_King1111111111
A_id_al_Qarni2222222222
Abbi_Glines14141414141414141414
Abdul_Rahman_Munif1111111111
Abigail_Gibbs1111111111
Abigail_Roux4444444444
Abigail_Thomas1111111111
Abolqasem_Ferdowsi1111111111
Abraham_Verghese1111111111
Abul_Hasan_Ali_Nadwi1111111111
Adam_Hochschild1111111111
Adam_Johnson1111111111
Adam_Levin1111111111
Adam_Rex1111111111
Adam_Smith1111111111
Addison_Moore1111111111
Adeline_Yen_Mah1111111111
Adolf_Hitler1111111111
Adolfo_Bioy_Casares1111111111
Aeschylus1111111111
Aesop1111111111
Agatha_Christie11111111111111111111
.................................
William_Strunk_Jr_1111111111
William_Styron3333333333
William_Wharton2222222222
William_Wordsworth1111111111
Willow_Aster1111111111
Wilson_Rawls2222222222
Winston_Groom1111111111
Witold_Gombrowicz2222222222
Wm_Paul_Young1111111111
Woody_Allen1111111111
Wu_Cheng_en1111111111
Yamamoto_Tsunetomo1111111111
Yana_Toboso1111111111
Yann_Martel2222222222
Yasunari_Kawabata1111111111
Yevgeny_Zamyatin1111111111
Young_Kim1111111111
Yuehai_Xiao1111111111
Yukio_Mishima2222222222
Yukito_Kishiro1111111111
Yvonne_Woon1111111111
Zack_Love1111111111
Zadie_Smith2222222222
Zilpha_Keatley_Snyder1111111111
Zora_Neale_Hurston1111111111
_42424242424242424242
_gota_Krist_f1111111111
_mile_Zola4444444444
_ric_Emmanuel_Schmitt1111111111
_sne_Seierstad1111111111
\n", "

2645 rows × 10 columns

\n", "
" ], "text/plain": [ " rating review_count isbn booktype author_url year dir rating_count name genres\n", "author \n", "A_A_Milne 6 6 6 6 6 6 6 6 6 6\n", "A_G_Howard 1 1 1 1 1 1 1 1 1 1\n", "A_J_Cronin 1 1 1 1 1 1 1 1 1 1\n", "A_J_Jacobs 1 1 1 1 1 1 1 1 1 1\n", "A_J_Salt 1 1 1 1 1 1 1 1 1 1\n", "A_Meredith_Walters 2 2 2 2 2 2 2 2 2 2\n", "A_N_Roquelaure 2 2 2 2 2 2 2 2 2 2\n", "A_S_Byatt 1 1 1 1 1 1 1 1 1 1\n", "A_S_King 1 1 1 1 1 1 1 1 1 1\n", "A_id_al_Qarni 2 2 2 2 2 2 2 2 2 2\n", "Abbi_Glines 14 14 14 14 14 14 14 14 14 14\n", "Abdul_Rahman_Munif 1 1 1 1 1 1 1 1 1 1\n", "Abigail_Gibbs 1 1 1 1 1 1 1 1 1 1\n", "Abigail_Roux 4 4 4 4 4 4 4 4 4 4\n", "Abigail_Thomas 1 1 1 1 1 1 1 1 1 1\n", "Abolqasem_Ferdowsi 1 1 1 1 1 1 1 1 1 1\n", "Abraham_Verghese 1 1 1 1 1 1 1 1 1 1\n", "Abul_Hasan_Ali_Nadwi 1 1 1 1 1 1 1 1 1 1\n", "Adam_Hochschild 1 1 1 1 1 1 1 1 1 1\n", "Adam_Johnson 1 1 1 1 1 1 1 1 1 1\n", "Adam_Levin 1 1 1 1 1 1 1 1 1 1\n", "Adam_Rex 1 1 1 1 1 1 1 1 1 1\n", "Adam_Smith 1 1 1 1 1 1 1 1 1 1\n", "Addison_Moore 1 1 1 1 1 1 1 1 1 1\n", "Adeline_Yen_Mah 1 1 1 1 1 1 1 1 1 1\n", "Adolf_Hitler 1 1 1 1 1 1 1 1 1 1\n", "Adolfo_Bioy_Casares 1 1 1 1 1 1 1 1 1 1\n", "Aeschylus 1 1 1 1 1 1 1 1 1 1\n", "Aesop 1 1 1 1 1 1 1 1 1 1\n", "Agatha_Christie 11 11 11 11 11 11 11 11 11 11\n", "... ... ... ... ... ... ... ... ... ... ...\n", "William_Strunk_Jr_ 1 1 1 1 1 1 1 1 1 1\n", "William_Styron 3 3 3 3 3 3 3 3 3 3\n", "William_Wharton 2 2 2 2 2 2 2 2 2 2\n", "William_Wordsworth 1 1 1 1 1 1 1 1 1 1\n", "Willow_Aster 1 1 1 1 1 1 1 1 1 1\n", "Wilson_Rawls 2 2 2 2 2 2 2 2 2 2\n", "Winston_Groom 1 1 1 1 1 1 1 1 1 1\n", "Witold_Gombrowicz 2 2 2 2 2 2 2 2 2 2\n", "Wm_Paul_Young 1 1 1 1 1 1 1 1 1 1\n", "Woody_Allen 1 1 1 1 1 1 1 1 1 1\n", "Wu_Cheng_en 1 1 1 1 1 1 1 1 1 1\n", "Yamamoto_Tsunetomo 1 1 1 1 1 1 1 1 1 1\n", "Yana_Toboso 1 1 1 1 1 1 1 1 1 1\n", "Yann_Martel 2 2 2 2 2 2 2 2 2 2\n", "Yasunari_Kawabata 1 1 1 1 1 1 1 1 1 1\n", "Yevgeny_Zamyatin 1 1 1 1 1 1 1 1 1 1\n", "Young_Kim 1 1 1 1 1 1 1 1 1 1\n", "Yuehai_Xiao 1 1 1 1 1 1 1 1 1 1\n", "Yukio_Mishima 2 2 2 2 2 2 2 2 2 2\n", "Yukito_Kishiro 1 1 1 1 1 1 1 1 1 1\n", "Yvonne_Woon 1 1 1 1 1 1 1 1 1 1\n", "Zack_Love 1 1 1 1 1 1 1 1 1 1\n", "Zadie_Smith 2 2 2 2 2 2 2 2 2 2\n", "Zilpha_Keatley_Snyder 1 1 1 1 1 1 1 1 1 1\n", "Zora_Neale_Hurston 1 1 1 1 1 1 1 1 1 1\n", "_ 42 42 42 42 42 42 42 42 42 42\n", "_gota_Krist_f 1 1 1 1 1 1 1 1 1 1\n", "_mile_Zola 4 4 4 4 4 4 4 4 4 4\n", "_ric_Emmanuel_Schmitt 1 1 1 1 1 1 1 1 1 1\n", "_sne_Seierstad 1 1 1 1 1 1 1 1 1 1\n", "\n", "[2645 rows x 10 columns]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfgb_author.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lots of useless info there. One column should suffice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise:\n", "\n", "- Group the dataframe by `author`. Include the following columns: `rating`, `name`, `author`. For the aggregation of the `name` column which includes the names of the books create a list with the strings containing the name of each book. Make sure that the way you aggregate the rest of the columns make sense! \n", "\n", "- Create a new column with number of books for each author and find the most prolific author!" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ratingreview_countisbnbooktypeauthor_urlyeardirrating_countnameauthorgenres
22134.191169good_reads:bookhttps://www.goodreads.com/author/show/1201952....2003dir23/2750180.html15781لا تحزنA_id_al_Qarnireligion|religion|islam|self-help|non-fiction|...
59983.91281good_reads:bookhttps://www.goodreads.com/author/show/1201952....2006dir60/2750008.html3083أسعد ام", "رأة في العالÙA_id_al_Qarnireligion|islam|religion|self-help|spirituality...
\n", "
" ], "text/plain": [ " rating review_count isbn booktype author_url year dir rating_count name author genres\n", "2213 4.19 1169 good_reads:book https://www.goodreads.com/author/show/1201952.... 2003 dir23/2750180.html 15781 لا تحزن A_id_al_Qarni religion|religion|islam|self-help|non-fiction|...\n", "5998 3.91 281 good_reads:book https://www.goodreads.com/author/show/1201952.... 2006 dir60/2750008.html 3083 أسعد اÙ\n", "رأة في العالÙ\n", " A_id_al_Qarni religion|islam|religion|self-help|spirituality..." ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "###### Before we start : what do we do about these titles where 'name' is unreadable? Try different encodings?\n", "auth_name = 'A_id_al_Qarni'\n", "df[df.author == auth_name].head()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'\\xff\\xfe\\xd9\\x00\\x84\\x00\\xd8\\x00\\xa7\\x00 \\x00\\xd8\\x00\\xaa\\x00\\xd8\\x00\\xad\\x00\\xd8\\x00\\xb2\\x00\\xd9\\x00\\x86\\x00'" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.author == auth_name].iat[0,8].encode('UTF-16')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['rating', 'review_count', 'isbn', 'booktype', 'author_url', 'year', 'dir', 'rating_count', 'name', 'author', 'genres'], dtype='object')" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# let's examine the columns we have\n", "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create the GroupBy table" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "authors = df.copy()\n", "authors = authors[['rating','name','author']].groupby('author').agg({'rating' : np.mean,\n", " 'name' : '|'.join})" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
authorratingname
0A_A_Milne4.365Winnie-the-Pooh|The House at Pooh Corner|The H...
1A_G_Howard4.020Splintered (Splintered, #1)
2A_J_Cronin4.220The Keys of the Kingdom
3A_J_Jacobs3.750The Year of Living Biblically
4A_J_Salt4.940Nik Nassa & the Mark of Destiny
\n", "
" ], "text/plain": [ " author rating name\n", "0 A_A_Milne 4.365 Winnie-the-Pooh|The House at Pooh Corner|The H...\n", "1 A_G_Howard 4.020 Splintered (Splintered, #1)\n", "2 A_J_Cronin 4.220 The Keys of the Kingdom\n", "3 A_J_Jacobs 3.750 The Year of Living Biblically\n", "4 A_J_Salt 4.940 Nik Nassa & the Mark of Destiny" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "authors = authors.reset_index()\n", "authors.head()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
authorratingname
0A_A_Milne4.365[Winnie-the-Pooh, The House at Pooh Corner, Th...
1A_G_Howard4.020[Splintered (Splintered, #1)]
2A_J_Cronin4.220[The Keys of the Kingdom]
3A_J_Jacobs3.750[The Year of Living Biblically]
4A_J_Salt4.940[Nik Nassa & the Mark of Destiny]
\n", "
" ], "text/plain": [ " author rating name\n", "0 A_A_Milne 4.365 [Winnie-the-Pooh, The House at Pooh Corner, Th...\n", "1 A_G_Howard 4.020 [Splintered (Splintered, #1)]\n", "2 A_J_Cronin 4.220 [The Keys of the Kingdom]\n", "3 A_J_Jacobs 3.750 [The Year of Living Biblically]\n", "4 A_J_Salt 4.940 [Nik Nassa & the Mark of Destiny]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# split the column string and make a list of string book names\n", "authors['name'] = authors.name.str.split('|')\n", "authors.head()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# count the books - create new column\n", "len(authors.name[0])" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
authorratingnamenum_books
0A_A_Milne4.365000[Winnie-the-Pooh, The House at Pooh Corner, Th...6
1A_G_Howard4.020000[Splintered (Splintered, #1)]1
2A_J_Cronin4.220000[The Keys of the Kingdom]1
3A_J_Jacobs3.750000[The Year of Living Biblically]1
4A_J_Salt4.940000[Nik Nassa & the Mark of Destiny]1
5A_Meredith_Walters4.150000[Find You in the Dark (Find You in the Dark, #...2
6A_N_Roquelaure3.450000[Beauty's Punishment (Sleeping Beauty, #2), Th...2
7A_S_Byatt3.860000[Possession]1
8A_S_King3.930000[Please Ignore Vera Dietz]1
9A_id_al_Qarni4.050000[لا تحزن, أسعد ام", "رأة في الØ...2
10Abbi_Glines4.179286[Fallen Too Far (Too Far, #1), Existence (Exis...14
11Abdul_Rahman_Munif4.030000[التيه]1
12Abigail_Gibbs3.820000[Dinner With a Vampire (Dark Heroine, #1)]1
13Abigail_Roux4.470000[Cut & Run (Cut & Run, #1), Fish & Chips (Cut ...4
14Abigail_Thomas3.680000[A Three Dog Life]1
15Abolqasem_Ferdowsi4.520000[Shahnameh]1
16Abraham_Verghese4.260000[Cutting for Stone]1
17Abul_Hasan_Ali_Nadwi4.150000[م", "اذا خسر العالم", " بانحطاط Ø...1
18Adam_Hochschild4.140000[King Leopold's Ghost]1
19Adam_Johnson4.030000[The Orphan Master's Son]1
20Adam_Levin4.040000[The Instructions]1
21Adam_Rex4.140000[The True Meaning of Smekday]1
22Adam_Smith3.820000[The Wealth of Nations]1
23Addison_Moore3.780000[Ethereal (Celestra, #1)]1
24Adeline_Yen_Mah4.010000[Chinese Cinderella]1
25Adolf_Hitler2.970000[Mein Kampf]1
26Adolfo_Bioy_Casares4.060000[The Invention of Morel]1
27Aeschylus3.960000[The Oresteia]1
28Aesop4.030000[Aesop's Fables]1
29Agatha_Christie3.977273[And Then There Were None, Murder on the Orien...11
...............
2615William_Strunk_Jr_4.170000[The Elements of Style]1
2616William_Styron4.043333[Sophie's Choice, The Confessions of Nat Turne...3
2617William_Wharton4.090000[Birdy, A Midnight Clear]2
2618William_Wordsworth3.920000[Lyrical Ballads]1
2619Willow_Aster4.140000[True Love Story]1
2620Wilson_Rawls3.990000[Where the Red Fern Grows, Summer of the Monkeys]2
2621Winston_Groom3.970000[Forrest Gump (Forrest Gump, #1)]1
2622Witold_Gombrowicz3.975000[Ferdydurke, Cosmos]2
2623Wm_Paul_Young3.660000[The Shack]1
2624Woody_Allen4.030000[Without Feathers]1
2625Wu_Cheng_en4.040000[Monkey]1
2626Yamamoto_Tsunetomo4.090000[Hagakure]1
2627Yana_Toboso4.340000[Black Butler, Vol. 01 (Black Butler, #1)]1
2628Yann_Martel3.470000[Life of Pi, Beatrice and Virgil]2
2629Yasunari_Kawabata3.740000[Snow Country]1
2630Yevgeny_Zamyatin3.970000[We]1
2631Young_Kim3.660000[Twilight (Twilight: The Graphic Novel, #1)]1
2632Yuehai_Xiao4.560000[Crossing the Seas]1
2633Yukio_Mishima4.030000[The Sailor Who Fell from Grace with the Sea, ...2
2634Yukito_Kishiro4.150000[Battle Angel Alita, Volume 01]1
2635Yvonne_Woon3.950000[Dead Beautiful (Dead Beautiful, #1)]1
2636Zack_Love3.550000[Sex in the Title]1
2637Zadie_Smith3.655000[White Teeth, On Beauty]2
2638Zilpha_Keatley_Snyder4.110000[The Changeling]1
2639Zora_Neale_Hurston3.820000[Their Eyes Were Watching God]1
2640_3.988095[عزازيل, ثلاثية غرناطة, تر...42
2641_gota_Krist_f4.340000[The Notebook, The Proof, The Third Lie]1
2642_mile_Zola3.990000[Germinal (Les Rougon-Macquart, #13), L'Assomm...4
2643_ric_Emmanuel_Schmitt4.160000[Oscar et la dame rose]1
2644_sne_Seierstad3.740000[The Bookseller of Kabul]1
\n", "

2645 rows × 4 columns

\n", "
" ], "text/plain": [ " author rating name num_books\n", "0 A_A_Milne 4.365000 [Winnie-the-Pooh, The House at Pooh Corner, Th... 6\n", "1 A_G_Howard 4.020000 [Splintered (Splintered, #1)] 1\n", "2 A_J_Cronin 4.220000 [The Keys of the Kingdom] 1\n", "3 A_J_Jacobs 3.750000 [The Year of Living Biblically] 1\n", "4 A_J_Salt 4.940000 [Nik Nassa & the Mark of Destiny] 1\n", "5 A_Meredith_Walters 4.150000 [Find You in the Dark (Find You in the Dark, #... 2\n", "6 A_N_Roquelaure 3.450000 [Beauty's Punishment (Sleeping Beauty, #2), Th... 2\n", "7 A_S_Byatt 3.860000 [Possession] 1\n", "8 A_S_King 3.930000 [Please Ignore Vera Dietz] 1\n", "9 A_id_al_Qarni 4.050000 [لا تحزن, أسعد اÙ\n", "رأة في الØ... 2\n", "10 Abbi_Glines 4.179286 [Fallen Too Far (Too Far, #1), Existence (Exis... 14\n", "11 Abdul_Rahman_Munif 4.030000 [التيه] 1\n", "12 Abigail_Gibbs 3.820000 [Dinner With a Vampire (Dark Heroine, #1)] 1\n", "13 Abigail_Roux 4.470000 [Cut & Run (Cut & Run, #1), Fish & Chips (Cut ... 4\n", "14 Abigail_Thomas 3.680000 [A Three Dog Life] 1\n", "15 Abolqasem_Ferdowsi 4.520000 [Shahnameh] 1\n", "16 Abraham_Verghese 4.260000 [Cutting for Stone] 1\n", "17 Abul_Hasan_Ali_Nadwi 4.150000 [Ù\n", "اذا خسر العالÙ\n", " بانحطاط Ø... 1\n", "18 Adam_Hochschild 4.140000 [King Leopold's Ghost] 1\n", "19 Adam_Johnson 4.030000 [The Orphan Master's Son] 1\n", "20 Adam_Levin 4.040000 [The Instructions] 1\n", "21 Adam_Rex 4.140000 [The True Meaning of Smekday] 1\n", "22 Adam_Smith 3.820000 [The Wealth of Nations] 1\n", "23 Addison_Moore 3.780000 [Ethereal (Celestra, #1)] 1\n", "24 Adeline_Yen_Mah 4.010000 [Chinese Cinderella] 1\n", "25 Adolf_Hitler 2.970000 [Mein Kampf] 1\n", "26 Adolfo_Bioy_Casares 4.060000 [The Invention of Morel] 1\n", "27 Aeschylus 3.960000 [The Oresteia] 1\n", "28 Aesop 4.030000 [Aesop's Fables] 1\n", "29 Agatha_Christie 3.977273 [And Then There Were None, Murder on the Orien... 11\n", "... ... ... ... ...\n", "2615 William_Strunk_Jr_ 4.170000 [The Elements of Style] 1\n", "2616 William_Styron 4.043333 [Sophie's Choice, The Confessions of Nat Turne... 3\n", "2617 William_Wharton 4.090000 [Birdy, A Midnight Clear] 2\n", "2618 William_Wordsworth 3.920000 [Lyrical Ballads] 1\n", "2619 Willow_Aster 4.140000 [True Love Story] 1\n", "2620 Wilson_Rawls 3.990000 [Where the Red Fern Grows, Summer of the Monkeys] 2\n", "2621 Winston_Groom 3.970000 [Forrest Gump (Forrest Gump, #1)] 1\n", "2622 Witold_Gombrowicz 3.975000 [Ferdydurke, Cosmos] 2\n", "2623 Wm_Paul_Young 3.660000 [The Shack] 1\n", "2624 Woody_Allen 4.030000 [Without Feathers] 1\n", "2625 Wu_Cheng_en 4.040000 [Monkey] 1\n", "2626 Yamamoto_Tsunetomo 4.090000 [Hagakure] 1\n", "2627 Yana_Toboso 4.340000 [Black Butler, Vol. 01 (Black Butler, #1)] 1\n", "2628 Yann_Martel 3.470000 [Life of Pi, Beatrice and Virgil] 2\n", "2629 Yasunari_Kawabata 3.740000 [Snow Country] 1\n", "2630 Yevgeny_Zamyatin 3.970000 [We] 1\n", "2631 Young_Kim 3.660000 [Twilight (Twilight: The Graphic Novel, #1)] 1\n", "2632 Yuehai_Xiao 4.560000 [Crossing the Seas] 1\n", "2633 Yukio_Mishima 4.030000 [The Sailor Who Fell from Grace with the Sea, ... 2\n", "2634 Yukito_Kishiro 4.150000 [Battle Angel Alita, Volume 01] 1\n", "2635 Yvonne_Woon 3.950000 [Dead Beautiful (Dead Beautiful, #1)] 1\n", "2636 Zack_Love 3.550000 [Sex in the Title] 1\n", "2637 Zadie_Smith 3.655000 [White Teeth, On Beauty] 2\n", "2638 Zilpha_Keatley_Snyder 4.110000 [The Changeling] 1\n", "2639 Zora_Neale_Hurston 3.820000 [Their Eyes Were Watching God] 1\n", "2640 _ 3.988095 [عزازيل, ثلاثية غرناطة, تر... 42\n", "2641 _gota_Krist_f 4.340000 [The Notebook, The Proof, The Third Lie] 1\n", "2642 _mile_Zola 3.990000 [Germinal (Les Rougon-Macquart, #13), L'Assomm... 4\n", "2643 _ric_Emmanuel_Schmitt 4.160000 [Oscar et la dame rose] 1\n", "2644 _sne_Seierstad 3.740000 [The Bookseller of Kabul] 1\n", "\n", "[2645 rows x 4 columns]" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "authors['num_books'] = authors['name'].str.len()\n", "authors" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "author Stephen_King\n", "rating 3.91875\n", "name [The Stand, The Shining (The Shining #1), It, ...\n", "num_books 56\n", "Name: 2349, dtype: object" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sort for more prolific\n", "authors.sort_values(by='num_books', ascending=False).iloc[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Winner is Stephen King with 56 books! OMG!!!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perhaps you want more detailed info..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "dfgb_author[['rating', 'rating_count', 'review_count', 'year']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also access a `groupby` dictionary style." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "ratingdict = {}\n", "for author, subset in dfgb_author:\n", " ratingdict[author] = (subset['rating'].mean(), subset['rating'].std())\n", "ratingdict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Exercise
\n", "\n", "Lets get the best-rated book(s) for every year in our dataframe." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Using .groupby, we can divide the dataframe into subsets by the values of 'year'.\n", "#We can then iterate over these subsets\n", "# your code here\n", "for year, subset in df.groupby('year'):\n", " #Find the best book of the year\n", "\n", " bestbook = subset[subset.rating == subset.rating.max()]\n", " if bestbook.shape[0] > 1:\n", " print(year, bestbook.name.values, bestbook.rating.values)\n", " else:\n", " print(year, bestbook.name.values[0], bestbook.rating.values[0])" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }