CS109A Introduction to Data Science

Lab 2 Scraping

Harvard University
Summer 2018
Instructors: Pavlos Protopapas and Kevin Rader
Lab Instructors: Rahul Dave
Authors: Rahul Dave, David Sondak, Will Claybaugh and Pavlos Protopaps


In [1]:
## RUN THIS CELL TO GET THE RIGHT FORMATTING 
from IPython.core.display import HTML
def css_styling():
    styles = open("../../styles/cs109.css", "r").read()
    return HTML(styles)
css_styling()
Out[1]:
In [2]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn.apionly as sns
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
/Users/eleni/anaconda2/envs/bunny/lib/python3.6/site-packages/seaborn/apionly.py:6: UserWarning: As seaborn no longer sets a default style on import, the seaborn.apionly module is deprecated. It will be removed in a future version.
  warnings.warn(msg, UserWarning)
In [3]:
import time, requests

In this lab, we'll scrape Goodread's Best Books list:

https://www.goodreads.com/list/show/1.Best_Books_Ever?page=1 .

We'll walk through scraping the list pages for the book names/urls

Table of Contents

  1. Learning Goals
  2. Exploring the Web pages and downloading them
  3. Parse the page, extract book urls
  4. Parse a book page, extract book properties
  5. Set up a pipeline for fetching and parsing

Learning Goals

Understand the structure of a web page. Use Beautiful soup to scrape content from these web pages.

This lab corresponds to lectures 2, 3 and 4 and maps on to homework 1 and further.

1. Exploring the web pages and downloading them

We're going to see the structure of Goodread's best books list. We'll use the Developer tools in chrome, safari and firefox have similar tools available

To getch this page we use the requests module. But are we allowed to do this? Lets check:

https://www.goodreads.com/robots.txt

Yes we are.

In [4]:
URLSTART="https://www.goodreads.com"
BESTBOOKS="/list/show/1.Best_Books_Ever?page="
url = URLSTART+BESTBOOKS+'1'
print(url)
page = requests.get(url)
https://www.goodreads.com/list/show/1.Best_Books_Ever?page=1

We can see properties of the page. Most relevant are status_code and text. The former tells us if the web-page was found, and if found , ok. (See lecture notes.)

In [5]:
page.status_code # 200 is good
Out[5]:
200
In [6]:
page.text[:5000]
Out[6]:
'\n\n\n\n\n  \nBest Books Ever (54400 books)\n\n\n\n    <script type="text/javascript"> var ue_t0=window.ue_t0||+new Date();\n </script>\n  <script type="text/javascript">\n    (function(e){var c=e;var a=c.ue||{};a.main_scope="mainscopecsm";a.q=[];a.t0=c.ue_t0||+new Date();a.d=g;function g(h){return +new Date()-(h?0:a.t0)}function d(h){return function(){a.q.push({n:h,a:arguments,t:a.d()})}}function b(m,l,h,j,i){var k={m:m,f:l,l:h,c:""+j,err:i,fromOnError:1,args:arguments};c.ueLogError(k);return false}b.skipTrace=1;e.onerror=b;function f(){c.uex("ld")}if(e.addEventListener){e.addEventListener("load",f,false)}else{if(e.attachEvent){e.attachEvent("onload",f)}}a.tag=d("tag");a.log=d("log");a.reset=d("rst");c.ue_csm=c;c.ue=a;c.ueLogError=d("err");c.ues=d("ues");c.uet=d("uet");c.uex=d("uex");c.uet("ue")})(window);(function(e,d){var a=e.ue||{};function c(g){if(!g){return}var f=d.head||d.getElementsByTagName("head")[0]||d.documentElement,h=d.createElement("script");h.async="async";h.src=g;f.insertBefore(h,f.firstChild)}function b(){var k=e.ue_cdn||"z-ecx.images-amazon.com",g=e.ue_cdns||"images-na.ssl-images-amazon.com",j="/images/G/01/csminstrumentation/",h=e.ue_file||"ue-full-11e51f253e8ad9d145f4ed644b40f692._V1_.js",f,i;if(h.indexOf("NSTRUMENTATION_FIL")>=0){return}if("ue_https" in e){f=e.ue_https}else{f=e.location&&e.location.protocol=="https:"?1:0}i=f?"https://":"http://";i+=f?g:k;i+=j;i+=h;c(i)}if(!e.ue_inline){if(a.loadUEFull){a.loadUEFull()}else{b()}}a.uels=c;e.ue=a})(window,document);\n\n    if (window.ue && window.ue.tag) { window.ue.tag(\'list:show:signed_out\', ue.main_scope);window.ue.tag(\'list:show:signed_out:desktop\', ue.main_scope); }\n  </script>\n\n\n\n          <script type="text/javascript">\n        if (window.Mobvious === undefined) {\n          window.Mobvious = {};\n        }\n        window.Mobvious.device_type = \'desktop\';\n        </script>\n\n\n  \n<script src="https://s.gr-assets.com/assets/webfontloader-d12a76562d1fabc96ea2ca165a5f46ae.js"></script>\n<script>\n//<![CDATA[\n\n  WebFont.load({\n    classes: false,\n    custom: {\n      families: ["Lato:n4,n7,i4", "Merriweather:n4,n7,i4"],\n      urls: ["https://s.gr-assets.com/assets/gr/fonts-e256f84093cc13b27f5b82343398031a.css"]\n    }\n  });\n\n//]]>\n</script>\n\n  <link rel="stylesheet" media="all" href="https://s.gr-assets.com/assets/goodreads-bad0777cf7270fb59d36c8f13c6d96b9.css" />\n<!--[if lte IE 9]>\n<link rel="stylesheet" media="all" href="https://s.gr-assets.com/assets/goodreads_split2-48a5db1c55dcda6ab475ad6619dd5a52.css" />\n<![endif]-->\n    <style type="text/css" media="screen">\n    .bigTabs {\n      margin-bottom: 10px;\n    }\n\n    .list_read{\n      background-color: #D7D2C4;\n      float: left;\n    }\n  </style>\n\n\n  <!--[if lt IE 9]>\n    <link rel="stylesheet" media="screen" href="https://s.gr-assets.com/assets/ie-243e73b5c2539766a7d02b37fdf277f8.css" />\n  <![endif]-->\n\n  <!--[if lt IE 8]>\n    <link rel="stylesheet" media="screen" href="https://s.gr-assets.com/assets/common_images_ie-5b0cc205c33756d8e7e84ce5b8be28e3.css" />\n  <![endif]-->\n\n  <!--[if gte IE 8]><!-->\n  <link rel="stylesheet" media="screen" href="https://s.gr-assets.com/assets/common_images-67dd60a681af04727cf7d18f5c5ba498.css" />\n  <!--<![endif]-->\n\n  <script src="https://s.gr-assets.com/assets/desktop/libraries-1c2883cf9caed935dcb88219fd720a9c.js"></script>\n  <script src="https://s.gr-assets.com/assets/application-a59f4ee39388be54746bddc73dd37c81.js"></script>\n\n    <script>\n  //<![CDATA[\n    var gptAdSlots = gptAdSlots || [];\n    var googletag = googletag || {};\n    googletag.cmd = googletag.cmd || [];\n    (function() {\n      var gads = document.createElement("script");\n      gads.async = true;\n      gads.type = "text/javascript";\n      var useSSL = "https:" == document.location.protocol;\n      gads.src = (useSSL ? "https:" : "http:") +\n      "//www.googletagservices.com/tag/js/gpt.js";\n      var node = document.getElementsByTagName("script")[0];\n      node.parentNode.insertBefore(gads, node);\n    })();\n    // page settings\n  //]]>\n</script>\n<script>\n  //<![CDATA[\n    googletag.cmd.push(function() {\n      googletag.pubads().setTargeting("sid", "osid.77da3382ab59401c0cd7edf0f2809352");\n    googletag.pubads().setTargeting("grsession", "osid.77da3382ab59401c0cd7edf0f2809352");\n    googletag.pubads().setTargeting("surface", "desktop");\n    googletag.pubads().setTargeting("signedin", "false");\n    googletag.pubads().setTargeting("gr_author", "false");\n    googletag.pubads().setTargeting("author", []);\n    googletag.pubads().setTargeting("shelf", ["best","earliestlist","favourites","fiction","writer"]);\n    googletag.pubads().setTargeting("gtargeting", "1z141z5");\n    googletag.pubads().setTargeting("resource", "List_1");\n      googletag.pubads().enableAsyncRendering();\n      googletag.pubads().enableSingleRequest();\n      googletag.pubads().collapseEmptyDivs(true);\n      googletag.pubads().disableInitialLoad();\n      googletag.enableServices();\n    });\n  /'</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Let us write a loop to fetch 2 pages of "best-books" from goodreads. Notice the use of a format string. This is an example of old-style python format strings</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [7]:</div>
<div class="inner_cell">
<div class="input_area">
<div class="highlight hl-ipython3"><pre><span></span><span class="n">URLSTART</span><span class="o">=</span><span class="s2">"https://www.goodreads.com"</span>
<span class="n">BESTBOOKS</span><span class="o">=</span><span class="s2">"/list/show/1.Best_Books_Ever?page="</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">3</span><span class="p">):</span>
    <span class="n">bookpage</span><span class="o">=</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
    <span class="n">stuff</span><span class="o">=</span><span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">URLSTART</span><span class="o">+</span><span class="n">BESTBOOKS</span><span class="o">+</span><span class="n">bookpage</span><span class="p">)</span>
    <span class="n">filetowrite</span><span class="o">=</span><span class="s2">"files/page"</span><span class="o">+</span> <span class="s1">'</span><span class="si">%02d</span><span class="s1">'</span> <span class="o">%</span> <span class="n">i</span> <span class="o">+</span> <span class="s2">".html"</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"FTW"</span><span class="p">,</span> <span class="n">filetowrite</span><span class="p">)</span>
    <span class="n">fd</span><span class="o">=</span><span class="nb">open</span><span class="p">(</span><span class="n">filetowrite</span><span class="p">,</span><span class="s2">"w"</span><span class="p">)</span>
    <span class="n">fd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">stuff</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
    <span class="n">fd</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
    <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="prompt"></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>FTW files/page01.html
FTW files/page02.html
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="2.-Parse-the-page,-extract-book-urls">2. Parse the page, extract book urls<a class="anchor-link" href="#2.-Parse-the-page,-extract-book-urls">¶</a></h2><p>Notice how we do file input-output, and use beautiful soup in the code below. The <code>with</code> construct ensures that the file being read is closed, something we do explicitly for the file being written. We look for the elements with class <code>bookTitle</code>, extract the urls, and write them into a file</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [8]:</div>
<div class="inner_cell">
<div class="input_area">
<div class="highlight hl-ipython3"><pre><span></span><span class="kn">from</span> <span class="nn">bs4</span> <span class="k">import</span> <span class="n">BeautifulSoup</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [9]:</div>
<div class="inner_cell">
<div class="input_area">
<div class="highlight hl-ipython3"><pre><span></span><span class="n">bookdict</span><span class="o">=</span><span class="p">{}</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">3</span><span class="p">):</span>
    <span class="n">books</span><span class="o">=</span><span class="p">[]</span>
    <span class="n">stri</span> <span class="o">=</span> <span class="s1">'</span><span class="si">%02d</span><span class="s1">'</span> <span class="o">%</span> <span class="n">i</span>
    <span class="n">filetoread</span><span class="o">=</span><span class="s2">"files/page"</span><span class="o">+</span> <span class="n">stri</span> <span class="o">+</span> <span class="s1">'.html'</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">"FTW"</span><span class="p">,</span> <span class="n">filetoread</span><span class="p">)</span>
    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">filetoread</span><span class="p">)</span> <span class="k">as</span> <span class="n">fdr</span><span class="p">:</span>
        <span class="n">data</span> <span class="o">=</span> <span class="n">fdr</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
    <span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s1">'html.parser'</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">soup</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">'.bookTitle'</span><span class="p">):</span>
        <span class="n">books</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'href'</span><span class="p">])</span>
    <span class="nb">print</span><span class="p">(</span><span class="n">books</span><span class="p">[:</span><span class="mi">10</span><span class="p">])</span>
    <span class="n">bookdict</span><span class="p">[</span><span class="n">stri</span><span class="p">]</span><span class="o">=</span><span class="n">books</span>
    <span class="n">fd</span><span class="o">=</span><span class="nb">open</span><span class="p">(</span><span class="s2">"files/list"</span><span class="o">+</span><span class="n">stri</span><span class="o">+</span><span class="s2">".txt"</span><span class="p">,</span><span class="s2">"w"</span><span class="p">)</span>
    <span class="n">fd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">books</span><span class="p">))</span>
    <span class="n">fd</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="prompt"></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>FTW files/page01.html
['/book/show/2767052-the-hunger-games', '/book/show/2.Harry_Potter_and_the_Order_of_the_Phoenix', '/book/show/2657.To_Kill_a_Mockingbird', '/book/show/1885.Pride_and_Prejudice', '/book/show/41865.Twilight', '/book/show/19063.The_Book_Thief', '/book/show/11127.The_Chronicles_of_Narnia', '/book/show/7613.Animal_Farm', '/book/show/18405.Gone_with_the_Wind', '/book/show/30.J_R_R_Tolkien_4_Book_Boxed_Set']
FTW files/page02.html
['/book/show/5470.1984', '/book/show/4989.The_Red_Tent', '/book/show/37435.The_Secret_Life_of_Bees', '/book/show/5.Harry_Potter_and_the_Prisoner_of_Azkaban', '/book/show/7171637-clockwork-angel', '/book/show/2187.Middlesex', '/book/show/2623.Great_Expectations', '/book/show/24583.The_Adventures_of_Tom_Sawyer', '/book/show/49552.The_Stranger', '/book/show/16299.And_Then_There_Were_None']
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Here is George Orwell's 1984</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [10]:</div>
<div class="inner_cell">
<div class="input_area">
<div class="highlight hl-ipython3"><pre><span></span><span class="n">bookdict</span><span class="p">[</span><span class="s1">'02'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="prompt output_prompt">Out[10]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>'/book/show/5470.1984'</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Lets go look at the first URLs on both pages</p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><img alt="" src="images/goodreads2.png"></p>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="3.-Parse-a-book-page,-extract-book-properties">3. Parse a book page, extract book properties<a class="anchor-link" href="#3.-Parse-a-book-page,-extract-book-properties">¶</a></h2><p>Ok so now lets dive in and get one of these these files and parse them.</p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [11]:</div>
<div class="inner_cell">
<div class="input_area">
<div class="highlight hl-ipython3"><pre><span></span><span class="n">furl</span><span class="o">=</span><span class="n">URLSTART</span><span class="o">+</span><span class="n">bookdict</span><span class="p">[</span><span class="s1">'02'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="n">furl</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="prompt output_prompt">Out[11]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>'https://www.goodreads.com/book/show/5470.1984'</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing text_cell rendered"><div class="prompt input_prompt">
</div><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><img alt="" src="images/goodreads3.png"></p>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [12]:</div>
<div class="inner_cell">
<div class="input_area">
<div class="highlight hl-ipython3"><pre><span></span><span class="n">fstuff</span><span class="o">=</span><span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">furl</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">fstuff</span><span class="o">.</span><span class="n">status_code</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="prompt"></div>
<div class="output_subarea output_stream output_stdout output_text">
<pre>200
</pre>
</div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [13]:</div>
<div class="inner_cell">
<div class="input_area">
<div class="highlight hl-ipython3"><pre><span></span><span class="n">d</span><span class="o">=</span><span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">fstuff</span><span class="o">.</span><span class="n">text</span><span class="p">,</span> <span class="s1">'html.parser'</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
</div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="prompt input_prompt">In [15]:</div>
<div class="inner_cell">
<div class="input_area">
<div class="highlight hl-ipython3"><pre><span></span><span class="n">d</span><span class="o">.</span><span class="n">prettify</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="output_wrapper">
<div class="output">
<div class="output_area">
<div class="prompt output_prompt">Out[15]:</div>
<div class="output_text output_subarea output_execute_result">
<pre>'<!DOCTYPE html>\n<html class="desktop ">\n <head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# good_reads: http://ogp.me/ns/fb/good_reads#">\n  <title>\n   1984 by George Orwell\n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n   \n   \n   \n   \n   \n   \n   \n   \n   \n   \n   \n   \n   \n   \n   \n   \n  \n \n \n  \n  
\n \n
\n
\n \n \n
\n
\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n Goodreads helps you keep track of books you want to read.\n
\n
\n Start by marking “1984” as Want to Read:\n
\n
\n
\n
\n
\n \n \n \n \n \n \n \n \n \n \n \n \n
\n
\n
\n
\n \n \n \n \n \n \n \n \n \n
\n \n
\n
    \n
  • \n \n
  • \n
  • \n \n
  • \n
  • \n \n
  • \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n 1984\n \n
\n
\n \n
\n
\n Enlarge cover\n
\n \n
\n
\n
\n
\n
\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n
\n
\n
\n \n \n \n \n \n \n \n \n \n \n
\n \n
\n
\n \n
\n Rate this book\n
\n
\n Clear rating\n
\n \n
\n
\n
\n \n Open Preview\n \n \n \n \n
\n
\n
\n

\n 1984\n

\n

\n

\n
\n \n by\n \n \n
\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n 4.16\n \n \n \n ·\n \n \n Rating details\n \n \n \n ·\n \n \n \n \n 2,416,552\n \n Ratings\n \n \n ·\n \n \n \n 53,688\n \n Reviews\n \n
\n
\n
\n \n Among the seminal texts of the 20th century,\n \n Nineteen Eighty-Four\n \n is a rare work that grows more haunting as its futuristic purgatory becomes more real. Published in 1949, the book offers political satirist George Orwell\'s nightmare vision of a totalitarian, bureaucratic world and one poor stiff\'s attempt to find individuality. The brilliance of the novel is Orwell\'s presc\n \n \n \n ...more\n \n
\n
\n \n
\n
\n \n Mass Market Paperback\n \n ,\n \n Signet Classics\n \n ,\n \n 328 pages\n \n
\n
\n Published\n July 1st 1950\n by New American Library\n \n (first published June 8th 1949)\n \n
\n
\n \n More Details...\n \n \n \n \n edit details\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n
\n
\n

\n Friend Reviews\n

\n
\n
\n
\n \n To see what your friends thought of this book,\n \n please sign up.\n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n

\n Reader Q&A\n

\n
\n
\n
\n
\n To ask other readers questions about\n \n 1984\n \n ,\n \n please sign up\n \n .\n
\n
\n
\n
\n Popular Answered Questions\n
\n
\n
\n \n
\n
\n
\n
\n \n \n \n
\n
\n \n \n Bigsoph\n \n \n \n \n \n Maybe you are expecting a happy ending, where love triumphs over all and human spirit is set free?\n
\n If this was made into a movie now, they would likely\n \n …more\n \n
\n
\n \n Maybe you are expecting a happy ending, where love triumphs over all and human spirit is set free?\n
\n If this was made into a movie now, they would likely cast Will Smith as Winston and have him overthrow the government and rescue Julia\n \n (less)\n \n
\n
\n
\n
\n
\n
\n
\n \n
    \n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n \n \n \n
\n
\n \n \n Niklas\n \n \n \n \n \n Big Brother is the personification of the state of Oceania. The book explains that the existence of Big Brother is necessary because it is easier to\n \n …more\n \n \n \n Big Brother is the personification of the state of Oceania. The book explains that the existence of Big Brother is necessary because it is easier to love a person than an organisation, and that the name "Big Brother" was selected because it plays on family loyalty.\n \n (less)\n \n \n \n \n
\n
\n
\n \n
    \n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n \n \n Books That Everyone Should Read At Least Once\n \n
\n
\n 19,317 books\n —\n 90,155 voters\n
\n
\n
\n
\n \n \n Best Books of the 20th Century\n \n
\n
\n 7,516 books\n —\n 47,874 voters\n
\n
\n
\n
\n
\n
\n \n More lists with this book...\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n

\n Community Reviews\n

\n
\n
\n
\n
\n
\n \n (showing 1-30)\n \n
\n
\n
\n \n \n \n \n Rating details\n \n \n
\n \n
\n \n \n Sort\n \n : Default\n \n \n
\n
\n \n |\n \n
\n
\n \n \n Filter\n \n \n \n
\n
\n
\n
\n
\n
\n \n \n Bill  Kerwin\n \n
\n
\n \n May 28, 2007\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n
\n This book is far from perfect. Its characters lack depth, its rhetoric is sometimes didactic, its plot (well, half of it anyway) was lifted from Zumyatin’s\n \n We\n \n , and the lengthy Goldstein treatise shoved into the middle is a flaw which alters the structure of the novel like a scar disfigures a face.\n
\n
\n But in the long run, all that does not matter, because George Orwell got it right.\n
\n
\n Orwell, a socialist who fought against Franco, watched appalled as the great Soviet experiment was reduced to a totalita\n
\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Silvana\n \n
\n
\n \n Dec 07, 2007\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n
\n \n Recommends it for:\n \n \n everyone\n \n
\n \n
\n
\n \n \n WAR IS PEACE.\n
\n
\n FREEDOM IS SLAVERY.\n
\n
\n IGNORANCE IS STRENGTH.\n
\n
\n Those words keep sounding in my head since I read this book. Gosh, probably the most haunting not to mention frightening book I\'ve ever read. 1984 should also be included in the horror genre.\n
\n
\n 1984 describes a Utopia. Not Thomas More\'s version of Utopia, but this is one is the antithesis, i.e. Dystopia. Imagine living in a country, whose leaders apply a totalitarian system in regulating their citizen, in the most extreme ways, which make Hitler\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Dave\n \n
\n
\n \n Jul 28, 2007\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n In George Orwell\'s 1984, Winston Smith is an open source developer who writes his code offline because his ISP has installed packet sniffers that are regulated by the government under the Patriot Act. It\'s really for his own protection, though. From, like, terrorists and DVD pirates and stuff. Like every good American, he drinks Coca-Cola and his processed food has desensitized his palate to all but four flavors: sweet, salty-so-that-you-will-drink-more-coca-cola, sweet, and Cooler Ranch!(tm). H\n \n \n \n ...more\n \n \n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n
\n
\n
\n \n \n John Wiswell\n \n
\n
\n \n Aug 12, 2007\n \n \n rated it\n \n \n it was ok\n \n \n \n \n \n \n \n \n \n \n
\n \n Recommends it for:\n \n \n Classics readers, political readers concerned with overreaches\n \n
\n
\n
\n \n \n \n 1984\n \n is not a particularly good novel, but it is a very good essay. On the novel front, the characters are bland and you only care about them because of the awful things they live through. As a novel all the political exposition is heavyhanded, and the message completely overrides any sense of storytelling. As an essay, the points it makes can be earthshaking. It seems everyone who has so much as gotten a parking ticket thinks he lives in a 1984-dystopia. Every administration that reaches for po\n \n \n \n ...more\n \n \n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Mohammed Arabey\n \n
\n
\n \n Feb 04, 2013\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n
\n
\n \n \n It\'s written 1948? Clearly History has its twisted ways to repeat itself..\n
\n A Note that MUST be written in the cover of every edition..\n
\n \n
\n لم اتوقع أن هذا التحذير\n \n "إن هذه الرواية تحذير وليست بدليل"\n \n بهذه الواقعية، مازالت الحكومات العربية تراقب الجميع لحماية أمن الحكام..بينما مازال أمن الأفراد هزيلا..منعدما\n
\n \n \n
\n هي الرواية التي كتبت في 1948 بعبقرية، أرسي بها جورج أورويل قواعد روايات الديستوبيا بحق\n
\n وإن كانت مستوحاه من واقع محيط به ولكن التاريخ دائما يجد وسيلة ليتكرر ويزيد وينتشر ويتوغل\n
\n هي رواية مازالت صداها في\n
\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Huda Yahya\n \n
\n
\n \n Apr 19, 2012\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n
\n Shelves:\n \n dystopia\n \n ,\n \n history\n \n ,\n \n favourites\n \n
\n
\n
\n \n \n \n حدثني عن القهر\n
\n عن الاستعباد\n
\n عن الذل\n
\n ثم حدثني بأدق التفاصيل\n
\n عن مراحل تقويض الكائن الانساني\n
\n
\n حدثني كثيرا وطويلا كي أعي هذا الدمار\n
\n كي أتشربه\n
\n كي أدميه في لحمي وأعصابي نصلا حادا طويلا\n
\n كي أنزف روحي بكاء\n
\n كي أتعلم شيئا نافعا قبل أن أغادر هذا العالم البائس\n
\n
\n حدثني يا أورويل فما أشهى وجع حديثك\n
\n وما أشهى ألم المعرفة النازف\n
\n
\n \n الحرب هي السلام\n \n
\n
\n ما المفترض علي فعله\n
\n الكتابة عن نفسي أم عن وطني أم عن وينستون\n
\n بمن أبدأ\n
\n
\n ولكن مهلا\n
\n لما التفرقة..؟\n
\n كلنا واحد\n
\n أنا.. وينستون..جوليا..أنت..\n
\n بقعة الأرض التي تنتمي إليها روحك\n
\n وتدعوها وطن\n
\n
\n \n الحريّة هي العبود\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Obaydah Amer\n \n
\n
\n \n Jun 03, 2011\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n
\n
\n \n \n هذه الرواية لا يجب أن تصنف أدبا ورواية وفنا وحسب ..\n
\n كلا .. هي أكبر من ذلك,\n
\n يجب أن تدرس كحالة مستعصية في علم الاجتماع وعلم السياسة .. !\n
\n
\n عندما كتب جورج أورويل في الـ1949 كان متشائما لدرجة بعيدة لما سيؤول له العالم بعد 1984\n
\n نعم، ربما لم يتحقق ما قاله أورويل تماما، لكنه قد حصل بصورة موازية وشبيهة في كثير من المجتمعات الشمولية\n
\n لنتحدث عن سوريا مثلا :\n
\n
\n الأخ الكبير : القائد الملهم والرئيس الشاب الدكتور بشار الأسد ..\n
\n أعضاء الحزب الداخلي: بعض المكونات الرئيسية للنظام\n
\n أعضاء الحزب الخارجي: حزب البعث\n
\n العامة: 19 مليون\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Stephen\n \n
\n
\n \n Aug 28, 2008\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n \n Photobucket\n \n
\n I am a big fan of speculative fiction and in my literary travels I have encountered a myriad of dystopias, anti-utopias and places and societies that make one want to scream and.....\n \n Photobucket\n \n
\n ...\n \n (with or without contemporaneous loss of bladder and other bodily functions)\n \n ....\n
\n
\n Simply put, George Orwell\'s 1984 is unquestionably the most memorable and\n \n MOST DISTURBING\n \n vision of a world gone\n \n mad\n \n utterly bat-shit psycho that I have ever experienced. Ever!!! Despite being published back in 1948, I have yet to f\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Maria\n \n
\n
\n \n Feb 19, 2018\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n
\n Shelves:\n \n favorites\n \n
\n
\n
\n \n \n I\'m gonna ask myself a mandatory question and say nothing more.\n
\n
\n Why the fuck had I not read this book before?\n
\n
\n
\n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Lyndsey\n \n
\n \n
\n \n \n \n YOU. ARE. THE. DEAD.\n \n Oh my God. I got the chills so many times toward the end of this book. It completely blew my mind. It managed to surpass my high expectations AND be nothing at all like I expected. Or in Newspeak "Double Plus Good."\n
\n
\n Let me preface this with an apology. If I sound stunningly inarticulate at times in this review, I can\'t help it. My mind is completely fried.\n
\n
\n \n This book is like the dystopian Lord of the Rings\n \n , with its richly developed culture and economics, not to mention a fully\n
\n
\n
\n
\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Amanda\n \n
\n
\n \n Jun 06, 2008\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n \n
\n
\n \n \n I\'ve put off writing a review for 1984 because it\'s simply too daunting to do so. I liked 1984 even better after a second reading (bumping it up from a 4 star to a 5 star) because I think that, given the complexity of the future created by Orwell, multiple readings may be needed to take it all in. I thought it was genius the first time and appreciated that genius even more the second time.\n
\n
\n Orwell had a daunting task: creating a future nearly half a century away from the time period in which he w\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Leo .\n \n
\n
\n \n Nov 09, 2017\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n Is Orwell turning in his grave? Does his epitaph read. "I fucking warned you! Don\'t say I never told you so! "\n
\n
\n Did he have a crystal fucking ball?\n
\n
\n ***\n
\n If you want truth, go out and see\n
\n
\n Not like in 1984, Richard Burton on TV\n
\n
\n Orwell must have been psychic, or was he in the know\n
\n
\n Cos\' what\'s going on in the world clearly shows\n
\n
\n That humanity is programmed through a TV screen\n
\n
\n Since its conception, its all its ever been\n
\n
\n News, films, dramas, sports, soaps and cartoons\n
\n
\n Leaving the masses wide eyed, like Buffoons\n
\n
\n Al\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Starjustin\n \n
\n
\n \n Mar 27, 2017\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n What can I possibly say about this amazing novel, 1984 by George Orwell, that hasn\'t been already said by many who have read the book for over half a century. When it is said that the book is \'haunting\', \'nightmarish\', and \'startling\' any reader would have to agree! This well known novel grips the reader from the beginning and does not even let go of the grip at the finished reading. A classic you won\'t want to miss if you haven\'t taken the time to read it yet.\n
\n I actually listened to this novel\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n فهد الفهد\n \n
\n
\n \n Jun 25, 2011\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n \n
\n
\n \n \n 1984\n
\n
\n نشر أورويل هذه الرواية بعد أربع سنوات من نشر روايته (مزرعة الحيوان)، كانت مزرعة الحيوان عن انهيار الحلم الشيوعي، وتبعثره على يد الستالينيين، وكيف أن المجتمع الشيوعي أصبح أسوأ بكثير من المجتمع الرأسمالي، وأنه ليست أرزاق العمال والفلاحين المهددة الآن تحت ظل الشيوعية، وإنما حرياتهم بل وحياتهم ذاتها.\n
\n
\n احتاج النظام السوفييتي إلى عشرين عام فقط بعد الثورة (1917 – 1937 م)، ليظهر أقبح وجوهه في مهزلة محاكمات موسكو، وحملات التطهير التي قتل فيها ما يقترب من المليوني شخص في الاتحاد السوفييتي، فلذا عندما\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Joe\n \n
\n
\n \n Feb 08, 2017\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n Social media is a cage full of starved rats and all of us have our heads stuck in there now, like it or not.\n \n \n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n أحمد أبازيد Ahmad Abazed\n \n
\n
\n \n Jan 29, 2011\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n Shelves:\n \n favorites\n \n
\n
\n
\n \n \n " انتبه : الأخ الكبير يراقبك ! "\n
\n رواية صادمة تمثّل زلزالا للوعي و اللاوعي لدى كلّ من يقرؤها , هذه الرواية التي شكّلت ذروة عبقريّة جورج أورويل و تأثّره و تأثيره في الوعي العالمي و التاريخ الحديث .\n
\n الأخ الكبير عند أورويل له مرادفات في ثقافتنا العربية الراهنة ( مثلا : الأخ العقيد , القائد الخالد , القائد الضرورة ... أو الرئيس الشاب ! ) هذا الاختلاف في التسمية هو توافق عبقريّ في مأساويّته لدولة " أوقيانيا " مع دول مثل : سورية و العراق و ليبيا و تونس و مصر ... و باقي أنظمتنا الثوريّة التي تأكلنا و تفتّ\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Ahmad Sharabiani\n \n
\n
\n \n Oct 11, 2008\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n \n
\n
\n \n \n 547. Nineteen Eighty-Four, George Orwell\n
\n Nineteen Eighty-Four, often published as 1984, is a dystopian novel published in 1949 by English author George Orwell. The novel is set in Airstrip One, formerly Great Britain, a province of the superstate Oceania, whose residents are victims of perpetual war, omnipresent government surveillance and public manipulation. Oceania\'s political ideology, euphemistically named English Socialism (shortened to "Ingsoc" in Newspeak, the government\'s invented langua\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Ahmed\n \n
\n
\n \n Apr 25, 2014\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n
\n Shelves:\n \n إنجليزى\n \n
\n
\n
\n \n \n \n
\n كل ما تشوف صور الرئيس السيسي حفظه الله في الشوارع، اِفتكر الأخ الكبير، وكل ما تشوف الإعلام المصري، افتكر الرواية دي، وغسيل الدماغ اللي بيُمارس علينا.\n
\n
\n المؤسسات المصرية أصبحت بتمنح مميزات لمن يبلغ عن (إرهابي) , من فرص\n
\n عمل وخلافه , يعني بورقة وسخة ممكن تأذي جارك اللي بينك وبينه مشاكل أو تسجن زميلك اللي بينافسك في الشغل , يعني المفروض نعمل لجورج أورويل مقام ونطوف حواليه.\n
\n
\n الرواية التى سنسجن بسببها جميعا , ونوضع فى الأغلال ويركبونا على حمير بالشقلوب ويلفوا بينا الأسواق وهاتف يهتف : انهم يدعون إلى دين جد\n
\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Bookdragon Sean \n \n
\n
\n \n Jun 09, 2018\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n
\n \n Recommends it for:\n \n \n fahrenheit 451 fans\n \n
\n
\n Shelves:\n \n 4-star-reads\n \n
\n
\n
\n \n \n \n \n \n “The best books... are those that tell you what you know already.”\n \n \n \n
\n
\n Just about everything Orwell says in\n \n 1984\n \n is a maniacal truism. In some twisted form, everything reflects the truth of reality.\n
\n
\n Of course there are exaggerations, though nothing is far from plausibility. We are controlled by our governments, and often in ways we are not consciously aware of. Advertisements, marketing campaigns and political events are all designed for us to elicit a certain response and think in a desired way\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Lisa\n \n
\n
\n \n Jun 25, 2014\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n \n
\n
\n \n \n Doubleplusgood Maxitruth in Oldspeak on Doublethink and Crimestop!\n
\n
\n (Translation from Newspeak: Excellent, accurate analysis of oppressive, selective society in well-written Standard English reflecting on the the capacity to hold two contradictory opinions for truth at the same time and on the effectiveness of protective stupidity as a means to keep a power structure stable.)\n
\n
\n There is not much left to say about this prophetic novel by Orwell which has not been said over and over again since its pu\n
\n
\n
\n
\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n James Trevino\n \n
\n
\n \n Sep 21, 2017\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n I reread this recently, knowing my mind from a few years ago is different from my mind now. But it was surprisingly just as scary! Maybe even more so, if that is possible!!\n
\n
\n I wonder if there is someone who has read 1984 and has not felt angry and helpless. It is a good book. It is so good that it made my want to throw away my Kindle. And that is a lot, considering the last time that happened was when I read about The Red Wedding in George R.R. Martin\'s series.\n
\n
\n I also wonder if this world Orwell d\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Emily May\n \n
\n
\n \n Dec 05, 2010\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n \n
\n
\n \n \n This was the book that started my love affair with the dystopian genre. And maybe indirectly influenced my decision to do a politics degree. I was only 12 years old when I first read it but I suddenly saw how politics could be taken and manipulated to tell one hell of a scary and convincing story. I\'m a lot more well-read now but, back then, this was a game-changer. I started to think about things differently. I started to think about 2 + 2 = 5 and I wanted to read more books that explored the i\n \n \n \n ...more\n \n \n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Joey Woolfardis\n \n
\n
\n \n Jan 12, 2016\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n
\n Shelves:\n \n 2016\n \n ,\n \n bookshelf\n \n ,\n \n ce20\n \n ,\n \n champion\n \n ,\n \n masculine\n \n ,\n \n sterling\n \n
\n
\n
\n \n \n Read as part of\n \n The Infinite Variety Reading Challenge\n \n , based on the BBC\'s Big Read Poll of 2003.\n
\n
\n \n \n "For, if leisure and security were enjoyed by all alike, the great mass of human beings who are normally stupefied by poverty would become literate and would learn to think for themselves; and when once they had done this, they would sooner or later realise that the privileged minority had no function, and they would sweep it away."\n \n \n
\n
\n \n Nineteen Eighty-Four\n \n is an insanely relevant novel in this day and\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Cecily\n \n
\n
\n \n May 30, 2008\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n \n Revised for 2017, with added "Alternative Facts"\n \n
\n
\n \n
\n \n Ten Shades of Grey?\n \n
\n
\n The colour of this book is grey, relentless grey: of skin, sky, food, floor, walls, mind, life itself. Added piquancy comes from general decay, drudgery, exploitation, chronic sickness, and malaise.\n
\n
\n There is also sex and (non-sexual) bondage, domination, and torture.\n
\n
\n I don’t expect a dystopian book to be happy reading, but this reread was far grimmer than I remembered it, partly because I read it immediately after the lyrical beaut\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Amira Mahmoud\n \n
\n
\n \n Mar 02, 2014\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n \n
\n
\n \n \n 2+2=5\n
\n وعليك اعتباراها حقيقة مُسلم بها\n
\n كيف؟\n
\n والمنطق والفكر والمعادلات والدلائل والتاريخ ...إلخ، ليس لها قيمة إذن\n
\n \n فالجهل هو القوة\n \n
\n وأين حريتك في التفكير والرأي!\n
\n \n الحرية هي العبودية\n \n
\n أسوء أنواع القمع، هي تلك التي تُمارس على العقل والتفكير\n
\n \n أنت لا تملك سوى تلك السنتيميترات المربعة في جمجمتك\n \n
\n لكن حتى تلك، تنوء بِحملها\n
\n وملكيتها تُشكل لك خوف ورهبة من أن يظهر ما تُفكر فيه في انفعالاتك\n
\n أو على صفحة وجهك أو لغة جسدك\n
\n أو حتى أن ينطبع لديك في اللاوعي\n
\n فيُصبح أخشى ما تخشاه أن تهلوس به أثناء نومك\n
\n قمع فكري، وعملية غسيل للمخ و\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Zoë\n \n
\n
\n \n May 02, 2015\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n Goodness gracious this was very unsettling. I\'m already a pretty paranoid person, so the idea of Big Brother was both very intriguing but also extremely frightening.\n
\n I really enjoyed reading this, but there were moments when I wasn\'t invested in the story and wanted to take a break from it, mostly in the last half of the book. Still DEFINITELY worth the read, though!\n
\n
\n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Lyn\n \n
\n
\n \n Aug 02, 2011\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n “It was a bright cold day in April, and the clocks were striking thirteen.”\n
\n
\n This changed the way that I looked at ideologies and changed the way I looked at leadership. Cynical, scathing, and not without its flaws, this is still a stark, haunting glimpse at what could be.\n
\n
\n “War is peace.\n
\n Freedom is slavery.\n
\n Ignorance is strength.”\n
\n
\n Chilling.\n
\n
\n The closing lines still come to me sometimes and remind me of depths that I can only imagine.\n
\n
\n “He gazed up at the enormous face. Forty years it had taken him to\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Joe Valdez\n \n
\n
\n \n Jan 13, 2017\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n
\n Shelves:\n \n sci-fi-general\n \n
\n
\n
\n \n \n My preparedness for the regime change taking place in the United States--with elements of the Electoral College, the Kremlin and the FBI helping to install a failed business promoter who the majority of American voters did not support in the election--begins with\n \n 1984\n \n by George Orwell. Like many, this 1949 novel was assigned reading for me in high school. What stood out to me then was that I needed to finish it because there would be a test. Studying how civics is supposed to work in 3rd period\n \n \n \n ...more\n \n \n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Nawal Al-Qussyer\n \n
\n
\n \n May 27, 2010\n \n \n rated it\n \n \n it was amazing\n \n \n \n \n \n \n \n \n \n \n \n
\n
\n \n \n أضفت مراجعة الرواية في مدونتي ضمن 50 كتاب غيرني\n
\n \n http://www.nawalsaad.com/?p=4108\n \n
\n وأضفت رابط لتقرير رحالة قام بزيارة كوريا الشمالية، وكان صادمًا تطابق استراتيجات الرواية مع ما يفعل في كوريا الشمالية.\n
\n
\n المجد للكتب التي تحدث ضجة داخل عقلك.. المجد للكتب التي تحفز اليقطة الذهنية فيك .. المجد للكتاب الذي يصفعك ويجعلك تعيد النظر في أشياء كثيرة .. هذا الكتاب من الكتب النادرة التي توازي شهرتها وضجتها قيمتها الفكرية والروائية ..\n
\n
\n قرأت الكتاب في الطائرة رحلة عودة من امستردام إلى الرياض .. استغرق معي الكتاب 6 ساع\n
\n
\n
\n
\n
\n
\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n Antonio\n \n
\n
\n \n Jul 05, 2014\n \n \n rated it\n \n \n really liked it\n \n \n \n \n \n \n \n \n \n \n \n ·\n \n \n review of another edition\n \n
\n \n Recommends it for:\n \n \n Todos\n \n
\n \n
\n
\n \n \n Este libro me sigue atormentando...\n
\n
\n Antes de hablar de la historia, primero quiero hablar un poco sobre géneros literarios, si yo se debería enfocarme en el libro pero denme un momento, nunca he sido fanático del género del terror, por dos razones: asustarse para divertirse no me parece muy lógico, y la otra razón es que siempre hay una vocecita en mi cabeza que, cuando se trata de monstruos, fantasmas, demonios, vampiros, posesiones, etc. me dice esto no es real, como te vas a asustar de lo que\n
\n \n \n ...more\n \n
\n
\n
\n \n \n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n « previous\n \n \n 1\n \n \n \n 3\n \n \n 4\n \n \n 5\n \n \n 6\n \n \n 7\n \n \n 8\n \n \n 9\n \n \n …\n \n \n next »\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n topics\n \n posts\n \n views\n \n last activity\n \n
\n \n \n Great American Re...:\n \n \n \n September 2018—1984 by George Orwell\n \n \n 40\n \n 20\n \n 5 hours, 55 min ago\n \n
\n \n \n UCAS English 11 R...:\n \n \n \n September Reading Assignment\n \n \n 1\n \n 5\n \n Sep 30, 2018 06:59PM\n \n
\n \n \n Fantasy Buddy Reads:\n \n \n \n 1984 [October 22, 2018]\n \n \n 5\n \n 21\n \n Sep 29, 2018 12:28PM\n \n
\n \n what would happen to you if you were in 101 room\n \n \n 34\n \n 350\n \n Sep 29, 2018 04:58AM\n \n
\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n \n \n \n
\n
\n
\n
\n \n
\n
\n \n
\n
\n
\n
\n
    \n
  • \n \n The Dispossessed\n \n \n \n \n
  • \n
  • \n \n Waiting for Godot\n \n \n \n
  • \n
  • \n \n Notes from Underground, White Nights, The Dream of a Ridiculous Man, and Selections from The House of the Dead\n \n \n
  • \n
  • \n \n Welcome to the Monkey House\n \n \n
  • \n
  • \n \n A Modest Proposal\n \n \n
  • \n
  • \n \n Brideshead Revisited: The Sacred and Profane Memories of Captain Charles Ryder\n \n \n
  • \n
  • \n \n The Iron Heel\n \n \n
  • \n
  • \n \n Brave New World\n \n \n
  • \n
  • \n \n Coriolanus\n \n \n
  • \n
  • \n \n The Waves\n \n \n
  • \n
  • \n \n A Passage to India\n \n \n
  • \n
  • \n \n The Phantom of the Opera\n \n \n
  • \n
  • \n \n The Player of Games (Culture, #2)\n \n \n
  • \n
  • \n \n The Space Merchants (The Space Merchants, #1)\n \n \n
  • \n
  • \n \n Transmetropolitan, Vol. 2: Lust for Life (Transmetropolitan, #2)\n \n \n
  • \n
  • \n \n The Master and Margarita\n \n \n
  • \n
  • \n \n In the Country of Last Things\n \n \n
  • \n
  • \n \n The Tin Drum\n \n \n
  • \n
\n
\n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n \n 1984 trailer\n \n \n \n
\n
\n \n 5 comments\n \n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n

\n \n Genres\n \n

\n
\n
\n
\n
\n \n \n
\n
\n
\n
\n \n \n
\n
\n
\n
\n \n \n
\n
\n
\n \n \n See top shelves…\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n
\n \n
\n 26,745\nfollowers\n
\n
\n
\n
\n \n \n Eric Arthur Blair\n \n , better known by his pen name\n \n George Orwell\n \n , was an English author and journalist. His work is marked by keen intelligence and wit, a profound awareness of social injustice, an intense opposition to totalitarianism, a passion for clarity in language, and a belief in democratic socialism.\n
\n
\n In addition to his literary career Orwell served as a police officer with the Indian Imperial\n
\n
\n
\n \n \n ...more\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n \n Animal Farm\n \n
\n
\n \n Animal Farm / 1984\n \n
\n
\n \n Down and Out in Paris and London\n \n
\n
\n \n Homage to Catalonia\n \n
\n
\n \n Burmese Days\n \n
\n
\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n 217 trivia questions\n
\n 9 quizzes\n
\n
\n \n More quizzes & trivia...\n \n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n \n “Perhaps one did not want to be loved so much as to be understood.”\n \n \n —\n \n 12451 likes\n \n \n
\n
\n
\n
\n \n “Who controls the past controls the future. Who controls the present controls the past.”\n \n \n —\n \n 8038 likes\n \n \n
\n
\n
\n \n More quotes…\n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n \n \n \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n \n
\n
\n
\n \n
\n \n \n \n
\n
\n \n \n \n \n \n \n\n\n' In [14]: d.select("meta[property='og:title']")[0]['content'] Out[14]: '1984' Lets get everything we want... In [15]: d=BeautifulSoup(fstuff.text, 'html.parser') print( "title", d.select_one("meta[property='og:title']")['content'],"\n", "isbn", d.select("meta[property='books:isbn']")[0]['content'],"\n", "type", d.select("meta[property='og:type']")[0]['content'],"\n", "author", d.select("meta[property='books:author']")[0]['content'],"\n", "average rating", d.select_one("span.average").text,"\n", "ratingCount", d.select("meta[itemprop='ratingCount']")[0]["content"],"\n", "reviewCount", d.select_one("span.count")["title"] ) title 1984 isbn 9780451524935 type books.book author https://www.goodreads.com/author/show/3706.George_Orwell average rating 4.16 ratingCount 2402209 reviewCount 53335 Ok, now that we know what to do, lets wrap our fetching into a proper script. So that we dont overwhelm their servers, we will only fetch 5 from each page, but you get the idea... We'll segue of a bit to explore new style format strings. See https://pyformat.info for more info. In [16]: "list{:0>2}.txt".format(3) Out[16]: 'list03.txt' In [17]: a = "4" b = 4 class Four: def __str__(self): return "Fourteen" c=Four() In [18]: "The hazy cat jumped over the {} and {} and {}".format(a, b, c) Out[18]: 'The hazy cat jumped over the 4 and 4 and Fourteen' 4. Set up a pipeline for fetching and parsing¶Ok lets get back to the fetching... In [19]: fetched=[] for i in range(1,3): with open("files/list{:0>2}.txt".format(i)) as fd: counter=0 for bookurl_line in fd: if counter > 4: break bookurl=bookurl_line.strip() stuff=requests.get(URLSTART+bookurl) filetowrite=bookurl.split('/')[-1] filetowrite="files/"+str(i)+"_"+filetowrite+".html" print("FTW", filetowrite) fd=open(filetowrite,"w", encoding='utf-8') fd.write(stuff.text) fd.close() fetched.append(filetowrite) time.sleep(2) counter=counter+1 print(fetched) FTW files/1_2767052-the-hunger-games.html FTW files/1_2.Harry_Potter_and_the_Order_of_the_Phoenix.html FTW files/1_2657.To_Kill_a_Mockingbird.html FTW files/1_1885.Pride_and_Prejudice.html FTW files/1_41865.Twilight.html FTW files/2_5470.1984.html FTW files/2_4989.The_Red_Tent.html FTW files/2_37435.The_Secret_Life_of_Bees.html FTW files/2_5.Harry_Potter_and_the_Prisoner_of_Azkaban.html FTW files/2_7171637-clockwork-angel.html ['files/1_2767052-the-hunger-games.html', 'files/1_2.Harry_Potter_and_the_Order_of_the_Phoenix.html', 'files/1_2657.To_Kill_a_Mockingbird.html', 'files/1_1885.Pride_and_Prejudice.html', 'files/1_41865.Twilight.html', 'files/2_5470.1984.html', 'files/2_4989.The_Red_Tent.html', 'files/2_37435.The_Secret_Life_of_Bees.html', 'files/2_5.Harry_Potter_and_the_Prisoner_of_Azkaban.html', 'files/2_7171637-clockwork-angel.html'] Ok we are off to parse each one of the html pages we fetched. We have provided the skeleton of the code and the code to parse the year, since it is a bit more complex...see the difference in the screenshots above. In [20]: import re yearre = r'\d{4}' def get_year(d): if d.select_one("nobr.greyText"): return d.select_one("nobr.greyText").text.strip().split()[-1][:-1] else: thetext=d.select("div#details div.row")[1].text.strip() rowmatch=re.findall(yearre, thetext) if len(rowmatch) > 0: rowtext=rowmatch[0].strip() else: rowtext="NA" return rowtext ExerciseYour job is to fill in the code to get the genres. In [21]: def get_genres(d): # your code here genres=d.select("div.elementList div.left a") glist=[] for g in genres: glist.append(g['href']) return glist In [22]: listofdicts=[] for filetoread in fetched: print(filetoread) td={} with open(filetoread) as fd: datext = fd.read() d=BeautifulSoup(datext, 'html.parser') td['title']=d.select_one("meta[property='og:title']")['content'] td['isbn']=d.select_one("meta[property='books:isbn']")['content'] td['booktype']=d.select_one("meta[property='og:type']")['content'] td['author']=d.select_one("meta[property='books:author']")['content'] td['rating']=d.select_one("span.average").text td['ratingCount']=d.select_one("meta[itemprop='ratingCount']")["content"] td['reviewCount']=d.select_one("span.count")["title"] td['year'] = get_year(d) td['file']=filetoread glist = get_genres(d) td['genres']="|".join(glist) listofdicts.append(td) files/1_2767052-the-hunger-games.html files/1_2.Harry_Potter_and_the_Order_of_the_Phoenix.html files/1_2657.To_Kill_a_Mockingbird.html files/1_1885.Pride_and_Prejudice.html files/1_41865.Twilight.html files/2_5470.1984.html files/2_4989.The_Red_Tent.html files/2_37435.The_Secret_Life_of_Bees.html files/2_5.Harry_Potter_and_the_Prisoner_of_Azkaban.html files/2_7171637-clockwork-angel.html In [23]: listofdicts[0] Out[23]: {'title': 'The Hunger Games (The Hunger Games, #1)', 'isbn': '9780439023481', 'booktype': 'books.book', 'author': 'https://www.goodreads.com/author/show/153394.Suzanne_Collins', 'rating': '4.33', 'ratingCount': '5491176', 'reviewCount': '160373', 'year': '2008', 'file': 'files/1_2767052-the-hunger-games.html', 'genres': '/genres/young-adult|/genres/fiction|/genres/science-fiction|/genres/dystopia|/genres/fantasy|/genres/science-fiction'} Finally lets write all this stuff into a csv file which we will use to do analysis. In [24]: df = pd.DataFrame.from_records(listofdicts) df.head() Out[24]: .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } author booktype file genres isbn rating ratingCount reviewCount title year 0 https://www.goodreads.com/author/show/153394.S... books.book files/1_2767052-the-hunger-games.html /genres/young-adult|/genres/fiction|/genres/sc... 9780439023481 4.33 5491176 160373 The Hunger Games (The Hunger Games, #1) 2008 1 https://www.goodreads.com/author/show/1077326.... books.book files/1_2.Harry_Potter_and_the_Order_of_the_Ph... /genres/fantasy|/genres/young-adult|/genres/fi... 9780439358071 4.48 2030257 33033 Harry Potter and the Order of the Phoenix (Har... 2003 2 https://www.goodreads.com/author/show/1825.Har... books.book files/1_2657.To_Kill_a_Mockingbird.html /genres/classics|/genres/fiction|/genres/histo... 9780061120084 4.27 3722962 79058 To Kill a Mockingbird (To Kill a Mockingbird, #1) 1960 3 https://www.goodreads.com/author/show/1265.Jan... books.book files/1_1885.Pride_and_Prejudice.html /genres/classics|/genres/fiction|/genres/romance 9780679783268 4.25 2438138 54013 Pride and Prejudice 1813 4 https://www.goodreads.com/author/show/941441.S... books.book files/1_41865.Twilight.html /genres/young-adult|/genres/fantasy|/genres/ro... 9780316015844 3.58 4262416 97797 Twilight (Twilight, #1) 2005 In [25]: df.to_csv("files/meta.csv", index=False, header=True)