Lecture 16

Data Structures I

Tuesday, October 29th 2019

Containers Re-cap

  • Last time, you learned about Docker containers
  • These were motivated thru virtual environments and virutal machines
  • Containers provide more isolation than virtual environments
  • Containers provide less isolation than virtual machines, but are much more lightweight
  • In the lecture exercise, you created a Docker container
  • You also learned:
    • How to pull a Docker image from DockerHub
    • How to run that Docker image
  • Let's do a quick demo

Download and Run Docker Container

$ docker pull pavlosprotopapas/secret 
$ docker run -i -t pavlosprotopapas/secret
$ apt-get update
$ apt-get -y install vim

Edit File in Container

$ vim secret.sh
# Make Edits
$ chmod +x secret.sh
$ ./secret.sh
$ exit

Save Changes to Container

$ docker ps -a
$ docker commit  pavlosprotopapas/secret

Docker Container Workflow Summary

  • For developers, create a Dockerfile on Github
  • Developers should clone the repo and pull the Docker image
  • They will make changes in the repo

  • For clients (consumers), only provide the Docker image on Dockerhub.

  • They just want to run the container, not modify it.

Data Structures

  • Computer programs don't only perform calculations; they also store and retrieve information
  • Data structures and the algorithms that operate on them are at the core of computer science
  • Data structures are quite general
    • Any data representation and associated operations
    • e.g. integers, floats, arrays, classes, ...
  • Need to develop a "toolkit" of data structures and know when/how to use the right one for a given problem

Changing a data structure in a slow program can work the same way an organ transplant does in a sick patient. Important classes of abstract data types such as containers, dictionaries, and priority queues, have many different but functionally equivalent data structures that implement them.

Changing the data structure does not change the correctness of the program, since we presumably replace a correct implementation with a different correct implementation. However, the new implementation of the data type realizes different tradeoffs in the time to execute various operations, so the total performance can improve dramatically.

Like a patient in need of a transplant, only one part might need to be replaced in order to fix the problem.

-Steven S Skiena. The Algorithm Design Manual

Common data structures

  • Lists
  • Stacks/queues
  • Hashes
  • Heaps
  • Trees

We'll focus on lists today.

We'll tour some data structures in Python.

First up: sequences.

Sequences and their Abstractions

What is a sequence?

Consider the notion of Abstract Data Types.

The idea there is that one data type might be implemented in terms of another, or some underlying code, not even in Python.

As long as the interface and contract presented to the user is solid, we can change the implementation below.

The dunder methods in Python are used towards this purpose.

In Python a sequence is something that follows the "sequence protocol". An example of this is a Python list.

This entails defining the __len__ and __getitem__ methods, as we mentioned in previous lectures.

Example

In [1]:
alist = [1,2,3,4]
len(alist) # calls alist.__len__
Out[1]:
4
In [2]:
alist[2] # calls alist.__getitem__(2)
Out[2]:
3

Lists also support slicing

In [3]:
alist[2:4]
Out[3]:
[3, 4]

How does this work?

We will create a dummy sequence, which does not create any storage.

In [4]:
class DummySeq:
    # It just implements the protocol.
    def __len__(self):
        return 42
    
    def __getitem__(self, index):
        return index
In [5]:
d = DummySeq()
len(d)
Out[5]:
42
In [6]:
d[5]
Out[6]:
5
In [7]:
d[67:98]
Out[7]:
slice(67, 98, None)

The "slice object"

Slicing creates a slice object for us of the form slice(start, stop, step) and then Python calls seq.__getitem__(slice(start, stop, step)).

Two-dimensional slicing is also possible.

In [8]:
d[67:98:2,1]
Out[8]:
(slice(67, 98, 2), 1)
In [9]:
d[67:98:2,1:10]
Out[9]:
(slice(67, 98, 2), slice(1, 10, None))
In [10]:
# Adapted from Example 10-6 from Fluent Python
import numbers # See https://www.python.org/dev/peps/pep-3141/
import reprlib # like repr but w/ limits on sizes of returned strings

class NewSeq:
    def __init__(self, iterator):
        self._storage = list(iterator)
        
    def __repr__(self):
        components = reprlib.repr(self._storage)
        components = components[components.find('['):]
        return 'NewSeq({})'.format(components)

    def __len__(self):
        return len(self._storage)
     
    def __getitem__(self, index):
        cls = type(self)
        if isinstance(index, slice):
            return cls(self._storage[index])
        elif isinstance(index, numbers.Integral): 
            return self._storage[index]
        else:
            msg = '{cls.__name__} indices must be integers' 
            raise TypeError(msg.format(cls=cls))
In [11]:
d2 = NewSeq(range(10))
len(d2)
Out[11]:
10
In [12]:
repr(d2)
Out[12]:
'NewSeq([0, 1, 2, 3, 4, 5, ...])'
In [13]:
d2
Out[13]:
NewSeq([0, 1, 2, 3, 4, 5, ...])
In [14]:
d[4]
Out[14]:
4
In [15]:
d2[2:4]
Out[15]:
NewSeq([2, 3])
In [16]:
d2[1,4]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in 
----> 1 d2[1,4]

 in __getitem__(self, index)
     23         else:
     24             msg = '{cls.__name__} indices must be integers'
---> 25             raise TypeError(msg.format(cls=cls))

TypeError: NewSeq indices must be integers

Linked Lists

  • Remember, a name in Python points to its value.
  • We've seen lists whose last element is actually a pointer to another list.
  • This leads to the idea of a linked list, which we'll use to illustrate sequences.

Nested Pairs

Berkeley CS61a: Nested Pairs, this is the box and pointer notation.

In Python:

In [17]:
pair = (1,2)

This representation lacks a certain power. A few generalizations:

  • pair = (1, (2, None))
  • linked_list = (1, (2, (3, (4, None))))

The second example leads to something like: Recursive Lists.

Here's what things look like in PythonTutor: PythonTutor Example.

Quick Linked List implementation

In [18]:
empty_ll = None

def make_ll(first, rest): # Make a linked list
    return (first, rest)

def first(ll): # Get the first entry of a linked list
    return ll[0]

def rest(ll): # Get the second entry of a linked list
    return ll[1]

ll_1 = make_ll(1, make_ll(2, make_ll(3, empty_ll))) # Recursively generate a linked list

my_ll = make_ll(10,ll_1) # Make another one
my_ll
Out[18]:
(10, (1, (2, (3, None))))
In [19]:
print(first(my_ll), "     ", rest(my_ll), "     ", first(rest(my_ll)))
10       (1, (2, (3, None)))       1

Some reasons for linked lists:

  • You allocate memory only when you want to use it.
  • Inserting a new element is cheaper than in a fixed size array
  • Gateway to other pointer-like and hierarchical structures.

Comments about linked lists:

  • Not so useful in Python but can be useful in C/C++
  • There are singly-linked lists and doubly-linked lists
  • Larger memory footprint than arrays (need reference to next node.)
  • Can't access individual elements
  • Lose memory locality with linked lists