Key Word(s): Datastructures, Sequences, Linked lists
Containers Re-cap¶
- Last time, you learned about Docker containers
- These were motivated thru virtual environments and virutal machines
- Containers provide more isolation than virtual environments
- Containers provide less isolation than virtual machines, but are much more lightweight
- In the lecture exercise, you created a Docker container
- You also learned:
- How to pull a Docker image from DockerHub
- How to run that Docker image
- Let's do a quick demo
Download and Run Docker Container¶
$ docker pull pavlosprotopapas/secret
$ docker run -i -t pavlosprotopapas/secret
$ apt-get update
$ apt-get -y install vim
Edit File in Container¶
$ vim secret.sh
# Make Edits
$ chmod +x secret.sh
$ ./secret.sh
$ exit
Save Changes to Container¶
$ docker ps -a
$ docker commit pavlosprotopapas/secret
Docker Container Workflow Summary¶
- For developers, create a Dockerfile on Github
- Developers should clone the repo and pull the Docker image
They will make changes in the repo
For clients (consumers), only provide the Docker image on Dockerhub.
- They just want to run the container, not modify it.
Data Structures¶
- Computer programs don't only perform calculations; they also store and retrieve information
- Data structures and the algorithms that operate on them are at the core of computer science
- Data structures are quite general
- Any data representation and associated operations
- e.g. integers, floats, arrays, classes, ...
- Need to develop a "toolkit" of data structures and know when/how to use the right one for a given problem
Changing a data structure in a slow program can work the same way an organ transplant does in a sick patient. Important classes of abstract data types such as containers, dictionaries, and priority queues, have many different but functionally equivalent data structures that implement them.
Changing the data structure does not change the correctness of the program, since we presumably replace a correct implementation with a different correct implementation. However, the new implementation of the data type realizes different tradeoffs in the time to execute various operations, so the total performance can improve dramatically.
Like a patient in need of a transplant, only one part might need to be replaced in order to fix the problem.
-Steven S Skiena. The Algorithm Design Manual
We'll tour some data structures in Python
.
First up: sequences.
Sequences and their Abstractions¶
What is a sequence?¶
Consider the notion of Abstract Data Types.
The idea there is that one data type might be implemented in terms of another, or some underlying code, not even in Python
.
As long as the interface and contract presented to the user is solid, we can change the implementation below.
The dunder methods in Python
are used towards this purpose.
In Python
a sequence is something that follows the "sequence protocol". An example of this is a Python
list.
This entails defining the __len__
and __getitem__
methods, as we mentioned in previous lectures.
Example¶
alist = [1,2,3,4]
len(alist) # calls alist.__len__
alist[2] # calls alist.__getitem__(2)
Lists also support slicing¶
alist[2:4]
How does this work?¶
We will create a dummy sequence, which does not create any storage.
class DummySeq:
# It just implements the protocol.
def __len__(self):
return 42
def __getitem__(self, index):
return index
d = DummySeq()
len(d)
d[5]
d[67:98]
The "slice object"¶
Slicing creates a slice object for us of the form slice(start, stop, step)
and then Python
calls seq.__getitem__(slice(start, stop, step))
.
Two-dimensional slicing is also possible.
d[67:98:2,1]
d[67:98:2,1:10]
# Adapted from Example 10-6 from Fluent Python
import numbers # See https://www.python.org/dev/peps/pep-3141/
import reprlib # like repr but w/ limits on sizes of returned strings
class NewSeq:
def __init__(self, iterator):
self._storage = list(iterator)
def __repr__(self):
components = reprlib.repr(self._storage)
components = components[components.find('['):]
return 'NewSeq({})'.format(components)
def __len__(self):
return len(self._storage)
def __getitem__(self, index):
cls = type(self)
if isinstance(index, slice):
return cls(self._storage[index])
elif isinstance(index, numbers.Integral):
return self._storage[index]
else:
msg = '{cls.__name__} indices must be integers'
raise TypeError(msg.format(cls=cls))
d2 = NewSeq(range(10))
len(d2)
repr(d2)
d2
d[4]
d2[2:4]
d2[1,4]
Linked Lists¶
- Remember, a name in
Python
points to its value. - We've seen lists whose last element is actually a pointer to another list.
- This leads to the idea of a linked list, which we'll use to illustrate sequences.
pair = (1,2)
This representation lacks a certain power. A few generalizations:
pair = (1, (2, None))
linked_list = (1, (2, (3, (4, None))))
The second example leads to something like: Recursive Lists.
Here's what things look like in PythonTutor
: PythonTutor
Example.
Quick Linked List implementation¶
empty_ll = None
def make_ll(first, rest): # Make a linked list
return (first, rest)
def first(ll): # Get the first entry of a linked list
return ll[0]
def rest(ll): # Get the second entry of a linked list
return ll[1]
ll_1 = make_ll(1, make_ll(2, make_ll(3, empty_ll))) # Recursively generate a linked list
my_ll = make_ll(10,ll_1) # Make another one
my_ll
print(first(my_ll), " ", rest(my_ll), " ", first(rest(my_ll)))
Some reasons for linked lists:¶
- You allocate memory only when you want to use it.
- Inserting a new element is cheaper than in a fixed size array
- Gateway to other pointer-like and hierarchical structures.
Comments about linked lists:¶
- Not so useful in
Python
but can be useful inC/C++
- There are singly-linked lists and doubly-linked lists
- Larger memory footprint than arrays (need reference to next node.)
- Can't access individual elements
- Lose memory locality with linked lists