Key Word(s): Datastructures, Lists, Linked lists
Data Structures¶
- Computer programs don't only perform calculations; they also store and retrieve information
- Data structures and the algorithms that operate on them are at the core of computer science
- Data structures are quite general
- Any data representation and associated operations
- e.g. integers, floats, arrays, classes, ...
- Need to develop a "toolkit" of data structures and know when/how to use the right one for a given problem
Changing a data structure in a slow program can work the same way an organ transplant does in a sick patient. Important classes of abstract data types such as containers, dictionaries, and priority queues, have many different but functionally equivalent data structures that implement them.
Changing the data structure does not change the correctness of the program, since we presumably replace a correct implementation with a different correct implementation. However, the new implementation of the data type realizes different tradeoffs in the time to execute various operations, so the total performance can improve dramatically.
Like a patient in need of a transplant, only one part might need to be replaced in order to fix the problem.
-Steven S Skiena. The Algorithm Design Manual
We'll tour some data structures in Python
.
First up: sequences.
Sequences and their Abstractions¶
What is a sequence?¶
Consider the notion of Abstract Data Types.
The idea there is that one data type might be implemented in terms of another, or some underlying code not even in Python
.
As long as the interface and contract presented to the user is solid, we can change the implementation below.
The dunder methods in Python
are used towards this purpose.
In Python
a sequence is something that follows the "sequence protocol". An example of this is a Python
list.
This entails defining the __len__
and __getitem__
methods, as we mentioned in previous lectures.
Example¶
alist = [1,2,3,4]
len(alist) # calls alist.__len__
4
alist[2] # calls alist.__getitem__(2)
3
Lists also support slicing¶
alist[2:4]
[3, 4]
How does this work?¶
We will create a dummy sequence, which does not create any storage.
class DummySeq:
# It just implements the protocol.
def __len__(self):
return 42
def __getitem__(self, index):
return index
d = DummySeq()
len(d)
42
d[5]
5
d[67:98]
slice(67, 98, None)
The "slice object"¶
Slicing creates a slice object for us of the form slice(start, stop, step)
and then Python
calls seq.__getitem__(slice(start, stop, step))
.
Two-dimensional slicing is also possible.
d[67:98:2,1]
(slice(67, 98, 2), 1)
d[67:98:2,1:10]
(slice(67, 98, 2), slice(1, 10, None))
So what is slice()
exactly?
dir(slice)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'indices', 'start', 'step', 'stop']
Interesting! There's an indices
method!
help(slice.indices)
Help on method_descriptor: indices(...) S.indices(len) -> (start, stop, stride) Assuming a sequence of length len, calculate the start and stop indices, and the stride length of the extended slice described by S. Out of bounds indices are clipped in a manner consistent with the handling of normal slices.
slice(1, 10, 2).indices(100)
(1, 10, 2)
Of course, you would only need to use this if you don't have access to an underlying sequence.
# Adapted from Example 10-6 from Fluent Python
import numbers # See https://www.python.org/dev/peps/pep-3141/
import reprlib # like repr but w/ limits on sizes of returned strings
class NewSeq:
def __init__(self, iterator):
self._storage = list(iterator)
def __repr__(self):
components = reprlib.repr(self._storage)
return 'NewSeq({})'.format(components)
def __len__(self):
return len(self._storage)
def __getitem__(self, index):
cls = type(self)
if isinstance(index, slice):
return cls(self._storage[index])
elif isinstance(index, numbers.Integral):
return self._storage[index]
else:
msg = '{cls.__name__} indices must be integers'
raise TypeError(msg.format(cls=cls))
d2 = NewSeq(range(10))
len(d2)
10
repr(d2)
'NewSeq([0, 1, 2, 3, 4, 5, ...])'
d2
NewSeq([0, 1, 2, 3, 4, 5, ...])
d2[4]
4
d2[2:4]
NewSeq([2, 3])
d2[1,4]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-21-86ebd9ff5e90> in <module> ----> 1 d2[1,4] <ipython-input-15-b55bf2b4aeca> in __getitem__(self, index) 22 else: 23 msg = '{cls.__name__} indices must be integers' ---> 24 raise TypeError(msg.format(cls=cls)) TypeError: NewSeq indices must be integers
Breakout Room (15 mins)¶
- Determine who stayed up latest last night (their local time).
- Walk through the example and be ready to explain the following:
- The implementation of
__repr__
. - The implementation of
__getitem__
.- What is
numbers.Integral
?
- What is
- The choice of error message. Is it sensible? How can we know?
- The implementation of
Note: An explanation could involve demo-ing the method in a notebook cell!
l = list(range(10))
reprlib.repr(l)
'[0, 1, 2, 3, 4, 5, ...]'
ind = slice(2,4,None)
isinstance(ind, slice)
True
ind = 4
isinstance(ind, slice)
False
isinstance(ind, numbers.Integral)
True
l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l[4]
4
l[4.0]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-33-799917c621e3> in <module> ----> 1 l[4.0] TypeError: list indices must be integers or slices, not float
l[4.0, 3.0]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-34-9c08704c5d55> in <module> ----> 1 l[4.0, 3.0] TypeError: list indices must be integers or slices, not tuple
Linked Lists¶
- Remember, a name in
Python
points to its value. - We've seen lists whose last element is actually a pointer to another list.
- This leads to the idea of a linked list, which we'll use to illustrate sequences.
pair = (1,2)
This representation lacks a certain power. A generalization:
pair = (1, (-1, None))
Another generalization:
linked_list = (1, (2, (3, (4, None))))
The second example leads to something like: Recursive Lists.
Here's what things look like in PythonTutor
: PythonTutor
Example.
import IPython
IPython.display.IFrame('http://pythontutor.com/iframe-embed.html#code=ll%20%3D%20%281,%20%282,%20%283,%20%284,%20None%29%29%29%29&codeDivHeight=400&codeDivWidth=350&cumulative=false&curInstr=0&heapPrimitives=nevernest&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false">',
width=900, height=300)
Quick Linked List implementation¶
empty_ll = None
def make_ll(first, rest): # Make a linked list
return (first, rest)
def first(ll): # Get the first entry of a linked list
return ll[0]
def rest(ll): # Get the second entry of a linked list
return ll[1]
ll_1 = make_ll(1, make_ll(2, make_ll(3, empty_ll))) # Recursively generate a linked list
ll_1
(1, (2, (3, None)))
my_ll = make_ll(10,ll_1) # Make another one
my_ll
(10, (1, (2, (3, None))))
print(first(my_ll), " ", rest(my_ll), " ", first(rest(my_ll)))
10 (1, (2, (3, None))) 1
Some reasons for linked lists:¶
- You allocate memory only when you want to use it.
- Inserting a new element is cheaper than in a fixed size array
- Can be done with a constant number of operations!
- Gateway to other pointer-like and hierarchical structures.
Comments about linked lists:¶
- Not so useful in
Python
but can be useful inC/C++
- There are singly-linked lists and doubly-linked lists
- Larger memory footprint than arrays (need reference to next node.)
- Can't access individual elements
- Lose memory locality with linked lists