Python Data Structures

Python Data Structures

In the last tutorial, we had a look at some common programming algorithms. A companion to these algorithms is a set of common data structures. Programming algorithms need to work with data and that data is often contained in specific formats or data structures. Now is a good time to learn more about these common data structures that are used when creating various algorithms. The purpose of data structures is to organize information in ways that make it easy to be operated on by algorithms. You might have a watchlist of stocks and you might want to be able to sort them by dividend yield or P/E Ratio. Another example would be a tree structure that represents a collection of folders and files where you want to find a specific file within all those folders. Each scenario has associated data with a different data structure. The most common data structures to be well versed in are arrays, linked lists, stacks, queues, trees, and hash tables. Different applications will need different types of data structures to hold the information that algorithms need to work on. In this tutorial, we’ll explore these topics further.


Arrays

An array is a group of items where the position of each item is identified by an index or a key value. A one-dimensional array is the most basic type of array, and the diagram below shows what this might look like.

One Dimensional Array

Element positions can be calculated using a mathematical expression which allows array elements to be accessed directly in a random access approach. What this means is that since the position of each element can be directly computed, there is no need to navigate or traverse the data structure in order to access an item. The first index element of an array is always at position 0. Here is an example of a simple one-dimensional array in Python.

3
6
9
12
15

Accessing every other element of the array

3
9
15

Accessing an element directly

15

Arrays can have multiple dimensions. To create a two-dimensional array, the first dimension can themselves contain arrays. Accessing an item in a two-dimensional array requires you to provide two indexes. Here is a diagram of a two-dimensional array with an index of 2,1 highlighted.

Multi Dimensional Array

In Python, you’ll likely more commonly use the List data structure which is an array-like data type. Both the List and the Array in Python behave in similar ways in that you can iterate over them and store items at a specific index. The difference between the two is in the functions that you can perform on them. It is more work to use true arrays in Python because you have to import the array module and declare an array. Lists are simply a part of Python’s syntax, so they are used much more often and cover most of the use cases you’ll need. True arrays will be better for math functions as well as for working with large amounts of data. Most times, you can simply go with Lists. Here are a few examples of some lists in Python.


Linked Lists

The linked list data structure is a linear collection of data elements that are often referred to as nodes. They are similar to arrays however each of the nodes has a field that points to the next element in the list, unlike an array. There are single linked lists and double linked lists. Here are a couple of diagrams that display this.


Single Linked List

The first item of a linked list is called the head. Each element contains a field that points to the next item in the list. The last item in a linked list points to null, which means that it is the end of the list.
single linked list


Double Linked List

In a double linked list, each data item has a reference to both the previous and next items.
double linked list

Linked List In Python (Single Linked)

Here is an implementation of a linked list in Python. It uses two classes. One to represent the nodes of the list, and one to represent the linked list itself. The Node class implements the node type that will be stored in the linked list. It has a single next field, indicating that this is a singly linked list. The LinkedList class has fields for the head as well as a count field that keeps track of how many nodes are in the list.

Initialize a Linked List and store some values

Node:  15
Node:  12
Node:  9
Node:  6
Node:  3

Printing the count of the linked list

Number of items in List:  5

Find two Node objects in the Linked List

Finding item:  <__main__.Node object at 0x03512FD0>
Finding item:  <__main__.Node object at 0x03538028>

Delete a node in a linked list

Number of items in List:  4
Finding item:  <__main__.Node object at 0x031A8058>
Node:  15
Node:  12
Node:  9
Node:  3

Stack Data Structure

The stack data structure is a collection of elements that has two basic operations, push and pop. Stacks are LIFO, or last-in-first-out, data structures. The last item pushed onto a stack is the first one popped. An example of a stack is when you’re using the back button in your browser. As you surf the internet, the browser adds each link to a stack in order to maintain the order in which they were visited. When you click on the back button, the most recently added URL is popped off the stack and then revisited.

Stack Data Structure In Python

You can get the characteristics of working with a stack data structure in Python by making use of a list.

Initialize a stack

Push (Append) Items on to the stack

Print out the stack

['Tom', 'Dick', 'Harry', 'Bosch']

Pop an item off the stack

Bosch
['Tom', 'Dick', 'Harry']

Stack as a Class

You might also do something like the following which uses a user-defined class to offer stack functionality. This is still just a wrapper around using the list type, but now you have an actual push() method you can use.

Initial stack: [0, 1, 2, 3, 4]
pop(): 4
After pop(), the stack is now: [0, 1, 2, 3]
After push(7), the stack is now: [0, 1, 2, 3, 7]
The size is: 5

Queue Data Structure

The queue data structure also supports adding and removing items, but it uses the FIFO method. FIFO is a first-in, first-out approach. An empty queue that gets an item added to it, would be the first item in the list. Queuing in more items simply grows the length of the list. Queues are very common in programming since they mimick so much of what happens in real life. Have you ever been to the department of motor vehicles? Then you know very well what a Queue is. You walk up to the end of the line(queue), wait for some large amount of time(queue processing), and then finally get service once all others in front of you have been served. In Software, message processing is a common use of a queue.

Queue Data Structure In Python

Initialize an empty queue

Add some items to the queue

Print out the queue

deque(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])

Pop item off the queue

Monday
deque(['Tuesday', 'Wednesday', 'Thursday', 'Friday'])

Hash Table Data Structure

A hash table is basically an associative array. Many other programming languages have associative arrays, and Python has its implementation of this data structure via dictionaries. This data structure maps keys to values, using a hash function. A hash function uses the key to calculate an index for the slots and maps the key to a value. The ability to uniquely map a given key to a specific value is a big benefit of hash tables. This makes working with counters and filters fast and easy. Hash tables are also quite fast, making them good for large datasets. Hash tables don’t order their items in any specific way, so you would need to add a sorting mechanism if this is required.

Hash Table Data Structure In Python

Initialize a new hash table

{'firstkey': 1, 'secondkey': 2, 'thirdkey': 'three'}

Create a second hash table with iteration

{'firstkey': 1, 'secondkey': 2, 'thirdkey': 3}

Replace an item in a hash table

{'firstkey': 1, 'secondkey': 'two', 'thirdkey': 3}

Iterate over the hash table to print key-value pairs

key:  firstkey  value:  1
key:  secondkey  value:  two
key:  thirdkey  value:  3

Learn More About Python Data Structures

Python Data Structures Summary

  • True Python Arrays are a wrapper on C arrays and are good for working with items of the same type. They are not as user-friendly as Lists.
  • Lists are a more flexible style of an array that can hold a combination of any type of data. If you need to shrink and grow your list without hassle, they are the better choice.
  • Linked Lists may be preferred over arrays since they are easier and faster to reorganize. This article explains why you would want to use a linked list.
  • Stacks grow to the right and shrink to the left and are good for Last In First Out operations.
  • Queues use the First In First Out approach and are good for messaging, logging, and other applications.
  • Hash Tables are implemented in Python using dictionaries and are a form of an associative array with distinct key-value pairs.