CMU 15-112: Fundamentals of Programming and Computer Science
Class Notes: Sets
Learning Goal: represent and modify data using sets. In particular:
- Create and use sets and their elements
- Recognize the difference between sets, lists, and tuples.
- Understand how hashing makes sets extremely efficient.
- Quick Example
- Creating Sets
- Using Sets
- Check 7.1
- Properties of Sets
- Check 7.2
- How Sets Work: Hashing
- Some Worked Examples Using Sets
- Quick Example
# A set is a data structure that can hold multiple elements in no particular order # We cannot index into it, but we can iterate over it. s = set([2,3,5]) print(3 in s) # prints True print(4 in s) # prints False for x in range(7): if (x not in s): print(x) # prints 0 1 4 6 - Creating Sets
- Create an empty set
s = set() print(s) # prints set(), the empty set
- Create a set from a list
s = set(["cat", "cow", "dog"]) print(s) # prints {'cow', 'dog', 'cat'}
- Create a statically-allocated set
s = { 2, 3, 5 } print(s) # prints { 2, 3, 5 }
- Caution: { } is not an empty set!
s = { } print(type(s) == set) # False! print(type(s)) # This is a dict (we'll learn about those soon)
- Create an empty set
- Using Sets
# Sets can do many of the same things as lists and tuples... s = set([1, 2, 3]) print(len(s)) # prints 3 print(2 in s) # prints True print(4 in s) # prints False print(4 not in s) # prints True print(2 not in s) # prints False s.add(7) # use add instead of append to add an element to a set s.remove(3) # removes 3 from the set; raises an error if 3 is not in s for item in s: print(item) # we can loop over the items in s - Check 7.1
- Properties of Sets
Though sets are very similar to lists and tuples, they have a few important differences...
- Sets are Unordered
# Elements may be arranged in a different order than they are provided, # and we cannot index into the set. s = set([2,4,8]) print(s) # prints {8, 2, 4} in standard Python (though not in brython) for element in s: # prints 8, 2, 4 print(element) - Elements are Unique
# Sets can also only hold one of each unique element. Duplicates are removed. s = set([2,2,2]) print(s) # prints {2} print(len(s)) # prints 1 - Elements Must Be Immutable
# Sets can only hold elements that are immutable (cannot be changed), # such as numbers, booleans, strings, and tuples a = ["lists", "are", "mutable"] s = set([a]) # TypeError: unhashable type: 'list' print(s)
Another example:s1 = set(["sets", "are", "mutable", "too"]) s2 = set([s1]) # TypeError: unhashable type: 'set' print(s)
- Sets are Unordered
- Check 7.2
- How Sets Work: Hashing
After seeing all the downsides of sets listed above, you might wonder why we'd ever use sets. The answer is simple: sets are MUCH more efficient than lists. It's possible to check whether an element appears in a set in constant time, while it takes more time to find an element in a list based on how large the list is. This is accomplished with an algorithmic approach called hashing.
A hash function takes a value as input and returns an integer as output. The function returns the same integer each time it's called on a given value, and should generally return different integers for different values, though that does not always need to be the case. We actually don't need to build the hash function ourselves; python has one already, a built-in function called hash.
Python stores items in a set by creating a list of some length n, then choosing indexes in the list where each element in the set can be stored when it is added. These indexes are calculated as hash(element) % n. Then, when we need to check whether an item exists in a set, we don't need to check every possible index; we only need to check the one computed by our formula! This trick is what makes it possible to do constant-time lookup.
A practical example of how sets are faster than lists is shown below:
# 0. Preliminaries import time n = 1000 # 1. Create a list [2,4,6,...,n] then check for membership # among [1,2,3,...,n] in that list. # don't count the list creation in the timing a = list(range(2,n+1,2)) print("Using a list... ", end="") start = time.time() count = 0 for x in range(n+1): if x in a: count += 1 end = time.time() elapsed1 = end - start print("count=", count," and time = %0.4f seconds" % elapsed1) # 2. Repeat, using a set print("Using a set.... ", end="") start = time.time() s = set(a) count = 0 for x in range(n+1): if x in s: count += 1 end = time.time() elapsed2 = end - start print("count=", count," and time = %0.4f seconds" % elapsed2) print("With n=%d, sets ran about %0.1f times faster than lists!" % (n, elapsed1/elapsed2)) print("Try a larger n to see an even greater savings!") - Some Worked Examples Using Sets
Note: we'll go over the following sections in class