CMU 15-112: Fundamentals of Programming and Computer Science
Class Notes: Recursive Applications
Learning Goal: use recursion in a variety of problem genres that cannot be solved with iteration. In particular:
- Navigate recursive data structures such as file systems.
- Create recursive fractals with Tkinter.
- Use recursive backtracking to find solutions to problems.
- File System Navigation
The file system used by your computer is composed of two different data type: folders/directories and files. A file is just a file, but a directory is an iterable structure that holds files and other directories.
In other words: file systems are naturally recursive! To navigate them, we need to use recursion as well.
To try out the following programs, you'll need to run them on your local IDE. You'll also need to download and expand this zip file in the same directory as the Python file you are running: sampleFiles.zip.
Note: a file system is actually a specialized version of a recursive data structure known as a tree. You'll learn more about this data structure in other CS courses.
- printFiles
import os def printFiles(path): # Base Case: a file. Print the path name directly. if os.path.isfile(path): print(path) else: # Recursive Case: a directory. Iterate through its files and directories. # Note that we have to update the path name to access the inner files! for filename in os.listdir(path): printFiles(path + "/" + filename) printFiles("sampleFiles") # Note: if you see .DS_Store files in the sampleFiles folders, or in the # output of your function (as often happens with Macs, in particular), # don't worry; this is just a metadata file and can be safely ignored. - listFiles
import os def listFiles(path): if os.path.isfile(path): # Base Case: return a list of just this file return [ path ] else: # Recursive Case: create a list of all the recursive results from the files in the folder files = [ ] for filename in os.listdir(path): files += listFiles(path + "/" + filename) return files print(listFiles("sampleFiles")) - removeTmpFiles
If your computer is automatically creating .DS_Store files and you want to remove them, run the following program.
Be careful when using os.remove() on non-temporary files- it's permanent and cannot be undone!
import os def removeTmpFiles(path): if path.split("/")[-1] == '.DS_Store': os.remove(path) elif os.path.isdir(path): for filename in os.listdir(path): removeTmpFiles(path + "/" + filename)
- printFiles
- Check 10.1
- Fractals
A fractal is a graphic that is created recursively. Usually it involves a pattern that repeats continuously at smaller and smaller scale. Unlike other recursive structures, fractals don't have a base case- they just keep repeating forever.
We can't draw infinitely-small graphics in Tkinter, so we'll draw graphics up to a specified level instead. Our base case will then be the point at which we reach that level. We'll use the following framework for most fractals:
from tkinter import * def init(data): data.level = 1 def drawFractal(canvas, level, otherParams): if level == 0: pass # base case else: pass # recursive case; call drawFractal as needed with level-1 def keyPressed(event, data): if event.keysym in ["Up", "Right"]: data.level += 1 elif (event.keysym in ["Down", "Left"]) and (data.level > 0): data.level -= 1 def redrawAll(canvas, data): margin = min(data.width, data.height)//10 otherParams = None drawFractal(canvas, data.level, otherParams) canvas.create_text(data.width/2, 0, text = "Level %d Fractal" % (data.level), font = "Arial " + str(int(margin/3)) + " bold", anchor="n") canvas.create_text(data.width/2, margin, text = "Use arrows to change level", font = "Arial " + str(int(margin/4)), anchor="s") def mousePressed(event, data): pass def timerFired(data): pass
- sierpinskiTriangle
def drawSierpinskiTriangle(canvas, x, y, size, level): # (x,y) is the lower-left corner of the triangle # size is the length of a side # Need a bit of trig to calculate the top point if level == 0: topY = y - (size**2 - (size/2)**2)**0.5 canvas.create_polygon(x, y, x + size, y, x + size/2, topY, fill="black") else: # Bottom-left triangle drawSierpinskiTriangle(canvas, x, y, size/2, level-1) # Bottom-right triangle drawSierpinskiTriangle(canvas, x + size/2, y, size/2, level-1) # Top triangle midY = y - ((size/2)**2 - (size/4)**2)**0.5 drawSierpinskiTriangle(canvas, x + size/4, midY, size/2, level-1) def redrawAll(canvas, data): margin = min(data.width, data.height)//10 x, y = margin, data.height - margin size = min(data.width, data.height) - 2*margin drawSierpinskiTriangle(canvas, x, y, size, data.level)
Result:
- kochSnowflake
# This example uses turtle graphics, not Tkinter import turtle def kochSide(length, n): if (n == 1): turtle.forward(length) else: kochSide(length/3.0, n-1) turtle.left(60) kochSide(length/3.0, n-1) turtle.right(120) kochSide(length/3.0, n-1) turtle.left(60) kochSide(length/3.0, n-1) def kochSnowflake(length, n): for step in range(3): kochSide(length, n) turtle.right(120) turtle.delay(20) turtle.speed(0) turtle.penup() turtle.goto(-300,100) turtle.pendown() turtle.pencolor("black") kochSide(300, 4) # same as k4(300) turtle.pencolor("blue") kochSnowflake(300, 4) turtle.penup() turtle.goto(-250,50) turtle.pendown() turtle.pencolor("red") kochSnowflake(200, 5) turtle.done()
Result:
- sierpinskiTriangle
- Check 10.2
- Backtracking
There are many times in problem solving when you need to construct a solution based on a set of constraints, like finding a solution to a Sudoku puzzle, or determining an optimal packing of items into containers. It's possible to solve these problems by iterating through every possible solution and checking whether each solution is correct (like trying every possible combination of numbers in the open spaces of a Sudoku board), but generating all possible solutions often grows in exponential time- this just isn't a reasonable approach for many problems.
A more efficient way to construct a solution in constraint-based situations like these is to use backtracking. In this approach, a problem is broken up into several parts (such as the individual cells of a Sudoku board). If you can find a way to check the validity of a partial solution, you can then build up solutions part-by-part, checking validity all the way. This makes it possible to backtrack as soon as a solution is illegal and try a new partial-solution, instead of waiting until the full solution is made.
In this course, backtracking will usually follow the following general template. For each backtracking problem you attempt, try to determine what the problem state, parts, moves, validity, add/remove move, and completion look like in that problem.
# General backtracking template def solveWithBacktracking(problemState): if isComplete(problemState): return problemState part = getNextPart(problemState) for move in getPossibleMoves(problemState, part): # Sometimes it's easier to make a move, then check if it's valid. # Sometimes it's easier to check if a move is valid first. # Just make sure that you always undo a move properly! if isValid(problemState, part, move): problemState = makeMove(problemState, part, move) tmpSolution = solveWithBacktracking(problemState) if tmpSolution != None: return tmpSolution problemState = undoMove(problemState, part, move) return None- Check 10.3
Note: we'll go over all the following examples in class. You do not need to review them in advance unless you want to.
- nQueens
Problem State: the board is stored as a 2D list of strings. "Q" for queen, " " for blank
Parts: each row must have one queen in a valid solution, so the rows are the parts.
Moves: in a row, each column might be able to hold a queen.
Validity: check whether there is another queen in the same row, column, or diagonal line.
Make/Undo Move: add "Q"/" " to the appropriate space.
Completion: we're done when we pass the last row.
def isLegal(board, queenRow, queenCol): # A board is legal if no two queens can attack each other # We only need to check the most recently placed queen for row in range(len(board)): for col in range(len(board[0])): if queenRow == row and queenCol == col: continue elif board[row][col] == "Q": if ((queenRow == row) or (queenCol == col) or (queenRow + queenCol == row + col) or (queenRow - queenCol == row - col)): return False return True def solve(board, row): if (row == len(board)): return board for col in range(len(board[row])): board[row][col] = "Q" if isLegal(board, row, col): solution = solve(board, row + 1) if solution != None: return solution board[row][col] = " " return None def printBoard(board): for row in range(len(board)): print("".join(board[row])) def nQueens(n): board = [ [" "] * n for row in range(n) ] solution = solve(board, 0) if solution != None: printBoard(solution) nQueens(4) - Subset Sum
Problem State: we start with a list of positive numbers (lst) and an empty result list (result). We want to put items from lst into result such that result sums to target, an int.
Parts: each number in lst is a potential part of the solution.
Moves: we can either include the next number, or not include it.
Validity: we can include a number if adding it to the result doesn't make the result sum greater than the target.
Make/Undo Move: if we don't change result directly, we don't need to undo the move!
Completion: we're done when the sum of the result is equal to the target. We know we're at a dead end if we run out of items as well.
# Assume that lst contains no negative numbers def subsetSum(lst, result, target): if sum(result) == target: return result if len(lst) == 0: return None for addedItem in [ [lst[0]], [] ]: if sum(result) + sum(addedItem) <= target: tmpResult = result + addedItem tmp = subsetSum(lst[1:], tmpResult, target) if tmp != None: return tmp return None print(subsetSum([3, 6, 5, 2, 11, 7, 3], [], 15)) - Maze Solving
Problem State: the maze is stored as a 2D list of islands. Each island keeps track of a dictionary mapping bridge directions to its connected islands. We'll construct a chain of connected islands starting at the top-left and ending at the bottom-right.
Parts: each island can be a part of the solution. We represent an island by its row and col.
Moves: from an island, we can move South, East, North, or West.
Validity: check whether that direction moves off-grid, and whether a bridge exists there. Also check if we've already seen the next island. If we've already checked in, there's no point in checking again.
Make/Undo Move: add/remove the next island to the solution chain.
Completion: we're done when we reach the lower-right island.
Python code: notes-recursion-maze-solver.py
Key excerpt:NORTH = (-1, 0) SOUTH = ( 1, 0) EAST = ( 0, 1) WEST = ( 0, -1) def isValid(data, row, col, direction): if not (0 <= row < len(data.maze) and 0 <= col < len(data.maze[0])): return False return direction in data.maze[row][col].bridges def solve(data, row, col, visited, alreadySeen): if row == len(data.maze)-1 and col == len(data.maze[0])-1: return visited for direction in [SOUTH, EAST, NORTH, WEST]: drow, dcol = direction if isValid(data, row, col, direction) and \ (row + drow, col + dcol) not in alreadySeen: visited.append((row + drow, col + dcol)) alreadySeen.add((row + drow, col + dcol)) tmpSolution = solve(data, row + drow, col + dcol, visited, alreadySeen) if tmpSolution != None: return tmpSolution visited.pop() # We won't undo alreadySeen, because we know we can't reach the end from there. return None def solveMaze(data): visited = [(0, 0)] alreadySeen = set() return solve(data, 0, 0, visited, alreadySeen)
Memoization is not a recursive application; instead, it's a way to make recursive applications more efficient.
Big idea: when a program does a lot of repetitive computation, store results as soon as they're computed, then look up results when available.
- The problem:
def fib(n): if (n < 2): return 1 else: return fib(n-1) + fib(n-2) import time def testFib(maxN=40): for n in range(maxN+1): start = time.time() fibOfN = fib(n) ms = 1000*(time.time() - start) print("fib(%2d) = %8d, time =%5dms" % (n, fibOfN, ms)) testFib() # gets really slow!
- A solution:
fibResults = dict() def fib(n): if (n in fibResults): return fibResults[n] if (n < 2): result = 1 else: result = fib(n-1) + fib(n-2) fibResults[n] = result return result import time def testFib(maxN=40): for n in range(maxN+1): start = time.time() fibOfN = fib(n) ms = 1000*(time.time() - start) print("fib(%2d) = %8d, time =%5dms" % (n, fibOfN, ms)) testFib() # ahhh, much better!
- A more elegant solution (which will make sense after you've read the Functions Redux notes):
def memoized(f): # You are not responsible for how this decorator works- you can just use it. import functools cachedResults = dict() @functools.wraps(f) def wrapper(*args): if args not in cachedResults: cachedResults[args] = f(*args) return cachedResults[args] return wrapper @memoized def fib(n): if (n < 2): return 1 else: return fib(n-1) + fib(n-2) import time def testFib(maxN=40): for n in range(maxN+1): start = time.time() fibOfN = fib(n) ms = 1000*(time.time() - start) print("fib(%2d) = %8d, time =%5dms" % (n, fibOfN, ms)) testFib() # ahhh, much better!