CMU 15-112: Fundamentals of Programming and Computer Science
Class Notes: Strings
This week we'll be exploring strings in much greater detail than we've discussed so far. Strings have served us well as debugging tools and for basic text output, but now we will explore how we can manipulate and process text data in a variety of ways. Like numbers, strings are important in almost all aspects of programming. We can interact with strings in similar ways to how we've interacted with numbers (i.e. stepping through each digit of an integer, detecting if a number meets certain criteria, or finding the first instance of a particular digit) but strings are represented very differently from numbers, so the syntax we use is also different.
Fortunately, Python (and many other languages) have built-in functions and methods to perform common string operations. We will explore the most useful of these and show how they can simplify the code we write.
Learning Goal: use strings to format and manipulate and parse non-numeric data. In particular:
- Use string constants and operators to compare and create alphanumeric data
- Use loops and string slicing and indexing to isolate and operate on individual letters and substrings
- Use string builtins and methods to process string values
- Perform basic file IO for reading and writing file-based text data
- String Literals
- Check 3.1
- Some String Constants
- Some String Operators
- Check 3.2
- Looping over Strings
- Check 3.3
- Example: isPalindrome
- Strings are Immutable
- Some String-related Built-In Functions
- Some String Methods
- Check 3.4
- Check 3.5
- Check 3.6
- String Formatting
- Basic Console Input
- Basic File IO
- Check 3.6
- String Literals
- Four kinds of quotes
# Quotes enclose characters to tell Python "this is a string!" # single-quoted or double-quoted strings are the most common print('single-quotes') print("double-quotes") # triple-quoted strings are less common (though see next section for a typical use) print('''triple single-quotes''') print("""triple double-quotes""") # The reason we have multiple kinds of quotes is partly so we can have strings like: print('The professor said "No laptops in class!" I miss my laptop.') # Note that everything inside the single quotes is a string, and this prints fine. # If a certain kind of quote starts a string, the same kind ends it.# This means if we only use one type of quote, lines like this one will break! # It causes a syntax error! We don't even know how to color this. print("The professor said "No laptops in class!" I miss my laptop.") - Newlines in strings
# If you see something like \n in a string, that's probably an "escape sequence" # Even though it looks like two characters, python treats it as one special character # Note that these two print statements do the same thing! print("abc\ndef") # \n is a single newline character. print("""abc def""") print("""\ You can use a backslash at the end of a line in a string to exclude the newline after it. This should almost never be used, but one good use of it is in this example, at the start of a multi-line string, so the whole string can be entered with the same indentation (none, that is). """) - More Escape Sequences
print("Double-quote: \"") print("Backslash: \\") print("Newline (in brackets): [\n]") print("Tab (in brackets): [\t]") print("These items are tab-delimited, 3-per-line:") print("abc\tdef\tg\nhi\tj\\\tk\n---")
An escape sequence produces a single character:
s = "a\\b\"c\td" print("s =", s) print("len(s) =", len(s)) - repr() vs. print()
Sometimes it can be difficult to debug strings! Two strings that look identical when printed may not actually be the same. Have you ever had trouble distinguishing a tab from several spaces in a word processor? The repr() function is sort of like a 'formal' version of print(). Where print() is meant to produce output intended for the user, repr() shows us a representation of the data contained in the string. This can be really useful for debugging! Looking at an example will help you understand the difference:print("These look the same when we print them!") s1="abc\tdef" s2="abc def" print("print s1: ",s1) print("print s2: ",s2) print("\nThey aren't really though...") print("s1==s2?", s1==s2) print("\nLet's try repr instead") print("repr s1: ",repr(s1)) print("repr s2: ",repr(s2)) print("\nHere's a sneaky one") s1="abcdef" s2="abcdef \t" print("print s1: ",s1) print("print s2: ",s2) print("s1==s2?", s1==s2) print("repr s1: ",repr(s1)) print("repr s2: ",repr(s2)) print("repr() lets you see the spaces^^^") - String Literals as Multi-line Comments
""" Python does not have multiline comments, but you can do something similar by using a top-level multiline string, such as this. Technically, this is not a comment, and Python will evaluate this string, but then ignore it and garbage collect it! """ print("wow!")
- Four kinds of quotes
- Check 3.1
- Some String Constants
# When we import string, we get some nifty pre-built strings! # We can use these later when we want to check things like: # -Is a string all letters? # -Where are the punctuation marks in our string? # -Where is the next whitespace character? import string print(string.ascii_letters) # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ print(string.ascii_lowercase) # abcdefghijklmnopqrstuvwxyz print("-----------") print(string.ascii_uppercase) # ABCDEFGHIJKLMNOPQRSTUVWXYZ print(string.digits) # 0123456789 print("-----------") print(string.punctuation) # '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' print(string.printable) # digits + letters + punctuation + whitespace print("-----------") print(string.whitespace) # space + tab + linefeed + return + ...
- Some String Operators
- String + and *
print("abc" + "def") # What do you think this should do? print("abc" * 3) # How many characters do you think this prints? print("abc" + 3) # ...will this give us an error? (Yes) - The in operator
# The "in" operator is really really useful! print("ring" in "strings") print("wow" in "amazing!") print("Yes" in "yes!") print("" in "No way!") - String indexing and slicing
- Indexing a single character
# Indexing lets us find a character at a specific location (the index) s = "abcdefgh" print(s) print(s[0]) print(s[1]) print(s[2]) print("-----------") print("Length of ",s,"is",len(s)) print("-----------") print(s[len(s)-1]) print(s[len(s)]) # crashes (string index out of range) - Negative indexes
s = "abcdefgh" print(s) print(s[-1]) print(s[-2]) - Slicing a range of characters
# Slicing is also super important! # It's like indexing, but it lets us get more than 1 character. # ...how is this kind of like range(a,b)? s = "abcdefgh" print(s) print(s[0:3]) print(s[1:3]) print("-----------") print(s[2:3]) print(s[3:3]) - Slicing with default parameters
s = "abcdefgh" print(s) print(s[3:]) print(s[:3]) print(s[:]) - Slicing with a step parameter
print("This is not as common, but perfectly ok.") s = "abcdefgh" print(s) print(s[1:7:2]) print(s[1:7:3]) - Reversing a string
s = "abcdefgh" print("This works, but is confusing:") print(s[::-1]) print("This also works, but is still confusing:") print("".join(reversed(s))) print("Best way: write your own reverseString() function:") def reverseString(s): return s[::-1] print(reverseString(s)) # crystal clear!
- Indexing a single character
- String + and *
- Check 3.2
- Looping over Strings
-
We can loop over strings! There are a bunch of ways we can do this. Try some of the examples below to understand what each does.
- "for" loop with indexes
s = "abcd" for i in range(len(s)): print(i, s[i]) - "for" loop without indexes
# ...how is this different? What is c? s = "abcd" for c in s: print(c) - "for" loop with split
# By itself, names.split(",") produces something called a list. # Don't worry about that now, and until we teach lists, you probably # shouldn't use this outside the context of looping over strings. names = "fred,wilma,betty,barney" for name in names.split(","): print(name) - "for" loop with splitlines
# .splitlines() also makes a list, but don't worry about that either! # quotes from brainyquote.com quotes = """\ Dijkstra: Simplicity is prerequisite for reliability. Knuth: If you optimize everything, you will always be unhappy. Dijkstra: Perfecting oneself is as much unlearning as it is learning. Knuth: Beware of bugs in the above code; I have only proved it correct, not tried it. Dijkstra: Computer science is no more about computers than astronomy is about telescopes. """ for line in quotes.splitlines(): if (line.startswith("Knuth")): print(line)
- "for" loop with indexes
- Check 3.3
- Example: isPalindrome
A string is a palindrome if it is exactly the same forwards and backwards. For example, "racecar" is a palindrome because if you reverse the order of the letters, the result still spells racecar. Likewise, "abccba" is a palindrome, and so is "abcba" despite only having one c. Can you think of some others?
# There are many ways to write isPalindrome(s) # Here are several. Which way is best? def reverseString(s): return s[::-1] def isPalindrome1(s): return (s == reverseString(s)) def isPalindrome2(s): for i in range(len(s)): if (s[i] != s[len(s)-1-i]): return False return True def isPalindrome3(s): for i in range(len(s)): if (s[i] != s[-1-i]): return False return True def isPalindrome4(s): while (len(s) > 1): if (s[0] != s[-1]): return False s = s[1:-1] return True print(isPalindrome1("abcba"), isPalindrome1("abca")) print(isPalindrome2("abcba"), isPalindrome2("abca")) print(isPalindrome3("abcba"), isPalindrome3("abca")) print(isPalindrome4("abcba"), isPalindrome4("abca"))
- Strings are Immutable
- You cannot change strings! They are immutable. (If this section is confusing, we'll review it in lecture!)
s = "abcde" s[2] = "z" # Error! Cannot assign into s[i] - Instead, you must create a new string
s = "abcde" s = s[:2] + "z" + s[3:] print(s)
- You cannot change strings! They are immutable. (If this section is confusing, we'll review it in lecture!)
Look how handy these are! Try some!
- str() and len()
name = input("Enter your name: ") print("Hi, " + name + ". Your name has " + str(len(name)) + " letters!") - chr() and ord()
print(ord("A")) # 65 print(chr(65)) # "A" print(chr(ord("A")+1)) # ? - eval()
# eval() works but you should not use it! s = "(3**2 + 4**2)**0.5" print(eval(s)) # why not? Well... s = "reformatMyHardDrive()" print(eval(s)) # no such function! But what if there was?
Methods are a special type of function that we call "on" a value, like a string. You can tell it's a method because the syntax is in the form of value.function(), like s.islower() in the code below.
- Character types: isalnum(), isalpha(), isdigit(), islower(), isspace(), isupper()
# Run this code to see a table of isX() behaviors def p(test): print("True " if test else "False ", end="") def printRow(s): print(" " + s + " ", end="") p(s.isalnum()) p(s.isalpha()) p(s.isdigit()) p(s.islower()) p(s.isspace()) p(s.isupper()) print() def printTable(): print(" s isalnum isalpha isdigit islower isspace isupper") for s in "ABCD,ABcd,abcd,ab12,1234, ,AB?!".split(","): printRow(s) printTable() - String edits: lower(), upper(), replace(), strip()
print("This is nice. Yes!".lower()) print("So is this? Sure!!".upper()) print(" Strip removes leading and trailing whitespace only ".strip()) print("This is nice. Really nice.".replace("nice", "sweet")) print("This is nice. Really nice.".replace("nice", "sweet", 1)) # count = 1 print("----------------") s = "This is so so fun!" t = s.replace("so ", "") print(t) print(s) # note that s is unmodified (strings are immutable!) - Substring search: count(), startswith(), endswith(), find(), index()
print("This is a history test".count("is")) # 3 print("This IS a history test".count("is")) # 2 print("-------") print("Dogs and cats!".startswith("Do")) # True print("Dogs and cats!".startswith("Don't")) # False print("-------") print("Dogs and cats!".endswith("!")) # True print("Dogs and cats!".endswith("rats!")) # False print("-------") print("Dogs and cats!".find("and")) # 5 print("Dogs and cats!".find("or")) # -1 print("-------") print("Dogs and cats!".index("and")) # 5 print("Dogs and cats!".index("or")) # crash!
Sometimes we want to easily insert a value into a string, especially when we're printing things. Here are some ways to do that.
format a string with %s
breed = "beagle"
print("Did you see a %s?" % breed)
format an integer with %d
dogs = 42
print("There are %d dogs." % dogs)
format a float with %f
grade = 87.385
print("Your current grade is %f!" % grade)
format a float with %.[precision]f
You can control how many fractional digits of a float are included in the string by changing the number to the right of the decimal point.
grade = 87.385
print("Your current grade is %0.1f!" % grade)
print("Your current grade is %0.2f!" % grade)
print("Your current grade is %0.3f!" % grade)
print("Your current grade is %0.4f!" % grade)
format multiple values
dogs = 42
cats = 18
exclamation = "Wow"
print("There are %d dogs and %d cats. %s!!!" % (dogs, cats, exclamation))
format right-aligned with %[minWidth]
dogs = 42
cats = 3
print("%10s %10s" % ("dogs", "cats"))
print("%10d %10d" % (dogs, cats))
format left-aligned with %-[minWidth]
dogs = 42
cats = 3
print("%-10s %-10s" % ("dogs", "cats"))
print("%-10d %-10d" % (dogs, cats))
Ever played Zork? Or any other text-based adventure?
It could be pretty easy to make your own! You can use input() to allow the user to type in data.
Note: This is possibly the first time we've shown how to make your programs interactive!
- Input a string
name = input("Enter your name: ") print("Your name is:", name)
- Input a number (error!)
x = input("Enter a number: ") print("One half of", x, "=", x/2) # Error!
- Input a number with int()
x = int(input("Enter a number: ")) print("One half of", x, "=", x/2)
Files in your computer can store text, music, pictures, or anything else.
Let's just start by reading and writing some basic text files.
# Note: As this requires read-write access to your hard drive,
# this will not run in the browser in Brython.
def readFile(path):
with open(path, "rt") as f:
return f.read()
def writeFile(path, contents):
with open(path, "wt") as f:
f.write(contents)
contentsToWrite = "This is a test!\nIt is only a test!"
writeFile("foo.txt", contentsToWrite)
contentsRead = readFile("foo.txt")
assert(contentsRead == contentsToWrite)
# "The files are in the computer!"
print("Open the file foo.txt and verify its contents.")