* Edit Jan 2021: I recently completed a YouTube video covering topics in this post:
In the last lesson we learned about lists, Python's jack-of-all trades sequence data type. In this lesson we'll take a look at 2 more Python sequences: tuples and strings.
Tuples
Tuples are an immutable sequence data type that are commonly used to hold short collections of related data. For instance, if you wanted to store latitude and longitude coordinates for cities, tuples might be a good choice, because the values are related and not likely to change. Like lists, tuples can store objects of different types.
Construct a tuple with a comma separated sequence of objects within parentheses:
In [1]:
my_tuple = (1,3,5)
print(my_tuple)
Alternatively, you can construct a tuple by passing an iterable into the tuple() function:
In [2]:
my_list = [2,3,1,4]
another_tuple = tuple(my_list)
another_tuple
Out[2]:
Tuples generally support the same indexing and slicing operations as lists and they also support some of the same functions, with the caveat that tuples cannot be changed after they are created. This means we can do things like find the length, max or min of a tuple, but we can't append new values to them or remove values from them:
In [3]:
another_tuple[2] # You can index into tuples
Out[3]:
In [4]:
another_tuple[2:4] # You can slice tuples
Out[4]:
In [5]:
# You can use common sequence functions on tuples:
print( len(another_tuple))
print( min(another_tuple))
print( max(another_tuple))
print( sum(another_tuple))
In [6]:
another_tuple.append(1) # You can't append to a tuple
In [7]:
del another_tuple[1] # You can't delete from a tuple
You can sort the objects in tuple using the sorted() function, but doing so creates a new list containing the result rather than sorting the original tuple itself like the list.sort() function does with lists:
In [8]:
sorted(another_tuple)
Out[8]:
Although tuples are immutable themselves, they can contain mutable objects like lists. This means that the contents of a shallow copy of a tuple containing a list will change if the nested list changes:
In [9]:
list1 = [1,2,3]
tuple1 = ("Tuples are Immutable", list1)
tuple2 = tuple1[:] # Make a shallow copy
list1.append("But lists are mutable")
print( tuple2 ) # Print the copy
To avoid this behavior, make a deepcopy using the copy library:
In [11]:
import copy
list1 = [1,2,3]
tuple1 = ("Tuples are Immutable", list1)
tuple2 = copy.deepcopy(tuple1) # Make a deep copy
list1.append("But lists are mutable")
print( tuple2 ) # Print the copy
Strings
We already learned a little bit about strings in the lesson on basic data types, but strings are technically sequences: immutable sequences of text characters. As sequences, they support indexing operations where the first character of a string is index 0. This means we can get individual letters or slices of letters with indexing:
In [12]:
my_string = "Hello world"
In [13]:
my_string[3] # Get the character at index 3
Out[13]:
In [14]:
my_string[3:] # Slice from the third index to the end
Out[14]:
In [15]:
my_string[::-1] # Reverse the string
Out[15]:
In addition, certain sequence functions like len() and count() work on strings:
In [16]:
len(my_string)
Out[16]:
In [17]:
my_string.count("l") # Count the l's in the string
Out[17]:
As immutable objects, you can't change a string itself: every time you transform a string with a function, Python makes a new string object, rather than actually altering the original string that exists in your computer's memory.
Strings have many associated functions. Some basic string functions include:
In [18]:
# str.lower()
my_string.lower() # Make all characters lowercase
Out[18]:
In [19]:
# str.upper()
my_string.upper() # Make all characters uppercase
Out[19]:
In [20]:
# str.title()
my_string.title() # Make the first letter of each word uppercase
Out[20]:
Find the index of the first appearing substring within a string using str.find(). If the substring does not appear, find() returns -1:
In [21]:
my_string.find("W")
Out[21]:
Notice that since strings are immutable, we never actually changed the original value of my_string with any of the code above, but instead generated new strings that were printed to the console. This means "W" does not exist in my_string even though our call to str.title() produced the output 'Hello World'. The original lowercase "w" still exists at index position 6:
In [22]:
my_string.find("w")
Out[22]:
Find and replace a target substring within a string using str.replace()
In [23]:
my_string.replace("world", # Substring to replace
"friend") # New substring
Out[23]:
Split a string into a list of substrings based on a given separating character with str.split():
In [24]:
my_string.split() # str.split() splits on spaces by default
Out[24]:
In [25]:
my_string.split("l") # Supply a substring to split on other values
Out[25]:
Split a multi-line string into a list of lines using str.splitlines():
In [26]:
multiline_string = """I am
a multiline
string!
"""
multiline_string.splitlines()
Out[26]:
Strip leading and trailing characters from both ends of a string with str.strip().
In [27]:
# str.strip() removes whitespace by default
" white space ".strip()
Out[27]:
Override the default by supplying a string containing all characters you'd like to strip as an argument to the function:
In [28]:
"xXxxBuyNOWxxXx".strip("xX")
Out[28]:
You can strip characters from the left or right sides only with str.lstrip() and str.rstrip() respectively:
In [29]:
" white space ".lstrip()
Out[29]:
In [30]:
" white space ".rstrip()
Out[30]:
You can join (concatenate) two strings with the plus (+) operator:
In [31]:
"Hello " + "World"
Out[31]:
Convert the a list of strings into a single string separated by a given delimiter with str.join():
In [32]:
" ".join(["Hello", "World!", "Join", "Me!"])
Out[32]:
Although the + operator works for string concatenation, things can get messy if you start trying to join more than a couple values together with pluses.
In [33]:
name = "Joe"
age = 10
city = "Paris"
"My name is " + name + " I am " + str(age) + " and I live in " + "Paris"
Out[33]:
For complex string operations of this sort is preferable to use the str.format() function. str.format() takes in a template string with curly braces as placeholders for values you provide to the function as the arguments. The arguments are then filled into the appropriate placeholders in the string:
In [34]:
template_string = "My name is {} I am {} and I live in {}"
template_string.format(name, age, city)
Out[34]:
Read more about string formatting here.
Wrap Up
Basic sequences like lists, tuples and strings appear everywhere in Python code, so it is essential to understand the basics of how they work before we can start using Python for data analysis. We're almost ready to dive into data structures designed specifically data analysis, but before we do, we need to cover two more useful built in Python data structures: dictionaries and sets.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.