Friday, October 30, 2015

Python for Data Analysis Part 6: Tuples and Strings


* Edit Jan 2021: I recently completed a YouTube video covering topics in this post:




In the last lesson we learned about lists, Python's jack-of-all trades sequence data type. In this lesson we'll take a look at 2 more Python sequences: tuples and strings.

Tuples

Tuples are an immutable sequence data type that are commonly used to hold short collections of related data. For instance, if you wanted to store latitude and longitude coordinates for cities, tuples might be a good choice, because the values are related and not likely to change. Like lists, tuples can store objects of different types.
Construct a tuple with a comma separated sequence of objects within parentheses:
In [1]:
my_tuple = (1,3,5)

print(my_tuple)
(1, 3, 5)
Alternatively, you can construct a tuple by passing an iterable into the tuple() function:
In [2]:
my_list = [2,3,1,4]

another_tuple = tuple(my_list)

another_tuple
Out[2]:
(2, 3, 1, 4)
Tuples generally support the same indexing and slicing operations as lists and they also support some of the same functions, with the caveat that tuples cannot be changed after they are created. This means we can do things like find the length, max or min of a tuple, but we can't append new values to them or remove values from them:
In [3]:
another_tuple[2]     # You can index into tuples
Out[3]:
1
In [4]:
another_tuple[2:4]   # You can slice tuples
Out[4]:
(1, 4)
In [5]:
# You can use common sequence functions on tuples:

print( len(another_tuple))   
print( min(another_tuple))  
print( max(another_tuple))  
print( sum(another_tuple))  
4
1
4
10
In [6]:
another_tuple.append(1)    # You can't append to a tuple
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-26174f458975> in <module>()
----> 1 another_tuple.append(1)    # You can't append to a tuple

AttributeError: 'tuple' object has no attribute 'append'
In [7]:
del another_tuple[1]      # You can't delete from a tuple
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-c547ee9ba53d> in <module>()
----> 1 del another_tuple[1]      # You can't delete from a tuple

TypeError: 'tuple' object doesn't support item deletion
You can sort the objects in tuple using the sorted() function, but doing so creates a new list containing the result rather than sorting the original tuple itself like the list.sort() function does with lists:
In [8]:
sorted(another_tuple)
Out[8]:
[1, 2, 3, 4]
Although tuples are immutable themselves, they can contain mutable objects like lists. This means that the contents of a shallow copy of a tuple containing a list will change if the nested list changes:
In [9]:
list1 = [1,2,3]

tuple1 = ("Tuples are Immutable", list1)

tuple2 = tuple1[:]                       # Make a shallow copy

list1.append("But lists are mutable")

print( tuple2 )                          # Print the copy
('Tuples are Immutable', [1, 2, 3, 'But lists are mutable'])
To avoid this behavior, make a deepcopy using the copy library:
In [11]:
import copy

list1 = [1,2,3]

tuple1 = ("Tuples are Immutable", list1)

tuple2 = copy.deepcopy(tuple1)           # Make a deep copy

list1.append("But lists are mutable")

print( tuple2 )                          # Print the copy
('Tuples are Immutable', [1, 2, 3])

Strings

We already learned a little bit about strings in the lesson on basic data types, but strings are technically sequences: immutable sequences of text characters. As sequences, they support indexing operations where the first character of a string is index 0. This means we can get individual letters or slices of letters with indexing:
In [12]:
my_string = "Hello world"
In [13]:
my_string[3]    # Get the character at index 3
Out[13]:
'l'
In [14]:
my_string[3:]   # Slice from the third index to the end
Out[14]:
'lo world'
In [15]:
my_string[::-1]  # Reverse the string
Out[15]:
'dlrow olleH'
In addition, certain sequence functions like len() and count() work on strings:
In [16]:
len(my_string)
Out[16]:
11
In [17]:
my_string.count("l")  # Count the l's in the string
Out[17]:
3
As immutable objects, you can't change a string itself: every time you transform a string with a function, Python makes a new string object, rather than actually altering the original string that exists in your computer's memory.
Strings have many associated functions. Some basic string functions include:
In [18]:
# str.lower()     

my_string.lower()   # Make all characters lowercase
Out[18]:
'hello world'
In [19]:
# str.upper()     

my_string.upper()   # Make all characters uppercase
Out[19]:
'HELLO WORLD'
In [20]:
# str.title()

my_string.title()   # Make the first letter of each word uppercase
Out[20]:
'Hello World'
Find the index of the first appearing substring within a string using str.find(). If the substring does not appear, find() returns -1:
In [21]:
my_string.find("W")
Out[21]:
-1
Notice that since strings are immutable, we never actually changed the original value of my_string with any of the code above, but instead generated new strings that were printed to the console. This means "W" does not exist in my_string even though our call to str.title() produced the output 'Hello World'. The original lowercase "w" still exists at index position 6:
In [22]:
my_string.find("w")
Out[22]:
6
Find and replace a target substring within a string using str.replace()
In [23]:
my_string.replace("world",    # Substring to replace
                  "friend")   # New substring
Out[23]:
'Hello friend'
Split a string into a list of substrings based on a given separating character with str.split():
In [24]:
my_string.split()     # str.split() splits on spaces by default
Out[24]:
['Hello', 'world']
In [25]:
my_string.split("l")  # Supply a substring to split on other values
Out[25]:
['He', '', 'o wor', 'd']
Split a multi-line string into a list of lines using str.splitlines():
In [26]:
multiline_string = """I am
a multiline 
string!
"""

multiline_string.splitlines()
Out[26]:
['I am', 'a multiline ', 'string!']
Strip leading and trailing characters from both ends of a string with str.strip().
In [27]:
# str.strip() removes whitespace by default

"    white space   ".strip() 
Out[27]:
'white space'
Override the default by supplying a string containing all characters you'd like to strip as an argument to the function:
In [28]:
"xXxxBuyNOWxxXx".strip("xX")
Out[28]:
'BuyNOW'
You can strip characters from the left or right sides only with str.lstrip() and str.rstrip() respectively:
In [29]:
"   white space   ".lstrip() 
Out[29]:
'white space   '
In [30]:
"   white space   ".rstrip()
Out[30]:
'   white space'
You can join (concatenate) two strings with the plus (+) operator:
In [31]:
"Hello " + "World"
Out[31]:
'Hello World'
Convert the a list of strings into a single string separated by a given delimiter with str.join():
In [32]:
" ".join(["Hello", "World!", "Join", "Me!"])
Out[32]:
'Hello World! Join Me!'
Although the + operator works for string concatenation, things can get messy if you start trying to join more than a couple values together with pluses.
In [33]:
name = "Joe"
age = 10
city = "Paris"

"My name is " + name + " I am " + str(age) + " and I live in " + "Paris"
Out[33]:
'My name is Joe I am 10 and I live in Paris'
For complex string operations of this sort is preferable to use the str.format() function. str.format() takes in a template string with curly braces as placeholders for values you provide to the function as the arguments. The arguments are then filled into the appropriate placeholders in the string:
In [34]:
template_string = "My name is {} I am {} and I live in {}"

template_string.format(name, age, city)
Out[34]:
'My name is Joe I am 10 and I live in Paris'
Read more about string formatting here.

Wrap Up

Basic sequences like lists, tuples and strings appear everywhere in Python code, so it is essential to understand the basics of how they work before we can start using Python for data analysis. We're almost ready to dive into data structures designed specifically data analysis, but before we do, we need to cover two more useful built in Python data structures: dictionaries and sets.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.