Monday, October 26, 2015

Python for Data Analysis Part 3: Basic Data Types


* Edit Jan 2021: I recently completed a YouTube video covering topics in this post:




In the last lesson we learned that Python can act as a powerful calculator, but numbers are just one of many basic data types you'll encounter in data analysis. A solid understanding of basic data types is essential for working with data in Python.

Integers

Integers or "ints" for short, are whole-numbered numeric values. Any positive or negative number (or 0) without a decimal is an integer in Python. Integer values have unlimited precision, meaning an integer is always exact. You can check the type of a Python object with the type() function. Let's run type() on an integer:
In [1]:
type(12)
Out[1]:
int
Above we see that the type of "12" is of type "int". You can also use the function isinstance() to check whether an object is an instance of a given type:
In [2]:
# Check if 12 is an instance of type "int"

isinstance(12, int)
Out[2]:
True
The code output True confirms that 12 is an int.
Integers support all the basic math operations we covered last time. If a math operation involving integers would result in a non-integer (decimal) value, the result is becomes a float:
In [3]:
1/3  # A third is not a whole number*
Out[3]:
0.3333333333333333
In [4]:
type(1/3)  # So the type of the result is not an int
Out[4]:
float
*Note: In Python 2, integer division performs floor division instead of converting the ints to floats as we see here in Python 3, so 1/3 would return 0 instead of 0.3333333.

Floats

Floating point numbers or "floats" are numbers with decimal values. Unlike integers, floating point numbers don't have unlimited precision because irrational decimal numbers are infinitely long and therefore can't be stored in memory. Instead, the computer approximates the value of long decimals, so there can be small rounding errors in long floats. This error is so minuscule it usually isn't of concern to us, but it can add up in certain cases when making many repeated calculations.
Every number in Python with a decimal point is a float, even if there are no non-zero numbers after the decimal:
In [5]:
type(1.0)
Out[5]:
float
In [6]:
isinstance(0.33333, float)
Out[6]:
True
The arithmetic operations we learned last time work on floats as well as ints. If you use both floats and ints in the same math expression the result is a float:
In [7]:
5 + 1.0
Out[7]:
6.0
You can convert a float to an integer using the int() function:
In [8]:
int(6.0)
Out[8]:
6
You can convert an integer to a float with the float() function:
In [9]:
float(6)
Out[9]:
6.0
Floats can also take on a few special values: Inf, -Inf and NaN. Inf and -Inf stand for infinity and negative infinity respectively and NaN stands for "not a number", which is sometimes used as a placeholder for missing or erroneous numerical values.
In [10]:
type ( float ("Inf") )
Out[10]:
float
In [11]:
type ( float ("NaN") )
Out[11]:
float
*Note: Python contains a third, uncommon numeric data type "complex" which is used to store complex numbers.

Booleans

Booleans or "bools" are true/false values that result from logical statements. In Python, booleans start with the first letter capitalized so True and False are recognized as bools but true and false are not. We've already seen an example of booleans when we used the isinstance() function above.
In [12]:
type(True)
Out[12]:
bool
In [13]:
isinstance(False, bool)  # Check if False is of type bool
Out[13]:
True
You can create boolean values with logical expressions. Python supports all of the standard logic operators you'd expect:
In [14]:
# Use >  and  < for greater than and less than:
    
20>10 
Out[14]:
True
In [15]:
20<5
Out[15]:
False
In [16]:
# Use >= and  <= for greater than or equal and less than or equal:

20>=20
Out[16]:
True
In [17]:
30<=29
Out[17]:
False
In [18]:
# Use == (two equal signs in a row) to check equality:

10 == 10
Out[18]:
True
In [19]:
"cat" == "cat" 
Out[19]:
True
In [20]:
True == False
Out[20]:
False
In [21]:
40 == 40.0  # Equivalent ints and floats are considered equal
Out[21]:
True
In [22]:
# Use != to check inequality. (think of != as "not equal to")

1 != 2
Out[22]:
True
In [23]:
10 != 10
Out[23]:
False
In [24]:
# Use the keyword "not" for negation:

not False
Out[24]:
True
In [25]:
not (2==2)
Out[25]:
False
In [26]:
# Use the keyword "and" for logical and:

(2 > 1) and (10 > 9)
Out[26]:
True
In [27]:
False and True
Out[27]:
False
In [28]:
# Use the keyword "or" for logical or:

(2 > 3) or (10 > 9)
Out[28]:
True
In [29]:
False or True
Out[29]:
True
Similar to math expressions, logical expressions have a fixed order of operations. In a logical statement, comparisons like >, < and == are executed first, followed by "not", then "and" and finally "or". See the following link to learn more: Python operator precedence. Use parentheses to enforce the desired order of operations.
In [30]:
2 > 1 or 10 < 8 and not True
Out[30]:
True
In [31]:
((2 > 1) or (10 < 8)) and not True
Out[31]:
False
You can convert numbers into boolean values using the bool() function. All numbers other than 0 convert to True:
In [32]:
bool(1)
Out[32]:
True
In [33]:
bool(-12.5)
Out[33]:
True
In [34]:
bool(0)
Out[34]:
False

Strings

Text data in Python is known as a string or str. Surround text with single or double quotation marks to create a string:
In [35]:
type("cat")
Out[35]:
str
In [36]:
type('1')
Out[36]:
str
In [37]:
isinstance("hello!", str)
Out[37]:
True
You can define a multi-line string using triple quotes:
In [38]:
print( """This string spans
multiple lines """ )
This string spans
multiple lines 
You can convert numbers from their integer or float representation to a string representation and vice versa using the int(), float() and str() functions:`
In [39]:
str(1)          # Convert an int to a string
Out[39]:
'1'
In [40]:
str(3.333)      # Convert a float to a string
Out[40]:
'3.333'
In [41]:
int('1')        # Convert a string to an int
Out[41]:
1
In [42]:
float('3.333')  # Convert a string to a float
Out[42]:
3.333
Two quotation marks right next to each other (such as '' or "") without anything in between them is known as the empty string. The empty string often represents a missing text value.
Numeric data and logical data are generally well-behaved, but strings of text data can be very messy and difficult to work with. Cleaning text data is often one of the most laborious steps in preparing real data sets for analysis. We will revisit strings and functions to help you clean text data in future lesson.

None

In Python, "None" is a special data type that is often used to represent a missing value. For example, if you define a function that doesn't return anything (does not give you back some resulting value) it will return "None" by default.
In [43]:
type(None)  
Out[43]:
NoneType
In [44]:
# Define a function that prints the input but returns nothing*

def my_function(x):
    print(x)
    
my_function("hello") == None  # The output of my_function equals None
hello
Out[44]:
True
*Note: We will cover defining custom functions in detail in a future lesson.

Wrap Up

This lesson covered the most common basic data types in Python, but it is not an exhaustive list of Python data objects or the functions. The Python language's official documentation has a more thorough summary of built-in types, but it is a bit more verbose and detailed than is necessary when you are first getting started with the language.
Now that we know about the basic data types, it would be nice to know how to save values to use them later. We'll cover that in the next lesson.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.