Thursday, July 16, 2015

Introduction to R Part 3: Atomic Data Types



In Part 2 we saw that R can function as a powerful calculator but numbers are just one of several atomic or basic data types in R. Learning R's basic data types and how to convert between them is an essential skill for data analysis.

Atomic Type #1: Double

Numbers themselves come in 3 different flavors. Decimal numbers (real numbers) in R are known as doubles. Doubles are the default numeric data type so when you manually enter a number in R, you are working with a double.
You can check an object's type using the typeof() function:
In [1]:
typeof(1)      # Numbers are type double by default

typeof(-10.5)
Out[1]:
"double"
Out[1]:
"double"
In [2]:
# Doubles can take on the values Inf and -Inf, indicating infinity and -infinity

typeof(Inf)

typeof(-Inf)
Out[2]:
"double"
Out[2]:
"double"

Atomic Type #2: Integer

Integers are a second numeric data type that only take whole numbered values. Sometimes data you load into R will be integers instead of doubles. If you want to create an integer yourself, you can do so by converting a double with the as.integer() function (In practice however, there is little reason to convert doubles to integers.).
In [3]:
as.integer(1)            # Convert the double 1 into the integer 1

typeof(as.integer(1))    # Confirm that it is of type integer
# Converting a double to an integer drops the decimal
as.integer(1.6)
Out[3]:
1
Out[3]:
"integer"
Out[3]:
1
You can convert an integer to a double using the as.numeric() function:
In [4]:
typeof(as.numeric(as.integer(1)))
Out[4]:
"double"
*Note: the previous line of code may look confusing because it has three nested parentheses. As with math expressions, when you nest functions, the inner functions execute first. In this case, we first create the integer 1 using the as.integer() function, then we convert it to a double using the as.numeric() function and finally we check its type with typeof() function.
If you happen to encounter data that consists of doubles and integers, you can still perform math operations as normal. R knows that doubles and integers are both numeric data types so it will convert them as necessary to perform mathematical operations:
In [5]:
10.1 + as.integer(20)
Out[5]:
30.1

Atomic Type #3: Complex

As a math-focused language, R also supports complex numbers as a third numeric basic data type. Complex numbers are numbers with both a real and imaginary component, which are frequently encountered in fields like engineering, physics and signal processing. Complex numbers are expressed in the form: 2 + 3i where 2 is the real part, and 3i is the imaginary part.
In [6]:
typeof(2 + 3i)  

typeof(3i)
Out[6]:
"complex"
Out[6]:
"complex"
In the realm of complex numbers, i is defined as the square root of -1. If you try to take the square root of a negative real number, the result is NaN or not a number:
In [7]:
sqrt(-1)
Warning message:
In sqrt(-1): NaNs produced
Out[7]:
NaN
You can however, take the square root of the complex version of -1:
In [8]:
sqrt(-1 + 0i)
Out[8]:
[1] 0+1i
In [9]:
as.complex(10)  # Convert doubles to complex with as.complex()
Out[9]:
[1] 10+0i
In [10]:
# Converting a complex number to a double drops the imaginary part
as.numeric(10+1i)
Warning message:
In eval(expr, envir, enclos): imaginary parts discarded in coercion
Out[10]:
10

Atomic Type #4: Logical

Our first non-numeric data type is the Logical. A Logical takes on the value of TRUE or FALSE. You must type TRUE and FALSE in all capital letters for R to recognize them as logical values. Data that only takes on the values of True or False are also called Booleans.
In [11]:
typeof(TRUE)

typeof(FALSE)
Out[11]:
"logical"
Out[11]:
"logical"
You can create logical values with logical comparisons. R supports a variety of standard logic operators:
In [12]:
# Use >  and  < for greater than and less than:

20>10 

20<5
Out[12]:
TRUE
Out[12]:
FALSE
In [13]:
# Use >= and  <= for greater than or equal and less than or equal:

20>=20

30<=29
Out[13]:
TRUE
Out[13]:
FALSE
In [14]:
# Use == (two equal signs in a row) to check equality:

10 == 10

40 == as.integer(40)

"cat" == "cat"

TRUE == FALSE
Out[14]:
TRUE
Out[14]:
TRUE
Out[14]:
TRUE
Out[14]:
FALSE
In [15]:
# Use ! for negation. (think of ! as "not")

!FALSE

!(2==2)
Out[15]:
TRUE
Out[15]:
FALSE
In [16]:
# Use != to check inequality. (think of != as "not equal to")

1 != 2

10 != 10
Out[16]:
TRUE
Out[16]:
FALSE
In [17]:
#Use & for logical and:

(2 > 1) & (10 > 9)

!FALSE & !TRUE
Out[17]:
TRUE
Out[17]:
FALSE
In [18]:
# Use | for logical or:

(2 > 3) | (10 > 9)

!FALSE | !TRUE
Out[18]:
TRUE
Out[18]:
TRUE
Similar to math expressions, logical operations have a fixed order of operations. In a logical statement, ! is executed first, followed by & and finally |. Equalities and inequalities are executed last.
In [19]:
2 > 1 | 10 < 8 & !TRUE

# Use parentheses to enforce the order of operations you desire
((2 > 1) | (10 < 8)) & !TRUE
Out[19]:
TRUE
Out[19]:
FALSE
Use the xor() function for the exclusive or logical operation. Exclusive or returns TRUE if one of the two arguments is true but it returns false if both arguments are true:
In [20]:
xor(TRUE,TRUE)

xor(TRUE,FALSE)
Out[20]:
FALSE
Out[20]:
TRUE
You can convert numeric values to logical using the as.logical() function. All numeric values convert to TRUE except for zero.
In [21]:
as.logical(1)

as.logical(0)

as.logical(-10)
Out[21]:
TRUE
Out[21]:
FALSE
Out[21]:
TRUE
You can convert logical values to numeric using the as.numeric() function. FALSE translates to 0, while TRUE translates to 1:
In [22]:
as.numeric(TRUE)

as.numeric(FALSE)
Out[22]:
1
Out[22]:
0
Logical operations are very useful for filtering data as we'll see in coming lessons.

Atomic Type #5: Character

Strings of text in R are known as characters. Surround text with quotation marks to create a character:
In [23]:
typeof("cat")

typeof("1")

typeof('hello!')
Out[23]:
"character"
Out[23]:
"character"
Out[23]:
"character"
Convert numbers that are in character form to numeric using the as.numeric() function:
In [24]:
as.numeric("12")

typeof(as.numeric("12"))
Out[24]:
12
Out[24]:
"double"
Convert numbers to character using the as.character() function:
In [25]:
as.character(12)

typeof(as.character(12))
Out[25]:
"12"
Out[25]:
"character"
While numeric data and logical data are generally well-behaved, strings of text data can be very messy and difficult to work with. Cleaning text data is often one of the most laborious steps in preparing real data sets for analysis. We will revisit character data and functions to help you clean it in a future lesson.
There is a sixth rare data type known as "raw" intended to hold raw bytes. You'll very rarely, if ever, encounter data in the raw format so it doesn't warrant more discussion.
Now that we know about the all basic data types, it would be nice to know how to save values to use them later in calculations and functions. We'll learn that next time.

1 comment:

Note: Only a member of this blog may post a comment.