In R, a matrix is a 2 dimensional data structure that contains atomic elements of the same type. Just like a vector, a matrix can hold numeric values, logical values, characters or other atomic data types. A matrix consists of 2 or more rows and 2 or more columns. When talking about matrices, the letter m refers to number of rows and n refer to the number of columns.
Matrix Construction
To construct a matrix in R, you can use the matrix() function which takes several arguments:
In [1]:
X <- matrix(data = c(1,2,3,4,5,6), # a vector used to construct the matrix
nrow = 2, # number of rows in the matrix
ncol = 3, # number of columns in the matrix
byrow = FALSE) # fill the matrix by rows or columns?
print(X)
Matrices are typically assigned to capital lettered variables to distinguish them from vectors.
You can add to a matrix you've already constructed with the rbind() and cbind() functions. rbind takes a sequence of vectors, matrices or data frames (we'll cover those later) as arguments and combines them by rows, while cbind() combines sequences of data objects by columns:
In [2]:
X <- rbind(X, c(7,8,9) ) # Adds the vector c(7,8,9) as a new row
print(X)
In [3]:
Y <- matrix( seq(10,15,1), 3, 2) # Create a new 3x2 matrix
print (Y)
X <- cbind(X, Y) # Add the new matrix to X by column
print(X)
*Note: the arguments to rbind() should have the same number of columns and the arguments to cbind() should have the same number of rows.
Since cbind() and rbind() work on vectors, you can use them to construct matrices from vectors on a row by row, or column by column basis:
In [4]:
Z <- rbind(c(1,2,3), c(4,5,6), c(7,8,9))
print(Z)
You can turn a matrix's rows into columns and columns into rows using the transpose function t():
In [5]:
X <- t(X)
print (X)
Transpose essentially flips a matrix along the main diagonal.
You can also convert a matrix into a vector using the c() function:
In [6]:
c(X) # Convert a matrix into a vector by column
# If you want to convert a matrix to vector by row, take the transpose first:
c(t(X))
Out[6]:
Out[6]:
Matrix Indexing
Similar to vectors, you can access the elements inside a matrix with indexing. You may have noticed that when we printed matrices above, the rows and columns were labeled with values in square brackets. Those labels are the index values of the rows and columns. Since matrices have two dimensions, they have two indexes, a row index and a column index, which are separated by a comma. You can use indices to grab specific values, rows or columns in a matrix:
In [7]:
X[3,2] # Get the value at row 3 column 2
# Leave the row or column index blank to take the entire row or column:
X[3, ] # Get row 3
X[ ,2] # Get column 2
Out[7]:
Out[7]:
Out[7]:
You can also take slices of rows and columns, just like you can with vectors:
In [8]:
print( X[4:5, 2:3] ) # Get data points in rows 4 and 5 and columns 2 and 3
All the vector indexing operations discussed last time work for matrices:
In [9]:
print( X[c(1,3,5), c(1,3)] ) # Vector indexing
In [10]:
print( X[ -2, -2] ) # Remove row 2 and remove column 2
In [11]:
X_logical <- (X %% 2 == 0) # Create a logical matrix identifying even numbers
print(X_logical)
In [12]:
X[X_logical] # Use the logical matrix as an index to get even values in X
Out[12]:
In [13]:
X[X %in% c(2,6,8,9,10,15,100)] # Get matrix values contained in a vector
Out[13]:
A matrix can also have named dimensions. You can assign dimension names when creating a matrix by passing a list of two vectors the dimnames argument:
In [14]:
Z <- matrix(c(1,2,3,4), 2, 2, dimnames = list( c("r1","r2"), c("c1","c2")) )
print(Z)
You can also create or reassign dimnames after creating a matrix:
In [15]:
dimnames(Z) <- list( c("first_r","second_r"), c("first_c","second_c"))
print(Z)
When dimensions are named, you can index the matrix using the dimension names or the normal numeric indexes:
In [16]:
Z[2,2] # Get the value at 2,2 with numeric indexing
Z["second_r","second_c"] # Get the value at 2,2 with named dimensions
Out[16]:
Out[16]:
Matrix Operations
Matrices in R offer many of the same conveniences as vectors. For instance, you can perform element-wise math operations on matrices of same dimensions by using the standard math symbols:
In [17]:
X <- Y <- matrix(c(1,2,-1,1,1,2,1,2,3),3,3) #Make two new identical matrices
print(X)
print(Y)
# You can use the dim() function to check matrix dimensions:
dim(X)
dim(Y)
Out[17]:
Out[17]:
In [18]:
print(X + Y) # Element-wise addition
In [19]:
print(X * Y) # Element-wise multiplication
In [20]:
print(X / Y) # Element-wise division
For true matrix multiplication, use the %*% operator:
In [21]:
print( X %*% Y )
Here are a few other useful matrix operations:
In [22]:
diag(X) # Get elements on the main diagonal of a matrix
Out[22]:
In [23]:
solve(X) # Get the inverse of a square matrix
Out[23]:
In [24]:
eigen(X) # Get the eigenvectors and eigenvalues of a matrix
Out[24]:
In [25]:
rowSums(X) # Get the sums of the rows
colSums(X) # Get the sums of the columns
rowMeans(X) # Get the means of the rows
colMeans(X) #Get the means of the columns
Out[25]:
Out[25]:
Out[25]:
Out[25]:
In [26]:
sum(X) # Sum all the values in X
Out[26]:
In [27]:
min(X) # Get the min value in X
Out[27]:
In [28]:
max(X) # Get the max value in X
Out[28]:
In [29]:
mean(X) # Get the mean of all the values in X
Out[29]:
*Note: sum, min max and mean also work on vectors
Any time you want to perform a matrix operation, there's a good chance R has a function that does just what you need built in or available in a package. Google is your friend.
R also contains an array data structure that stores elements of the same atomic type in an arbitrary number of dimensions. An array is just a vector stored with an extra attribute "dim" that specifies its dimensions:
In [30]:
A <- array(1:64, dim = c(4,4,4)) #create a 4 x 4 x 4 array
dim(A) #Check the dimensions of the array
print(A)
Out[30]:
Arrays generally support the same types of indexing and vector operations as vectors and matrices, just with different numbers of dimensions. Arrays are uncommon so we won't study them any further.
Up till now, all the data structures we've covered hold values of the same atomic type. Real world data comes in all shapes and forms so the data structures we know aren't sufficient to tackle most real data sets. Next time we'll learn about our first heterogeneous data structure: lists.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.