In this article, we will study the various available data structures in R and when they are used.
Data structures are collections of elements stored together in one single place. The collection may contain the same or different types of elements. Each data structure is associated with its own set of attributes, defined using in-built methods.
The major data structures in R are as follows :
Vectors in R
A vector is a homogeneous collection of elements stored together sequentially. Vectors are also known as uni-dimensional arrays. Each vector is associated with a length attribute, which specifies the number of elements contained within it. An empty vector is associated with a length equivalent to 0. Vectors can contain elements belonging to any data type, be it an integer, logical, or string type in nature.
#declaring vectors vec1 = c(2,5,-1,8,10) vec2 = c("Hi","There") #printing the contents of vec1 cat("Vec1", vec1) #printing the length of vec1 cat("length of vec1 ", length(vec1))
The code produces the following output :
Vec1 2 5 -1 8 10 length of vec1 5
Lists in R
A list is a heterogeneous collection of elements stored together. A list may contain matrices, other lists or vectors, or even singular elements. Lists are also known as generic vectors, because of the variability of the data types in the elements stored within them. The list() method is used to create a list in R.
#declaring list list_obj <- list("a",c(1:4),4+3i,list(TRUE,-10)) #printing the contents of the list print("list Contents") print(list_obj)
The code produces the following output :
[1] "list Contents" [[1]] [1] "a" [[2]] [1] 1 2 3 4 [[3]] [1] 4+3i [[4]] [[4]][[1]] [1] TRUE [[4]][[2]] [1] -10
Data Frames in R
A data frame is a collection of heterogeneous elements stored together in tabular form. The elements are arranged in rows and columns. Data frames are two-dimensional in nature and can be declared using the data.frame() method in R. Every row in the data frame must have the same number of elements. The elements should also have the same data type. The syntax of this method is :
data.frame(col1 , … coln )
where col1.. coln: a vector of values of the same data type
#declaring a data frame #col2 containing string values data_frame <- data.frame(col1 = c(1:3), #col1 containing numerical values col2 = c("Hi","Readers","This is about DS")) print("Data Frame") print(data_frame)
The code produces the following output :
[1] "Data Frame" col1 col2 1 1 Hi 2 2 Readers 3 3 This is about DS
Matrices
A matrix is an ordered collection of homogeneous elements arranged together in the form of rows and columns. It may be square or rectangular in nature. Matrices are two-dimensional R objects, created by the matrix() method. The elements in the matrix are stored in column-wise order. At least the number of rows or columns have to be specified in the matrix method. The method has the following syntax :
Matrix (seq , nrow = , ncol = )
Where nrow is the number of rows and ncol is the number of columns
#declaring a matrix mat <- matrix(c(1:12), ncol = 4) print("Matrix") print(mat)
The code produces the following output :
[1] "Matrix" [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12
Arrays
Arrays are n-dimensional R objects containing homogeneous elements. The declaration of the array is made using the array() method which takes as input a vector of elements that is arranged according to the specified dimensions. The syntax is :
array( vec , dim = (rol , col , num)) , where dim specifies the num of arrays each having row x col dimensions.
#declaring a array arr <- array(c(1:20) , dim = c(5,2,2)) #printing the array elements print("Array Contents : ") print(arr)
The code produces the following output :
[1] "Array Contents : " , , 1 [,1] [,2] [1,] 1 6 [2,] 2 7 [3,] 3 8 [4,] 4 9 [5,] 5 10 , , 2 [,1] [,2] [1,] 11 16 [2,] 12 17 [3,] 13 18 [4,] 14 19 [5,] 15 20
Factors
The factors are a vector of values wherein each unique value is aligned with a level. The number of levels in a factor corresponds to the number of unique values within it. Factors are used mostly in the machine learning domain.Â
#declaring a factor fac <- factor(c("R","Python", "C++","Python", "C++","R","R","C++")) print("Unique Levels of factor") print(fac)
The output produced by the code is as follows :
[1] "Unique Levels of factor" [1] R Python C++ Python C++ R R C++ Levels: C++ Python R