In this article we will look into the factor data structure in R. We will explore the factors and their attributes.
Factors in R are basically categorical data objects, which are used to classify the data based on values. A factor may or may not contain unique values. This type of data structure is used to simulate data storage on multiple levels. An integer is mapped to every unique value in the factor vector. Factor vectors can be illusioned to be equivalent to character vectors, but internally, a factor is a sequence of mappings to these values.
The levels in a factor represent the unique values in a factor.
Creating factors in R
A factor in R can be specified using the factor()
keyword. It creates another vector of levels corresponding to unique values in the first vector. The factors can also be created from string vectors, which may or may not contain the replicated values.
#creating a factor fac = factor(c(1,2,4,1,2,2,5)) #printing the factor values print(fac)
The code produces the following output :
[1] 1 2 4 1 2 2 5 Levels: 1 2 4 5
However, if an integer vector is supplied without the factor()
method, then the vector doesn’t qualify to become a factor. Strings or character vectors are by default qualified to become factors.
Defining Levels of factors in R
The levels in a factor are the distinct values contained within it. So, the levels are defined on the basis of the data elements in the factor. However, the levels of the factor can also be pre-defined by the user, irrespective of the data that occurs within it.
#creating a factor fac1 = factor(c(1,2,4,1,2,2,5)) #printing the factor print(fac1) #printing the levels of factors cat("initial levels : ", levels(fac1)) #creating factors with customised levels fac2 = factor(c(1,2,4,1,2,2,5), levels = c(1,2,3,4,5)) #printing the factor print(fac2) #printing the levels of factors cat("initial levels : ", levels(fac2))
The output produced by the code is :
[1] 1 2 4 1 2 2 5 Levels: 1 2 4 5 initial levels : 1 2 4 5 [1] 1 2 4 1 2 2 5 Levels: 1 2 3 4 5 initial levels : 1 2 3 4 5
Validating if a vector is factor in R
Any vector can be checked whether it qualifies to become a factor or not; first is using the class() method, which is used to tell the category to which the data object falls.
class (vec)
, where vec is the input vector,
#creating a factor fac1 = factor(c(1,2,4,1,2,2,5)) #checking if fac1 is factor cat("fac1 class : ", class(fac1)) #creating a factor fac2 = c("r" , "python", "c","r","r") #checking if fac1 is factor cat("fac2 class : ", class(fac2))
The code produces the following output :
fac1 class : factor fac2 class : character
The is.factor()
the inbuilt method can also be used to check whether the input parameter is a factor or not.
is.factor(vec)
#creating a factor fac1 = factor(c(1,2,4,1,2,2,5)) #checking if fac1 is factor cat("fac1 factor?", is.factor(fac1)) #creating a factor fac2 = c("r" , "python", "c","r","r") #checking if fac1 is factor cat("fac2 factor?", is.factor(fac2))
The output produced by the code is :
fac1 factor? TRUE fac2 factor? FALSE