When working with data, it’s common to have a set of observations organized into structured groups. These sets of related information are called tables, and in R, they are represented using special data structures called Data Frames. The Data Frame is one of the most commonly used R data structures and works in much the same way as a spreadsheet; there are rows and columns, but instead of numbers or text, each cell can contain a different type of data.
In this blog post, you will learn what Data Frames are, how they work with column names and row indices, how to create them from scratch, load them from different file formats and manage their attributes in order to make your analysis easier.
Accessing individual elements of the data frame
The cells of the data frame can be accessed by specifying the row number and the column number of the element which is to be extracted. The syntax used for the value extraction is as follows:Â
Syntax
data-frame[row-indx, col-indx]
Example
#creating a data frame data_frame = data.frame(col1 = c(1:5), col2 = c("Amma","baba","cathy","daddy","emma"), col3 = c(T,F,T,T,F), col4 = letters[1:5]) #print data frame print("Data Frame") print(data_frame) #accessing element of second column and third row ele = data_frame[3,2] cat("Element at 2,3 position : ", ele)
Output
[1] "Data Frame" col1 col2 col3 col4 1 1 Amma TRUE a 2 2 baba FALSE b 3 3 cathy TRUE c 4 4 daddy TRUE d 5 5 emma FALSE e Element at 2,3 position : cathy
Accessing entire row data from the data frame
The entire row data can be extracted from the data frame in the form of a vector. The syntax required to access the entire row is:
SyntaxÂ
data-frame [ row-indx , ]
Where the row-indx
is the row to be retrieved.Â
Example
#creating a data frame data_frame = data.frame(col1 = c(1:5), col2 = c("Amma","baba","cathy","daddy","emma"), col3 = c(T,F,T,T,F), col4 = letters[1:5]) #print data frame print("Data Frame") print(data_frame) #accessing 3rd row data from the data frame vec = data_frame[3,] print("Elements of third row") print(vec)
Output
[1] "Data Frame" col1 col2 col3 col4 1 1 Amma TRUE a 2 2 baba FALSE b 3 3 cathy TRUE c 4 4 daddy TRUE d 5 5 emma FALSE e [1] "Elements of third row" col1 col2 col3 col4 3 3 cathy TRUE c
Accessing column data from the data frame
The entire column data can be extracted from the data frame in the form of a vector. The syntax required to access the entire column is:Â
Syntax
data-frame [ , col-indx ]
Where the col-indx
is the column to be retrieved.Â
Example
#creating a data frame data_frame = data.frame(col1 = c(1:5), col2 = c("Amma","baba","cathy","daddy","emma"), col3 = c(T,F,T,T,F), col4 = letters[1:5]) #print data frame print("Data Frame") print(data_frame) #accessing 2nd column data from the data frame vec = data_frame[,2] print("Elements of second column") print(vec)
Output
[1] "Data Frame" col1 col2 col3 col4 1 1 Amma TRUE a 2 2 baba FALSE b 3 3 cathy TRUE c 4 4 daddy TRUE d 5 5 emma FALSE e [1] "Elements of second column" [1] "Amma" "baba" "cathy" "daddy" "emma"
Accessing a range of rows and columns from the data frame
We may need to gather a range of columns or rows separated by the colon (:)
the operator from the data frame by specifying the rows and columns to be extracted.Â
Syntax
data-frame[ st-row-indx:end-row-indx , st-col-indx:end-col-indx]
- Where, st-row-indx - starting row index
- end-row-indx - ending row indexÂ
- st-col-indx - starting column index
- end-col-indx - ending column index
In this case, a subset of the data frame is accessed and formed by the intersection of rows and columns formed. In case the column indices are empty, then the entire rows for the specified row indices are accessed. In case the row indices are empty, then the entire columns for the specified column indices are accessed.Â
Example
#creating a data frame data_frame = data.frame(col1 = c(1:5), col2 = c("Amma","baba","cathy","daddy","emma"), col3 = c(T,F,T,T,F), col4 = letters[1:5]) #print data frame print("Data Frame") print(data_frame) #accessing 2nd,3rd and 4th row data from the data frame vec = data_frame[2:4,] print("Elements of second to fourth row") print(vec) #accessing data from first to second column in the data frame vec2 = data_frame[,1:2] print("Elements of first to second column") print(vec2) #accessing data from intersection of 4th and 5th row and 2nd and 3rd column vec3 = data_frame[4:5,2:3] print("Elements from intersection of 4th and 5th row and 2nd and 3rd column") print(vec3)
Output
[1] "Data Frame" col1 col2 col3 col4 1 1 Amma TRUE a 2 2 baba FALSE b 3 3 cathy TRUE c 4 4 daddy TRUE d 5 5 emma FALSE e [1] "Elements of second to fourth row" col1 col2 col3 col4 2 2 baba FALSE b 3 3 cathy TRUE c 4 4 daddy TRUE d [1] "Elements of first to second column" col1 col2 1 1 Amma 2 2 baba 3 3 cathy 4 4 daddy 5 5 emma [1] "Elements from intersection of 4th and 5th row and 2nd and 3rd column" col2 col3 4 daddy TRUE 5 emma FALSE
Extracting the number of rows and columns of the data frame
The number of rows and columns of the data frame can be accessed using the nrow()
and the ncol()
methods of the data frame respectively. The columns can also be extracted using the length()
method in R. All these methods just take the data frame name as the function parameter.
Example
#creating a data frame data_frame = data.frame(col1 = c(1:5), col2 = c("Amma","baba","cathy","daddy","emma"), col3 = c(T,F,T,T,F), col4 = letters[1:5]) #print data frame print("Data Frame") print(data_frame) cat("Number of rows : ",nrow(data_frame)) cat("Number of columns : ",ncol(data_frame))
Output
[1] "Data Frame" col1 col2 col3 col4 1 1 Amma TRUE a 2 2 baba FALSE b 3 3 cathy TRUE c 4 4 daddy TRUE d 5 5 emma FALSE e Number of rows : 5> cat("Number of columns : ",ncol(data_frame)) Number of columns : 4
The dim()
the method can be used to retrieve both the rows and columns of the data frame collectively.