In R, data frames are used to store and manipulate tabular data into a single variable. Data frames offer a number of advantages over traditional column-by-column matrix storage. This article will cover the data structure of data frames—that is, what are data frames, and how they can be created and used in the R programming language.
Introduction to Data Frames in R
A data frame is a tabular representation of data elements, arranged in a well-defined fixed format. It is similar to the matrix data structure, but each column in the data frame can belong to a different data type. A data frame can belong to any of the predefined data types, integer, character, or logical data type. The data frame in R programming has the following attributes :Â
- There are a fixed number of rows and columns in the data frame.
- The names of columns in a data frame cannot be empty.
- The names of rows cannot be repetitive in nature. Each name should be unique.
- Every column contains the same number of data items.
R provides us with a large number of functions to create a data frame as well as access the various attributes associated with it.
Creating a Data Frame in R
A data frame in R can be created using the data.frame()
method, which takes as arguments the column names and their data. Each column name is associated with a vector of values, wherein, each column may or may not belong to the same data type. The number of entries in each column should be the same.
#creating a data frame data_frame = data.frame(col1 = c(1:5), col2 = c("Amma","baba","cathy","daddy","emma"), col3 = c(T,F,T,T,F)) #printing the data frame print("Data Frame") print(data_frame)
The code produces the following output :
[1] "Data Frame" col1 col2 col3 1 1 Amma TRUE 2 2 baba FALSE 3 3 cathy TRUE 4 4 daddy TRUE 5 5 emma FALSE
Structure of the Data Frame
The structure of the data frame is a complete illustration of the data frame object and the elements, that is the rows and columns making it up. The in-built str()
method in R is used to retrieve the internal structure of the data frame. The str()
method simply takes as a parameter the data frame.
#creating a data frame data_frame = data.frame(col1 = c(1:5), col2 = c("Amma","baba","cathy","daddy","emma"), col3 = c(T,F,T,T,F)) #printing the data frame print("Data Frame") print(data_frame) #printing the structure of the data frame print("DataFrame Structure") str(data_frame)
Output :
[1] "Data Frame" col1 col2 col3 1 1 Amma TRUE 2 2 baba FALSE 3 3 cathy TRUE 4 4 daddy TRUE 5 5 emma FALSE 'data.frame': 5 obs. of 3 variables: $ col1: int 1 2 3 4 5 $ col2: chr "Amma" "baba" "cathy" "daddy" ... $ col3: logi TRUE FALSE TRUE TRUE FALSE
Summary of the Data Frame in R
The summary()
the method in R is used to generate the summary statistics for the columns individually. In case, the data type of the column is numeric in nature, then the summary returns the values for the mean, median, and mode as well as the min and max for the data frame. For character type columns, the summary method returns the attributes like mode or length respectively.
#creating a data frame data_frame = data.frame(col1 = c(1:5), col2 = c("Amma","baba","cathy","daddy","emma"), col3 = c(T,F,T,T,F)) #printing the summary of the data frame print("DataFrame Summary") summary(data_frame)
Output
[1] "DataFrame Summary" col1 col2 col3 Min. :1 Length:5 Mode :logical 1st Qu.:2 Class :character FALSE:2 Median :3 Mode :character TRUE :3 Mean :3 3rd Qu.:4 Max. :5