Home Β» Julia Β» How to Remove Duplicates from DataFrame in Julia?

How to Remove Duplicates from DataFrame in Julia?

To remove duplicates from a dataframe in Julia, you can use the unique function from the DataFrames package. Here is an example of how to use it:

Remove Duplicates from DataFrame in Julia Example

using DataFrames

# create a sample dataframe
df = DataFrame(x = [1, 2, 2, 3, 3, 3], y = [4, 5, 5, 6, 6, 6])
6Γ—2 DataFrame
 Row β”‚ x      y     
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚     1      4
   2 β”‚     2      5
   3 β”‚     2      5
   4 β”‚     3      6
   5 β”‚     3      6
   6 β”‚     3      6
# remove duplicates
df_unique = unique(df)
3Γ—2 DataFrame
 Row β”‚ x      y     
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚     1      4
   2 β”‚     2      5
   3 β”‚     3      6
# display the resulting dataframe
df_unique
3Γ—2 DataFrame
 Row β”‚ x      y     
     β”‚ Int64  Int64 
─────┼──────────────
   1 β”‚     1      4
   2 β”‚     2      5
   3 β”‚     3      6

This will create a new dataframe df_unique that only contains the unique rows of df. Note that the unique function only considers the values of the columns when determining uniqueness. If you want to consider a subset of the columns, you can pass them as an argument to the unique function like this:

df_unique = unique(df, [:x])

This will remove duplicates based on the values in the x column only.

You can also use the unique! function to remove duplicates in place, modifying the original dataframe.

unique!(df)

Related:

  1. Convert Array to DataFrame in Julia
  2. How to Create DataFrame in Julia?