I've got a lovely dataframe, my very first, and I'm starting to get the hang of R. One thing I haven't been able to find is a test for duplicate values. I have one column that I'm pretty sure is all unique values, but I don't know that.
Is there a way I can ask? For simplicity, let's pretend this is my data:
var1 var2 var3
1 1 A 1
2 2 B 3
3 3 C NA
4 4 D NA
5 5 E 4
and I want to know whether var1
ever repeats.
Using the GROUP BY clause to group all rows by the target column(s) – i.e. the column(s) you want to check for duplicate values on. Using the COUNT function in the HAVING clause to check if any of the groups have more than 1 entry; those would be the duplicate values.
distinct() function can be used to filter out the duplicate rows. We just have to pass our R object and the column name as an argument in the distinct() function.
To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.
Check out the duplicated
function:
duplicated(dat$var1) # the rows of dat var1 duplicated
Documentation is here.
You should also look at the unique
function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With