I have a data frame with multiple columns and I want to be able to isolate two of the columns and get the total amount of unique values... here's an example of what I mean:
Lets say i have a data frame df:
df<- data.frame(v1 = c(1, 2, 3, 2, "a"), v2 = c("a", 2 ,"b","b", 4))
df
v1 v2
1 1 a
2 2 2
3 3 b
4 2 b
5 a 4
Now what Im trying to do is extract just the unique values over the two columns. So if i just used unique() for each column the out put would look like this:
> unique(df[,1])
[1] 1 2 3 a
> unique(df[,2])
[1] a 2 b 4
But this is no good as it only finds the unique values per column, whereas I need the total amount of unique values over the two columns! For instance, 'a' is repeated in both columns, but I only want it counted once. For an example output of what I need; imagine the columns V1 and V2 are placed on top of each other like so:
V1_V2
1 1
2 2
3 3
4 2
5 a
6 a
7 2
8 b
9 b
10 4
The unique values of V1_V2 would be:
V1_V2
1 1
2 2
3 3
5 a
8 b
10 4
Then I could just count the rows using nrow(). Any ideas how I'd achieve this?
The unique() function in R is used to eliminate or delete the duplicate values or the rows present in the vector, data frame, or matrix as well. The unique() function found its importance in the EDA (Exploratory Data Analysis) as it directly identifies and eliminates the duplicate values in the data.
The function distinct() [dplyr package] can be used to keep only unique/distinct rows from a data frame. If there are duplicate rows, only the first row is preserved. It's an efficient version of the R base function unique() .
This is well suited for union
:
data.frame(V1_V2=union(df$v1, df$v2))
# V1_V2
#1 1
#2 2
#3 3
#4 a
#5 b
#6 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With