Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if exists any duplicate in Java 8 Streams?

In java 8, what's the best way to check if a List contains any duplicate?

My idea was something like:

list.size() != list.stream().distinct().count() 

Is it the best way?

like image 675
pedrorijo91 Avatar asked May 05 '15 12:05

pedrorijo91


People also ask

How do I find duplicates in a string in Java 8?

In Java 8 Stream, filter with Set. Add() is the fastest algorithm to find duplicate elements, because it loops only one time. Set<T> items = new HashSet<>(); return list. stream() .

How does Stream detect duplicate values in a list?

Get the stream of elements in which the duplicates are to be found. For each element in the stream, count the frequency of each element, using Collections. frequency() method. Then for each element in the collection list, if the frequency of any element is more than one, then this element is a duplicate element.

What is the function used to duplicate a stream?

CopyTo(Stream) Reads the bytes from the current stream and writes them to another stream. Both streams positions are advanced by the number of bytes copied.


2 Answers

Your code would need to iterate over all elements. If you want to make sure that there are no duplicates simple method like

public static <T> boolean areAllUnique(List<T> list){     Set<T> set = new HashSet<>();      for (T t: list){         if (!set.add(t))             return false;     }      return true; } 

would be more efficient since it can give you false immediately when first non-unique element would be found.

This method could also be rewritten as (assuming non-parallel streams and thread-safe environment) using Stream#allMatch which also is short-circuit (returns false immediately for first element which doesn't fulfill provided condition)

public static <T> boolean areAllUnique(List<T> list){     Set<T> set = new HashSet<>();     return list.stream().allMatch(t -> set.add(t)); } 

or as @Holger mentioned in comment

public static <T> boolean areAllUnique(List<T> list){     return list.stream().allMatch(new HashSet<>()::add); } 
like image 110
Pshemo Avatar answered Oct 11 '22 16:10

Pshemo


I used the following:
1. return list.size() == new HashSet<>(list).size();.

I'm not sure how it compares to:
2. return list.size() == list.stream().distinct().count();
and
3. return list.stream().sequential().allMatch(new HashSet<>()::add);
in terms of performance.

The last one (#3) has possibility to handle not only collections (e.g. lists), but also streams (without explicitly collecting them).

Upd.: The last one (#3) seems to be the best not only because it can handle pure streams, but also because it stops on the first duplicate (while #1 and #2 always iterate till the end) — as @Pshemo said in comment.

like image 44
Sasha Avatar answered Oct 11 '22 17:10

Sasha