Fastest way to test if there are any duplicates in a Java list/array
We will try to determine if there are any duplicates in an array or a list in Java programming language. We will compare three different algorithms and compare their durations.
1. The first algorithm runs on an array. We have implemented two loops to compare the elements of the list to other elements. The complexity is O(n2).
2. The second algorithms is similar to the first one. Instead of using an array, we take a list as an input.
3. And the last algorithm uses Java HashSets. The complexity of inserting an element to a HashSet is O(1), but the worst case performance might increase up to O(n). Thus, the complexity of inserting all the elements may be somewhere between O(n) and O(n2).
Here is the test code and comparison results.
public class TestDuplicate {
public static boolean hasDuplicateArray(double[] arr)
{
int count = arr.length;
for (int i = 1; i < count; ++i)
for (int j = 0; j < i; ++j)
if (arr[i] == arr[j])
return true;
return false;
}
public static boolean hasDuplicateList(List list)
{
int count = list.size();
for (int i = 1; i < count; ++i)
{
Double elem1 = list.get(i);
for (int j = 0; j < i; ++j)
{
Double elem2 = list.get(j);
if (elem1 == elem2)
return true;
}
}
return false;
}
public static boolean hasDuplicateMap(List list)
{
HashSet set = new HashSet();
int count = list.size();
for (int i = 0; i < count; ++i)
{
Double elem = list.get(i);
if (set.contains(elem))
return true;
set.add(elem);
}
return false;
}
public static void main(String[] args)
{
Random rand = new Random(2020);
double[] arr = new double[100000];
List list = new ArrayList();
for (int i = 0; i < arr.length; ++i)
{
arr[i] = rand.nextDouble();
list.add(arr[i]);
}
long startTime = System.currentTimeMillis();
boolean result = hasDuplicateArray(arr);
long duration = System.currentTimeMillis() - startTime;
System.out.println("Java Array: \t" + result + "\t" + duration + " ms");
startTime = System.currentTimeMillis();
result = hasDuplicateList(list);
duration = System.currentTimeMillis() - startTime;
System.out.println("Java List: \t" + result + "\t" + duration + " ms");
startTime = System.currentTimeMillis();
result = hasDuplicateMap(list);
duration = System.currentTimeMillis() - startTime;
System.out.println("Java HashSet: \t" + result + "\t" + duration + " ms");
}
}
The test results are obtained on Intel Core i5-8250U processor. Here are the results for 100K elements:
Java Array: false 2910 ms
Java List: false 3976 ms
Java HashSet: false 27 ms
If you want to repeat the test for 1M elements, the result is:
Java Array: false 409085 ms
Java List: false 1793858 ms
Java HashSet: false 605 ms
According to the results, the fastest way to understand if there are any duplicates in a Java collection is to use a HashSet.