Fastest way to test if there are any duplicates in a Java list/array

Fastest way to test if there are any duplicates in a Java list/array

We will try to determine if there are any duplicates in an array or a list in Java programming language. We will compare three different algorithms and compare their durations.

1. The first algorithm runs on an array. We have implemented two loops to compare the elements of the list to other elements. The complexity is O(n2).

2. The second algorithms is similar to the first one. Instead of using an array, we take a list as an input.

3. And the last algorithm uses Java HashSets. The complexity of inserting an element to a HashSet is O(1), but the worst case performance might increase up to O(n). Thus, the complexity of inserting all the elements may be somewhere between O(n) and O(n2).

Here is the test code and comparison results.

public class TestDuplicate {
   
    public static boolean hasDuplicateArray(double[] arr)
    {
        int count = arr.length;
       
        for (int i = 1; i < count; ++i)
            for (int j = 0; j < i; ++j)
                if (arr[i] == arr[j])
                    return true;
       
        return false;
    }
   
    public static boolean hasDuplicateList(List list)
    {
        int count = list.size();
       
        for (int i = 1; i < count; ++i)
        {
            Double elem1 = list.get(i);
           
            for (int j = 0; j < i; ++j)
            {
                Double elem2 = list.get(j);
               
                if (elem1 == elem2)
                    return true;
            }
        }

        return false;
    }
   
    public static boolean hasDuplicateMap(List list)
    {
        HashSet set = new HashSet();
        int count = list.size();
       
        for (int i = 0; i < count; ++i)
        {
            Double elem = list.get(i);
           
            if (set.contains(elem))
                return true;
           
            set.add(elem);
        }
       
        return false;
    }
   
    public static void main(String[] args)
    {
        Random rand = new Random(2020);
        double[] arr = new double[100000];
        List list = new ArrayList();
       
        for (int i = 0; i < arr.length; ++i)
        {
            arr[i] = rand.nextDouble();
            list.add(arr[i]);
        }

        long startTime = System.currentTimeMillis();
        boolean result = hasDuplicateArray(arr);
        long duration = System.currentTimeMillis() - startTime;
        System.out.println("Java Array: \t" + result + "\t" + duration + " ms");
       
        startTime = System.currentTimeMillis();
        result = hasDuplicateList(list);
        duration = System.currentTimeMillis() - startTime;
        System.out.println("Java List: \t" + result + "\t" + duration + " ms");
       
        startTime = System.currentTimeMillis();
        result = hasDuplicateMap(list);
        duration = System.currentTimeMillis() - startTime;
        System.out.println("Java HashSet: \t" + result + "\t" + duration + " ms");
    }
}

The test results are obtained on Intel Core i5-8250U processor. Here are the results for 100K elements:

Java Array:        false    2910 ms
Java List:         false    3976 ms
Java HashSet:      false    27 ms

If you want to repeat the test for 1M elements, the result is:

Java Array:        false    409085 ms
Java List:         false    1793858 ms
Java HashSet:      false    605 ms

According to the results, the fastest way to understand if there are any duplicates in a Java collection is to use a HashSet.


added 4 years 6 months ago

- Fastest way to test if there are any duplicates in a Java list/array
- C# Equation and Formula Parser