Java Stream distinct() Example

1. Introduction

The distinct() method in Java Streams is used to remove duplicate elements from the stream, ensuring that only unique elements are processed. This method is particularly useful in data cleaning and preparation, where duplicates can skew analysis or impact performance negatively. By providing a straightforward way to filter out duplicates, distinct() enhances the robustness and accuracy of data-driven applications.

Key Points

1. distinct() is an intermediate operation that returns a stream with duplicate values removed.

2. It uses the equals() method to determine equality and hence, duplicates.

3. This method can be combined with other stream operations to perform complex data processing tasks efficiently.

2. Program Steps

1. Import necessary classes.

2. Create a stream with duplicate elements.

3. Apply the distinct() method to filter out duplicates.

4. Use a terminal operation to collect and print the results.

3. Code Program

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class StreamDistinctExample {

    public static void main(String[] args) {
        // Step 2: Create a stream with duplicate elements
        List<String> items = Arrays.asList("apple", "apple", "banana", "apple", "orange", "banana", "papaya");

        // Step 3: Apply distinct() to remove duplicates
        List<String> distinctItems = items.stream()
                                          .distinct()
                                          .collect(Collectors.toList());

        // Step 4: Use forEach to output the results
        System.out.println("Distinct items:");
        distinctItems.forEach(System.out::println);
    }
}

Output:

Distinct items:
apple
banana
orange
papaya

Explanation:

1. Creation of Data: The program starts with a list of strings where some items like 'apple' and 'banana' appear more than once.

2. Application of distinct(): The stream created from the list is passed through the distinct() method, which filters out duplicate elements. It does this by checking the equality of items using the equals() method inherent to the object.

3. Collecting Results: After duplicates are removed, the stream is collected into a new list using Collectors.toList(), a common terminal operation that gathers elements into a list.

4. Output: The distinct elements are then printed, showing that duplicates have been successfully filtered out. This demonstrates the distinct() method's utility in ensuring that data processed is unique, enhancing data quality for subsequent operations or analyses.

5. Use Cases: This functionality is crucial in scenarios where data integrity and uniqueness are paramount, such as in database operations, data migration tasks, or when preparing datasets for analytics where duplicates can lead to skewed results.


Comments