Find Median from Data Stream - Java Solution

1. Introduction

This blog post examines the "Find Median from Data Stream" problem, a classic challenge in the design of data structures. The problem entails continuously finding the median value from a stream of integers. The median is the middle value in an ordered list of numbers. This problem is a great example of applying priority queues to manage dynamically changing data.

Problem

Implement a MedianFinder class that can continuously find the median from a stream of integers. The class should have methods to add numbers to the data stream and to find the median of all numbers added so far.

Example interaction:

1. Initialize MedianFinder.

2. Add numbers to the stream.

3. Find the median at different stages.

2. Solution Steps

1. Use two priority queues (heaps): a max heap to store the smaller half of the numbers and a min heap for the larger half.

2. Ensure that the sizes of the two heaps differ by at most 1.

3. Add numbers to the appropriate heap based on their value relative to the median.

4. Balance the heaps if their sizes differ by more than 1.

5. For finding the median:

a. If the heaps are of equal size, the median is the average of the two roots.

b. If they're of unequal size, the median is the root of the larger heap.

6. Implement addNum to add numbers and findMedian to find the median.

3. Code Program

import java.util.Collections;
import java.util.PriorityQueue;

class MedianFinder {
    private PriorityQueue<Integer> maxHeap;
    private PriorityQueue<Integer> minHeap;

    public MedianFinder() {
        maxHeap = new PriorityQueue<>(Collections.reverseOrder()); // Max heap
        minHeap = new PriorityQueue<>(); // Min heap
    }

    public void addNum(int num) {
        if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
            maxHeap.add(num);
        } else {
            minHeap.add(num);
        }

        // Balance heaps
        if (maxHeap.size() > minHeap.size() + 1) {
            minHeap.add(maxHeap.poll());
        } else if (minHeap.size() > maxHeap.size()) {
            maxHeap.add(minHeap.poll());
        }
    }

    public double findMedian() {
        if (maxHeap.size() == minHeap.size()) {
            return (maxHeap.peek() + minHeap.peek()) / 2.0;
        } else {
            return maxHeap.peek();
        }
    }
}

public class Main {
    public static void main(String[] args) {
        MedianFinder medianFinder = new MedianFinder();
        medianFinder.addNum(1);
        medianFinder.addNum(2);
        System.out.println(medianFinder.findMedian()); // Output: 1.5
        medianFinder.addNum(3);
        System.out.println(medianFinder.findMedian()); // Output: 2.0
    }
}

Output:

1.5
2.0

Explanation:

1. MedianFinder: This class maintains two priority queues to store the lower and upper halves of the input numbers.

2. addNum: This method adds a new number to the data structure. It places the number into the max heap if it's smaller or equal to the current median, otherwise into the min heap. After adding, it balances the heaps to maintain their size property.

3. findMedian: This method calculates the median based on the sizes of the heaps. If both heaps are the same size, the median is the average of their roots. Otherwise, it's the root of the larger heap.

4. The use of two heaps allows for efficient insertion and median calculation, adhering to the requirement for a dynamic and continuously updating data structure.

5. This approach ensures that the median is always accessible in constant time after each insertion.


Comments