1. Introduction
In this blog post, we will tackle a common database challenge in PostgreSQL: deleting duplicate email addresses while preserving the entry with the smallest id. This operation is essential for maintaining data quality and integrity.
Problem
The goal is to write a PostgreSQL query to delete all duplicate email entries in the Person table, ensuring that only the entry with the smallest id for each email is kept.
Database Table:
Person Table:id | |
---|---|
1 | john@example.com |
2 | bob@example.com |
3 | john@example.com |
3. PostgreSQL Solution
DELETE FROM Person
WHERE id IN (
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) as rn
FROM Person
) t
WHERE t.rn > 1
);
Output:
id | |
---|---|
1 | john@example.com |
2 | bob@example.com |
Explanation:
1. The query employs a subquery that uses the ROW_NUMBER() window function, partitioning by email and ordering by id.
2. This window function assigns a row number to each email based on its id, within each partition of duplicate emails.
3. In the outer query, the DELETE statement removes rows where the row number is greater than 1, which means all duplicate emails except the one with the smallest id.
4. As a result, only one unique email with the smallest id remains for each set of duplicate emails.
Comments
Post a Comment