Structured Query Language (SQL) is one of the most popular languages today to querying database and there are many reasons for this. It has many qualities that make it suitable for development for users at most all levels. Let’s discuss how efficiently remove duplicate records in a SQL Server Database table.
SQL Server is a Relational Database Management System (RDBMS) and the most widely used client–server model RDBMS in the current trend applications. RDBMS is perhaps the most important Database system used in the workplace today.
Plan your Data Structure and table design before you start your application. It purely depends on the amount of Data and type of application.
There are two types of table structures, they are
- With Unique Column
- Without Unique Column
First we discuss about “with Unique Column”. Let us consider the SQL Query to create an EmplyeeInfo Table for our further illustration purpose.
- Table Contains Unique Column : Number
Create Table: EmployeeInfo
SQL Server Syntax:
1 2 3 4 5 6 7 |
CREATE TABLE [dbo].[EmployeeInfo]( [id] [int] IDENTITY(1,1) NOT NULL, [firstName] [varchar](50) NOT NULL, [lastName] [varchar](50) NOT NULL, [dob] [date] NOT NULL, [gender] [bit] NOT NULL, ) |
Insert Data:
Insert the data into the EmployeeInfo table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Prayag', 'Pavithran', '1985-09-15', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Sakthivel', 'G', '1986-12-11', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Prayag', 'Pavithran', '1985-09-15', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Sakthivel', 'G', '1986-12-11', 1) GO INSERT INTO EmployeeInfo([firstName], [lastName], [dob], [gender]) VALUES ('Prayag', 'Pavithran', '1985-09-15', 1) GO |
Select Data:
Select the data from the EmployeeInfo table.
1 |
SELECT * FROM EmployeeInfo |
Id | firstName | lastName | dob | gender |
1 | Bala | Manigandan | 1985-12-06 | 1 |
2 | Bala | Manigandan | 1985-12-06 | 1 |
3 | Bala | Manigandan | 1985-12-06 | 1 |
4 | Prayag | Pavithran | 1985-09-15 | 1 |
5 | Bala | Manigandan | 1985-12-06 | 1 |
6 | Sakthivel | G | 1986-12-11 | 1 |
7 | Prayag | Pavithran | 1985-09-15 | 1 |
8 | Bala | Manigandan | 1985-12-06 | 1 |
9 | Sakthivel | G | 1986-12-11 | 1 |
10 | Prayag | Pavithran | 1985-09-15 | 1 |
In the above EmployeeInfo table, id 1, 4, 6 has duplicate values with unique id
Case 1: without Column ‘id’
1 2 3 4 5 6 |
SELECT DISTINCT firstName ,lastName ,dob ,gender FROM EmployeeInfo |
firstName | lastName | dob | Gender |
Bala | Manigandan | 1985-12-06 | 1 |
Prayag | Pavithran | 1985-09-15 | 1 |
Sakthivel | G | 1986-12-11 | 1 |
Case 2: with Column ‘id’
1 2 3 4 5 6 |
SELECT * FROM EmployeeInfo E1 WHERE E1.id = (SELECT MIN(id) FROM EmployeeInfo E2 WHERE E1.firstName = E2.firstName AND E1.lastName = E2.lastName AND E1.dob = E2.dob AND E1.gender = E2.gender) |
Id | firstName | lastName | dob | Gender |
1 | Bala | Manigandan | 1985-12-06 | 1 |
4 | Prayag | Pavithran | 1985-09-15 | 1 |
6 | Sakthivel | G | 1986-12-11 | 1 |
The possible way to delete duplicate record in the above table using the following approach, for that first we have the select the duplicate rows in the said table. Let us see the SQL for selecting duplicate rows
1 2 3 4 5 6 |
SELECT * FROM EmployeeInfo E1 WHERE E1.id > (SELECT MIN(id) FROM EmployeeInfo E2 WHERE E1.firstName = E2.firstName AND E1.lastName = E2.lastName AND E1.dob = E2.dob AND E1.gender = E2.gender) |
Id | firstName | lastName | Dob | gender |
2 | Bala | Manigandan | 1985-12-06 | 1 |
3 | Bala | Manigandan | 1985-12-06 | 1 |
5 | Bala | Manigandan | 1985-12-06 | 1 |
7 | Prayag | Pavithran | 1985-09-15 | 1 |
8 | Bala | Manigandan | 1985-12-06 | 1 |
9 | Sakthivel | G | 1986-12-11 | 1 |
10 | Prayag | Pavithran | 1985-09-15 | 1 |
The above SQL find out’s the duplicate rows perfectly, so by using this query we can delete the duplicate rows from the above said database table EmployeeInfo. The SQL is
1 2 3 4 5 6 |
DELETE FROM EmployeeInfo WHERE id > (SELECT MIN(id) FROM EmployeeInfo E2 WHERE EmployeeInfo.firstName = E2.firstName AND EmployeeInfo.lastName = E2.lastName AND EmployeeInfo.dob = E2.dob AND EmployeeInfo.gender = E2.gender) |
Table Contains Unique Column: Varchar (Alphabets | Numbers | Special Character)
Create Table: EmployeeInfo
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE TABLE [dbo].[EmpUID]( [id] [varchar](50) NOT NULL, [firstName] [varchar](50) NOT NULL, [lastName] [varchar](50) NOT NULL, [dob] [date] NOT NULL, [gender] [bit] NOT NULL, CONSTRAINT [PK_EmpUID] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] |
Insert Data:
Insert the data into the EmployeeInfo table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E001', 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E002', 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E003', 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E004', 'Prayag', 'Pavithran', '1985-09-15', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E005', 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E006', 'Sakthivel', 'G', '1986-12-11', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E007', 'Prayag', 'Pavithran', '1985-09-15', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E008', 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E009', 'Sakthivel', 'G', '1986-12-11', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES ('E010', 'Prayag', 'Pavithran', '1985-09-15', 1) GO |
Select Data:
1 |
SELECT * FROM EmployeeInfo |
Select the data from the EmployeeInfo table.
Id | firstName | lastName | dob | gender |
E001 | Bala | Manigandan | 1985-12-06 | 1 |
E002 | Bala | Manigandan | 1985-12-06 | 1 |
E003 | Bala | Manigandan | 1985-12-06 | 1 |
E004 | Prayag | Pavithran | 1985-09-15 | 1 |
E005 | Bala | Manigandan | 1985-12-06 | 1 |
E006 | Sakthivel | G | 1986-12-11 | 1 |
E007 | Prayag | Pavithran | 1985-09-15 | 1 |
E008 | Bala | Manigandan | 1985-12-06 | 1 |
E009 | Sakthivel | G | 1986-12-11 | 1 |
E010 | Prayag | Pavithran | 1985-09-15 | 1 |
In the above EmployeeInfo table, id 1, 4, 6 has duplicate values with unique id
The possible way to delete duplicate record in the above table using the following approach, for that first we have the select the duplicate rows in the said table. Let us see the SQL for selecting duplicate rows
1 2 3 4 5 6 |
SELECT * FROM EmployeeInfo E1 WHERE E1.id NOT LIKE (SELECT TOP 1 id FROM EmployeeInfo E2 WHERE E1.firstName = E2.firstName AND E1.lastName = E2.lastName AND E1.dob = E2.dob AND E1.gender = E2.gender) |
Id | firstName | lastName | Dob | Gender |
E002 | Bala | Manigandan | 1985-12-06 | 1 |
E003 | Bala | Manigandan | 1985-12-06 | 1 |
E005 | Bala | Manigandan | 1985-12-06 | 1 |
E007 | Prayag | Pavithran | 1985-09-15 | 1 |
E008 | Bala | Manigandan | 1985-12-06 | 1 |
E009 | Sakthivel | G | 1986-12-11 | 1 |
E010 | Prayag | Pavithran | 1985-09-15 | 1 |
The above SQL find out’s the duplicate rows perfectly, so by using this query we can delete the duplicate rows from the above said database table EmployeeInfo. The SQL is
1 2 3 4 5 6 |
DELETE FROM EmployeeInfo WHERE id NOT LIKE (SELECT TOP 1 id FROM EmployeeInfo E2 WHERE EmployeeInfo.firstName = E2.firstName AND EmployeeInfo.lastName = E2.lastName AND EmployeeInfo.dob = E2.dob AND EmployeeInfo.gender = E2.gender) |
- Table Contains Non – Unique Column
Create Table: EmployeeInfo
1 2 3 4 5 6 7 |
CREATE TABLE [dbo].[EmployeeInfo]( [id] [int] NOT NULL, [firstName] [varchar](50) NOT NULL, [lastName] [varchar](50) NOT NULL, [dob] [date] NOT NULL, [gender] [bit] NOT NULL, ) |
Insert Data:
Insert the data into the EmployeeInfo table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (1, 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (1, 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (1, 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (2, 'Prayag', 'Pavithran', '1985-09-15', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (1, 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (3, 'Sakthivel', 'G', '1986-12-11', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (2, 'Prayag', 'Pavithran', '1985-09-15', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (1, 'Bala', 'Manigandan', '1985-12-06', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (3, 'Sakthivel', 'G', '1986-12-11', 1) GO INSERT INTO EmployeeInfo([id], [firstName], [lastName], [dob], [gender]) VALUES (4, 'Ram', 'Kumar', '1968-11-02', 1) GO |
Select Data:
Select the data from the EmployeeInfo table.
1 |
SELECT * FROM EmployeeInfo |
Id | firstName | lastName | dob | gender |
1 | Bala | Manigandan | 1985-12-06 | 1 |
1 | Bala | Manigandan | 1985-12-06 | 1 |
1 | Bala | Manigandan | 1985-12-06 | 1 |
2 | Prayag | Pavithran | 1985-09-15 | 1 |
1 | Bala | Manigandan | 1985-12-06 | 1 |
3 | Sakthivel | G | 1986-12-11 | 1 |
2 | Prayag | Pavithran | 1985-09-15 | 1 |
1 | Bala | Manigandan | 1985-12-06 | 1 |
3 | Sakthivel | G | 1986-12-11 | 1 |
4 | Ram | Kumar | 1968-11-02 | 1 |
In the above EmployeeInfo table, id 1, 2, 3 has duplicate values. The following SQL gives the DISTINCT records
1 |
SELECT DISTINCT * FROM EmployeeInfo |
Id | firstName | lastName | dob | Gender |
1 | Bala | Manigandan | 1985-12-06 | 1 |
2 | Prayag | Pavithran | 1985-09-15 | 1 |
3 | Sakthivel | G | 1986-12-11 | 1 |
4 | Ram | Kumar | 1968-11-02 | 1 |
The possible way to delete duplicate record in the above table using the following approach, for that first we have the select the duplicate rows in the said table. Let us see the SQL for selecting duplicate rows
1 2 3 |
SELECT * FROM EmployeeNonUNQ GROUP BY id, firstName, lastName, dob, gender HAVING COUNT(*) >1 |
Id | firstName | lastName | dob | Gender |
1 | Bala | Manigandan | 1985-12-06 | 1 |
2 | Prayag | Pavithran | 1985-09-15 | 1 |
3 | Sakthivel | G | 1986-12-11 | 1 |
The above SQL find out’s the duplicate rows perfectly, append the Row Number with the SELECT Statement, then the SQL is
1 2 3 4 |
SELECT *, ROW_NUMBER() OVER(PARTITION BY id, firstName, lastName, dob, gender ORDER BY firstName ASC) AS RN FROM EmployeeNonUNQ |
Id | firstName | lastName | dob | gender | RN |
1 | Bala | Manigandan | 1985-12-06 | 1 | 1 |
1 | Bala | Manigandan | 1985-12-06 | 1 | 2 |
2 | Bala | Manigandan | 1985-12-06 | 1 | 3 |
2 | Prayag | Pavithran | 1985-09-15 | 1 | 1 |
1 | Bala | Manigandan | 1985-12-06 | 1 | 4 |
3 | Sakthivel | G | 1986-12-11 | 1 | 1 |
2 | Prayag | Pavithran | 1985-09-15 | 1 | 2 |
1 | Bala | Manigandan | 1985-12-06 | 1 | 5 |
3 | Sakthivel | G | 1986-12-11 | 1 | 2 |
4 | Ram | Kumar | 1968-11-02 | 1 | 1 |
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query, by using this query we can delete the duplicate rows from the above said database table EmployeeInfo. The SQL is
1 2 3 4 5 6 7 8 9 10 11 |
WITH CTE AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY id, firstName, lastName, dob, gender ORDER BY firstName ASC) AS RN FROM EmployeeNonUNQ ) DELETE FROM CTE WHERE RN > 1; |
Conclusion:
This article targets at understanding how efficiently we delete duplicate records with both scenario with Unique Value / without Unique Value This article is intended for the beginner/intermediate level. I hope, this post will solve all your basic functionality deletion of duplicate records in a SQL Server Database. Think Big… Start Small… Do Fast…