Discussion in 'SQL PL/SQL' started by Vicky, Jun 18, 2014.
Could you tell me the best way to eliminate duplicate records from a table?!
Candidate solutions can be a little...
the best option can be defined proceeding from characteristics of data: volume, data types, etc.
!!! it is necessary to define the rule by which duplicate records will be excluded from selection !!!
DROP TABLE test_tab purge ;
CREATE TABLE test_tab
(id INT PRIMARY KEY,
INSERT INTO test_tab
CONNECT BY level <= 10000;
rem USE GROUP BY
SELECT x,y,MAX(id) keep (dense_rank FIRST ORDER BY x) id FROM test_tab GROUP BY x,y;
rem USE analityc FUNCTION
SELECT x,y,id FROM
(SELECT x,y ,id ,ROW_NUMBER() OVER (partition BY x,y ORDER BY x) rn FROM test_tab
WHERE rn = 1;
If your table has been properly designed, it should be having at least a Primary Key which would technically prevent it from having duplicates.
However, if there are "duplicates" based on other rules (for ex. , no 2 records may have same value for column x, y and z) then you may enforce them with unique keys/indexes.
If these constraints are not there, then as Sergey mentioned, it's necessary to know how YOU define the criteria for duplicate records and use it to remove the duplicates.
You can refer to this article which explores various ways to do it: