1. Get rid of all advertisements and get unlimited access to documents by upgrading to Premium Membership. Upgrade to Premium Now and also get a Premium Badge!

Odd duplicate issue

Discussion in 'SQL PL/SQL' started by ptrumpy, Oct 23, 2012.

  1. ptrumpy

    ptrumpy Active Member

    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    55
    I am faced with an interesting dilemma. What I need to do is look at a set of records and determine which ones to keep.

    The records look like this:

    ID Units Dollars
    N123456 30 45
    123456 35 50
    333444 100 200
    333555 200 400

    I need to look at the ID field to see if there are duplicates when you exclude the leading 'N' from the records that contain them (this will be a small subset of the records). Easy enough. The 2nd part, though, is what is giving me trouble. If there are duplicates, I need to keep the entire row of the record with the leading N and discard the other. So, from the above 4 records, I need to return a set that includes the 1st, 3rd, and 4th rows and get rid of the 2nd row.

    Hopefully I've been clear enough. If not, I apologize. Thanks in advance for any suggestions.
     
  2. dariyoosh

    dariyoosh Forum Advisor

    Messages:
    118
    Likes Received:
    19
    Trophy Points:
    260
    Hello there,


    You can try the following:

    (I just added several more arbitrary values in order to make the example more pertinent,
    however your original values provided in the OP were conserved in the following table
    Code (Text):

    WITH tmptab AS
    (
        SELECT 'N123456' AS col1, 30  AS col2, 45  AS col3 FROM DUAL UNION
        SELECT '123456'  AS col1, 35  AS col2, 50  AS col3 FROM DUAL UNION
        SELECT '333444'  AS col1, 100 AS col2, 200 AS col3 FROM DUAL UNION
        SELECT '333555'  AS col1, 200 AS col2, 400 AS col3 FROM DUAL UNION
        SELECT 'N333555' AS col1, 30  AS col2, 45  AS col3 FROM DUAL UNION
        SELECT '888777'  AS col1, 20  AS col2, 10  AS col3 FROM DUAL UNION
        SELECT '111222'  AS col1, 50  AS col2, 60  AS col3 FROM DUAL UNION
        SELECT '999777'  AS col1, 23  AS col2, 92  AS col3 FROM DUAL UNION
        SELECT 'N999777' AS col1, 23  AS col2, 92  AS col3 FROM DUAL
    )
    SELECT col1
    FROM tmptab t1
    WHERE 'N' || t1.col1 NOT IN (SELECT t2.col1 FROM tmptab t2)
    ORDER BY t1.col1 ASC;


    COL1
    -------
    111222
    333444
    888777
    N123456
    N333555
    N999777

    6 rows selected.

    SQL>

     


    Yet, there are 2 points that I think had to be more clear in your post:

    - For the first column, you put an alphanumeric value for the first row, and for others you provided numeric values. I considered
    that you forgot to format the data for all rows properly (enclosing the values with single quotes), and I considered that the
    type of the first column is VARCHAR2 because otherwise my solution will not work.

    - You didn't specify the constraints put on your table in terms of the primary key. My solution is valid only if
    we assume that the column ID (the first column in your table) is the primary key.


    Kind Regards,
    Dariyoosh
     
  3. ptrumpy

    ptrumpy Active Member

    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    55
    dariyoosh,

    Thank you. That works great and makes perfect sense. My apologies for leaving some things out of my post. This was my first post to this forum and I didn't spend enough time making sure everything made sense. You are correct - My primary key in this case is the ID field and is a varchar2 field. I appreciate you taking the time to show me a solution even though I left out some details.

    Again - many thanks.

    Phil
     
  4. zargon

    zargon Community Moderator Forum Guru

    Messages:
    2,351
    Likes Received:
    348
    Trophy Points:
    1,430
    Location:
    Aurora, CO
    Please read the original post again -- he has no problem excluding the leading 'N' from the small subset with that id format and deciding there are duplicates. The issue is how to delete the rows where the numeric id matches the numeric part of the 'N'-prefixed ids, preserving the 'N'-prefixed rows. There is obviously reason for creating id values with this scheme, we simply are not informed of the business reason.

    A query will not return values in single quotes; his example was not of insert statements but of a query against the table. You also presume too much with your 'solution' as we have no idea if there is a primary key to this table. Before offering up such solutions it would be wise to ask for further information from the person who originally posted the question. Then we can accurately assess the situation and formulate a solution based on actual conditions.

    To ptrumpy:

    Provide a valid create table statement for the table in question, including any constraints and/or indexes created against it.
    Provide a bit more sample data so that a solution can possibly be provided.
     
  5. zargon

    zargon Community Moderator Forum Guru

    Messages:
    2,351
    Likes Received:
    348
    Trophy Points:
    1,430
    Location:
    Aurora, CO
    I am now confused -- the example does not provide what I thought you were asking for, a query to return the non-'N'-prefixed rows of essentially duplicated data yet it's a solution, according to your most recent post.

    Please explain.
     
  6. ptrumpy

    ptrumpy Active Member

    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    55
    David,

    I wanted the N prefixed row returned if one exists. I can, however, see how you interpreted it the other way. Sorry for the confusion.
     
  7. zargon

    zargon Community Moderator Forum Guru

    Messages:
    2,351
    Likes Received:
    348
    Trophy Points:
    1,430
    Location:
    Aurora, CO
    " If there are duplicates, I need to keep the entire row of the record with the leading N and discard the other."

    This I understand and to do that I would expect you'd want to return the non-N-prefixed row to feed a delete statement.
     
  8. ptrumpy

    ptrumpy Active Member

    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    55
    David,

    I do not need to delete anything. By discard, I just meant from the query results, but I will take note of the confusion caused here and adjust my language to hopefully avoid that confusion in the future.

    Phil
     
  9. zargon

    zargon Community Moderator Forum Guru

    Messages:
    2,351
    Likes Received:
    348
    Trophy Points:
    1,430
    Location:
    Aurora, CO
    Then it was my error in reading the post.
     
  10. dhwanil_18r

    dhwanil_18r Active Member

    Messages:
    3
    Likes Received:
    0
    Trophy Points:
    55
    Location:
    India
    hi,

    Please find your solution.

    Code (SQL):

    SELECT TRIM (LEADING 'N' FROM ID) ID, units, dollars
      FROM (SELECT ID, units, dollars,
                   ROW_NUMBER () OVER (PARTITION BY TRIM (LEADING 'N' FROM ID) ORDER BY ID DESC)
                                                                               rw
              FROM dummy)
     WHERE rw = 1;
    Here 'dummy' is your table/query, from which you need desired output.

    Please let us know,if this is useful.


    Dhwanil
     
  11. zargon

    zargon Community Moderator Forum Guru

    Messages:
    2,351
    Likes Received:
    348
    Trophy Points:
    1,430
    Location:
    Aurora, CO
    Not really as it doesn't provide the correct output. The first set of records is correct:

    Code (SQL):
    SQL> SELECT id, units, dollars
      2  FROM yapoo
      3  WHERE 'N'||id NOT IN (SELECT id FROM yapoo)
      4  ORDER BY id DESC
      5  /
     
    ID                        UNITS    DOLLARS
    -------------------- ---------- ----------
    N333555                      20        400
    N222955                     200        400
    N123456                      30         45
    333444                      100        200
    333111                      200        400
    331115                      200        400
    311155                      200        400
    111555                      200        400
     
    8 ROWS selected.
     
    SQL>
    This is the output from your query:

    Code (SQL):
    SQL> SELECT TRIM (LEADING 'N' FROM ID) ID, units, dollars
      2  FROM (SELECT ID, units, dollars,ROW_NUMBER () OVER (PARTITION BY TRIM (LEADING 'N' FROM ID) ORDER BY ID DESC) rw
      3          FROM yapoo)
      4  WHERE rw = 1;
     
    ID                        UNITS    DOLLARS
    -------------------- ---------- ----------
    111555                      200        400
    123456                       30         45
    222955                      200        400
    311155                      200        400
    331115                      200        400
    333111                      200        400
    333444                      100        200
    333555                       20        400
     
    8 ROWS selected.
     
    SQL>
    The results should include the 'N'-prefixed values; your query does not return them.
     
  12. dhwanil_18r

    dhwanil_18r Active Member

    Messages:
    3
    Likes Received:
    0
    Trophy Points:
    55
    Location:
    India
    If ID with 'N' Prefix is needed,we can remove TRIM from main query.

    Code (SQL):
    SELECT ID, units, dollars
    FROM (SELECT ID, units, dollars,ROW_NUMBER () OVER (PARTITION BY TRIM (LEADING 'N' FROM ID) ORDER BY ID DESC) rw
    FROM yapoo)
    WHERE rw = 1;
    Dhwanil
     
  13. zargon

    zargon Community Moderator Forum Guru

    Messages:
    2,351
    Likes Received:
    348
    Trophy Points:
    1,430
    Location:
    Aurora, CO
    Please read the initial post again as it states returning the rows with the 'N'-prefixed ids is required.

    It always helps to read the original problem before submitting a solution.
     
  14. dhwanil_18r

    dhwanil_18r Active Member

    Messages:
    3
    Likes Received:
    0
    Trophy Points:
    55
    Location:
    India
    There is the only difference between desired output and my output, is 'the N' prefix.

    Its just the TRIM to remove/add 'N' prefix in query.

    Last query will provide desired output.

    Let OP decide submitted solution is helpful or not/ it requires any modification.

    Dhwanil
     
  15. zargon

    zargon Community Moderator Forum Guru

    Messages:
    2,351
    Likes Received:
    348
    Trophy Points:
    1,430
    Location:
    Aurora, CO
    My comment was directed toward reading the original post in order to understand the problem. There was no need to get defensive or rude. Your latest response appears to be both.
     
  16. rajavu

    rajavu Forum Guru

    Messages:
    815
    Likes Received:
    52
    Trophy Points:
    610
    Location:
    @ Bangalore , India
    Dwanil might have intended as below.

    Code (SQL):
    WITH dummy AS
    (
        SELECT 'N123456' AS Id, 30  AS units, 45  AS dollars FROM DUAL UNION
        SELECT '123456'  AS col1, 35  AS col2, 50  AS col3 FROM DUAL UNION
        SELECT '333444'  AS col1, 100 AS col2, 200 AS col3 FROM DUAL UNION
        SELECT '333555'  AS col1, 200 AS col2, 400 AS col3 FROM DUAL UNION
        SELECT 'N333555' AS col1, 30  AS col2, 45  AS col3 FROM DUAL UNION
        SELECT '444555'  AS col1, 20  AS col2, 10  AS col3 FROM DUAL UNION
        SELECT '666777'  AS col1, 50  AS col2, 60  AS col3 FROM DUAL UNION
        SELECT 'N888888'  AS col1, 50  AS col2, 60  AS col3 FROM DUAL UNION
        SELECT '999777'  AS col1, 23  AS col2, 92  AS col3 FROM DUAL UNION
        SELECT 'N999777' AS col1, 23  AS col2, 92  AS col3 FROM DUAL
    )
    SELECT   ID, units, dollars
      FROM (SELECT ID, units, dollars,
                   ROW_NUMBER () OVER (PARTITION BY TRIM (LEADING 'N' FROM ID) ORDER BY ID DESC) rw
              FROM dummy)
     WHERE rw = 1
     
  17. rajavu

    rajavu Forum Guru

    Messages:
    815
    Likes Received:
    52
    Trophy Points:
    610
    Location:
    @ Bangalore , India
    This can be done with LAG() also.

    Code (SQL):
    WITH dummy AS
    (
        SELECT 'N123456' AS Id, 30  AS units, 45  AS dollars FROM DUAL UNION
        SELECT '123456', 35  , 50  FROM DUAL UNION
        SELECT '333444', 100 , 200 FROM DUAL UNION
        SELECT '333555', 200 , 400 FROM DUAL UNION
        SELECT 'N333555',30  , 45  FROM DUAL UNION
        SELECT '444555', 20  , 10  FROM DUAL UNION
        SELECT '666777', 50  , 60  FROM DUAL UNION
        SELECT 'N888888', 50 , 60  FROM DUAL UNION
        SELECT '999777', 23  , 92  FROM DUAL UNION
        SELECT 'N999777' , 23  , 92   FROM DUAL),
    rec AS (    
     SELECT ID, units, dollars,
                   LAG (TRIM (LEADING 'N' FROM ID)) OVER (  partition BY TRIM (LEADING 'N' FROM ID) ORDER BY ID DESC) lagid
              FROM dummy )
    SELECT ID, units, dollars FROM rec WHERE lagid IS NULL