1. Get rid of all advertisements and get unlimited access to documents by upgrading to Premium Membership. Upgrade to Premium Now and also get a Premium Badge!

Limit results after select distinct with join

Discussion in 'SQL PL/SQL' started by Robertjb1, Nov 18, 2014.

  1. Robertjb1

    Robertjb1 Active Member

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    55
    Hi. I have the need to pull a sample of 30 consumer accounts for every product code. The problem that I am running into is that there is a unique identifier, for every account, that gets eliminated when I run my select distinct to get the product codes. The tables are similar to this.

    Table1.UniqueID, Tabel1.ProductKey
    Table2.ProductKey, Table2.ProductCode
    Table2.UniqueID, Table3.AccountNumber

    If I run a select distinct on the first table to get the ProductKey (joined to Table2- to get the product code), I lose the UniqueID, which is the masked customer account unique ID.
    If I run a similar query on Table3, there is no way, that I know of, to guarantee that I will have all product codes represented.

    My end result should return something similar to this:

    Product1 Acct1
    Product1 Acct2
    Product1 Acct3
    Product1 Acct4
    etc...
    Product2 Acct1
    Product2 Acct2
    Product2 Acct3
    Product2 Acct4
    etc...

    I would greatly appreciate any advice on how to handle this.
     
  2. Robertjb1

    Robertjb1 Active Member

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    55
    Clarification: Table1 has a multitude of fields that are all keys to other tables. The fields that I am interested in for this query are the named fields. (UniqueID is the common key to most tables)

    Correction: Table2 structure is Table2.UniqueID, Table2.ProductKey, Table2.ProductCode
    My query without limits, returning over 1Million records, is as follows:
    Select
    2.ProductCode,
    3.AccountNumber
    FROM Table1
    LEFT JOIN Table2 2 ON 2.UniqueID = 1.UniqueID
    INNER JOIN Table3 3 ON 3.UniqueID = 1.UniqueID
    Where 1.system = 'Consumer'

    Thanks
     
  3. ocprep

    ocprep Forum Advisor

    Messages:
    277
    Likes Received:
    76
    Trophy Points:
    410
    Location:
    Orlando, Florida
    What do you mean you are 'losing' the unique identifier. You aren't selecting it. Your desired output doesn't contain it, so how exactly are you planning to not 'lose' it?

    Based on your second post, the UniqueID exists in all three tables you are selecting from. However, if it is truly supposed to be the value to link across all of your tables, it's a little strange that you are using a LEFT JOIN between Table1 and Table2 in your sample query.

    The way your left join is set up, the query will return rows in Table 1 even where the UniqueID matching Table2 does not exist. That would certainly be one definition of 'losing' the value.
     
    Robertjb1 likes this.
  4. Robertjb1

    Robertjb1 Active Member

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    55
    I can definitely stand to be corrected so please let me know if this is faulty logic.

    First of all the ‘correction’ in my second comment is incorrect. Table2 does not have the UniqueID. I apologize for that confusion.

    My universe of data (Table1) is for consumer accounts with a ledger balance > 0, currently over 1million records.
    The UniqueID is the random number linked to the actual individual consumer account number in Table3. Table2 has all Product Codes available, currently over 4800, but not all of these are in Table1 because of the ledger balance limitation on the consumer account. I did an initial select distinct on Table1 to identify the subset of product codes represented in that table so I could then use that list to go back to Table1 again and get 30 UniqueIDs for each. If I had included the UniqueID in the select distinct, wouldn’t that have returned more than just the list of Unique Product Codes? This is where my ‘faulty logic’ comment may come into play! Table2 is joined to Table1 on the ProductKey for the purpose of identifying the actual product code. Table3 is joined to Table1 on the UniqueID for the purpose of identifying the actual consumer account number.
    I’ve been googling for possible examples and have just found this structure that I've altered to hit Table1. This query returns 30 UniqueIDs per ProductKey. I’m confused on where I need to have the JOINS to get the real Product Codes and Account Numbers. I appreciate your time and your questions.

    Code (SQL):
    SELECT *
    FROM
       (  
          SELECT 1.*, ROW_NUMBER() OVER (PARTITION BY ProductKey ORDER BY ProductKey ASC) AS rn
          FROM Table1 1
        )
      WHERE rn < 31
      ORDER BY ProductKey
     
     
  5. zargon

    zargon Community Moderator Forum Guru

    Messages:
    2,347
    Likes Received:
    348
    Trophy Points:
    1,430
    Location:
    Aurora, CO
    Please provide create table statements and sample data. It's impossible for anyone to replicate your work without these items.
     
    Robertjb1 likes this.
  6. ocprep

    ocprep Forum Advisor

    Messages:
    277
    Likes Received:
    76
    Trophy Points:
    410
    Location:
    Orlando, Florida
    >The UniqueID is the random number linked to the actual
    >individual consumer account number in Table3.

    Hopefully you do not really mean the number is random because random doesn't guarantee uniqueness. In the first post you spoke of it being masked. Once again, hopefully the masking doesn't destroy the uniqueness. SSNs for example are often asked as something like '###-##-1234'. The original SSN was unique, the masked value isn't.

    Joining the results of your analytic query to the other two tables could be done as a subquery, but a WITH clause is probably a little clearer. Obviously I can't test the following, but it should be reasonably close to what you need.

    Code (Text):
    WITH Prodlist AS
    (
    SELECT UniqueID, ProductKey
    FROM
       (  
        SELECT UniqueID, ProductKey,
               ROW_NUMBER() OVER (PARTITION BY ProductKey ORDER BY ProductKey ASC) AS rn
          FROM Table1 T1
        )
      WHERE rn < 31
      ORDER BY ProductKey
    )
    SELECT T2.ProductCode, T3.AccountNumber
    FROM   Table2 T2
           INNER JOIN Prodlist P1
           ON T2.ProductKey = P1.ProductKey
           INNER JOIN Table3 T3
           ON P1.UniqueID = T3.UniqueID;
     
    Robertjb1 likes this.
  7. Robertjb1

    Robertjb1 Active Member

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    55
    @Matthew. There is some sort of logical algorithm that associates a 'random' number to each real account number (before I ever see it) so it isn't random in the literal sense of the word. It's more of a logically derived random number-account number association that follows the data associated with each account. The reason for the 'random' UniqueID is for compliance with the "Gramm-Leach-Bliley Act". I implemented your suggestions and it returned exactly what I needed. You're awesome. I'm just a banker trying to get data. Thanks for your advice/suggestions.
     
  8. Robertjb1

    Robertjb1 Active Member

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    55
    Thanks for your reply/comment, David. I was able to resolve the issue with Mathew's suggestions.