1. Get rid of all advertisements and get unlimited access to documents by upgrading to Premium Membership. Upgrade to Premium Now and also get a Premium Badge!

Analytic Functions III : Conclusion

Discussion in 'SQL PL/SQL' started by simply_dba, Oct 30, 2008.

  1. simply_dba

    simply_dba Forum Advisor

    Messages:
    95
    Likes Received:
    5
    Trophy Points:
    140
    Location:
    Kolkata, India
    Analytic Functions I
    Analytic Functions II

    After having understood the structure of analytic functions, we will discuss a few examples in this part to show what we can do with this functionality.

    Let us start with an answer to the question I asked on the last thread.

    "Give me a set of employees makin top salaries in each department".

    We will use a set of 2.

    Code (Text):
    break on deptno skip 2

    select * from (select deptno, ename, sal,
           dense_rank() over (partition by deptno order by sal desc) dr
           from emp)
    where dr <= 2
    order by deptno, sal desc

        DEPTNO ENAME             SAL         DR
    ---------- ---------- ---------- ----------
            10 KING             5000          1
               CLARK            2450          2


            20 SCOTT            3000          1
               FORD             3000          1
               JONES            2975          2


            30 BLAKE            2850          1
               ALLEN            1600          2



    7 rows selected.
    Here the DENSE_RANK() function was used to get the top two salaries. We assigned the dense rank to the salary column and sorted it in a descending order. If you recall from part I, a dense rank does not skip numbers and will assign the same number to those rows with the same value. Hence, after the resultset is built in the inline view, we can simply select all of the rows with a dense rank of three or less, this gives us everyone who makes the top two salaries by department number. Just to show what would happen if we tried to use RANK and encountered these duplicate values:

    Code (Text):
    select deptno, ename, sal,
           dense_rank() over (partition by deptno order by sal desc) dr,
           rank() over (partition by deptno order by sal desc) r
    from emp
    order by deptno, sal desc

        DEPTNO ENAME             SAL         DR          R
    ---------- ---------- ---------- ---------- ----------
            10 KING             5000          1          1
               CLARK            2450          2          2
               MILLER           1300          3          3

            20 SCOTT            3000          1          1
               FORD             3000          1          1
               JONES            2975          2          3
               ADAMS            1100          3          4
               SMITH             800          4          5

            30 BLAKE            2850          1          1
               ALLEN            1600          2          2
               TURNER           1500          3          3
               MARTIN           1250          4          4
               WARD             1250          4          4
               JAMES             950          5          6


    14 rows selected.
    If we had used RANK, it would have left JONES (because he is ranked at 3) out of the resultset but JONES is one of the people in department 20 making the top 2 salaries belongs in the result. In this case, using RANK over DENSE_RANK would not have answered our specific query. Lastly, we had to use an inline view and alias the dense rank column to DR. This is because we cannot use analytic functions in a WHERE or HAVING clause directly so had to compute the view and then filter just the rows we wanted to keep.
     
  2. simply_dba

    simply_dba Forum Advisor

    Messages:
    95
    Likes Received:
    5
    Trophy Points:
    140
    Location:
    Kolkata, India
    Pivot Queries​


    A Pivot query is a query which transposes rows into columns.

    For example, if we want to see the distinct jobs of each department, then the query would be :

    Code (Text):
    select distinct deptno, job from emp order by deptno;

        DEPTNO JOB
    ---------- ---------
            10 CLERK
            10 MANAGER
            10 PRESIDENT
            20 ANALYST
            20 CLERK
            20 MANAGER
            30 CLERK
            30 MANAGER
            30 SALESMAN

    9 rows selected.
    But we want the output to look like as below

    Let us first analyse the output. As we can see, the query needs to return exactly one row per department and will have four columns one for DEPTNO and the other three for three distinct jobs.We need to fix the number of columns here (that is, number of rows before transposition) before we began to construct the query.

    The query would be :

    Code (Text):
    select  deptno,
        max(decode(seq,1,job,null)) job_1,
        max(decode(seq,2,job,null)) job_2,
        max(decode(seq,3,job,null)) job_3
    from
        (Select  deptno,job,dense_rank() over (partition by deptno order by job nulls last) seq
         from emp)
    group by deptno;

        DEPTNO JOB_1     JOB_2     JOB_3
    ---------- --------- --------- ---------
            10 CLERK     MANAGER   PRESIDENT
            20 ANALYST   CLERK     MANAGER
            30 CLERK     MANAGER   SALESMAN
    This simply created an inner resultset that had a sequence assigned to jobs by department number in order of the jobs. The decode in the outer query keeps only rows with sequences 1,2,or 3 and assigns them to the correct column. The group by gets rid of the redundant rows and we are left with our collapsed result.To see it's effect let us remove the group by and max from the query and see what happens.

    Code (Text):
    select  deptno,
        decode(seq,1,job,null) job_1,
        decode(seq,2,job,null) job_2,
        decode(seq,3,job,null) job_3
    from
        (Select  deptno,job,dense_rank() over (partition by deptno order by job nulls last) seq
        from emp)


        DEPTNO JOB_1     JOB_2     JOB_3
    ---------- --------- --------- ---------
            10 CLERK
            10           MANAGER
            10                     PRESIDENT
            20 ANALYST
            20 ANALYST
            20           CLERK
            20           CLERK
            20                     MANAGER
            30 CLERK
            30           MANAGER
            30                     SALESMAN
            30                     SALESMAN
            30                     SALESMAN
            30                     SALESMAN

    14 rows selected.
    Also, again, note the use of the function dense_rank(). It was used to assign same sequence number to same jobs in the same dept do that redundant jobs would get eliminated and we would be left with distinct jobs in each dept. If we would have used row_number() instead, then this would not have been possible.

    Code (Text):

    select  deptno,
        max(decode(seq,1,job,null)) job_1,
        max(decode(seq,2,job,null)) job_2,
        max(decode(seq,3,job,null)) job_3
    from
        (Select  deptno,job,dense_rank() over (partition by deptno order by job nulls last) seq
         from emp)
    group by deptno;

        DEPTNO JOB_1     JOB_2     JOB_3
    ---------- --------- --------- ---------
            10 CLERK     MANAGER   PRESIDENT
            20 [COLOR="DarkRed"]ANALYST   ANALYST   [/COLOR]CLERK
            30 CLERK     MANAGER   SALESMAN
    To understand this deeply let us use just the subquery and see the sequence numbers assigned

    Code (Text):

    Select  deptno,job,dense_rank() over (partition by deptno order by job nulls last) seq
    from emp


        DEPTNO JOB              SEQ
    ---------- --------- ----------
            10 CLERK              1
            10 MANAGER            2
            10 PRESIDENT          3
            20 ANALYST            1
            20 ANALYST            1
            20 CLERK              2
            20 CLERK              2
            20 MANAGER            3
            30 CLERK              1
            30 MANAGER            2
            30 SALESMAN           3
            30 SALESMAN           3
            30 SALESMAN           3
            30 SALESMAN           3

    14 rows selected.


    Select  deptno,job,row_number() over (partition by deptno order by job nulls last) seq
    from emp

        DEPTNO JOB              SEQ
    ---------- --------- ----------
            10 CLERK              1
            10 MANAGER            2
            10 PRESIDENT          3
            20 ANALYST            1
            20 ANALYST            2
            20 CLERK              3
            20 CLERK              4
            20 MANAGER            5
            30 CLERK              1
            30 MANAGER            2
            30 SALESMAN           3
            30 SALESMAN           4
            30 SALESMAN           5
            30 SALESMAN           6

    14 rows selected.
    With this, I think no explanatioin is needed for the power of dense_rank() function.
     
  3. simply_dba

    simply_dba Forum Advisor

    Messages:
    95
    Likes Received:
    5
    Trophy Points:
    140
    Location:
    Kolkata, India
    Manipulation Around the Current Row (Use of lag and lead)​


    A typical business requirement would be :
    "Find out, for each employee, the number of days between his hiredate and the last hire , the number of days betwen his hiredate and the next hire."
    Now, using convetional sql this would be too difficult, if not impossible. First you have to find out, for each employee, the employee who was hired just before and calculate the difference days, then you have to find out the employee eho was hired next and calculate the difference days.

    Now with analytics we can achieve this very simply by using lag() and lead() functions.Before answering the query, let us understand these functions.

    Lag(Arg1,Arg2,Arg3) Lead(Arg1,Arg2,Arg3)

    Arg1 is the expression to be returned from the other row

    Arg2 is the offset into the partition from the current row you wish to retrieve. This is a positive integer offset from the current row. In the case of LAG, it is an index back into preceding rows. In the case of LEAD, it is an index forward into the rows to come. The default value for this argument is 1.

    Arg3 is what to return by default when the index supplied by Arg2 goes out of the window. For example, the first row in every partition does not have a preceding row so LAG(..., 1 ) for that row will not be defined. You can allow it to return NULL by default or supply a value. It should be noted that windows are not used with LAG and LEAD ‐ you may only use the PARTITION BY and ORDER BY, but not ROWS or RANGE.

    Now let us construct the query:

    Code (Text):
    Break on deptno skip 2

    select deptno, ename, hiredate,
          lag( hiredate, 1, null ) over ( partition by deptno order by hiredate, ename ) last_hire,
          hiredate ‐ lag( hiredate, 1, null ) over ( partition by deptno order by hiredate, ename ) days_last,
          lead( hiredate, 1, null ) over ( partition by deptno order by hiredate, ename ) next_hire,
          lead( hiredate, 1, null ) over ( partition by deptno order by hiredate, ename ) ‐ hiredate days_next
    from emp

        DEPTNO ENAME      HIREDATE  LAST_HIRE  DAYS_LAST NEXT_HIRE  DAYS_NEXT
    ---------- ---------- --------- --------- ---------- --------- ----------
            10 CLARK      09-JUN-81                      17-NOV-81        161
               KING       17-NOV-81 09-JUN-81        161 23-JAN-82         67
               MILLER     23-JAN-82 17-NOV-81         67


            20 SMITH      17-DEC-80                      02-APR-81        106
               JONES      02-APR-81 17-DEC-80        106 03-DEC-81        245
               FORD       03-DEC-81 02-APR-81        245 19-APR-87       1963
               SCOTT      19-APR-87 03-DEC-81       1963 23-MAY-87         34
               ADAMS      23-MAY-87 19-APR-87         34


            30 ALLEN      20-FEB-81                      22-FEB-81          2
               WARD       22-FEB-81 20-FEB-81          2 01-MAY-81         68
               BLAKE      01-MAY-81 22-FEB-81         68 08-SEP-81        130
               TURNER     08-SEP-81 01-MAY-81        130 28-SEP-81         20
               MARTIN     28-SEP-81 08-SEP-81         20 03-DEC-81         66
               JAMES      03-DEC-81 28-SEP-81         66



    14 rows selected.
    Limitations​

    Altough analytic functions are generally usefull, however, they come with the following limitations:

    Cannot use directly in pl/sql
    The SQL parser used by PL/SQL does not understand this syntax as yet. To get around this, use dynamically opened refcursor (i.e, use single quotes for the query in the construct (open :x for '<analytic query>')


    Cannot use in Where Clause
    It should be noted that analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. What this means is that we cannot use an analytic function directly in a predicate ‐ you cannot use where or having clause on them. Rather we will have to use an inline view if we need to select from a resultset based on the outcome of an analytic function. Analytic functions can appear only in the select list or ORDER BY clause of a query.

    NULLS and Sorting
    NULLS can affect the outcome of the analytic functions ‐ especially when you use a descending sort. By default, NULLS are greater than any other value.

    Conclusion​

    I hope the series of threads have given you a clear understanding of the analytic functions.The ease of their use and the power of their construct really make them a valuable component of the system

    Cheers
    :D