1. Get rid of all advertisements and get unlimited access to documents by upgrading to Premium Membership. Upgrade to Premium Now and also get a Premium Badge!

Analytic Functions II : The Windowing Clause.

Discussion in 'SQL PL/SQL' started by simply_dba, Oct 21, 2008.

  1. simply_dba

    simply_dba Forum Advisor

    Messages:
    95
    Likes Received:
    5
    Trophy Points:
    140
    Location:
    Kolkata, India
    Analytic Functions II : The Windowing Clause

    First Read Analytic Functions Part I, An Introduction.

    In this part we will have a detailed discussion on the windowing clause of an analytic function.

    As was said earlier that this clause defines a sliding window of data, within the group, on which the analytic function acts upon.For example, the clause 'RANGE UNBOUNDED PRECEDING' acts upon every row from the first row till the current row.

    We can set up data windows based on two criteria: RANGES of data values or ROWS offset from the current row

    Range Windows

    Range windows collect rows togather based on a 'Where' clause. For exmaple, the clause 'range 5 preceding' will generate a window of rows, that are within 5 units of the current row and preceding it. This units may either be numeric comparisions or date comparisions. Any other datatypes are not valid for range comparision.

    an example

    Code (Text):
    Select count(*) over (order by hiredate asc range 50 preceding) from emp;
    This will select all the rows in the partition such that the hiredate is within 50 days of the current row's hiredate.In this case,since the hiredate is sorted in ascending order,the values in the window would consist of all the rows in the current group such that the 'hiredate' was less then the current row's 'hiredate' and within 50 days of it. If the query would have been :

    Code (Text):
    Select count(*) over (order by hiredate desc range 50 preceding) from emp;
    then the rows selected would have 'hiredate' greater then current row's hiredate and within 50 days ot it.

    The following example would make this clearer.

    Code (Text):
    select ename, sal, hiredate, hiredate-50 "50_days_prior",
           first_value(ename) over (order by hiredate asc range 50 preceding) first_ename,
           first_value(hiredate) over (order by hiredate asc range 50 preceding) first_hdate
    from emp
    order by hiredate asc

    ENAME             SAL HIREDATE  50_days_p FIRST_ENAM FIRST_HDA
    ---------- ---------- --------- --------- ---------- ---------
    SMITH             800 17-DEC-80 28-OCT-80 SMITH      17-DEC-80
    ALLEN            1600 20-FEB-81 01-JAN-81 ALLEN      20-FEB-81
    WARD             1250 22-FEB-81 03-JAN-81 ALLEN      20-FEB-81
    JONES            2975 02-APR-81 11-FEB-81 ALLEN      20-FEB-81
    BLAKE            2850 01-MAY-81 12-MAR-81 JONES      02-APR-81
    CLARK            2450 09-JUN-81 20-APR-81 BLAKE      01-MAY-81
    TURNER           1500 08-SEP-81 20-JUL-81 TURNER     08-SEP-81
    MARTIN           1250 28-SEP-81 09-AUG-81 TURNER     08-SEP-81
    KING             5000 17-NOV-81 28-SEP-81 MARTIN     28-SEP-81
    FORD             3000 03-DEC-81 14-OCT-81 KING       17-NOV-81
    JAMES             950 03-DEC-81 14-OCT-81 KING       17-NOV-81
    MILLER           1300 23-JAN-82 04-DEC-81 MILLER     23-JAN-82
    SCOTT            3000 19-APR-87 28-FEB-87 SCOTT      19-APR-87
    ADAMS            1100 23-MAY-87 03-APR-87 SCOTT      19-APR-87

    14 rows selected.
    The function first_value() returns the value of the column provided from the first row in the window.

    In the above query, we sorted the partition by hiredate ascending. We used the function first_value() to select the value of first 'ename' and the value of
    first 'hiredate' in the sorted list.

    Now, if we look at BLAKE, we can see that his hiredate was 01-MAY-81 and 50 days prior to it was 12-MAR-81.The function then defined the window as every row that preceded the row of Blake within the sorted list and where the hiredate was between 01-MAY-81 and 12-MAR-81. The first value for that window was that of Jones (having the earliest possible hire date which is within 50 days of Balke's hire date).

    Now what would happen if we sortred on descending order instead of ascending order ?

    Code (Text):
    select ename, sal, hiredate, hiredate+50 "within_next_50_days",
           first_value(ename) over (order by hiredate desc range 50 preceding) first_ename,
           first_value(hiredate) over (order by hiredate desc range 50 preceding) first_hdate
    from emp
    order by hiredate desc

    ENAME             SAL HIREDATE  within_ne FIRST_ENAM FIRST_HDA
    ---------- ---------- --------- --------- ---------- ---------
    ADAMS            1100 23-MAY-87 12-JUL-87 ADAMS      23-MAY-87
    SCOTT            3000 19-APR-87 08-JUN-87 ADAMS      23-MAY-87
    MILLER           1300 23-JAN-82 14-MAR-82 MILLER     23-JAN-82
    FORD             3000 03-DEC-81 22-JAN-82 FORD       03-DEC-81
    JAMES             950 03-DEC-81 22-JAN-82 FORD       03-DEC-81
    KING             5000 17-NOV-81 06-JAN-82 FORD       03-DEC-81
    MARTIN           1250 28-SEP-81 17-NOV-81 KING       17-NOV-81
    TURNER           1500 08-SEP-81 28-OCT-81 MARTIN     28-SEP-81
    CLARK            2450 09-JUN-81 29-JUL-81 CLARK      09-JUN-81
    BLAKE            2850 01-MAY-81 20-JUN-81 CLARK      09-JUN-81
    JONES            2975 02-APR-81 22-MAY-81 BLAKE      01-MAY-81
    WARD             1250 22-FEB-81 13-APR-81 JONES      02-APR-81
    ALLEN            1600 20-FEB-81 11-APR-81 JONES      02-APR-81
    SMITH             800 17-DEC-80 05-FEB-81 SMITH      17-DEC-80

    14 rows selected.
    If we look at BLAKE again, this time the window is different since thw data partition is sorted differently. BLAKE's window of 'range 50 preceding' is CLARKE since CLARKE's hiredate is the last hiredate preceding BLAKE's record and within 50 days of it.
     
  2. simply_dba

    simply_dba Forum Advisor

    Messages:
    95
    Likes Received:
    5
    Trophy Points:
    140
    Location:
    Kolkata, India
    Row Windows

    These are physical units; physical numbers of rows, to include in the window.

    Code (Text):
    Select count(*) over (order by hiredate ASC ROWS 5 preceding) from emp;
    This window will consist of six rows, the current one and 5 rows in front of this row (where in front of is defined by the ORDER BY clause). With ROW partitions, we do not have the limitations of the RANGE partition ‐ the data may be of any type and the order by may include many columns.

    Let us revisit the previous example once again.

    Code (Text):
    select ename, sal, hiredate,
           first_value(ename) over (order by hiredate asc rows 5 preceding) first_ename,
           first_value(hiredate) over (order by hiredate asc rows 5 preceding) first_hdate
    from emp
    order by hiredate asc


    ENAME             SAL HIREDATE  FIRST_ENAM FIRST_HDA
    ---------- ---------- --------- ---------- ---------
    SMITH             800 17-DEC-80 SMITH      17-DEC-80
    ALLEN            1600 20-FEB-81 SMITH      17-DEC-80
    WARD             1250 22-FEB-81 SMITH      17-DEC-80
    JONES            2975 02-APR-81 SMITH      17-DEC-80
    BLAKE            2850 01-MAY-81 SMITH      17-DEC-80
    CLARK            2450 09-JUN-81 SMITH      17-DEC-80
    TURNER           1500 08-SEP-81 ALLEN      20-FEB-81
    MARTIN           1250 28-SEP-81 WARD       22-FEB-81
    KING             5000 17-NOV-81 JONES      02-APR-81
    JAMES             950 03-DEC-81 BLAKE      01-MAY-81
    FORD             3000 03-DEC-81 CLARK      09-JUN-81
    MILLER           1300 23-JAN-82 TURNER     08-SEP-81
    SCOTT            3000 19-APR-87 MARTIN     28-SEP-81
    ADAMS            1100 23-MAY-87 KING       17-NOV-81

    14 rows selected.
    Looking at TURNER's record, if we count upto 5 upwards we will find ALLEN which is given in TURNER's record as the first_ename.Sorting the group in descending order reverses the window .

    Code (Text):
    select ename, sal, hiredate,
           first_value(ename) over (order by hiredate desc rows 5 preceding) first_ename,
           first_value(hiredate) over (order by hiredate desc rows 5 preceding) first_hdate
    from emp
    order by hiredate desc

    ENAME             SAL HIREDATE  FIRST_ENAM FIRST_HDA
    ---------- ---------- --------- ---------- ---------
    ADAMS            1100 23-MAY-87 ADAMS      23-MAY-87
    SCOTT            3000 19-APR-87 ADAMS      23-MAY-87
    MILLER           1300 23-JAN-82 ADAMS      23-MAY-87
    FORD             3000 03-DEC-81 ADAMS      23-MAY-87
    JAMES             950 03-DEC-81 ADAMS      23-MAY-87
    KING             5000 17-NOV-81 ADAMS      23-MAY-87
    MARTIN           1250 28-SEP-81 SCOTT      19-APR-87
    TURNER           1500 08-SEP-81 MILLER     23-JAN-82
    CLARK            2450 09-JUN-81 FORD       03-DEC-81
    BLAKE            2850 01-MAY-81 JAMES      03-DEC-81
    JONES            2975 02-APR-81 KING       17-NOV-81
    WARD             1250 22-FEB-81 MARTIN     28-SEP-81
    ALLEN            1600 20-FEB-81 TURNER     08-SEP-81
    SMITH             800 17-DEC-80 CLARK      09-JUN-81

    14 rows selected.
    Now MILLER is the first value in the set of 5 rows preceding TURNER's group.


    Now that we understood the working of these windows, let us construct a meaningful query out of this.

    The business requires the average salary of 5 persons joined before and after of joining of a particuler employee.The query would be :

    Code (Text):
    select ename, hiredate, sal,
           avg(sal) over (order by hiredate asc rows 5 preceding) avg_5_before,
           count(*) over (order by hiredate asc rows 5 preceding) nos_before,
           avg(sal) over (order by hiredate desc rows 5 preceding) avg_5_after,
           count(*) over (order by hiredate desc rows 5 preceding) nos_after
    from emp
    order by hiredate

    ENAME      HIREDATE         SAL AVG_5_BEFORE NOS_BEFORE AVG_5_AFTER  NOS_AFTER
    ---------- --------- ---------- ------------ ---------- ----------- ----------
    SMITH      17-DEC-80        800          800          1      1987.5          6
    ALLEN      20-FEB-81       1600         1200          2  2104.16667          6
    WARD       22-FEB-81       1250   1216.66667          3  2045.83333          6
    JONES      02-APR-81       2975      1656.25          4  2670.83333          6
    BLAKE      01-MAY-81       2850         1895          5  2333.33333          6
    CLARK      09-JUN-81       2450       1987.5          6  2358.33333          6
    TURNER     08-SEP-81       1500   2104.16667          6  2166.66667          6
    MARTIN     28-SEP-81       1250   2045.83333          6  2416.66667          6
    KING       17-NOV-81       5000   2670.83333          6  2391.66667          6
    JAMES      03-DEC-81        950   2333.33333          6        1870          5
    FORD       03-DEC-81       3000   2358.33333          6        2100          4
    MILLER     23-JAN-82       1300   2166.66667          6        1800          3
    SCOTT      19-APR-87       3000   2416.66667          6        2050          2
    ADAMS      23-MAY-87       1100   2391.66667          6        1100          1

    14 rows selected.
    An extra count(*) was added to demonstrate how many rows contributed to the making of the average. We can clearly see that only 2 rows made up the average for ALLEN (800+1600/2) hired before him, while 4 persons made up the average for FORD hired after him.


    Now that we have understood the difference between range windows and row windows we can examine the ways we can implement them.

    In it's simplest form the windows are specified in the following three mutually exclusive forms

    UNBOUNDED PRECEDING: The window starts from the first row of the current partition and ends with the current row being processed.
    CURRENT ROW: The window starts and ends with the current row being processed.
    n PRECEDING: where n is any number. The window starts from the row that is n rows in front of the current row for ROWS and starts from the row whoʹs order by value is less then the current row by n for RANGE.
    n FOLLOWING : where n is any number. The window ends (or starts) from the row that is n rows after the current row for ROWS and starts (or ends) from the row who's order by value is more then the current row by n for RANGE

    Note : The CURRENT ROW range would probably never be used in the simple form as it would restrict the analytic function to the single row only, something

    for which an analytic function is not needed. In a more complex form, the window is specified with a BETWEEN clause. There, we might use the CURRENT ROW as either a starting point or ending point of the window.

    That's it for now . In my next thread, i'll take a few examples and form analytic queries to some common requirements.

    Untill then try to work out a analytic query for the following requirement.

    "Give me a set of employees makin top salaries in each department".

    Cheers
    :D