Objective The article explains the Aggregate functions in SQL. 1. Introduction Apart from the single row functions like CONCAT, SUBSTR, INSTR, NVL and others, Oracle provides Aggregate functions also. Unlike single Row SQL functions, Aggregate functions operate on GROUPS. They are not data manipulation functions but derive and produce additional information from the selected columns. Since they operate on GROUPS, GROUP BY clause is a mandatory clause in the SELECT statement for group formation. If they operate on whole result set, GROUP BY clause can be ignored. Syntax: Code (Text): GROUP_BY_FUNCTION ([DISTINCT | ALL] COLUMN_NAME) Illustrations For instance, maximum salary has to be found against each job id working in a department. This implies that first all departments must be grouped. Within each group, JOB IDs of that department must be grouped, and then aggregate function would operate on these subgroups to get the maximum salary. Query for the above figure looks like: Code (SQL): SELECT DEPARTMENT_ID, JOB_ID, MAX(SALARY) FROM EMPLOYEE GROUP BY DEPARTMENT_ID, JOB_ID 2. Use of DISTINCT, ALL keywords with Aggregate functions DISTINCT ignores duplicate values. Aggregate function uses only unique values of the column. ALL is the default specification. Aggregate function includes all values of a not null columns . 3. Types of Group Functions Below is the list of GROUP BY functions. Each function takes an argument and returns the single output for each row. COUNT – Counts the number of records. SUM – Sum of definite column value. AVG – Average of specified column value. MAX, MIN – To find maximum and minimum values of a column. STDDEV, VARIANCE – Calculates standard deviation and variance of a column value. Note that except COUNT(*), all other aggregate functions ignore NULL values. 3.1. AVG AVG function calculates the average of the specified column value. It works with number data type only and returns numeric output. Using with DISTINCT identifier, it returns the average of only the unique column values. Syntax Code (Text): AVG (DISTINCT/ALL column name) Example a : Below query calculates the average salary of all the employees in the organization Code (SQL): SQL> SELECT AVG (SALARY) FROM EMPLOYEE; AVG (SAL -------- 5763.32 Note that since result set is not grouped, AVG is calculated on full organization. 3.2. SUM This function sums up the numeric column values of the column based group. It operates on number data ad returns numeric output. Using with DISTINCT identifier, it returns the sum of only the unique column values. Syntax SUM (DISTINCT/ALL column name) Example a: Below query calculates sum of salary for each job id. Code (SQL): SQL> SELECT JOB_ID, SUM (SALARY) FROM EMPLOYEE GROUP BY JOB_ID JOB_ID SUM(SALARY) -------- ------------------- MGR 12000 DEV 23500 HR 14000 ADM 17000 3.3. MAX, MIN Maximum and minimum functions are used to find the maximum and minimum values of the column. It takes one mandatory parameter, which is a column of any SQL supported data type i.e. number, character and date. Syntax: Code (Text): MAX(DISTINCT/ALL Column name) MIN(DISTINCT/ALL Column name) Example a: Below query lists the maximum salary drawn under each job category in the company Code (SQL): SELECT JOB_ID, MAX(SALARY) FROM EMPLOYEE GROUP BY JOB_ID JOB_ID MAX(SALARY) -------- ------------------- MGR 8320 DEV 12300 HR 7150 ADM 8270 Example b: Below query returns the oldest employee in each department. Code (SQL): SELECT DEPARTMENT_ID, MIN(HIRE_DATE) FROM EMPLOYEE GROUP BY DEPARTMENT_ID 3.4. COUNT Count function counts the number of rows in the result set of SELECT query. It returns zero, if no records are found in the result set. Example a: Below query counts the employees in each department Code (SQL): SQL> SELECT DEPARTMENT_ID, COUNT (*) FROM EMPLOYEE GROUP BY DEPARTMENT_ID Example b: Below query counts the employees in whole organization, whose salary exceeds 5000 Code (SQL): SQL> SELECT COUNT(EMPLOYEE_ID) FROM EMPLOYEE WHERE SAL > 5000 4. Handling NULL values in GROUP BY functions GROUP BY functions work with actual values of the column only i.e. ignoring the NULL values (COUNT(*) is exception here). But this feature creates major difference in terms of calculations, end analytic results. But to avoid their ignoring, handling of NULL values in the columns is necessary because ignoring of NULL values might produce incorrect results. For instance, refer the below figure and queries Above figure shows the difference created in the Average salary of the employees due to NULL value. In the first screen shot, four salary for employees add up to 13785. But since average ignored NULL values column, it calculated average of only 3 employees as 4595, which is wrong. Next screen shot shows the handling of NULL value. It has bee replaced by zero using NULL handling function. Now, total salary gets distirbuted in 4 employees 5. Nesting of Group functions Multiple group by functions can be used together to produce meaningful results. Suppose, you require maximum average across all the departments or the department with maximum employees, in such cases comes the application of nested group function. Closely observing the requirement, it says Department with Maximum Count of employee. Code (SQL): SQL>SELECT MAX(COUNT(EMPNO)) FROM EMP GROUP BY DEPARTMENT_ID Notes: Group by function must be specified in the query. Multiple single-row functions can be nested together, but group functions can be nested up to two levels only. The nested function call COUNT(SUM(AVG( X))) returns the error, “ORA-00935: group function is nested too deeply.” However, using nested single-row functions within group functions works fine. Consider the following query: Code (SQL): SELECT SUM(AVG(LENGTH(LAST_NAME))) FROM EMPLOYEES GROUP BY DEPARTMENT_ID It calculates the sum of the average length of LAST_NAME values per department. References