With Group by we can arrange rows in groups and apply aggregate functions against those groups.
USE TSQL2012;
SELECT COUNT(*) AS numorders
FROM Sales.Orders;
This query generates the following output.
numorders
———–
830
Because there is no GROUP BY clause, all rows queried from the Sales.Orders table are arranged in one group, and then the COUNT(*) function counts the number of rows in that group. Grouped queries return one result row per group, and because the query defines only one group, it returns only one row in the result set.
Now Using GROUP BY clause, you can group the rows based on a specified grouping set of expressions.
For example, the following query groups the rows by shipper ID and
counts the number of rows (orders, in this case) per each distinct group.
SELECT shipperid, COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY shipperid;
This query generates the following output.
shipperid numorders
———– ———–
1 249
2 326
3 255
The query identifies three groups because there are three distinct shipper IDs.
The grouping set can be made of multiple elements. For example, the following query groups the rows by shipper ID and shipped year.
SELECT shipperid, YEAR(shippeddate) AS shippedyear,
COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY shipperid, YEAR(shippeddate);
shipperid shippedyear numorders
———– ———– ———–
1 2008 79
3 2008 73
1 NULL 4
3 NULL 6
1 2006 36
2 2007 143
2 NULL 11
3 2006 51
1 2007 130
2 2008 116
2 2006 56
3 2007 125
Notice that you get a group for each distinct shipper ID and shipped year combination that exists in the data, even when the shipped year is NULL. Remember that a NULL in the shippeddate column represents unshipped orders, so a NULL in the shippedyear column represents the group of unshipped orders for the respective shipper.
If you need to filter entire groups, you need a filtering option that is evaluated at the group level not like the WHERE clause, which is evaluated at the row level. For this, SQL provides the HAVING clause. Like the WHERE clause, the HAVING clause uses a predicate but evaluates the predicate per group as opposed to per row. This means that you can refer to aggregate computations because the data has already been grouped.
For example, suppose that you need to group only shipped orders by shipper ID and shipping year, and filter only groups having fewer than 100 orders. You can use the following query to achieve this task.
SELECT shipperid, YEAR(shippeddate) AS shippedyear,
COUNT(*) AS numorders
FROM Sales.Orders
WHERE shippeddate IS NOT NULL
GROUP BY shipperid, YEAR(shippeddate)
HAVING COUNT(*) < 100;
This query generates the following output
shipperid shippedyear numorders
———– ———– ———–
1 2008 79
3 2008 73
1 2006 36
3 2006 51
2 2006 56
Notice that the query filters only shipped orders in the WHERE clause. This filter is applied at the row level conceptually before the data is grouped. Next the query groups the data by shipper ID and shipped year. Then the HAVING clause filters only groups that have a count of rows (orders) that is less than 100. Finally, the SELECT clause returns the shipper ID, shipped year, and count of orders per each remaining group.
The following query invokes the COUNT(*) function, in addition to a number of general set functions, including COUNT.
SELECT shipperid,
COUNT(*) AS numorders,
COUNT(shippeddate) AS shippedorders,
MIN(shippeddate) AS firstshipdate,
MAX(shippeddate) AS lastshipdate,
SUM(val) AS totalvalue
FROM Sales.OrderValues
GROUP BY shipperid;
This query generates the following output (dates formatted for readability).
shipperid numorders shippedorders firstshipdate lastshipdate totalvalue
———– ———- ———— ————- ————- ———–
3 255 249 2006-07-15 2008-05-01 383405.53
1 249 245 2006-07-10 2008-05-04 348840.00
2 326 315 2006-07-11 2008-05-06 533547.69
The difference in count(*) and count(shippeddate) is count(*) does not ignore NULL where count(shippeddate) does.
With general set functions, you can work with distinct occurrences by specifying a DISTINCT clause before the expression, as follows.
SELECT shipperid, COUNT(DISTINCT shippeddate) AS numshippingdates
FROM Sales.Orders
GROUP BY shipperid;
This query generates the following output.
shipperid numshippingdates
———– —————–
1 188
2 215
3 198
SELECT shipperid, COUNT(shippeddate) AS numshippingdates
FROM Sales.Orders
GROUP BY shipperid;
This query generates the following output.
shipperid numshippingdates
———– —————–
1 249
2 245
3 315
Note that the DISTINCT option is available not only to the COUNT function, but also to other general set functions. However, it’s more common to use it with COUNT.
SELECT S.shipperid, S.companyname,
COUNT(*) AS numorders
FROM Sales.Shippers AS S
INNER JOIN Sales.Orders AS O
ON S.shipperid = O.shipperid
GROUP BY S.shipperid, S.companyname;
This query generates the following output.
shipperid companyname numorders
———– ————– ———–
1 Shipper GVSUA 249
2 Shipper ETYNR 326
3 Shipper ZHISN 255
Thanks