Chuck's Code Blog

Assorted solutions and musings

Using OVER() with Aggregate Functions


I was recently asked about a T-SQL function that I hadn’t heard about, so I did some digging around and found out that the OVER() function was added to Microsoft SQL 2005.  It gives you the ability to add aggregate functions to any SELECT (even without a GROUP BY clause) by specifying a partition for each function. The way it works is similar to joining an aggregated copy of a SELECT to itself. For example, consider the following:

SELECT customerID, productID, orderDate, orderAmount
FROM Orders

customerID  productID   orderDate               orderAmount
———– ———– ———————– ———————
1           1           2007-01-01 00:00:00.000 20.00
1           2           2007-01-02 00:00:00.000 30.00
1           2           2007-01-05 00:00:00.000 23.00
1           3           2007-01-04 00:00:00.000 18.00
2           1           2007-01-03 00:00:00.000 74.00
2           1           2007-01-06 00:00:00.000 34.00
2           2           2007-01-08 00:00:00.000 10.00

(7 row(s) affected)

You can now easily return the total orderAmount per customer as an additional column in this SELECT, simply by adding an aggregate SUM() function with an OVER() clause:

SELECT customerID,  productID, orderDate, orderAmount,
      SUM(orderAmount) OVER (Partition by CustomerID) AS Total
FROM Orders

customerID  productID   orderDate               orderAmount   Total
----------- ----------- ----------------------- ------------- ---------
1           1           2007-01-01 00:00:00.000 20.00         91.00
1           2           2007-01-02 00:00:00.000 30.00         91.00
1           2           2007-01-05 00:00:00.000 23.00         91.00
1           3           2007-01-04 00:00:00.000 18.00         91.00
2           1           2007-01-03 00:00:00.000 74.00         118.00
2           1           2007-01-06 00:00:00.000 34.00         118.00
2           2           2007-01-08 00:00:00.000 10.00         118.00

(7 row(s) affected)

The previous SQL is essentially shorthand for:

SELECT
    o.customerID, o.productID, o.orderDate, o.orderAmount, t.Total
FROM
    Orders o
INNER JOIN
   (
    SELECT customerID, SUM(orderAmount) AS Total
    FROM Orders
    GROUP BY customerID
   )
  t ON t.customerID = o.customerID

since the two return the same results.

Note that the total returned using SUM(..) OVER (..) is not the total for the entire table, just for the scope of the SELECT where it is used. For example, if you add a filter to the SELECT to return only rows for ProductID 2, the totals will reflect that criteria as well:

SELECT customerID,  productID, orderDate, orderAmount,
      SUM(orderAmount) OVER (Partition by CustomerID) AS Total
FROM Orders
WHERE productID = 2

customerID  productID   orderDate               orderAmount   Total
----------- ----------- ----------------------- ------------  ------------
1           2           2007-01-02 00:00:00.000 30.00         53.00
1           2           2007-01-05 00:00:00.000 23.00         53.00
2           2           2007-01-08 00:00:00.000 10.00         10.00

(3 row(s) affected)

That is a nice advantage over the old way of linking to a derived table, since in that case you’d need to repeat the criteria for both the primary (outer) SELECT and also the derived table.

Typically, SUM(..) OVER(..) is most useful for calculating a percentage of a total for each row. For example, for each Order we can calculate the percentage of that order’s orderAmount compared to the customer’s total orderAmount:

SELECT customerID,  productID, orderDate, orderAmount,
       orderAmount / SUM(orderAmount) OVER (Partition by CustomerID) AS Pct
FROM Orders

customerID  productID   orderDate               orderAmount   Pct
----------- ----------- ----------------------- ------------  -------
1           1           2007-01-01 00:00:00.000 20.00         0.2197
1           2           2007-01-02 00:00:00.000 30.00         0.3296
1           2           2007-01-05 00:00:00.000 23.00         0.2527
1           3           2007-01-04 00:00:00.000 18.00         0.1978
2           1           2007-01-03 00:00:00.000 74.00         0.6271
2           1           2007-01-06 00:00:00.000 34.00         0.2881
2           2           2007-01-08 00:00:00.000 10.00         0.0847

(7 row(s) affected)

Of course, be sure that you don’t encounter any divide by zero errors by using a CASE if necessary.

While I’ve made many references to using the SUM() function, of course this technique works with any of the other aggregate functions as well, such as MIN() or AVG(). For example, you could return only Orders where the orderAmount is below the average for the product that was ordered using this:

SELECT x.*
FROM
(
	SELECT customerId, productID, orderDate, orderAmount,
		AVG(orderAmount) OVER (partition BY productID) AS ProductAvg
	FROM orders
) x
WHERE x.orderAmount < x.productAvg

customerId  productID   orderDate               orderAmount   ProductAvg
----------- ----------- ----------------------- ------------- -----------
1           1           2007-01-01 00:00:00.000 20.00         42.6666
2           1           2007-01-06 00:00:00.000 34.00         42.6666
2           2           2007-01-08 00:00:00.000 10.00         21.00

(3 row(s) affected)

It is my understanding that some SQL implementations allow you to use SUM(..) OVER (..) to calculate running totals for a SELECT, but unfortunately that does not appear to be possible using SQL Server 2005. However, there are other ways to accomplish this in T-SQL if you really need to; my general recommendation is to do this at your presentation layer if those totals are not needed for further processing at the database.

Advertisements

January 6, 2012 Posted by | Uncategorized | , , | Leave a comment

New Twitter “Tweet Button”

New Twitter “Tweet Button” For those of you who have been dreaming of an easier way for your readers to share your posts on Twitter, that day has come. WordPress has added an official Tweet Button as an option for all WordPress.com blogs.  How it works: When one of your readers hits the Tweet Button, they will be shown a popup that includes a shortened link to your post. Readers can add in a quick message, and then hit “Tweet” to send the post to their … Read More

August 18, 2010 Posted by | Uncategorized | Leave a comment