Chapter 7.第7章. Beyond the fundamentals of querying超越查询的基本原理

This chapter covers querying topics that go beyond the fundamentals and is provided as optional reading. 本章涵盖了超出基础知识的查询主题,并作为可选阅读提供。It starts with the powerful window functions, which you can use to apply data-analysis calculations in a flexible and efficient manner. 它从强大的窗口函数开始,您可以使用这些函数以灵活高效的方式应用数据分析计算。The chapter continues with techniques for pivoting and unpivoting data. 本章继续介绍数据透视和反透视技术。Pivoting rotates data from a state of rows to columns, and unpivoting rotates data from columns to rows, similar to pivot tables in Excel. 数据透视将数据从一个行状态旋转到另一个列,取消透视将数据从一列旋转到另一行,这与Excel中的数据透视表类似。The chapter finishes with a discussion about grouping sets, which are the sets of expressions that you group the data by. 本章以关于分组集的讨论结束,分组集是对数据进行分组的表达式集。It covers techniques for defining multiple grouping sets in the same query.它涵盖了在同一查询中定义多个分组集的技术。

What's common to the features covered in this chapter, beyond being more advanced than other features in this book, is that they're mostly used for analytical purposes. 本章介绍的功能除了比本书中的其他功能更先进之外,还有一个共同点,那就是它们主要用于分析目的。If you need more details about these features beyond what's covered in this chapter, you can find them in the books: T-SQL Querying (Microsoft Press, 2015) and Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions (Microsoft Press, 2012).如果您需要有关这些功能的更多详细信息,请参阅以下书籍:《T-SQL查询》(Microsoft出版社,2015)和《Microsoft SQL Server 2012使用窗口函数的高性能T-SQL》(Microsoft出版社,2012)。

Window functions窗口函数

A window function is a function that, for each row, computes a scalar result value based on a calculation against a subset of the rows from the underlying query. 窗口函数是一个函数,它基于对基础查询中的行子集的计算,为每一行计算标量结果值。The subset of rows is known as a window and is based on a window descriptor that relates to the current row. 行的子集称为“窗口”,它基于与当前行相关的窗口描述符。The syntax for window functions uses a clause called OVER, in which you provide the window specification.窗口函数的语法使用了一个名为OVER的子句,您可以在其中提供窗口规范。

If this sounds too technical, simply think of the need to perform a calculation against a set and return a single value. 如果这听起来太技术化,只需考虑对集合执行计算并返回单个值的需要。A classic example is aggregate calculations—such as SUM, COUNT, and AVG—but there are others as well, such as ranking and offset functions. 一个典型的例子是聚合计算,比如SUMCOUNTAVG,但也有其他的,比如排名函数和偏移函数。You're familiar already with a couple of ways to apply aggregate calculations—one is by using grouped queries, and another is by using subqueries. 您已经熟悉了应用聚合计算的两种方法,一种是使用分组查询,另一种是使用子查询。However, both options have shortcomings that window functions elegantly resolve.然而,这两个选项都有窗口功能优雅解决的缺点。


Image Note

If the upcoming explanations about the design of window functions seem a bit overwhelming, bear with me until I show code samples, which should help clarify things. 如果即将到来的关于窗口函数设计的解释似乎有点让人不知所措,请耐心等待,直到我展示代码示例,这将有助于澄清问题。After you see a couple of code samples, you might want to reread the next several paragraphs.在看到两个代码示例后,您可能需要重新阅读接下来的几段。


Grouped queries do provide insights into new information in the form of aggregates, but they also cause you to lose something—the detail. 分组查询确实以聚合的形式提供对新信息的洞察,但它们也会导致您丢失一些细节。After you group the rows, all computations in the query have to be done in the context of the defined groups. 将行分组后,查询中的所有计算都必须在定义的组的上下文中完成。Often, you need to perform calculations that involve both detail and aggregate elements. 通常,您需要执行同时涉及细节和聚合元素的计算。Window functions are not limited in the same way. 窗口功能不受相同方式的限制。A window function is evaluated per detailed row, and it's applied to a subset of rows that is derived from the underlying query result set. 窗口函数按详细的行计算,并应用于从基础查询结果集派生的行子集。The result of the window function is a scalar value, which is added as another column to the query result. 窗口函数的结果是一个标量值,它作为另一列添加到查询结果中。In other words, unlike grouped functions, window functions don't cause you to lose the detail. 换句话说,与分组函数不同,窗口函数不会导致丢失细节。For example, suppose you want to query order values and return the current order value and the percent it constitutes out of the customer total. 例如,假设要查询订单值,并返回当前订单值及其占客户总数的百分比。If you group by the customer, you can get only the customer total. 如果按客户分组,则只能得到客户总数。With a window function, you can return the customer total in addition to the detail order value, and you can even compute the percent of the current order value out of the customer total. 通过一个窗口函数,您可以在返回详细订单值的同时返回客户总数,甚至可以计算当前订单值占客户总数的百分比。I'll demonstrate the code to achieve this later in the chapter.我将在本章后面演示实现这一点的代码。

As for subqueries, you can use them to apply a scalar aggregate calculation against a set, but their starting point is a fresh view of the data rather than the underlying query result set. 至于子查询,可以使用它们对集合应用标量聚合计算,但它们的起点是数据的新视图,而不是底层查询结果集。Suppose the underlying query has table operators, filters, and other query elements; those do not affect what a subquery sees as its starting point. 假设基础查询具有表运算符、筛选器和其他查询元素;这些不会影响子查询的起点。If you need the subquery to apply to the underlying query result set as its starting point, you need to repeat all the underlying query logic in the subquery. 如果需要将子查询应用于基础查询结果集作为其起点,则需要重复子查询中的所有基础查询逻辑。In contrast, a window function is applied to a subset of rows from the underlying query's result set—not a fresh view of the data. 相反,窗口函数应用于基础查询结果集中的行子集,而不是数据的新视图。Therefore, anything you add to the underlying query is automatically applicable to all window functions used in the query. 因此,添加到基础查询中的任何内容都会自动适用于查询中使用的所有窗口函数。If you want, you can further restrict the window.如果需要,可以进一步限制窗口。

Another benefit of using window functions is that you gain the ability to define order, when applicable, as part of the specification of the calculation. 使用窗口函数的另一个好处是,您可以在适用时将顺序定义为计算规范的一部分。This does not conflict with relational aspects of the result. 这与结果的相关方面并不冲突。That is, order is defined for the calculation and not confused with presentation ordering. 也就是说,顺序是为计算定义的,不会与表示顺序混淆。The ordering specification for the window function, if applicable, is different from the ordering specification for presentation. 窗口功能的订购规范(如适用)不同于演示文稿的订购规范。If you don't include a presentation ORDER BY clause, you have no assurances that the result will be returned in a particular order. 如果没有包含ORDER BY子句,则无法保证结果将按特定顺序返回。If you do decide to force a certain presentation ordering, the resulting ordering can be different than the ordering for the window function.如果您确实决定强制执行某个演示文稿排序,则生成的排序可能与窗口函数的排序不同。

Following is an example of a query against the Sales.EmpOrders view in the TSQLV4 database that uses a window aggregate function to compute the running-total values for each employee and month:以下是对TSQLV4数据库中Sales.EmpOrders视图的查询示例,该视图使用窗口聚合函数计算每个员工和月份的运行总值:

USE TSQLV4;

SELECT empid, ordermonth, val,
  SUM(val) OVER(PARTITION BY empid
                ORDER BY ordermonth
                ROWS BETWEEN UNBOUNDED PRECEDING
                         AND CURRENT ROW) AS runval
FROM Sales.EmpOrders;

Here's the output of this query, shown in abbreviated form:以下是该查询的输出,以缩写形式显示:

empid  ordermonth  val      runval
------ ----------- -------- ----------
1      2014-07-01  1614.88  1614.88
1      2014-08-01  5555.90  7170.78
1      2014-09-01  6651.00  13821.78
1      2014-10-01  3933.18  17754.96
1      2014-11-01  9562.65  27317.61
...
2      2014-07-01  1176.00  1176.00
2      2014-08-01  1814.00  2990.00
2      2014-09-01  2950.80  5940.80
2      2014-10-01  5164.00  11104.80
2      2014-11-01  4614.58  15719.38
...

(192 row(s) affected)

There are up to three parts in the definition of a window function, which you specify in a clause called OVER: the window-partition clause, window-order clause, and window-frame clause. 窗口函数的定义中最多有三个部分,您可以在名为OVER的子句中指定它们:窗口分区子句、窗口顺序子句和窗口框架子句。An empty OVER() clause represents the entire underlying query's result set. 一个空OVER()子句表示整个基础查询的结果集。Then anything you add to the window specification essentially restricts the window.那么,添加到窗口规范中的任何内容实际上都会限制窗口。

The window-partition clause (PARTITION BY) restricts the window to the subset of rows that have the same values in the partitioning columns as in the current row. 窗口分区子句(PARTITION BY)将窗口限制为分区列中的值与当前行中的值相同的行子集。In the last query, the window is partitioned by empid. 在最后一个查询中,窗口按empid进行分区。For an underlying row with employee ID 1, the window exposed to the function filters only the rows where the employee ID is 1.对于员工ID为1的基础行,函数公开的窗口仅筛选员工ID为1的行。

The window-order clause (ORDER BY) defines ordering, but don't confuse this with presentation ordering. 窗口顺序子句(ORDER BY)定义了顺序,但不要将其与表示顺序混淆。In a window aggregate function, window ordering supports a frame specification. 在窗口聚合函数中,窗口排序支持框架规范。In a window ranking function, window ordering gives meaning to the rank. 在窗口排名函数中,窗口排序赋予排名意义。In our example, the window ordering is based on ordermonth.在示例中,窗口排序基于ordermonth

A window-frame clause (ROWS BETWEEN <top delimiter> AND <bottom delimiter>) filters a frame, or a subset, of rows from the window partition between the two specified delimiters. 窗框子句(ROWS BETWEEN <top delimiter> AND <bottom delimiter>)在两个指定的分隔符之间筛选窗口分区中的一个框或行的子集。In this example, the frame is defined with no low boundary point (UNBOUNDED PRECEDING) and extends until the current row (CURRENT ROW). 在本例中,框架的定义没有低边界点(UNBOUNDED PRECEDING),并一直延伸到当前行(CURRENT ROW)。In addition to the window-frame unit ROWS, there's another unit called RANGE, but it's implemented in T-SQL in a limited capacity.除了窗框单元ROWS之外,还有另一个称为RANGE的单元,但它在T-SQL中以有限的容量实现。

Putting all these together, you get the running-total values for each employee and month from the function in the example.将所有这些放在一起,您可以从示例中的函数中获得每个员工和月份的运行总值。

Note that because the starting point of a window function is the underlying query's result set, and the underlying query's result set is generated only when you reach the SELECT phase, window functions are allowed only in the SELECT and ORDER BY clauses of a query. 请注意,由于窗口函数的起点是基础查询的结果集,并且基础查询的结果集仅在到达SELECT阶段时生成,因此窗口函数仅在查询的SELECTORDER BY子句中被允许。Mostly, you'll use window functions in the SELECT clause. 大多数情况下,您将在SELECT子句中使用窗口函数。If you need to refer to a window function in an earlier logical-query processing phase (such as WHERE), you need to use a table expression. 如果需要在早期的逻辑查询处理阶段(例如WHERE)中引用窗口函数,则需要使用表表达式。You specify the window function in the SELECT list of the inner query and assign it with an alias. 在内部查询的SELECT列表中指定窗口函数,并为其分配别名。Then, in the outer query, you can refer to that alias anywhere you like.然后,在外部查询中,您可以在任何地方引用该别名。

The windowing concept can take some getting used to, but when you're comfortable with it, you'll realize it's immensely powerful. 窗口的概念可能需要一些时间来适应,但当你对它感到满意时,你会意识到它非常强大。Beyond being used for the obvious data-analysis calculations, window functions can be used to perform a variety of tasks, typically more elegantly and more efficiently than with alternative methods.除了用于明显的数据分析计算之外,窗口函数还可用于执行各种任务,通常比其他方法更优雅、更高效。

The following sections cover ranking, offset, and aggregate window functions. 以下部分介绍排序、偏移和聚合窗口函数。Because this book is about fundamentals, I will not get into certain topics here. 因为这本书是关于基础知识的,所以我不会在这里讨论某些主题。Those topics include the optimization of window functions, distribution functions, and the RANGE window frame unit. 这些主题包括优化窗口函数、分布函数和RANGE窗框单元。For details about those items, see my book on the topic, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions (Microsoft Press, 2012).有关这些项目的详细信息,请参阅我的书《使用窗口函数的Microsoft SQL Server 2012高性能T-SQL》(Microsoft 出版社,2012)。

Ranking window functions排名窗口函数

You use ranking window functions to rank each row with respect to others in the window. 您可以使用排名窗口函数,根据窗口中的其他行对每一行进行排名。T-SQL supports four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. T-SQL支持四个排名函数:ROW_NUMBERRANKDENSE_RANKNTILEThe following query demonstrates the use of these functions:以下查询演示了这些函数的使用:

SELECT orderid, custid, val,
  ROW_NUMBER() OVER(ORDER BY val) AS rownum,
  RANK()       OVER(ORDER BY val) AS rank,
  DENSE_RANK() OVER(ORDER BY val) AS dense_rank,
  NTILE(100)   OVER(ORDER BY val) AS ntile
FROM Sales.OrderValues
ORDER BY val;

This query generates the following output, shown here in abbreviated form:此查询生成以下输出,以缩写形式显示:

orderid     custid      val       rownum  rank    dense_rank ntile
----------- ----------- --------- ------- ------- ---------- -----
10782       12          12.50     1       1       1          1
10807       27          18.40     2       2       2          1
10586       66          23.80     3       3       3          1
10767       76          28.00     4       4       4          1
10898       54          30.00     5       5       5          1
10900       88          33.75     6       6       6          1
10883       48          36.00     7       7       7          1
11051       41          36.00     8       7       7          1
10815       71          40.00     9       9       8          1
10674       38          45.00     10      10      9          1
...
10691       63          10164.80  821     821     786        10
10540       63          10191.70  822     822     787        10
10479       65          10495.60  823     823     788        10
10897       37          10835.24  824     824     789        10
10817       39          10952.85  825     825     790        10
10417       73          11188.40  826     826     791        10
10889       65          11380.00  827     827     792        10
11030       71          12615.05  828     828     793        10
10981       34          15810.00  829     829     794        10
10865       63          16387.50  830     830     795        10

(830 row(s) affected)

The ROW_NUMBER function assigns incremental sequential integers to the rows in the query result based on the mandatory window ordering. ROW_NUMBER函数根据强制窗口顺序为查询结果中的行分配递增顺序整数。In the sample query, the ordering is based on the val column; therefore, you can see in the output that when the value increases, the row number increases as well. 在示例查询中,排序基于val列;因此,您可以在输出中看到,当值增加时,行号也会增加。However, even when the ordering value doesn't increase, the row number still must increase. 然而,即使排序值没有增加,行号也必须增加。Therefore, if the ROW_NUMBER function's ORDER BY list is not unique, as in the preceding example, the query is nondeterministic—that is, more than one correct result is possible. 因此,如果ROW_NUMBER函数的ORDER BY列表不是唯一的,如上例所示,则查询是不确定的,也就是说,可能有多个正确的结果。For example, observe that two rows with the value 36.00 got the row numbers 7 and 8. 例如,观察值为36.00的两行得到了行号7和8。Any arrangement of these row numbers would be considered correct. 这些行号的任何排列都将被认为是正确的。If you want to make a row number calculation deterministic, you need to add a tiebreaker to the ORDER BY list to make it unique. 如果要使行号计算具有确定性,则需要向ORDER BY列表中添加一个决胜器,以使其唯一。For example, you can add the orderid column.例如,可以添加orderid列。

As mentioned, the ROW_NUMBER function must produce unique values even when there are ties in the ordering values, making it nondeterministic when there are ties. 如前所述,ROW_NUMBER函数必须生成唯一的值,即使排序值中存在关联,这使得存在关联时不确定性。If you want to produce the same rank value given the same ordering value, use the RANK or DENSE_RANK function instead. 如果要在给定相同排序值的情况下生成相同的秩值,请改用RANKDENSE_RANK函数。The difference between the two is that RANK reflects the count of rows that have a lower ordering value than the current row (plus 1), whereas DENSE_RANK reflects the count of distinct ordering values that are lower than the current row (plus 1). 两者之间的区别在于,RANK反映的是排序值低于当前行(加1)的行数,而DENSE_RANK反映的是排序值低于当前行(加1)的不同行数。For example, in the sample query, a rank of 9 indicates eight rows have lower values. 例如,在示例查询中,9表示有8行的值较低。In the same row, a dense rank of 8 indicates seven rows that have distinct lower values.在同一行中,密集的秩8表示有七行的值明显较低。

You use the NTILE function to associate the rows in the result with tiles (equally sized groups of rows) by assigning a tile number to each row. 通过为每一行分配一个平铺编号,可以使用NTILE函数将结果中的行与平铺(大小相等的行组)相关联。You specify the number of tiles you are after and window ordering. 您可以指定要购买的瓷砖数量和窗口顺序。The sample query has 830 rows and the request was for 10 tiles; therefore, the tile size is 83 (830 divided by 10). 示例查询有830行,请求的是10个tile;因此,方块尺寸为83(830除以10)。Window ordering is based on the val column. 窗口排序基于val列。This means that the 83 rows with the lowest values are assigned tile number 1, the next 83 are assigned tile number 2, the next 83 are assigned tile number 3, and so on. 这意味着具有最低值的83行被分配了磁贴编号1,接下来的83行被分配了磁贴编号2,接下来的83行被分配了磁贴编号3,依此类推。If the number of rows can't be evenly divided by the number of tiles, an extra row is added to each of the first tiles from the remainder. 如果行数不能被平铺数平均分割,那么将在剩余的每一个平铺中添加一行。For example, if 102 rows and five tiles were requested, the first two tiles would have 21 rows instead of 20.例如,如果请求102行和5个磁贴,前两个磁贴将有21行而不是20行。

Like all window functions, ranking functions support a window partition clause. 与所有窗口函数一样,排名函数支持窗口分区子句。Remember that window partitioning restricts the window to only those rows that have the same values in the partitioning attributes as in the current row. 请记住,窗口分区仅将窗口限制为分区属性中的值与当前行中的值相同的行。For example, the expression ROW_NUMBER() OVER(PARTITION BY custid ORDER BY val) assigns row numbers independently for each customer. 例如,表达式ROW_NUMBER() OVER(PARTITION BY custid ORDER BY val)为每个客户独立分配行号。Here's the expression in a query:下面是查询中的表达式:

SELECT orderid, custid, val,
  ROW_NUMBER() OVER(PARTITION BY custid
                    ORDER BY val) AS rownum
FROM Sales.OrderValues
ORDER BY custid, val;

This query generates the following output, shown here in abbreviated form:此查询生成以下输出,以缩写形式显示:

orderid     custid      val          rownum
----------- ----------- ------------ -------
10702       1           330.00       1
10952       1           471.20       2
10643       1           814.50       3
10835       1           845.80       4
10692       1           878.00       5
11011       1           933.50       6
10308       2           88.80        1
10759       2           320.00       2
10625       2           479.75       3
10926       2           514.40       4
10682       3           375.50       1
...

(830 row(s) affected)

Remember that window ordering has nothing to do with presentation ordering and does not change the nature of the result from being relational. 请记住,窗口顺序与表示顺序无关,不会改变结果的关系性质。If you need to guarantee presentation ordering, you have to add a presentation ORDER BY clause, as I did in the last two.如果您需要保证演示文稿的顺序,您必须添加一个演示ORDER BY子句,就像我在上两篇文章中所做的那样。

Window functions are logically evaluated as part of the SELECT list, before the DISTINCT clause is evaluated. 在计算DISTINCT子句之前,窗口函数作为SELECT列表的一部分进行逻辑计算。If you're wondering why it matters, I'll explain this with an example. 如果你想知道为什么这很重要,我会用一个例子来解释。Currently, the OrderValues view has 830 rows with 795 distinct values in the val column. 目前,OrderValues视图有830行,val列中有795个不同的值。Consider the following query and its output, shown here in abbreviated form:考虑以下查询及其输出,此处以缩写形式显示:

SELECT DISTINCT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM Sales.OrderValues;

val        rownum
---------- -------
12.50      1
18.40      2
23.80      3
28.00      4
30.00      5
33.75      6
36.00      7
36.00      8
40.00      9
45.00      10
...
12615.05   828
15810.00   829
16387.50   830

(830 row(s) affected)

The ROW_NUMBER function is processed before the DISTINCT clause. ROW_NUMBER函数在DISTINCT子句之前处理。First, unique row numbers are assigned to the 830 rows from the OrderValues view. 首先,从OrderValues视图为830行分配唯一的行号。Then the DISTINCT clause is processed—but there are no duplicate rows to remove. 然后处理DISTINCT子句,但没有要删除的重复行。The DISTINCT clause has no effect here. DISTINCT子句在此处无效。If you want to assign row numbers to the 795 unique values, you need to come up with a different solution. 如果要将行号指定给795唯一值,则需要想出不同的解决方案。For example, because the GROUP BY phase is processed before the SELECT phase, you can use the following query:例如,由于GROUP BY阶段是在SELECT阶段之前处理的,因此可以使用以下查询:

SELECT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM Sales.OrderValues
GROUP BY val;

This query generates the following output, shown here in abbreviated form:此查询生成以下输出,以缩写形式显示:

val       rownum
--------- -------
12.50     1
18.40     2
23.80     3
28.00     4
30.00     5
33.75     6
36.00     7
40.00     8
45.00     9
48.00     10
...
12615.05  793
15810.00  794
16387.50  795

(795 row(s) affected)

Here, the GROUP BY phase produces 795 groups for the 795 distinct values, and then the SELECT phase produces a row for each group, with val and a row number based on val order. 这里,GROUP BY阶段为795个不同的值生成795个组,然后SELECT阶段为每个组生成一行,其中包含val和基于val顺序的行号。You'll get to work on an alternative solution in this chapter's exercises.在本章的练习中,您将学习另一种解决方案。

Offset window functions偏移窗口函数

You use offset window functions to return an element from a row that is at a certain offset from the current row or at the beginning or end of a window frame. 使用偏移窗口函数可以从与当前行或窗框的开头或结尾有一定偏移的行返回元素。T-SQL supports two pairs of offset functions: LAG and LEAD, and FIRST_VALUE and LAST_VALUE.T-SQL支持两对偏移函数:LAGLEAD,以及FIRST_VALUELAST_VALUE

The LAG and LEAD functions support window partitions and window order clauses. LAGLEAD函数支持窗口分区和窗口顺序子句。There's no relevance to window framing here. 此处与窗框无关。You use these functions to obtain an element from a row that is at a certain offset from the current row within the partition, based on the indicated ordering. 您可以使用这些函数根据指定的顺序,从与分区内当前行有一定偏移量的行中获取元素。The LAG function looks before the current row, and the LEAD function looks ahead. LAG函数在当前行之前,LEAD函数在前面。The first argument to the functions (which is mandatory) is the element you want to return; the second argument (optional) is the offset (1 if not specified); the third argument (optional) is the default value to return if there is no row at the requested offset (which is NULL if not specified otherwise).函数的第一个参数(强制)是要返回的元素;第二个参数(可选)是偏移量(如果未指定,则为1);第三个参数(可选)是在请求的偏移量处没有行时要返回的默认值(如果未另行指定,则为NULL)。

As an example, the following query returns order information from the OrderValues view. 例如,以下查询从OrderValues视图返回订单信息。For each customer order, the query uses the LAG function to return the value of the previous customer's order and the LEAD function to return the value of the next customer's order:对于每个客户订单,查询使用LAG函数返回上一个客户订单的值,使用LEAD函数返回下一个客户订单的值:

SELECT custid, orderid, val,
  LAG(val)  OVER(PARTITION BY custid
                 ORDER BY orderdate, orderid) AS prevval,
  LEAD(val) OVER(PARTITION BY custid
                 ORDER BY orderdate, orderid) AS nextval
FROM Sales.OrderValues
ORDER BY custid, orderdate, orderid;

Here's the output of this query in abbreviated form:以下是该查询的简略输出:

custid  orderid  val      prevval  nextval
------- -------- -------- -------- --------
1       10643    814.50   NULL     878.00
1       10692    878.00   814.50   330.00
1       10702    330.00   878.00   845.80
1       10835    845.80   330.00   471.20
1       10952    471.20   845.80   933.50
1       11011    933.50   471.20   NULL
2       10308    88.80    NULL     479.75
2       10625    479.75   88.80    320.00
2       10759    320.00   479.75   514.40
2       10926    514.40   320.00   NULL
3       10365    403.20   NULL     749.06
3       10507    749.06   403.20   1940.85
3       10535    1940.85  749.06   2082.00
3       10573    2082.00  1940.85  813.37
3       10677    813.37   2082.00  375.50
3       10682    375.50   813.37   660.00
3       10856    660.00   375.50   NULL
...

(830 row(s) affected)

Because you didn't indicate the offset, the functions assumed 1 by default. 因为没有指明偏移量,所以函数默认为1LAG obtained the value of the immediately previous customer's order, and LEAD obtained it from the next order. LAG获得了上一个客户订单的值,LEAD从下一个订单获得了值。Also, because you didn't specify a third argument, NULL was assumed by default when there was no previous or next row. 此外,由于没有指定第三个参数,当没有上一行或下一行时,默认情况下假定为NULLThe expression LAG(val, 3, 0) obtains the value from three rows back and returns 0 if a row wasn't found.表达式LAG(val, 3, 0)从后面三行中获取值,如果找不到行,则返回0。

In this example, I just returned the values from the previous and next orders, but normally you compute something based on the returned values. 在本例中,我只是返回了上一个订单和下一个订单的值,但通常您会根据返回的值计算一些内容。For example, you can compute the difference between the values of the current and previous customers' orders using val – LAG(val) OVER(...). 例如,您可以使用val – LAG(val) OVER(...)计算当前和以前客户订单的值之间的差异。Or you can compute the difference between the current and next customers' orders using val – LEAD(val) OVER(...).或者,您可以使用val – LEAD(val) OVER(...)计算当前和下一个客户订单之间的差异。

You use the FIRST_VALUE and LAST_VALUE functions to return an element from the first and last rows in the window frame, respectively. 使用FIRST_VALUELAST_VALUE函数分别从窗口框架的第一行和最后一行返回元素。Therefore, these functions support window-partition, window-order, and window-frame clauses. 因此,这些函数支持窗口分区、窗口顺序和窗口框架子句。If you want the element from the first row in the window partition, use FIRST_VALUE with the window frame extent ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. 如果需要窗口分区中第一行的元素,请将FIRST_VALUEROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW的窗框范围一起使用。If you want the element from the last row in the window partition, use LAST_VALUE with the window frame extent ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING. 如果需要窗口分区中最后一行中的元素,请将LAST_VALUEROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING窗框范围一起使用。Note that if you specify ORDER BY without a window-frame unit (such as ROWS), the bottom delimiter will by default be CURRENT ROW, and clearly that's not what you want with LAST_VALUE. 请注意,如果在没有窗口框架单元(例如ROWS)的情况下指定ORDER BY,则默认情况下,底部分隔符将是当前行,显然这不是您想要的LAST_VALUEAlso, for performance-related reasons that are beyond the scope of this book, you should be explicit about the window-frame extent even for FIRST_VALUE.此外,出于超出本书范围的与性能相关的原因,即使对于FIRST_VALUE,也应该明确说明窗框的范围。

As an example, the following query uses the FIRST_VALUE function to return the value of the first customer's order and the LAST_VALUE function to return the value of the last customer's order:例如,以下查询使用FIRST_VALUE函数返回第一个客户订单的值,使用LAST_VALUE函数返回最后一个客户订单的值:

SELECT custid, orderid, val,
  FIRST_VALUE(val) OVER(PARTITION BY custid
                        ORDER BY orderdate, orderid
                        ROWS BETWEEN UNBOUNDED PRECEDING
                                 AND CURRENT ROW) AS firstval,
  LAST_VALUE(val)  OVER(PARTITION BY custid
                        ORDER BY orderdate, orderid
                        ROWS BETWEEN CURRENT ROW
                                 AND UNBOUNDED FOLLOWING) AS lastval
FROM Sales.OrderValues
ORDER BY custid, orderdate, orderid;

This query generates the following output, shown here in abbreviated form:此查询生成以下输出,此处以缩写形式显示:

custid  orderid  val      firstval  lastval
------- -------- -------- --------- --------
1       10643    814.50   814.50    933.50
1       10692    878.00   814.50    933.50
1       10702    330.00   814.50    933.50
1       10835    845.80   814.50    933.50
1       10952    471.20   814.50    933.50
1       11011    933.50   814.50    933.50
2       10308    88.80    88.80     514.40
2       10625    479.75   88.80     514.40
2       10759    320.00   88.80     514.40
2       10926    514.40   88.80     514.40
3       10365    403.20   403.20    660.00
3       10507    749.06   403.20    660.00
3       10535    1940.85  403.20    660.00
3       10573    2082.00  403.20    660.00
3       10677    813.37   403.20    660.00
3       10682    375.50   403.20    660.00
3       10856    660.00   403.20    660.00
...

(830 row(s) affected)

As with LAG and LEAD, normally you compute something based on the returned values. LAGLEAD一样,通常根据返回的值计算一些内容。For example, you can compute the difference between the current and the first customer's order values: val – FIRST_VALUE(val) OVER(...). 例如,您可以计算当前客户订单值和第一个客户订单值之间的差值:val – FIRST_VALUE(val) OVER(...)Or you can compute the difference between the current and last customer's order values: val – LAST_VALUE(val) OVER(...).或者,您可以计算当前客户订单值与上次客户订单值之间的差值:val – LAST_VALUE(val) OVER(...)

Aggregate window functions聚合窗口函数

You use aggregate window functions to aggregate the rows in the defined window. 使用聚合窗口函数聚合定义窗口中的行。They support window-partition, window-order, and window-frame clauses.它们支持窗口分区、窗口顺序和窗口框架子句。

I'll start with an example that doesn't involve ordering and framing. 我将从一个不涉及排序和框架的示例开始。Recall that using an OVER clause with empty parentheses exposes a window of all rows from the underlying query's result set to the function. For example, SUM(val) OVER() returns the grand total of all values. 回想一下,使用带有空括号的OVER子句会向函数公开一个包含基础查询结果集中所有行的窗口。例如,SUM(val) OVER()返回所有值的总和。If you do add a window-partition clause, you expose a restricted window to the function, with only those rows from the underlying query's result set that share the same values in the partitioning elements as in the current row. 如果确实添加了一个窗口分区子句,则向函数公开一个受限窗口,只有来自基础查询结果集中的那些行在分区元素中与当前行中共享相同的值。For example, SUM(val) OVER(PARTITION BY custid) returns the total values for the current customer.例如,SUM(val) OVER(PARTITION BY custid)返回当前客户的总值。

Here's a query against OrderValues that returns, along with each order, the grand total of all order values, as well as the customer total:下面是对OrderValues的查询,该查询将与每个订单一起返回所有订单值的总计以及客户总计:

SELECT orderid, custid, val,
  SUM(val) OVER() AS totalvalue,
  SUM(val) OVER(PARTITION BY custid) AS custtotalvalue
FROM Sales.OrderValues;

This query returns the following output, shown here in abbreviated form:此查询返回以下输出,此处以缩写形式显示:

orderid     custid      val          totalvalue       custtotalvalue
----------- ----------- ------------ ---------------- ---------------
10643       1           814.50       1265793.22       4273.00
10692       1           878.00       1265793.22       4273.00
10702       1           330.00       1265793.22       4273.00
10835       1           845.80       1265793.22       4273.00
10952       1           471.20       1265793.22       4273.00
11011       1           933.50       1265793.22       4273.00
10926       2           514.40       1265793.22       1402.95
10759       2           320.00       1265793.22       1402.95
10625       2           479.75       1265793.22       1402.95
10308       2           88.80        1265793.22       1402.95
10365       3           403.20       1265793.22       7023.98
...

(830 row(s) affected)

The totalvalue column shows, for each row, the grand total of all values. totalvalue列显示每行所有值的总计。The column custtotalvalue has the total values for the current customer.custtotalvalue包含当前客户的总值。


Image Important

As mentioned at the beginning of the chapter, one of the great advantages of window functions is that they don't hide the detail. 正如本章开头提到的,窗口函数的一个最大优点是它们不会隐藏细节。This means you can write expressions that mix detail and aggregates. 这意味着您可以编写混合细节和聚合的表达式。The next example demonstrates this.下一个示例演示了这一点。


As an example of mixing detail and aggregates, the following query calculates for each row the percentage of the current value out of the grand total, as well as out of the customer total:作为混合明细和聚合的示例,以下查询为每一行计算当前值在总计和客户总计中的百分比:

SELECT orderid, custid, val,
  100. * val / SUM(val) OVER() AS pctall,
  100. * val / SUM(val) OVER(PARTITION BY custid) AS pctcust
FROM Sales.OrderValues;

This query returns the following output, shown here in abbreviated form:此查询返回以下输出,此处以缩写形式显示:

orderid      custid val         pctall                        pctcust
------------ ------ ----------- ----------------------------- -----------------------------
10643        1      814.50      0.0643470029014691672941      19.0615492628130119354083
10692        1      878.00      0.0693636200705830925528      20.5476246197051252047741
10702        1      330.00      0.0260706089103558320528      7.7229113035338169904048
10835        1      845.80      0.0668197606556938265161      19.7940556985724315469225
10952        1      471.20      0.0372256694501808123130      11.0273812309852562602387
11011        1      933.50      0.0737482224782338461253      21.8464778843903580622513
10926        2      514.40      0.0406385491620819394181      36.6655974910011048148544
10759        2      320.00      0.0252805904585268674452      22.8090808653195053280587
10625        2      479.75      0.0379011352264945770526      34.1958017035532271285505
10308        2      88.80       0.0070153638522412057160      6.3295199401261627285362
10365        3      403.20      0.0318535439777438529809      5.7403352515240647040566
...

(830 row(s) affected)

Aggregate window functions also support a window frame. The frame allows for more sophisticated calculations, such as running and moving aggregates, YTD and MTD calculations, and others. 聚合窗口函数还支持窗口框架。该框架允许进行更复杂的计算,例如运行和移动骨料、YTD和MTD计算等。Let's re-examine the query I used in the introduction to the section about window functions:让我们重新检查一下我在介绍窗口函数一节中使用的查询:

SELECT empid, ordermonth, val,
  SUM(val) OVER(PARTITION BY empid
                ORDER BY ordermonth
                ROWS BETWEEN UNBOUNDED PRECEDING
                         AND CURRENT ROW) AS runval
FROM Sales.EmpOrders;

This query generates the following output (abbreviated):此查询生成以下输出(缩写):

empid  ordermonth  val      runval
------ ----------- -------- ----------
1      2014-07-01  1614.88  1614.88
1      2014-08-01  5555.90  7170.78
1      2014-09-01  6651.00  13821.78
1      2014-10-01  3933.18  17754.96
1      2014-11-01  9562.65  27317.61
...
2      2014-07-01  1176.00  1176.00
2      2014-08-01  1814.00  2990.00
2      2014-09-01  2950.80  5940.80
2      2014-10-01  5164.00  11104.80
2      2014-11-01  4614.58  15719.38
...

(192 row(s) affected)

Each row in the EmpOrders view holds information about the order activity for an employee and month. EmpOrders视图中的每一行都保存有关员工和月份的订单活动的信息。The query returns for each employee and month the monthly value, plus the running-total values from the beginning of the employee's activity until the current month. 查询返回每个员工和月份的月度值,加上从员工活动开始到当月的运行总值。To apply the calculation to each employee independently, you partition the window by empid. 要将计算单独应用于每个员工,请按empid对窗口进行分区。Then you define ordering based on ordermonth, giving meaning to the window frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This frame means “all activity from the beginning of the partition until the current month.”

T-SQL supports other delimiters for the ROWS window-frame unit. You can indicate an offset back from the current row as well as an offset forward. For example, to capture all rows from two rows before the current row until one row ahead, you use ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING. Also, if you do not want an upper bound, you can use UNBOUNDED FOLLOWING.

Because window functions are so profound and have so many practical uses, I urge you to invest the time and effort to get to know the concept well.因为窗口功能非常深刻,有很多实际用途,我敦促您投入时间和精力来更好地了解这个概念。

Pivoting data数据透视

Pivoting data involves rotating data from a state of rows to a state of columns, possibly aggregating values along the way. 数据透视涉及将数据从行的状态旋转到列的状态,可能会在此过程中聚合值。In many cases, the pivoting of data is handled by the presentation layer for purposes such as reporting. 在许多情况下,数据透视由表示层处理,以用于报告等目的。This section teaches you how to handle pivoting with T-SQL for cases you do decide to handle in the database.本节将教您如何使用T-SQL处理决定在数据库中处理的情况。

For the rest of the topics in this chapter, I use a sample table called dbo.Orders that you create and populate in the TSQLV4 database by running Listing 7-1.对于本章的其余主题,我使用一个名为dbo.Orders的示例表,您可以通过运行Listing 7-1TSQLV4数据库中创建和填充该表。

LISTING 7-1 Code to create and populate the dbo.Orders table创建和填充dbo.Orders表的代码


USE TSQLV4;

DROP TABLE IF EXISTS dbo.Orders;

CREATE TABLE dbo.Orders
(
  orderid   INT        NOT NULL,
  orderdate DATE       NOT NULL,
  empid     INT        NOT NULL,
  custid    VARCHAR(5) NOT NULL,
  qty       INT        NOT NULL,
  CONSTRAINT PK_Orders PRIMARY KEY(orderid)
);

INSERT INTO dbo.Orders(orderid, orderdate, empid, custid, qty)
VALUES
  (30001, '20140802', 3, 'A', 10),
  (10001, '20141224', 2, 'A', 12),
  (10005, '20141224', 1, 'B', 20),
  (40001, '20150109', 2, 'A', 40),
  (10006, '20150118', 1, 'C', 14),
  (20001, '20150212', 2, 'B', 12),
  (40005, '20160212', 3, 'A', 10),
  (20002, '20160216', 1, 'C', 20),
  (30003, '20160418', 2, 'B', 15),
  (30004, '20140418', 3, 'C', 22),
  (30007, '20160907', 3, 'D', 30);

SELECT * FROM dbo.Orders;


The query at the end of Listing 7-1 produces the following output:Listing 7-1末尾的查询生成以下输出:

orderid     orderdate   empid          custid    qty
----------- ----------- -------------- --------- -----------
10001       2014-12-24  2              A         12
10005       2014-12-24  1              B         20
10006       2015-01-18  1              C         14
20001       2015-02-12  2              B         12
20002       2016-02-16  1              C         20
30001       2014-08-02  3              A         10
30003       2016-04-18  2              B         15
30004       2014-04-18  3              C         22
30007       2016-09-07  3              D         30
40001       2015-01-09  2              A         40
40005       2016-02-12  3              A         10

Suppose you need to query this table and return the total order quantity for each employee and customer. 假设您需要查询此表并返回每个员工和客户的总订单数量。The following grouped query achieves this task:以下分组查询完成此任务:

SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid, custid;

This query generates the following output:此查询生成以下输出:

empid       custid    sumqty
----------- --------- -----------
2           A         52
3           A         20
1           B         20
2           B         27
1           C         34
3           C         22
3           D         30

Suppose, however, you have a requirement to produce the output in the form shown in Table 7-1.然而,假设您需要以Table 7-1所示的形式生成输出。

Image

TABLE 7-1 Pivoted view of total quantity per employee (on rows) and customer (on columns)每个员工(在行)和客户(在列)的总数量的数据透视视图

What you see in Table 7-1 is an aggregated and pivoted view of the data; the technique for generating this view is called pivoting.您在Table 7-1中看到的是数据的聚合和透视视图;生成此视图的技术称为pivoting(转秩)。

Every pivoting request involves three logical processing phases, each with associated elements:每个数据透视请求都涉及三个逻辑处理阶段,每个阶段都有相关元素:

1. A grouping phase with an associated grouping or on rows element具有关联分组或on rows元素的分组阶段

2. A spreading phase with an associated spreading or on cols element具有相关展开或on cols元素的展开阶段

3. An aggregation phase with an associated aggregation element and aggregate function具有关联聚合元素和聚合函数的聚合阶段

In this example, you need to produce a single row in the result for each unique employee ID. 在本例中,需要为每个唯一的员工ID在结果中生成一行。This means the rows from the dbo.Orders table need to be grouped by the empid attribute (the grouping element).这意味着dbo.Orders表中的行需要按empid属性(分组元素)分组。

The dbo.Orders table has a single column that holds all customer ID values and a single column that holds their ordered quantities. dbo.Orders表有一列保存所有客户ID值,还有一列保存其订购数量。The pivoting task requires you to produce a different result column for each unique customer ID, holding the aggregated quantities for that customer. 数据透视任务要求您为每个唯一的客户ID生成不同的结果列,其中包含该客户的聚合数量。You can think of this process as “spreading” the quantities by customer ID. 您可以将此过程视为按客户ID“分散”数量。The spreading element in this case is the custid attribute.本例中的展开元素是custid属性。

Finally, because pivoting involves grouping, you need to aggregate data to produce the result values in the “intersection” of the grouping (employee) and spreading (customer) elements. 最后,由于数据透视涉及分组,因此需要聚合数据,以在分组(employee)和展开(customer)元素的“交点”中生成结果值。You need to identify the aggregate function (SUM, in this case) and the aggregation element (the qty attribute, in this case).您需要标识聚合函数(本例中为SUM)和聚合元素(本例中为qty属性)。

To recap, pivoting involves grouping, spreading, and aggregating. 总而言之,数据透视涉及分组、展开和聚合。In this example, you group by empid, spread (quantities) by custid, and aggregate with SUM(qty). 在本例中,您按empid分组,按custid展开(数量),并用SUM(qty)聚合。After you identify the elements involved in pivoting, the rest is just a matter of incorporating those elements in the right places in a generic query template for pivoting.在确定了数据透视所涉及的元素之后,剩下的只是将这些元素合并到通用查询模板中正确的位置以进行数据透视。

This chapter presents two solutions for pivoting—one based on an explicit grouped query and another with a table operator called PIVOT.本章介绍两种数据透视解决方案,一种基于显式分组查询,另一种使用名为PIVOT的表运算符。

Pivoting with a grouped query使用分组查询进行数据透视

The solution using a grouped query handles all three phases in an explicit and straightforward manner.使用分组查询的解决方案以明确而直接的方式处理所有三个阶段。

The grouping phase is achieved with a GROUP BY clause—in this case, GROUP BY empid.分组阶段是通过GROUP BY子句实现的。在本例中,GROUP BY empid

The spreading phase is achieved in the SELECT clause with a CASE expression for each target column. 展开阶段在SELECT子句中实现,每个目标列都有一个CASE表达式。You need to know the spreading element values ahead of time and specify a separate expression for each. 您需要提前知道展开元素的值,并为每个值指定一个单独的表达式。Because in this case you need to “spread” the quantities of four customers (A, B, C, and D), there are four CASE expressions. 因为在本例中,您需要“分散”四个客户(A、B、C和D)的数量,所以有四个CASE表达式。For example, here's the CASE expression for customer A:例如,下面是客户A的CASE表达式:

CASE WHEN custid = 'A' THEN qty END

This expression returns the quantity from the current row only when the current row represents an order for customer A; otherwise, the expression returns a NULL. 仅当当前行表示客户A的订单时,此表达式才返回当前行中的数量;否则,表达式将返回NULLRemember that if an ELSE clause is not specified in a CASE expression, the implied default is ELSE NULL. 请记住,如果在CASE表达式中未指定ELSE子句,则隐含的默认值为ELSE NULLThis means that in the target column for customer A, only quantities associated with customer A appear as column values, and in all other cases the column values are NULL.这意味着在客户A的目标列中,只有与客户A关联的数量显示为列值,在所有其他情况下,列值均为NULL

If you don't know the spreading values ahead of time (the distinct customer IDs in this case), you need to query them from the data, construct the query string yourself by using an aggregate string concatenation method, and use dynamic SQL to execute it. 如果您事先不知道展开值(本例中是不同的客户ID),则需要从数据中查询它们,使用聚合字符串串联方法自己构造查询字符串,并使用动态SQL执行它。I'll demonstrate how to achieve this in Chapter 11, “Programmable objects.”我将在第11章可编程对象”中演示如何实现这一点

Finally, the aggregation phase is achieved by applying the relevant aggregate function (SUM, in this case) to the result of each CASE expression. 最后,通过将相关的聚合函数(在本例中为SUM)应用于每个CASE表达式的结果来实现聚合阶段。For example, here's the expression that produces the result column for customer A:例如,下面是为客户A生成结果列的表达式:

SUM(CASE WHEN custid = 'A' THEN qty END) AS A

Of course, depending on the request, you might need to use another aggregate function (such as MAX, MIN, or COUNT).当然,根据请求的不同,您可能需要使用另一个聚合函数(如MAXMINCOUNT)。

Here's the complete solution query that pivots order data, returning the total quantity for each employee (on rows) and customer (on columns):这是一个完整的解决方案查询,它以订单数据为中心,返回每个员工(在行)和客户(在列)的总数量:

SELECT empid,
  SUM(CASE WHEN custid = 'A' THEN qty END) AS A,
  SUM(CASE WHEN custid = 'B' THEN qty END) AS B,
  SUM(CASE WHEN custid = 'C' THEN qty END) AS C,
  SUM(CASE WHEN custid = 'D' THEN qty END) AS D
FROM dbo.Orders
GROUP BY empid;

This query produces the output shown earlier in Table 7-1. 此查询产生的输出如Table 7-1所示。SQL Server also generates the following warning in the Messages pane of SSMS:SQL Server还在SSMS的消息窗格中生成以下警告:

Warning: Null value is eliminated by an aggregate or other SET operation.

The warning notifies you that NULLs are ignored by an aggregate function, as we discussed earlier.该警告通知您,聚合函数会忽略NULL,正如我们前面讨论的那样。

Pivoting with the PIVOT operator使用PIVOT运算符旋转

The solution for pivoting based on an explicit grouped query is standard. 基于显式分组查询的数据透视解决方案是标准的。T-SQL also supports a proprietary table operator called PIVOT that you can use to achieve pivoting in a more concise manner. T-SQL还支持一个名为PIVOT的专有表运算符,您可以使用它以更简洁的方式实现数据透视。As a table operator, PIVOT operates in the context of the FROM clause like any other table operator (for example, JOIN). 作为一个表运算符,PIVOTFROM子句的上下文中进行操作,就像任何其他表运算符(例如JOIN)一样。It operates on the source table or table expression provided to it as its left input, pivots the data, and returns a result table. 它对作为其左输入提供给它的源表或表表达式进行操作,旋转数据并返回结果表。The PIVOT operator involves the same logical processing phases as described earlier (grouping, spreading, and aggregating), only it requires less code than the previous solution.PIVOT运算符涉及与前面描述的相同的逻辑处理阶段(分组、展开和聚合),只是它需要的代码比之前的解决方案少。

The general form of a query with the PIVOT operator is shown here:使用PIVOT运算符的查询的一般形式如下所示:

SELECT ...
FROM <input_table>
  PIVOT(<agg_function>(<aggregation_element>)
          FOR <spreading_element> IN (<list_of_target_columns>)) AS <result_table_alias>
WHERE ...;

In the parentheses of the PIVOT operator, you specify the aggregate function (SUM, in this example), aggregation element (qty), spreading element (custid), and the list of target column names (A, B, C, D). PIVOT运算符的括号中,指定聚合函数(本例中为SUM)、聚合元素(qty)、展开元素(custid)和目标列名列表(ABCD)。Following the parentheses of the PIVOT operator, you specify an alias for the result table.PIVOT运算符的括号之后,可以为结果表指定别名。

Note that with the PIVOT operator, you do not explicitly specify the grouping elements, removing the need for GROUP BY in the query. 请注意,使用PIVOT运算符,您不会显式指定分组元素,从而消除了查询中对GROUP BY的需要。The PIVOT operator figures out the grouping elements implicitly by elimination. PIVOT运算符通过消去隐式地计算出分组元素。The grouping elements are all attributes from the source table that were not specified as either the spreading element or the aggregation element. 分组元素是源表中未指定为扩展元素或聚合元素的所有属性。You must ensure that the source table for the PIVOT operator has no attributes other than the grouping, spreading, and aggregation elements so that the implied selection of the grouping elements will be what you want. 您必须确保PIVOT运算符的源表除了分组、扩展和聚合元素之外没有其他属性,以便分组元素的隐含选择将是您想要的。You achieve this by using a table expression that includes only the attributes you need as the input of the operator. 通过使用仅包含作为运算符输入所需的属性的表表达式,可以实现这一点。For example, here's the solution query to the original pivoting request, using the PIVOT operator:例如,以下是使用PIVOT运算符对原始透视请求的解决方案查询:

SELECT empid, A, B, C, D
FROM (SELECT empid, custid, qty
      FROM dbo.Orders) AS D
  PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;

Instead of operating directly on the dbo.Orders table, the PIVOT operator operates on a derived table called D that includes only the pivoting elements empid, custid, and qty. PIVOT运算符不是直接对dbo.Orders表进行操作,而是对一个名为D的派生表进行操作,该派生表只包括数据透视元素empidcustidqtyWhen you account for the spreading element, which is custid, and the aggregation element, which is qty, what's left is the implied grouping element empid.当您考虑展开元素(custid)和聚合元素(qty)时,剩下的是隐含的分组元素empid

This query returns the output shown earlier in Table 7-1.此查询返回Table 7-1中前面显示的输出。

To understand why you're required to use a table expression here, consider the following query that applies the PIVOT operator directly to the dbo.Orders table:要理解为什么需要在这里使用表表达式,请考虑以下将PIVOT运算符直接应用于dbo.Orders表的查询:

SELECT empid, A, B, C, D
FROM dbo.Orders
  PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;

The dbo.Orders table contains the attributes orderid, orderdate, empid, custid, and qty. dbo.Orders表包含属性orderidorderdateempidcustidqtyBecause the query specifies custid as the spreading element and qty as the aggregation element, the remaining attributes (orderid, orderdate, and empid) are all considered the grouping elements. 因为查询指定custid作为扩展元素,qty作为聚合元素,所以其余属性(orderidorderdateempid)都被视为分组元素。This query, therefore, returns the following output:因此,此查询返回以下输出:

empid       A           B           C           D
----------- ----------- ----------- ----------- -----------
2           12          NULL        NULL        NULL
1           NULL        20          NULL        NULL
1           NULL        NULL        14          NULL
2           NULL        12          NULL        NULL
1           NULL        NULL        20          NULL
3           10          NULL        NULL        NULL
2           NULL        15          NULL        NULL
3           NULL        NULL        22          NULL
3           NULL        NULL        NULL        30
2           40          NULL        NULL        NULL
3           10          NULL        NULL        NULL

(11 row(s) affected)

Because orderid is part of the grouping elements, you get a row for each order instead of a row for each employee. 因为orderid是分组元素的一部分,所以您可以为每个订单获取一行,而不是为每个员工获取一行。The logical equivalent of this query that uses the standard solution for pivoting has orderid, orderdate, and empid listed in the GROUP BY list as follows:使用标准解决方案进行数据透视的此查询的逻辑等效项在GROUP BY列表中列出了orderidorderdateempid,如下所示:

SELECT empid,
  SUM(CASE WHEN custid = 'A' THEN qty END) AS A,
  SUM(CASE WHEN custid = 'B' THEN qty END) AS B,
  SUM(CASE WHEN custid = 'C' THEN qty END) AS C,
  SUM(CASE WHEN custid = 'D' THEN qty END) AS D
FROM dbo.Orders
GROUP BY orderid, orderdate, empid;

As a best practice with the PIVOT operator, you should always work with a table expression and not query the underlying table directly. 作为PIVOT运算符的最佳实践,您应该始终使用表表达式,而不是直接查询基础表。Even if currently the table contains only the columns that are supposed to take part in pivoting, you might add columns to the table in the future, rendering your queries incorrect. 即使当前表中只包含应该参与数据透视的列,将来也可能会向表中添加列,从而导致查询不正确。It is also important to enumerate the columns both in the table expression's inner query and in the outer query.在表表达式的内部查询和外部查询中枚举列也很重要。

As another example of a pivoting request, suppose that instead of returning employees on rows and customers on columns, you want it the other way around: the grouping element is custid, the spreading element is empid, and the aggregation element and aggregate function remain SUM(qty). 作为数据透视请求的另一个示例,假设不返回行中的员工和列中的客户,而是以另一种方式返回:分组元素是custid,扩展元素是empid,聚合元素和聚合函数保持SUM(qty)After you learn the “template” for a pivoting solution (with the grouped query or with the PIVOT operator), it's just a matter of fitting those elements in the right places. 学习了数据透视解决方案的“模板”(使用分组查询或使用PIVOT运算符)后,只需将这些元素拟合到正确的位置即可。The following solution query uses the PIVOT operator to achieve the result:以下解决方案查询使用PIVOT运算符获得结果:

SELECT custid, [1], [2], [3]
FROM (SELECT empid, custid, qty
      FROM dbo.Orders) AS D
  PIVOT(SUM(qty) FOR empid IN([1], [2], [3])) AS P;

The employee IDs 1, 2, and 3 are values in the empid column in the source table, but in terms of the result, these values become target column names. 员工ID 1、2和3是源表中empid列中的值,但就结果而言,这些值成为目标列名。You must refer to them as identifiers in the IN clause. 您必须在IN子句中将它们作为标识符引用。When identifiers are irregular (for example, when they start with a digit), you need to delimit them—hence, the use of square brackets.当标识符不规则时(例如,当标识符以数字开头时),需要对其进行分隔,因此需要使用方括号。

This query returns the following output:此查询返回以下输出:

custid    1           2           3
--------- ----------- ----------- -----------
A         NULL        52          20
B         20          27          NULL
C         34          NULL        22
D         NULL        NULL        30

Unpivoting data取消激活数据

Unpivoting is a technique that rotates data from a state of columns to a state of rows. 是一种将数据从列状态旋转到行状态的技术。Usually, it involves querying a pivoted state of the data and producing from each source row multiple result rows, each with a different source column value. 通常,它涉及查询数据的数据透视状态,并从每个源行生成多个结果行,每个结果行具有不同的源列值。A common use case is to unpivot data you imported from a spreadsheet into the database for easier manipulation.一个常见的用例是将从电子表格导入的数据解压到数据库中,以便于操作。

Run the following code to create and populate a table called EmpCustOrders in the TSQLV4 sample database. 运行以下代码在TSQLV4示例数据库中创建并填充名为EmpCustOrders的表。I'll use this table to demonstrate unpivoting techniques:我将使用此表演示取消激励的技巧:

USE TSQLV4;

DROP TABLE IF EXISTS dbo.EmpCustOrders;

CREATE TABLE dbo.EmpCustOrders
(
  empid INT NOT NULL
    CONSTRAINT PK_EmpCustOrders PRIMARY KEY,
  A VARCHAR(5) NULL,
  B VARCHAR(5) NULL,
  C VARCHAR(5) NULL,
  D VARCHAR(5) NULL
);

INSERT INTO dbo.EmpCustOrders(empid, A, B, C, D)
  SELECT empid, A, B, C, D
  FROM (SELECT empid, custid, qty
        FROM dbo.Orders) AS D
    PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;

SELECT * FROM dbo.EmpCustOrders;

Here's the output of the query against EmpCustOrders showing its contents:以下是针对EmpCustOrders的查询输出,显示了其内容:

empid       A           B           C           D
----------- ----------- ----------- ----------- -----------
1           NULL        20          34          NULL
2           52          27          NULL        NULL
3           20          NULL        22          30

The table has a row for each employee; a column for each of the four customers A, B, C, and D; and the order quantity for each employee and customer. 该表为每个员工都有一行;四个客户ABCD各一列;以及每个员工和客户的订单数量。Notice that irrelevant intersections (employee-customer combinations that had no intersecting order activity) are represented by NULLs. 请注意,不相关的交叉点(没有交叉订单活动的员工-客户组合)由Null表示。Suppose you get a request to unpivot the data, requiring you to return a row for each employee and customer, along with the order quantity. 假设您收到一个请求,要求您为每个员工和客户返回一行数据以及订单数量。The result should look like this:结果应如下所示:

empid       custid    qty
----------- --------- -----------
1           B         20
1           C         34
2           A         52
2           B         27
3           A         20
3           C         22
3           D         30

In the following sections, I'll discuss two techniques for solving this problem—one using the APPLY operator and another using the UNPIVOT operator.在下面的部分中,我将讨论解决此问题的两种技术,一种是使用APPLY运算符,另一种是使用UNPIVOT运算符。

Unpivoting with the APPLY operator使用APPLY运算符取消激活

Unpivoting involves three logical processing phases: producing copies, extracting values, and eliminating irrelevant rows.取消激活涉及三个逻辑处理阶段:生成副本、提取值和消除不相关的行。

The first step in the solution involves producing multiple copies of each source row—one for each column you need to unpivot. 解决方案中的第一步涉及到为每个源行生成多个副本,每个源行对应一个需要取消归档的列。In this case, you need to produce a copy for each of the columns A, B, C, and D, which represent customer IDs. 在这种情况下,您需要为代表客户ID的ABCD列中的每一列生成一个副本。You can achieve this step by applying a cross join between the EmpCustOrders table and a table that has a row for each customer.可以通过在EmpCustOrders表和每个客户都有一行的表之间应用交叉联接来实现这一步骤。

If you already have a table of customers in your database, you can use that table in the cross join. 如果数据库中已有客户表,则可以在交叉联接中使用该表。If you don't have a table of customers, you can build a virtual one on the fly using a table-value constructor based on the VALUES clause. 如果没有客户表,可以使用基于VALUES子句的表值构造函数动态构建虚拟客户表。Here's the code that implements this step:以下是实现此步骤的代码:

SELECT *
FROM dbo.EmpCustOrders
  CROSS JOIN (VALUES('A'),('B'),('C'),('D')) AS C(custid);

The VALUES clause defines a set of four rows, each with a single customer ID value. VALUES子句定义了一组四行,每行有一个客户ID值。The code defines a derived table called C based on this clause and names the only column in it custid. 代码基于此子句定义了一个名为C的派生表,并将其中唯一的列命名为custidThe code then applies a cross join between EmpCustOrders and C.然后,代码在EmpCustOrdersC之间应用交叉连接。


Image Note

If you're not familiar yet with the VALUES clause, it's described in detail in Chapter 8, “Data modification.”如果您还不熟悉VALUES子句,请参阅第8章数据修改”中的详细说明


In this example, the query that implements the first step in the solution returns the following output:在本例中,实现解决方案中第一步的查询返回以下输出:

empid       A           B           C           D           custid
----------- ----------- ----------- ----------- ----------- ------
1           NULL        20          34          NULL        A
1           NULL        20          34          NULL        B
1           NULL        20          34          NULL        C
1           NULL        20          34          NULL        D
2           52          27          NULL        NULL        A
2           52          27          NULL        NULL        B
2           52          27          NULL        NULL        C
2           52          27          NULL        NULL        D
3           20          NULL        22          30          A
3           20          NULL        22          30          B
3           20          NULL        22          30          C
3           20          NULL        22          30          D

As you can see, four copies were produced for each source row—one each for customers A, B, C, and D.如您所见,为每个源行生成了四个副本,分别为客户ABCD生成一个副本。

The second step in the solution is to extract a value from one of the original customer quantity columns (A, B, C, or D) to return a single value column (call it qty in our case). 解决方案中的第二步是从一个原始客户数量列(ABCD)中提取一个值,以返回一个值列(在本例中称为qty)。You need to extract the value from the column that corresponds to the current custid value. 您需要从与当前custid值相对应的列中提取值。If custid is 'A', the qty column should return the value from column A, if custid is 'B', qty should return the value from column B, and so on. 如果custid'A',则qty列应返回A列的值,如果custid'B',则qty应返回B列的值,依此类推。To achieve this step, you might think you can simply add the qty column as a second column to each row in the table value constructor (the VALUES clause), like this:要实现此步骤,您可能认为可以简单地将qty列作为第二列添加到表值构造函数(VALUES子句)的每一行中,如下所示:

SELECT empid, custid, qty
FROM dbo.EmpCustOrders
  CROSS JOIN (VALUES('A', A),('B', B),('C', C),('D', D)) AS C(custid, qty);

However, remember that a join treats its two inputs as a set; hence, there's no order between those inputs. 但是,请记住,联接将其两个输入视为一个集合;因此,这些输入之间没有顺序。You can't refer to the elements of either of the inputs when constructing the other. 在构造另一个输入时,不能引用其中任何一个输入的元素。In our case, the table-value constructor on the right side of the join has references to the columns A, B, C, and D from the left side of the join (EmpCustOrders). 在示例中,联接右侧的表值构造函数引用了联接左侧(EmpCustOrders)的列ABCDConsequently, when you try to run this code, you get the following errors:因此,当您尝试运行此代码时,会出现以下错误:

Msg 207, Level 16, State 1, Line 222
Invalid column name 'A'.
Msg 207, Level 16, State 1, Line 222
Invalid column name 'B'.
Msg 207, Level 16, State 1, Line 222
Invalid column name 'C'.
Msg 207, Level 16, State 1, Line 222
Invalid column name 'D'.

The solution is to use the CROSS APPLY operator instead of the CROSS JOIN operator. 解决方案是使用CROSS APPLY运算符而不是CROSS JOIN运算符。They are similar, but the former evaluates the left side first and then applies the right side to each left row, making the left side's elements accessible to the right side. 它们类似,但前者首先计算左侧,然后将右侧应用于每一行,从而使左侧的元素可供右侧访问。Here's the code implementing this step with the CROSS APPLY operator:下面是使用CROSS APPLY运算符实现此步骤的代码:

SELECT empid, custid, qty
FROM dbo.EmpCustOrders
  CROSS APPLY (VALUES('A', A),('B', B),('C', C),('D', D)) AS C(custid, qty);

This query runs successfully, returning the following output:此查询成功运行,返回以下输出:

empid       custid    qty
----------- --------- -----------
1           A         NULL
1           B         20
1           C         34
1           D         NULL
2           A         52
2           B         27
2           C         NULL
2           D         NULL
3           A         20
3           B         NULL
3           C         22
3           D         30

Recall that in the original table NULLs represent irrelevant intersections. 回想一下,在原始表中,NULL值表示不相关的交点。In the result, there's typically no reason to keep irrelevant rows where qty is NULL. 因此,通常没有理由在qtyNULL的情况下保留不相关的行。The nice thing in our case is that the CROSS APPLY operator, which created the column qty, was processed in the FROM clause, and the FROM clause is evaluated before the WHERE clause. 在例子中,好的方面是创建列qtyCROSS APPLY运算符在FROM子句中处理,FROM子句在WHERE子句之前求值。This means that the qty column is accessible to expressions in the WHERE clause. 这意味着WHERE子句中的表达式可以访问qty列。To remove irrelevant rows, add a filter in the WHERE clause that discards rows with a NULL in the qty column, like this:要删除不相关的行,请在WHERE子句中添加一个筛选器,该筛选器将丢弃qty列中为NULL的行,如下所示:

SELECT empid, custid, qty
FROM dbo.EmpCustOrders
  CROSS APPLY (VALUES('A', A),('B', B),('C', C),('D', D)) AS C(custid, qty)
WHERE qty IS NOT NULL;

This query returns the following output:此查询返回以下输出:

empid       custid    qty
----------- --------- -----------
1           B         20
1           C         34
2           A         52
2           B         27
3           A         20
3           C         22
3           D         30

Unpivoting with the UNPIVOT operator使用UNPIVOT运算符取消透视

Unpivoting data involves producing two result columns from any number of source columns—one to hold the source column names as strings and another to hold the source column values. 取消激活数据涉及从任意数量的源列中生成两个结果列,一个用于将源列名作为字符串保存,另一个用于保存源列值。In this example, you need to unpivot the source columns A, B, C, and D, producing the result names column custid and values column qty. 在本例中,需要取消对源列ABCD的透视,生成结果名称列custid和值列qtySimilar to the PIVOT operator, T-SQL also supports the UNPIVOT operator to enable you to unpivot data. PIVOT运算符类似,T-SQL还支持UNPIVOT运算符,使您能够取消透视数据。The general form of a query with the UNPIVOT operator is as follows:使用UNPIVOT运算符的查询的一般形式如下所示:

SELECT ...
FROM <input_table>
  UNPIVOT(<values_column> FOR <names_column> IN(<source_columns>)) AS <result_table_alias>
WHERE ...;

Like the PIVOT operator, UNPIVOT was also implemented as a table operator in the context of the FROM clause. PIVOT运算符一样,UNPIVOT也是在FROM子句的上下文中作为表运算符实现的。It operates on a source table or table expression (EmpCustOrders in this case). 它对源表或表表达式(本例中为EmpCustOrders)进行操作。Within the parentheses of the UNPIVOT operator, you specify the name you want to assign to the column that will hold the source-column values (qty here), the name you want to assign to the column that will hold the source-column names (custid), and the list of source-column names (A, B, C, and D). UNPIVOT运算符的括号内,指定要分配给将保存源列值的列的名称(此处为qty)、要分配给将保存源列名称(custid)的列的名称以及源列名称列表(ABCD)。Following the parentheses, you provide an alias to the table resulting from the table operator.在括号后面,您可以为表运算符生成的表提供别名。

Here's the query that uses the UNPIVOT operator to handle our unpivoting task:以下是使用UNPIVOT运算符处理反透视任务的查询:

SELECT empid, custid, qty
FROM dbo.EmpCustOrders
  UNPIVOT(qty FOR custid IN(A, B, C, D)) AS U;

Note that the UNPIVOT operator implements the same logical-processing phases described earlier—generating copies, extracting elements, and eliminating NULL intersections. 请注意,UNPIVOT运算符实现了前面描述的相同逻辑处理阶段,即生成副本、提取元素和消除NULL交点。However, the last phase is not optional as in the solution with the APPLY operator. 但是,最后一个阶段不是可选的,因为在使用APPLY运算符的解决方案中是可选的。When you need to apply the third phase, it's convenient to use the solution with the UNPIVOT operator because it's more concise. 当需要应用第三阶段时,可以方便地将解与UNPIVOT运算符一起使用,因为它更简洁。When you need to keep the rows with the NULLs, use the solution with the APPLY operator.当需要将行保留为Null时,请使用带有APPLY运算符的解决方案。

When you're done, run the following code for cleanup:完成后,运行以下代码进行清理:

DROP TABLE IF EXISTS dbo.EmpCustOrders;

Grouping sets分组集合

This section describes what a grouping set is and the features in T-SQL that support grouping sets.本节介绍什么是分组集以及T-SQL中支持分组集的功能。

A grouping set is a set of expressions you group the data by in a grouped query (a query with a GROUP BY clause). “分组集”是在分组查询(具有GROUP BY子句的查询)中对数据分组所依据的一组表达式。The reason for using the term “set” here is that there's no significance to the order in which you specify the expressions in the GROUP BY clause. 这里使用术语“set”的原因是,在GROUP BY子句中指定表达式的顺序没有意义。Traditionally in SQL, a single grouped query defines a single grouping set. 传统上,在SQL中,单个分组查询定义单个分组集。For example, each of the following four queries defines a single grouping set:例如,以下四个查询中的每一个都定义了一个分组集:

SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid, custid;

SELECT empid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid;

SELECT custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY custid;

SELECT SUM(qty) AS sumqty
FROM dbo.Orders;

The first query defines the grouping set (empid, custid); the second (empid), the third (custid), and the last query defines what's known as the empty grouping set, (). 第一个查询定义分组集(empid, custid);第二个查询(empid)、第三个查询(custid)和最后一个查询定义了所谓的空分组集()This code returns four result sets—one for each of the four queries.这段代码为四个查询中的每一个返回四个结果集。

Suppose, for reporting purposes, that instead of wanting four separate result sets returned, you want a single unified result set. 假设出于报告目的,您不希望返回四个单独的结果集,而是希望返回一个统一的结果集。You can achieve this by using the UNION ALL operator between the queries, after planting NULLs as placeholders for columns that appear in one query but not others. 可以通过在查询之间使用UNION ALL运算符来实现这一点,方法是将Null作为一个查询中出现的列(而不是其他查询中出现的列)的占位符。Here's what the code looks like:代码如下所示:

SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid, custid

UNION ALL

SELECT empid, NULL, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid

UNION ALL

SELECT NULL, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY custid

UNION ALL

SELECT NULL, NULL, SUM(qty) AS sumqty
FROM dbo.Orders;

This code generates a single result set with the aggregates for all four groupings:此代码生成一个结果集,其中包含所有四个分组的聚合:

empid       custid    sumqty
----------- --------- -----------
2           A         52
3           A         20
1           B         20
2           B         27
1           C         34
3           C         22
3           D         30
1           NULL      54
2           NULL      79
3           NULL      72
NULL        A         72
NULL        B         47
NULL        C         56
NULL        D         30
NULL        NULL      205

(15 row(s) affected)

Even though you managed to get what you were after, this solution has two main problems—the length of the code and performance. 尽管您设法获得了想要的东西,但此解决方案有两个主要问题:代码长度和性能。It's long because you have a separate query for each grouping set. 它很长,因为每个分组集都有一个单独的查询。Also, SQL Server needs to apply a separate scanning of the data for each query.此外,SQL Server需要对每个查询应用单独的数据扫描。

T-SQL supports standard features you can use to define multiple grouping sets in the same query. T-SQL支持可用于在同一查询中定义多个分组集的标准功能。Those are the GROUPING SETS, CUBE, and ROLLUP subclauses of the GROUP BY clause, and the GROUPING and GROUPING_ID functions. 这些是GROUP BY子句的GROUPING SETSCUBEROLLUP亚子句,以及GROUPING函数和GROUPING_ID函数。The main use cases are reporting and data analysis. 主要用例是报告和数据分析。These features usually need the presentation layer to use more sophisticated GUI controls to display the data than the typical grid control with its columns and rows. 与具有列和行的典型网格控件相比,这些功能通常需要表示层使用更复杂的GUI控件来显示数据。But this book's focus is the T-SQL code in the database and not the presentation layer.但本书的重点是数据库中的T-SQL代码,而不是表示层。

The GROUPING SETS subclauseGROUPING SETS亚子句

The GROUPING SETS subclause is a powerful enhancement to the GROUP BY clause. GROUPING SETS亚子句是GROUP BY子句的一个强大增强。You can use it to define multiple grouping sets in the same query. 可以使用它在同一查询中定义多个分组集。Simply list the grouping sets you want, separated by commas within the parentheses of the GROUPING SETS subclause, and for each grouping set list the members, separated by commas, within parentheses. 只需列出所需的分组集,在GROUPING SETS亚子句的括号内用逗号分隔,并为每个分组集列出成员,在括号内用逗号分隔。For example, the following query defines four grouping sets: (empid, custid), (empid), (custid), and ():例如,以下查询定义了四个分组集:(empid, custid)(empid)(custid)()

SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY
  GROUPING SETS
  (
    (empid, custid),
    (empid),
    (custid),
    ()
  );

The last grouping set is the empty grouping set representing the grand total. 最后一个分组集是表示总计的空分组集。This query is a logical equivalent of the previous solution that unified the result sets of four aggregate queries. 此查询在逻辑上等效于以前的解决方案,该解决方案统一了四个聚合查询的结果集。Only this one is much shorter, plus it gets optimized better. 只有这一个短得多,而且它得到了更好的优化。SQL Server typically needs fewer scans of the data than the number of grouping sets because it can roll up aggregates internally.SQL Server通常需要的数据扫描比分组集的数量少,因为它可以在内部汇总聚合。

The CUBE subclauseCUBE亚子句

The CUBE subclause of the GROUP BY clause provides an abbreviated way to define multiple grouping sets. GROUP BY子句的CUBE亚子句提供了定义多个分组集的简化方法。In the parentheses of the CUBE subclause, you provide a set of members separated by commas, and you get all possible grouping sets that can be defined based on the input members. CUBE亚子句的括号中,您提供了一组由逗号分隔的成员,并且可以得到所有可能的分组集,这些分组集可以根据输入成员定义。For example, CUBE(a, b, c) is equivalent to GROUPING SETS( (a, b, c), (a, b), (a, c), (b, c), (a), (b), (c), () ). 例如,CUBE(a, b, c)等价于GROUPING SETS( (a, b, c), (a, b), (a, c), (b, c), (a), (b), (c), () )In set theory, the set of all subsets of elements that can be produced from a particular set is called the power set. 在集合论中,可以从特定集合生成的所有元素子集的集合称为“幂集”。You can think of the CUBE subclause as producing the power set of grouping sets that can be formed from the given set of elements.可以将CUBE亚子句视为生成分组集的幂集,分组集可以由给定的元素集形成。

Instead of using the GROUPING SETS subclause in the previous query to define the four grouping sets (empid, custid), (empid), (custid), and (), you can simply use CUBE(empid, custid). 不必在前面的查询中使用GROUPING SETS亚子句来定义四个分组集(empid, custid)(empid)(custid)(),只需使用CUBE(empid, custid)Here's the complete query:以下是完整的查询:

SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY CUBE(empid, custid);

The ROLLUP subclauseROLLUP亚子句

The ROLLUP subclause of the GROUP BY clause also provides an abbreviated way to define multiple grouping sets. GROUP BY子句的ROLLUP亚子句还提供了定义多个分组集的简化方法。However, unlike the CUBE subclause, ROLLUP doesn't produce all possible grouping sets. 然而,与CUBE亚子句不同,ROLLUP不会生成所有可能的分组集。ROLLUP assumes a hierarchy among the input members and produces only grouping sets that form leading combinations of the input members. ROLLUP在输入成员之间采用层次结构,只生成构成输入成员前导组合的分组集。For example, whereas CUBE(a, b, c) produces all eight possible grouping sets, ROLLUP(a, b, c) produces only four based on the hierarchy a>b>c. 例如,虽然CUBE(a, b, c)生成所有八个可能的分组集,但ROLLUP(a, b, c)仅基于层次结构a>b>c生成四个分组集。It is the equivalent of specifying GROUPING SETS( (a, b, c), (a, b), (a), () ).它相当于指定GROUPING SETS( (a, b, c), (a, b), (a), () )

For example, suppose you want to return total quantities for all grouping sets that can be defined based on the time hierarchy of order year, order month, order day. 例如,假设要返回所有分组集的总数量,这些分组集可以根据订单年、订单月、订单日的时间层次结构定义。You can use the GROUPING SETS subclause and explicitly list all four possible grouping sets:您可以使用GROUPING SETS亚子句并显式列出所有四个可能的分组集:

GROUPING SETS(
  (YEAR(orderdate), MONTH(orderdate), DAY(orderdate)),
  (YEAR(orderdate), MONTH(orderdate)),
  (YEAR(orderdate)),
  () )

The logical equivalent that uses the ROLLUP subclause is much more concise:使用ROLLUP亚子句的逻辑等价物要简洁得多:

ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate))

Here's the complete query you need to run:以下是您需要运行的完整查询:

SELECT
  YEAR(orderdate) AS orderyear,
  MONTH(orderdate) AS ordermonth,
  DAY(orderdate) AS orderday,
  SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate));

This query produces the following output:此查询生成以下输出:

orderyear   ordermonth     orderday    sumqty
----------- -------------- ----------- -----------
2014        4              18          22
2014        4              NULL        22
2014        8              2           10
2014        8              NULL        10
2014        12             24          32
2014        12             NULL        32
2014        NULL           NULL        64
2015        1              9           40
2015        1              18          14
2015        1              NULL        54
2015        2              12          12
2015        2              NULL        12
2015        NULL           NULL        66
2016        2              12          10
2016        2              16          20
2016        2              NULL        30
2016        4              18          15
2016        4              NULL        15
2016        9              7           30
2016        9              NULL        30
2016        NULL           NULL        75
NULL        NULL           NULL        205

The GROUPING and GROUPING_ID functionsGROUPINGGROUPING_ID函数

When you have a single query that defines multiple grouping sets, you might need to associate result rows and grouping sets. 当您有一个定义多个分组集的查询时,可能需要关联结果行和分组集。As long as all grouping elements are defined as NOT NULL, this is easy. 只要所有分组元素都定义为NOT NULL,这就很容易。For example, consider the following query:例如,考虑以下查询:

SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY CUBE(empid, custid);

This query produces the following output:此查询生成以下输出:

empid       custid    sumqty
----------- --------- -----------
2           A         52
3           A         20
NULL        A         72
1           B         20
2           B         27
NULL        B         47
1           C         34
3           C         22
NULL        C         56
3           D         30
NULL        D         30
NULL        NULL      205
1           NULL      54
2           NULL      79
3           NULL      72

Because both the empid and custid columns were defined in the dbo.Orders table as NOT NULL, a NULL in those columns can only represent a placeholder, indicating that the column did not participate in the current grouping set. 由于empidcustid列在dbo.Orders表中都定义为NOT NULL,因此这些列中的NULL只能表示占位符,表示该列未参与当前分组集。For example, all rows in which empid is not NULL and custid is not NULL are associated with the grouping set (empid, custid). 例如,empid不为NULLcustid不为NULL的所有行都与分组集(empid, custid)关联。All rows in which empid is not NULL and custid is NULL are associated with the grouping set (empid), and so on.empid不为NULLcustidNULL的所有行都与分组集(empid)相关联,依此类推。

However, if a grouping column allows NULLs in the table, you cannot tell for sure whether a NULL in the result set originated from the data or is a placeholder for a nonparticipating member in a grouping set. 但是,如果分组列在表中允许NULL值,则无法确定结果集中的NULL值是源于数据还是分组集中非参与成员的占位符。One way to solve this problem is to use the GROUPING function. 解决此问题的一种方法是使用GROUPING函数。This function accepts a name of a column and returns 0 if it is a member of the current grouping set (a detail element) and 1 otherwise (an aggregate element).此函数接受列的名称,如果它是当前分组集的成员(详细信息元素),则返回0,否则返回1(聚合元素)。


Image Note

I find it counterintuitive that the GROUPING function returns 1 when the element isn't part of the grouping set and 0 when it is. 我发现,当元素不是分组集的一部分时,GROUPING函数返回1,当元素是分组集的一部分时返回0,这与直觉相反。To me, it would make more sense to return 1 when the element is part of the grouping set and 0 otherwise. 对我来说,当元素是分组集的一部分时,返回1、否则返回0更有意义。The current perspective is to use 1 to indicate the element is an aggregate element and 0 when it's a detail element. 当前透视图使用1表示元素是聚合元素,使用0表示元素是细节元素。You just need to make sure you realize this fact.你只需要确保你意识到这个事实。


For example, the following query invokes the GROUPING function for each of the grouping elements:例如,以下查询为每个分组元素调用GROUPING函数:

SELECT
  GROUPING(empid) AS grpemp,
  GROUPING(custid) AS grpcust,
  empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY CUBE(empid, custid);

This query returns the following output:此查询返回以下输出:

grpemp    grpcust    empid       custid    sumqty
--------- ---------- ----------- --------- -----------
0         0          2           A         52
0         0          3           A         20
1         0          NULL        A         72
0         0          1           B         20
0         0          2           B         27
1         0          NULL        B         47
0         0          1           C         34
0         0          3           C         22
1         0          NULL        C         56
0         0          3           D         30
1         0          NULL        D         30
1         1          NULL        NULL      205
0         1          1           NULL      54
0         1          2           NULL      79
0         1          3           NULL      72

(15 row(s) affected)

Now you don't need to rely on the NULLs anymore to figure out the association between result rows and grouping sets. 现在,您不再需要依赖NULL来确定结果行和分组集之间的关联。For example, all rows in which grpemp is 0 and grpcust is 0 are associated with the grouping set (empid, custid). 例如,grpemp0grpcust0的所有行都与分组集(empid, custid)关联。All rows in which grpemp is 0 and grpcust is 1 are associated with the grouping set (empid), and so on.grpemp0grpcust1的所有行都与分组集(empid)相关联,依此类推。

T-SQL supports another function, called GROUPING_ID, that can further simplify the process of associating result rows and grouping sets. T-SQL支持另一个名为GROUPING_ID的函数,它可以进一步简化关联结果行和分组集的过程。You provide the function with all elements that participate in any grouping set as inputs—for example, GROUPING_ID(a, b, c, d)—and the function returns an integer bitmap in which each bit represents a different input element—the rightmost element represented by the rightmost bit. 您为函数提供了参与任何分组集的所有元素作为输入,例如GROUPING_ID(a, b, c, d),函数返回一个整数位图,其中每个位表示不同的输入元素最右边的元素由最右边的位表示。For example, the grouping set (a, b, c, d) is represented by the integer 0 (0×8 + 0×4 + 0×2 + 0×1). 例如,分组集(a, b, c, d)由整数0(0×8 + 0×4 + 0×2 + 0×1)表示。The grouping set (a, c) is represented by the integer 5 (0×8 + 1×4 + 0×2 + 1×1), and so on.分组集(a, c)由整数5(0×8 + 1×4 + 0×2 + 1×1)表示,依此类推。

Instead of calling the GROUPING function for each grouping element as in the previous query, you can call the GROUPING_ID function once and provide it with all grouping elements as input, as shown here:您可以一次性调用GROUPING_ID函数并将所有分组元素作为输入提供给它,而不是像前面的查询那样为每个分组元素调用GROUPING函数,如下所示:

SELECT
  GROUPING_ID(empid, custid) AS groupingset,
  empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY CUBE(empid, custid);

This query produces the following output:此查询生成以下输出:

groupingset    empid       custid    sumqty
-------------- ----------- --------- -----------
0              2           A         52
0              3           A         20
2              NULL        A         72
0              1           B         20
0              2           B         27
2              NULL        B         47
0              1           C         34
0              3           C         22
2              NULL        C         56
0              3           D         30
2              NULL        D         30
3              NULL        NULL      205
1              1           NULL      54
1              2           NULL      79
1              3           NULL      72

Now you can easily figure out which grouping set each row is associated with. 现在,您可以很容易地找出每行与哪个分组集相关联。The integer 0 (binary 00) represents the grouping set (empid, custid); the integer 1 (binary 01) represents (empid); the integer 2 (binary 10) represents (custid); and the integer 3 (binary 11) represents ().整数0(二进制00)表示分组集(empid, custid);整数1(二进制01)表示(empid);整数2(二进制10)表示(custid);整数3(二进制11)表示()

Conclusion结论

This chapter covered window functions, pivoting and unpivoting data, and features related to grouping sets.本章介绍了窗口函数、数据透视和取消透视以及与分组集相关的功能。

You use window functions to perform data-analysis calculations in a more flexible and efficient manner than you can when using alternative methods. 与使用其他方法相比,您可以使用窗口函数以更灵活、更高效的方式执行数据分析计算。Window functions have numerous practical uses, so it's worth your time to get to know them well.窗口函数有许多实际用途,因此值得花时间深入了解它们。

I covered two techniques to handle pivoting: one using a standard grouped query and another using the more concise yet proprietary PIVOT operator. 我介绍了两种处理数据透视的技术:一种是使用标准分组查询,另一种是使用更简洁但专有的数据PIVOT运算符。I also covered two methods to handle unpivoting: one using the APPLY operator, which allows you to control whether to remove rows with NULLs, and another using the UNPIVOT operator, which is more concise but removes rows with NULLs as a mandatory step.我还介绍了两种处理取消激活的方法:一种是使用APPLY运算符,它允许您控制是否删除带null的行;另一种是使用UNPIVOT运算符,它更简洁,但作为一个强制步骤删除带null的行。

T-SQL supports features that make the handling of grouping sets flexible and efficient: the GROUPING SETS, CUBE, and ROLLUP subclauses and the GROUPING and GROUPING_ID functions.T-SQL支持使分组集的处理灵活高效的特性:GROUPING SETSCUBEROLLUP亚子句以及GROUPINGGROUPING_ID函数。

Exercises练习

This section provides exercises to help you familiarize yourself with the subjects discussed in Chapter 7. 本节提供练习,帮助您熟悉第7章中讨论的主题。All exercises for this chapter involve querying the dbo.Orders table in the TSQLV4 database that you created and populated earlier in this chapter by running the code in Listing 7-1.本章的所有练习都涉及查询TSQLV4数据库中的dbo.Orders表,您在本章前面通过运行Listing 7-1中的代码创建并填充了该数据库。

Exercise 练习1

Write a query against the dbo.Orders table that computes both a rank and a dense rank for each customer order, partitioned by custid and ordered by qty:针对dbo.Orders表编写一个查询,该表计算每个客户订单的排名和密集排名,按custid划分,按qty排序:

Image Table involved: TSQLV4 database, dbo.Orders table涉及的表:TSQLV4数据库、dbo.Orders

Image Desired output:所需输出:

custid orderid     qty         rnk                  drnk
------ ----------- ----------- -------------------- --------------------
A      30001       10          1                    1
A      40005       10          1                    1
A      10001       12          3                    2
A      40001       40          4                    3
B      20001       12          1                    1
B      30003       15          2                    2
B      10005       20          3                    3
C      10006       14          1                    1
C      20002       20          2                    2
C      30004       22          3                    3
D      30007       30          1                    1

Exercise 练习2

Earlier in the chapter in the section “Ranking window functions,” I provided the following query against the Sales.OrderValues view to return distinct values and their associated row numbers:在“排名窗口函数”一节的前面一章中,我针对Sales.OrderValues视图提供了以下查询,以返回不同的值及其关联的行号:

SELECT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM Sales.OrderValues
GROUP BY val;

Can you think of an alternative way to achieve the same task?你能想出另一种方法来完成同样的任务吗?

Image Table involved: TSQLV4 database, Sales.OrderValues view涉及的表:TSQLV4数据库、Sales.OrderValues视图

Image Desired output:所需输出:

val       rownum
--------- -------
12.50     1
18.40     2
23.80     3
28.00     4
30.00     5
33.75     6
36.00     7
40.00     8
45.00     9
48.00     10
...
12615.05  793
15810.00  794
16387.50  795

(795 row(s) affected)

Exercise 练习3

Write a query against the dbo.Orders table that computes for each customer order both the difference between the current order quantity and the customer's previous order quantity and the difference between the current order quantity and the customer's next order quantity:针对dbo.Orders表编写一个查询,该表计算每个客户订单的当前订单数量与客户以前订单数量之间的差异,以及当前订单数量与客户下一订单数量之间的差异:

Image Table involved: TSQLV4 database, dbo.Orders table涉及的表:TSQLV4数据库、dbo.Orders

Image Desired output:所需输出:

custid orderid     qty         diffprev    diffnext
------ ----------- ----------- ----------- -----------
A      30001       10          NULL        -2
A      10001       12          2           -28
A      40001       40          28          30
A      40005       10          -30         NULL
B      10005       20          NULL        8
B      20001       12          -8          -3
B      30003       15          3           NULL
C      30004       22          NULL        8
C      10006       14          -8          -6
C      20002       20          6           NULL
D      30007       30          NULL        NULL

Exercise 练习4

Write a query against the dbo.Orders table that returns a row for each employee, a column for each order year, and the count of orders for each employee and order year:针对dbo.Orders表编写查询,该表为每个员工返回一行,为每个订单年返回一列,并为每个员工和订单年返回订单计数:

Image Table involved: TSQLV4 database, dbo.Orders table涉及的表:TSQLV4数据库、dbo.Orders

Image Desired output:所需输出:

empid       cnt2014     cnt2015     cnt2016
----------- ----------- ----------- -----------
1           1           1           1
2           1           2           1
3           2           0           2

Exercise 练习5

Run the following code to create and populate the EmpYearOrders table:运行以下代码创建并填充EmpYearOrders表:

USE TSQLV4;

DROP TABLE IF EXISTS dbo.EmpYearOrders;

CREATE TABLE dbo.EmpYearOrders
(
  empid INT NOT NULL
    CONSTRAINT PK_EmpYearOrders PRIMARY KEY,
  cnt2014 INT NULL,
  cnt2015 INT NULL,
  cnt2016 INT NULL
);

INSERT INTO dbo.EmpYearOrders(empid, cnt2014, cnt2015, cnt2016)
  SELECT empid, [2014] AS cnt2014, [2015] AS cnt2015, [2016] AS cnt2016
  FROM (SELECT empid, YEAR(orderdate) AS orderyear
        FROM dbo.Orders) AS D
    PIVOT(COUNT(orderyear)
          FOR orderyear IN([2014], [2015], [2016])) AS P;

SELECT * FROM dbo.EmpYearOrders;

Here's the output for the query:以下是查询的输出:

empid       cnt2014     cnt2015     cnt2016
----------- ----------- ----------- -----------
1           1           1           1
2           1           2           1
3           2           0           2

Write a query against the EmpYearOrders table that unpivots the data, returning a row for each employee and order year with the number of orders. 针对EmpYearOrders表编写一个查询,该表取消对数据的透视,为每个员工和订单年返回一行订单数。Exclude rows in which the number of orders is 0 (in this example, employee 3 in the year 2015).排除订单数为0的行(在本例中为2015年的员工3)。

Image Desired output:所需输出:

empid       orderyear   numorders
----------- ----------- -----------
1           2014        1
1           2015        1
1           2016        1
2           2014        1
2           2015        2
2           2016        1
3           2014        2
3           2016        2

Exercise 练习6

Write a query against the dbo.Orders table that returns the total quantities for each of the following: (employee, customer, and order year), (employee and order year), and (customer and order year). 针对dbo.Orders表编写一个查询,该表返回以下各项的总数量:(employee、customer和order year)、(employee和order year)和(customer和order year)。Include a result column in the output that uniquely identifies the grouping set with which the current row is associated:在输出中包括一个结果列,该列唯一标识与当前行关联的分组集:

Image Table involved: TSQLV4 database, dbo.Orders table涉及的表:TSQLV4数据库、dbo.Orders

Image Desired output:所需输出:

groupingset    empid       custid    orderyear   sumqty
-------------- ----------- --------- ----------- -----------
0              2           A         2014        12
0              3           A         2014        10
4              NULL        A         2014        22
0              2           A         2015        40
4              NULL        A         2015        40
0              3           A         2016        10
4              NULL        A         2016        10
0              1           B         2014        20
4              NULL        B         2014        20
0              2           B         2015        12
4              NULL        B         2015        12
0              2           B         2016        15
4              NULL        B         2016        15
0              3           C         2014        22
4              NULL        C         2014        22
0              1           C         2015        14
4              NULL        C         2015        14
0              1           C         2016        20
4              NULL        C         2016        20
0              3           D         2016        30
4              NULL        D         2016        30
2              1           NULL      2014        20
2              2           NULL      2014        12
2              3           NULL      2014        32
2              1           NULL      2015        14
2              2           NULL      2015        52
2              1           NULL      2016        20
2              2           NULL      2016        15
2              3           NULL      2016        40

(29 row(s) affected)

When you're done with the exercises in this chapter, run the following code for cleanup:完成本章中的练习后,请运行以下代码进行清理:

DROP TABLE IF EXISTS dbo.Orders;

Solutions解决方案

This section provides solutions to the Chapter 7 exercises.本节提供了第7章练习的解决方案。

Exercise 练习1

This exercise is very technical. Figuring it out is just a matter of being familiar with the syntax for window-ranking functions. 这个练习非常技术性。弄清楚这一点只需熟悉窗口排名函数的语法即可。Here's the solution query, returning for each order both the rank and the dense rank, partitioned by custid and ordered by qty:以下是解决方案查询,为每个订单返回排名和密集排名,按custid划分,按qty排序:

SELECT custid, orderid, qty,
  RANK() OVER(PARTITION BY custid ORDER BY qty) AS rnk,
  DENSE_RANK() OVER(PARTITION BY custid ORDER BY qty) AS drnk
FROM dbo.Orders;

Exercise 练习2

Another way to handle this task is to write a query that returns distinct values without a row number computation, define a table expression based on this query, and then compute row numbers in the outer query against the table expression. 处理此任务的另一种方法是编写一个返回不同值的查询,而无需计算行号,基于此查询定义一个表表达式,然后根据表表达式计算外部查询中的行号。Here's the solution query:以下是解决方案查询:

WITH C AS
(
  SELECT DISTINCT val
  FROM Sales.OrderValues
)
SELECT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM C;

Exercise 练习3

You use the window offset functions LAG and LEAD to return an element from the previous row and the next row, respectively, based on the indicated partitioning and ordering specification. 根据指定的分区和排序规范,可以使用窗口偏移函数LAGLEAD分别返回前一行和下一行的元素。In this exercise, you need to perform the calculations within each customer's orders; hence, the window partitioning should be based on custid. 在本练习中,您需要在每个客户的订单中执行计算;因此,窗口分区应该基于custidAs for ordering, use orderdate as the first ordering column and orderid as the tiebreaker. 对于订购,使用orderdate作为第一个订购列,orderid作为分阶段符。Here's the complete solution query:以下是完整的解决方案查询:

SELECT custid, orderid, qty,
  qty - LAG(qty) OVER(PARTITION BY custid
                      ORDER BY orderdate, orderid) AS diffprev,
  qty - LEAD(qty) OVER(PARTITION BY custid
                       ORDER BY orderdate, orderid) AS diffnext
FROM dbo.Orders;

This query is a good example that shows you can mix detail elements from the row with window functions in the same expression.此查询是一个很好的示例,它显示可以在同一表达式中将行中的细节元素与窗口函数混合使用。

Exercise 练习4

Solving a pivoting problem is all about identifying the elements involved: the grouping element, spreading element, aggregation element, and aggregate function. 解决数据透视问题就是要识别所涉及的元素:分组元素、扩展元素、聚合元素和聚合函数。After you identify those, you simply fit them into the “template” query for pivoting—whether it's the solution with the grouped query or the solution using the PIVOT operator.在识别出它们之后,只需将它们放入“模板”查询中,以进行数据透视,无论是使用分组查询的解决方案还是使用PIVOT运算符的解决方案。

In this exercise, the grouping element is the employee (empid), the spreading element is order year (YEAR(orderdate)), and the aggregate function is COUNT; however, identifying the aggregation element is not that straightforward. 在本练习中,分组元素是雇员(empid),扩展元素是order year(YEAR(orderdate)),聚合函数是COUNT;然而,识别聚合元素并不是那么简单。You want the COUNT aggregate function to count matching rows—you don't really care which attribute it counts. 您希望COUNT聚合函数对匹配行进行计数,而不关心它对哪个属性进行计数。In other words, you can use any attribute you want, as long as the attribute does not allow NULLs, because aggregate functions ignore NULLs.换句话说,您可以使用任何想要的属性,只要该属性不允许Null,因为聚合函数会忽略Null

If it doesn't really matter which attribute you use as the input to the COUNT aggregate, why not use the same attribute you already use as the spreading element? 如果使用哪个属性作为COUNT聚合的输入并不重要,为什么不使用已经用作扩展元素的相同属性呢?In this case, you can use the order year as both the spreading element and aggregation element.在这种情况下,您可以使用订单年作为扩展元素和聚合元素。

Now that you've identified all pivoting elements, you're ready to write the complete solution. 现在,您已经识别了所有枢轴元素,可以编写完整的解决方案了。Here's the solution query without using the PIVOT operator:以下是不使用PIVOT运算符的解决方案查询:

USE TSQLV4;

SELECT empid,
  COUNT(CASE WHEN orderyear = 2014 THEN orderyear END) AS cnt2014,
  COUNT(CASE WHEN orderyear = 2015 THEN orderyear END) AS cnt2015,
  COUNT(CASE WHEN orderyear = 2016 THEN orderyear END) AS cnt2016
FROM (SELECT empid, YEAR(orderdate) AS orderyear
      FROM dbo.Orders) AS D
GROUP BY empid;

Recall that if you do not specify an ELSE clause in a CASE expression, an implicit ELSE NULL is assumed. 回想一下,如果没有在CASE表达式中指定ELSE子句,则假定为隐式ELSE NULLThus, the CASE expression produces non-NULLs only for matching orders (orders placed by the current employee in the current order year), and only those matching orders are taken into consideration by the COUNT aggregate.因此,CASE表达式仅为匹配的订单(当前员工在当前订单年下的订单)生成非Null,并且计数聚合仅考虑那些匹配的订单。

Notice that even though this solution does not require you to use a table expression, I used one here to alias the YEAR(orderdate) expression as orderyear to avoid repeating the expression.请注意,尽管此解决方案不需要使用表表达式,但我在这里使用了一个表表达式来将YEAR(orderdate)表达式别名为orderyear,以避免重复该表达式。

Here's the solution query that uses the PIVOT operator:以下是使用PIVOT运算符的解决方案查询:

SELECT empid, [2014] AS cnt2014, [2015] AS cnt2015, [2016] AS cnt2016
FROM (SELECT empid, YEAR(orderdate) AS orderyear
      FROM dbo.Orders) AS D
  PIVOT(COUNT(orderyear)
        FOR orderyear IN([2014], [2015], [2016])) AS P;

As you can see, it's just a matter of fitting the pivoting elements in the right places.正如您所看到的,这只是将枢轴元素安装到正确位置的问题。

If you prefer to use your own target column names and not the ones based on the actual data, you can provide your own aliases in the SELECT list. 如果希望使用自己的目标列名而不是基于实际数据的列名,可以在SELECT列表中提供自己的别名。In this query, I aliased the result columns [2014], [2015], and [2016] as cnt2014, cnt2015, and cnt2016, respectively.在这个查询中,我将结果列[2014][2015][2016]分别取别名为cnt2014cnt2015cnt2016

Exercise 练习5

This exercise involves a request to unpivot the source columns cnt2014, cnt2015, and cnt2016 to two target columns—orderyear to hold the year that the source column name represents and numorders to hold the source-column value. 此练习涉及请求将源列cnt2014cnt2015cnt2016解压到两个目标列orderyear以保存源列名表示的年份,并解压以保存源列值。You can use the solutions I showed in the chapter as the basis for solving this exercise with a couple of small revisions.您可以使用我在本章中展示的解决方案作为解决此练习的基础,并进行一些小的修改。

In the examples I used in the chapter, NULLs in the table represented irrelevant column values. 在我在本章中使用的示例中,表中的null表示不相关的列值。The unpivoting solutions I presented filtered out rows with NULLs. 我提供的取消激活解决方案筛选掉了带有null的行。The EmpYearOrders table has no NULLs, but it does have zeros in some cases, and the request is to filter out rows with 0 number of orders. EmpYearOrders表没有NULL值,但在某些情况下它确实有零,请求是筛选出订单数为0的行。With the solution that is based on the APPLY operator, simply use the predicate numorders <> 0 instead of using IS NOT NULL, like this:对于基于APPLY运算符的解决方案,只需使用谓词numorders<>0,而不是使用IS NOT NULL,如下所示:

SELECT empid, orderyear, numorders
FROM dbo.EmpYearOrders
  CROSS APPLY (VALUES(2014, cnt2014),
                     (2015, cnt2015),
                     (2016, cnt2016)) AS A(orderyear, numorders)
WHERE numorders <> 0;

As for the solution that uses the UNPIVOT operator, remember that it eliminates NULLs as an integral part of its logic. 至于使用UNPIVOT运算符的解决方案,请记住,它消除了NULL,将其作为其逻辑的一个组成部分。However, it does not eliminate zeros—you have to take care of eliminating zeros yourself by adding a WHERE clause, like this:但是,它不能消除零您必须通过添加WHERE子句来自行消除零,如下所示:

SELECT empid, CAST(RIGHT(orderyear, 4) AS INT) AS orderyear, numorders
FROM dbo.EmpYearOrders
  UNPIVOT(numorders FOR orderyear IN(cnt2014, cnt2015, cnt2016)) AS U
WHERE numorders <> 0;

Notice the expression used in the SELECT list to produce the orderyear result column: CAST(RIGHT(orderyear, 4) AS INT). 请注意SELECT列表中用于生成orderyear结果列的表达式:CAST(RIGHT(orderyear, 4) AS INT)The original column names that the query unpivots are cnt2014, cnt2015, and cnt2016. 查询unpivot的原始列名是cnt2014cnt2015cnt2016These column names become the values 'cnt2014', 'cnt2015', and 'cnt2016', respectively, in the orderyear column in the result of the UNPIVOT operator. 这些列名分别成为UNPIVOT运算符结果中orderyear列中的值'cnt2014''cnt2015''cnt2016'The purpose of this expression is to extract the four rightmost characters representing the order year and convert the value to an integer. 此表达式的目的是提取表示订单年份的四个最右边的字符,并将该值转换为整数。This manipulation was not required in the standard solution because the constants used to construct the table expression A were specified explicitly.标准溶液中不需要此操作,因为用于构造表表达式A的常量是显式指定的。

Exercise 练习6

You can use the GROUPING SETS subclause to list the requested grouping sets and the GROUPING_ID function to produce a unique identifier for the grouping sets. 您可以使用GROUPING SETS子类列出请求的分组集,并使用GROUPING_ID函数为分组集生成唯一标识符。Here's the complete solution query:以下是完整的解决方案查询:

SELECT
  GROUPING_ID(empid, custid, YEAR(Orderdate)) AS groupingset,
  empid, custid, YEAR(Orderdate) AS orderyear, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY
  GROUPING SETS
  (
    (empid, custid, YEAR(orderdate)),
    (empid, YEAR(orderdate)),
    (custid, YEAR(orderdate))
  );

The requested grouping sets are neither a power set nor a rollup of some set of attributes. 请求的分组集既不是幂集,也不是某些属性集的汇总。Therefore, you cannot use either the CUBE or ROLLUP subclause to further abbreviate the code.因此,您不能使用CUBEROLLUP子类来进一步缩写代码。