This chapter covers querying topics that go beyond the fundamentals and is provided as optional reading. 本章涵盖了超出基础知识的查询主题,并作为可选阅读提供。It starts with the powerful window functions, which you can use to apply data-analysis calculations in a flexible and efficient manner. 它从强大的窗口函数开始,您可以使用这些函数以灵活高效的方式应用数据分析计算。The chapter continues with techniques for pivoting and unpivoting data. 本章继续介绍数据透视和反透视技术。Pivoting rotates data from a state of rows to columns, and unpivoting rotates data from columns to rows, similar to pivot tables in Excel. 数据透视将数据从一个行状态旋转到另一个列,取消透视将数据从一列旋转到另一行,这与Excel中的数据透视表类似。The chapter finishes with a discussion about 本章以关于grouping sets
, which are the sets of expressions that you group the data by. 分组集
的讨论结束,分组集是对数据进行分组的表达式集。It covers techniques for defining multiple grouping sets in the same query.它涵盖了在同一查询中定义多个分组集的技术。
What's common to the features covered in this chapter, beyond being more advanced than other features in this book, is that they're mostly used for analytical purposes. 本章介绍的功能除了比本书中的其他功能更先进之外,还有一个共同点,那就是它们主要用于分析目的。If you need more details about these features beyond what's covered in this chapter, you can find them in the books: 如果您需要有关这些功能的更多详细信息,请参阅以下书籍:《T-SQL查询》(Microsoft出版社,2015)和《Microsoft SQL Server 2012使用窗口函数的高性能T-SQL》(Microsoft出版社,2012)。T-SQL Querying
(Microsoft Press, 2015) and Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions
(Microsoft Press, 2012).
A 窗口函数是一个函数,它基于对基础查询中的行子集的计算,为每一行计算标量结果值。window function
is a function that, for each row, computes a scalar result value based on a calculation against a subset of the rows from the underlying query. The subset of rows is known as a 行的子集称为“窗口”,它基于与当前行相关的窗口描述符。window
and is based on a window descriptor that relates to the current row. The syntax for window functions uses a clause called 窗口函数的语法使用了一个名为OVER
, in which you provide the window specification.OVER
的子句,您可以在其中提供窗口规范。
If this sounds too technical, simply think of the need to perform a calculation against a set and return a single value. 如果这听起来太技术化,只需考虑对集合执行计算并返回单个值的需要。A classic example is aggregate calculations—such as 一个典型的例子是聚合计算,比如SUM
, COUNT
, and AVG
—but there are others as well, such as ranking and offset functions. SUM
、COUNT
和AVG
,但也有其他的,比如排名函数和偏移函数。You're familiar already with a couple of ways to apply aggregate calculations—one is by using grouped queries, and another is by using subqueries. 您已经熟悉了应用聚合计算的两种方法,一种是使用分组查询,另一种是使用子查询。However, both options have shortcomings that window functions elegantly resolve.然而,这两个选项都有窗口功能优雅解决的缺点。
Note
If the upcoming explanations about the design of window functions seem a bit overwhelming, bear with me until I show code samples, which should help clarify things. 如果即将到来的关于窗口函数设计的解释似乎有点让人不知所措,请耐心等待,直到我展示代码示例,这将有助于澄清问题。After you see a couple of code samples, you might want to reread the next several paragraphs.在看到两个代码示例后,您可能需要重新阅读接下来的几段。
Grouped queries do provide insights into new information in the form of aggregates, but they also cause you to lose something—the detail. 分组查询确实以聚合的形式提供对新信息的洞察,但它们也会导致您丢失一些细节。After you group the rows, all computations in the query have to be done in the context of the defined groups. 将行分组后,查询中的所有计算都必须在定义的组的上下文中完成。Often, you need to perform calculations that involve both detail and aggregate elements. 通常,您需要执行同时涉及细节和聚合元素的计算。Window functions are not limited in the same way. 窗口功能不受相同方式的限制。A window function is evaluated per detailed row, and it's applied to a subset of rows that is derived from the underlying query result set. 窗口函数按详细的行计算,并应用于从基础查询结果集派生的行子集。The result of the window function is a scalar value, which is added as another column to the query result. 窗口函数的结果是一个标量值,它作为另一列添加到查询结果中。In other words, unlike grouped functions, window functions don't cause you to lose the detail. 换句话说,与分组函数不同,窗口函数不会导致丢失细节。For example, suppose you want to query order values and return the current order value and the percent it constitutes out of the customer total. 例如,假设要查询订单值,并返回当前订单值及其占客户总数的百分比。If you group by the customer, you can get only the customer total. 如果按客户分组,则只能得到客户总数。With a window function, you can return the customer total in addition to the detail order value, and you can even compute the percent of the current order value out of the customer total. 通过一个窗口函数,您可以在返回详细订单值的同时返回客户总数,甚至可以计算当前订单值占客户总数的百分比。I'll demonstrate the code to achieve this later in the chapter.我将在本章后面演示实现这一点的代码。
As for subqueries, you can use them to apply a scalar aggregate calculation against a set, but their starting point is a fresh view of the data rather than the underlying query result set. 至于子查询,可以使用它们对集合应用标量聚合计算,但它们的起点是数据的新视图,而不是底层查询结果集。Suppose the underlying query has table operators, filters, and other query elements; those do not affect what a subquery sees as its starting point. 假设基础查询具有表运算符、筛选器和其他查询元素;这些不会影响子查询的起点。If you need the subquery to apply to the underlying query result set as its starting point, you need to repeat all the underlying query logic in the subquery. 如果需要将子查询应用于基础查询结果集作为其起点,则需要重复子查询中的所有基础查询逻辑。In contrast, a window function is applied to a subset of rows from the underlying query's result set—not a fresh view of the data. 相反,窗口函数应用于基础查询结果集中的行子集,而不是数据的新视图。Therefore, anything you add to the underlying query is automatically applicable to all window functions used in the query. 因此,添加到基础查询中的任何内容都会自动适用于查询中使用的所有窗口函数。If you want, you can further restrict the window.如果需要,可以进一步限制窗口。
Another benefit of using window functions is that you gain the ability to define order, when applicable, as part of the specification of the calculation. 使用窗口函数的另一个好处是,您可以在适用时将顺序定义为计算规范的一部分。This does not conflict with relational aspects of the result. 这与结果的相关方面并不冲突。That is, order is defined for the calculation and not confused with presentation ordering. 也就是说,顺序是为计算定义的,不会与表示顺序混淆。The ordering specification for the window function, if applicable, is different from the ordering specification for presentation. 窗口功能的订购规范(如适用)不同于演示文稿的订购规范。If you don't include a presentation 如果没有包含ORDER BY
clause, you have no assurances that the result will be returned in a particular order. ORDER BY
子句,则无法保证结果将按特定顺序返回。If you do decide to force a certain presentation ordering, the resulting ordering can be different than the ordering for the window function.如果您确实决定强制执行某个演示文稿排序,则生成的排序可能与窗口函数的排序不同。
Following is an example of a query against the 以下是对Sales.EmpOrders
view in the TSQLV4
database that uses a window aggregate function to compute the running-total values for each employee and month:TSQLV4
数据库中Sales.EmpOrders
视图的查询示例,该视图使用窗口聚合函数计算每个员工和月份的运行总值:
USE TSQLV4;
SELECT empid, ordermonth, val,
SUM(val) OVER(PARTITION BY empid
ORDER BY ordermonth
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS runval
FROM Sales.EmpOrders;
Here's the output of this query, shown in abbreviated form:以下是该查询的输出,以缩写形式显示:
empid ordermonth val runval
------ ----------- -------- ----------
1 2014-07-01 1614.88 1614.88
1 2014-08-01 5555.90 7170.78
1 2014-09-01 6651.00 13821.78
1 2014-10-01 3933.18 17754.96
1 2014-11-01 9562.65 27317.61
...
2 2014-07-01 1176.00 1176.00
2 2014-08-01 1814.00 2990.00
2 2014-09-01 2950.80 5940.80
2 2014-10-01 5164.00 11104.80
2 2014-11-01 4614.58 15719.38
...
(192 row(s) affected)
There are up to three parts in the definition of a window function, which you specify in a clause called 窗口函数的定义中最多有三个部分,您可以在名为OVER
: the window-partition clause, window-order clause, and window-frame clause. OVER
的子句中指定它们:窗口分区子句、窗口顺序子句和窗口框架子句。An empty 一个空OVER()
clause represents the entire underlying query's result set. OVER()
子句表示整个基础查询的结果集。Then anything you add to the window specification essentially restricts the window.那么,添加到窗口规范中的任何内容实际上都会限制窗口。
The window-partition clause (窗口分区子句(PARTITION BY
) restricts the window to the subset of rows that have the same values in the partitioning columns as in the current row. PARTITION BY
)将窗口限制为分区列中的值与当前行中的值相同的行子集。In the last query, the window is partitioned by 在最后一个查询中,窗口按empid
. empid
进行分区。For an underlying row with employee ID 1, the window exposed to the function filters only the rows where the employee ID is 1.对于员工ID为1的基础行,函数公开的窗口仅筛选员工ID为1的行。
The window-order clause (窗口顺序子句(ORDER BY
) defines ordering, but don't confuse this with presentation ordering. ORDER BY
)定义了顺序,但不要将其与表示顺序混淆。In a window aggregate function, window ordering supports a frame specification. 在窗口聚合函数中,窗口排序支持框架规范。In a window ranking function, window ordering gives meaning to the rank. 在窗口排名函数中,窗口排序赋予排名意义。In our example, the window ordering is based on 在示例中,窗口排序基于ordermonth
.ordermonth
。
A window-frame clause (窗框子句(ROWS BETWEEN <top delimiter> AND <bottom delimiter>
) filters a frame, or a subset, of rows from the window partition between the two specified delimiters. ROWS BETWEEN <top delimiter> AND <bottom delimiter>
)在两个指定的分隔符之间筛选窗口分区中的一个框或行的子集。In this example, the frame is defined with no low boundary point (在本例中,框架的定义没有低边界点(UNBOUNDED PRECEDING
) and extends until the current row (CURRENT ROW
). UNBOUNDED PRECEDING
),并一直延伸到当前行(CURRENT ROW
)。In addition to the window-frame unit 除了窗框单元ROWS
, there's another unit called RANGE
, but it's implemented in T-SQL in a limited capacity.ROWS
之外,还有另一个称为RANGE
的单元,但它在T-SQL中以有限的容量实现。
Putting all these together, you get the running-total values for each employee and month from the function in the example.将所有这些放在一起,您可以从示例中的函数中获得每个员工和月份的运行总值。
Note that because the starting point of a window function is the underlying query's result set, and the underlying query's result set is generated only when you reach the 请注意,由于窗口函数的起点是基础查询的结果集,并且基础查询的结果集仅在到达SELECT
phase, window functions are allowed only in the SELECT
and ORDER BY
clauses of a query. SELECT
阶段时生成,因此窗口函数仅在查询的SELECT
和ORDER BY
子句中被允许。Mostly, you'll use window functions in the 大多数情况下,您将在SELECT
clause. SELECT
子句中使用窗口函数。If you need to refer to a window function in an earlier logical-query processing phase (such as 如果需要在早期的逻辑查询处理阶段(例如WHERE
), you need to use a table expression. WHERE
)中引用窗口函数,则需要使用表表达式。You specify the window function in the 在内部查询的SELECT
list of the inner query and assign it with an alias. SELECT
列表中指定窗口函数,并为其分配别名。Then, in the outer query, you can refer to that alias anywhere you like.然后,在外部查询中,您可以在任何地方引用该别名。
The windowing concept can take some getting used to, but when you're comfortable with it, you'll realize it's immensely powerful. 窗口的概念可能需要一些时间来适应,但当你对它感到满意时,你会意识到它非常强大。Beyond being used for the obvious data-analysis calculations, window functions can be used to perform a variety of tasks, typically more elegantly and more efficiently than with alternative methods.除了用于明显的数据分析计算之外,窗口函数还可用于执行各种任务,通常比其他方法更优雅、更高效。
The following sections cover ranking, offset, and aggregate window functions. 以下部分介绍排序、偏移和聚合窗口函数。Because this book is about fundamentals, I will not get into certain topics here. 因为这本书是关于基础知识的,所以我不会在这里讨论某些主题。Those topics include the optimization of window functions, distribution functions, and the 这些主题包括优化窗口函数、分布函数和RANGE
window frame unit. RANGE
窗框单元。For details about those items, see my book on the topic, 有关这些项目的详细信息,请参阅我的书《使用窗口函数的Microsoft SQL Server 2012高性能T-SQL》(Microsoft 出版社,2012)。Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions
(Microsoft Press, 2012).
You use ranking window functions to rank each row with respect to others in the window. 您可以使用排名窗口函数,根据窗口中的其他行对每一行进行排名。T-SQL supports four ranking functions: T-SQL支持四个排名函数:ROW_NUMBER
, RANK
, DENSE_RANK
, and NTILE
. ROW_NUMBER
、RANK
、DENSE_RANK
和NTILE
。The following query demonstrates the use of these functions:以下查询演示了这些函数的使用:
SELECT orderid, custid, val,
ROW_NUMBER() OVER(ORDER BY val) AS rownum,
RANK() OVER(ORDER BY val) AS rank,
DENSE_RANK() OVER(ORDER BY val) AS dense_rank,
NTILE(100) OVER(ORDER BY val) AS ntile
FROM Sales.OrderValues
ORDER BY val;
This query generates the following output, shown here in abbreviated form:此查询生成以下输出,以缩写形式显示:
orderid custid val rownum rank dense_rank ntile
----------- ----------- --------- ------- ------- ---------- -----
10782 12 12.50 1 1 1 1
10807 27 18.40 2 2 2 1
10586 66 23.80 3 3 3 1
10767 76 28.00 4 4 4 1
10898 54 30.00 5 5 5 1
10900 88 33.75 6 6 6 1
10883 48 36.00 7 7 7 1
11051 41 36.00 8 7 7 1
10815 71 40.00 9 9 8 1
10674 38 45.00 10 10 9 1
...
10691 63 10164.80 821 821 786 10
10540 63 10191.70 822 822 787 10
10479 65 10495.60 823 823 788 10
10897 37 10835.24 824 824 789 10
10817 39 10952.85 825 825 790 10
10417 73 11188.40 826 826 791 10
10889 65 11380.00 827 827 792 10
11030 71 12615.05 828 828 793 10
10981 34 15810.00 829 829 794 10
10865 63 16387.50 830 830 795 10
(830 row(s) affected)
The ROW_NUMBER
function assigns incremental sequential integers to the rows in the query result based on the mandatory window ordering. ROW_NUMBER
函数根据强制窗口顺序为查询结果中的行分配递增顺序整数。In the sample query, the ordering is based on the 在示例查询中,排序基于val
column; therefore, you can see in the output that when the value increases, the row number increases as well. val
列;因此,您可以在输出中看到,当值增加时,行号也会增加。However, even when the ordering value doesn't increase, the row number still must increase. 然而,即使排序值没有增加,行号也必须增加。Therefore, if the 因此,如果ROW_NUMBER
function's ORDER BY
list is not unique, as in the preceding example, the query is nondeterministic—that is, more than one correct result is possible. ROW_NUMBER
函数的ORDER BY
列表不是唯一的,如上例所示,则查询是不确定的,也就是说,可能有多个正确的结果。For example, observe that two rows with the value 36.00 got the row numbers 7 and 8. 例如,观察值为36.00的两行得到了行号7和8。Any arrangement of these row numbers would be considered correct. 这些行号的任何排列都将被认为是正确的。If you want to make a row number calculation deterministic, you need to add a tiebreaker to the 如果要使行号计算具有确定性,则需要向ORDER BY
list to make it unique. ORDER BY
列表中添加一个决胜器,以使其唯一。For example, you can add the 例如,可以添加orderid
column.orderid
列。
As mentioned, the 如前所述,ROW_NUMBER
function must produce unique values even when there are ties in the ordering values, making it nondeterministic when there are ties. ROW_NUMBER
函数必须生成唯一的值,即使排序值中存在关联,这使得存在关联时不确定性。If you want to produce the same rank value given the same ordering value, use the 如果要在给定相同排序值的情况下生成相同的秩值,请改用RANK
or DENSE_RANK
function instead. RANK
或DENSE_RANK
函数。The difference between the two is that 两者之间的区别在于,RANK
reflects the count of rows that have a lower ordering value than the current row (plus 1), whereas DENSE_RANK
reflects the count of distinct ordering values that are lower than the current row (plus 1). RANK
反映的是排序值低于当前行(加1)的行数,而DENSE_RANK
反映的是排序值低于当前行(加1)的不同行数。For example, in the sample query, a rank of 9 indicates eight rows have lower values. 例如,在示例查询中,9表示有8行的值较低。In the same row, a dense rank of 8 indicates seven rows that have distinct lower values.在同一行中,密集的秩8表示有七行的值明显较低。
You use the 通过为每一行分配一个平铺编号,可以使用NTILE
function to associate the rows in the result with tiles (equally sized groups of rows) by assigning a tile number to each row. NTILE
函数将结果中的行与平铺(大小相等的行组)相关联。You specify the number of tiles you are after and window ordering. 您可以指定要购买的瓷砖数量和窗口顺序。The sample query has 830 rows and the request was for 10 tiles; therefore, the tile size is 83 (830 divided by 10). 示例查询有830行,请求的是10个tile;因此,方块尺寸为83(830除以10)。Window ordering is based on the 窗口排序基于val
column. val
列。This means that the 83 rows with the lowest values are assigned tile number 1, the next 83 are assigned tile number 2, the next 83 are assigned tile number 3, and so on. 这意味着具有最低值的83行被分配了磁贴编号1,接下来的83行被分配了磁贴编号2,接下来的83行被分配了磁贴编号3,依此类推。If the number of rows can't be evenly divided by the number of tiles, an extra row is added to each of the first tiles from the remainder. 如果行数不能被平铺数平均分割,那么将在剩余的每一个平铺中添加一行。For example, if 102 rows and five tiles were requested, the first two tiles would have 21 rows instead of 20.例如,如果请求102行和5个磁贴,前两个磁贴将有21行而不是20行。
Like all window functions, ranking functions support a window partition clause. 与所有窗口函数一样,排名函数支持窗口分区子句。Remember that window partitioning restricts the window to only those rows that have the same values in the partitioning attributes as in the current row. 请记住,窗口分区仅将窗口限制为分区属性中的值与当前行中的值相同的行。For example, the expression 例如,表达式ROW_NUMBER() OVER(PARTITION BY custid ORDER BY val)
assigns row numbers independently for each customer. ROW_NUMBER() OVER(PARTITION BY custid ORDER BY val)
为每个客户独立分配行号。Here's the expression in a query:下面是查询中的表达式:
SELECT orderid, custid, val,
ROW_NUMBER() OVER(PARTITION BY custid
ORDER BY val) AS rownum
FROM Sales.OrderValues
ORDER BY custid, val;
This query generates the following output, shown here in abbreviated form:此查询生成以下输出,以缩写形式显示:
orderid custid val rownum
----------- ----------- ------------ -------
10702 1 330.00 1
10952 1 471.20 2
10643 1 814.50 3
10835 1 845.80 4
10692 1 878.00 5
11011 1 933.50 6
10308 2 88.80 1
10759 2 320.00 2
10625 2 479.75 3
10926 2 514.40 4
10682 3 375.50 1
...
(830 row(s) affected)
Remember that window ordering has nothing to do with presentation ordering and does not change the nature of the result from being relational. 请记住,窗口顺序与表示顺序无关,不会改变结果的关系性质。If you need to guarantee presentation ordering, you have to add a presentation 如果您需要保证演示文稿的顺序,您必须添加一个演示ORDER BY
clause, as I did in the last two.ORDER BY
子句,就像我在上两篇文章中所做的那样。
Window functions are logically evaluated as part of the 在计算SELECT
list, before the DISTINCT
clause is evaluated. DISTINCT
子句之前,窗口函数作为SELECT
列表的一部分进行逻辑计算。If you're wondering why it matters, I'll explain this with an example. 如果你想知道为什么这很重要,我会用一个例子来解释。Currently, the 目前,OrderValues
view has 830 rows with 795 distinct values in the val
column. OrderValues
视图有830行,val
列中有795个不同的值。Consider the following query and its output, shown here in abbreviated form:考虑以下查询及其输出,此处以缩写形式显示:
SELECT DISTINCT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM Sales.OrderValues;
val rownum
---------- -------
12.50 1
18.40 2
23.80 3
28.00 4
30.00 5
33.75 6
36.00 7
36.00 8
40.00 9
45.00 10
...
12615.05 828
15810.00 829
16387.50 830
(830 row(s) affected)
The ROW_NUMBER
function is processed before the DISTINCT
clause. ROW_NUMBER
函数在DISTINCT
子句之前处理。First, unique row numbers are assigned to the 830 rows from the 首先,从OrderValues
view. OrderValues
视图为830行分配唯一的行号。Then the 然后处理DISTINCT
clause is processed—but there are no duplicate rows to remove. DISTINCT
子句,但没有要删除的重复行。The DISTINCT
clause has no effect here. DISTINCT
子句在此处无效。If you want to assign row numbers to the 795 unique values, you need to come up with a different solution. 如果要将行号指定给795唯一值,则需要想出不同的解决方案。For example, because the 例如,由于GROUP BY
phase is processed before the SELECT
phase, you can use the following query:GROUP BY
阶段是在SELECT
阶段之前处理的,因此可以使用以下查询:
SELECT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM Sales.OrderValues
GROUP BY val;
This query generates the following output, shown here in abbreviated form:此查询生成以下输出,以缩写形式显示:
val rownum
--------- -------
12.50 1
18.40 2
23.80 3
28.00 4
30.00 5
33.75 6
36.00 7
40.00 8
45.00 9
48.00 10
...
12615.05 793
15810.00 794
16387.50 795
(795 row(s) affected)
Here, the 这里,GROUP BY
phase produces 795 groups for the 795 distinct values, and then the SELECT
phase produces a row for each group, with val
and a row number based on val
order. GROUP BY
阶段为795个不同的值生成795个组,然后SELECT
阶段为每个组生成一行,其中包含val
和基于val
顺序的行号。You'll get to work on an alternative solution in this chapter's exercises.在本章的练习中,您将学习另一种解决方案。
You use offset window functions to return an element from a row that is at a certain offset from the current row or at the beginning or end of a window frame. 使用偏移窗口函数可以从与当前行或窗框的开头或结尾有一定偏移的行返回元素。T-SQL supports two pairs of offset functions: T-SQL支持两对偏移函数:LAG
and LEAD
, and FIRST_VALUE
and LAST_VALUE
.LAG
和LEAD
,以及FIRST_VALUE
和LAST_VALUE
。
The LAG
and LEAD
functions support window partitions and window order clauses. LAG
和LEAD
函数支持窗口分区和窗口顺序子句。There's no relevance to window framing here. 此处与窗框无关。You use these functions to obtain an element from a row that is at a certain offset from the current row within the partition, based on the indicated ordering. 您可以使用这些函数根据指定的顺序,从与分区内当前行有一定偏移量的行中获取元素。The LAG
function looks before the current row, and the LEAD
function looks ahead. LAG
函数在当前行之前,LEAD
函数在前面。The first argument to the functions (which is mandatory) is the element you want to return; the second argument (optional) is the offset (1 if not specified); the third argument (optional) is the default value to return if there is no row at the requested offset (which is 函数的第一个参数(强制)是要返回的元素;第二个参数(可选)是偏移量(如果未指定,则为1);第三个参数(可选)是在请求的偏移量处没有行时要返回的默认值(如果未另行指定,则为NULL
if not specified otherwise).NULL
)。
As an example, the following query returns order information from the 例如,以下查询从OrderValues
view. OrderValues
视图返回订单信息。For each customer order, the query uses the 对于每个客户订单,查询使用LAG
function to return the value of the previous customer's order and the LEAD
function to return the value of the next customer's order:LAG
函数返回上一个客户订单的值,使用LEAD
函数返回下一个客户订单的值:
SELECT custid, orderid, val,
LAG(val) OVER(PARTITION BY custid
ORDER BY orderdate, orderid) AS prevval,
LEAD(val) OVER(PARTITION BY custid
ORDER BY orderdate, orderid) AS nextval
FROM Sales.OrderValues
ORDER BY custid, orderdate, orderid;
Here's the output of this query in abbreviated form:以下是该查询的简略输出:
custid orderid val prevval nextval
------- -------- -------- -------- --------
1 10643 814.50 NULL 878.00
1 10692 878.00 814.50 330.00
1 10702 330.00 878.00 845.80
1 10835 845.80 330.00 471.20
1 10952 471.20 845.80 933.50
1 11011 933.50 471.20 NULL
2 10308 88.80 NULL 479.75
2 10625 479.75 88.80 320.00
2 10759 320.00 479.75 514.40
2 10926 514.40 320.00 NULL
3 10365 403.20 NULL 749.06
3 10507 749.06 403.20 1940.85
3 10535 1940.85 749.06 2082.00
3 10573 2082.00 1940.85 813.37
3 10677 813.37 2082.00 375.50
3 10682 375.50 813.37 660.00
3 10856 660.00 375.50 NULL
...
(830 row(s) affected)
Because you didn't indicate the offset, the functions assumed 因为没有指明偏移量,所以函数默认为1
by default. 1
。LAG
obtained the value of the immediately previous customer's order, and LEAD
obtained it from the next order. LAG
获得了上一个客户订单的值,LEAD
从下一个订单获得了值。Also, because you didn't specify a third argument, 此外,由于没有指定第三个参数,当没有上一行或下一行时,默认情况下假定为NULL
was assumed by default when there was no previous or next row. NULL
。The expression 表达式LAG(val, 3, 0)
obtains the value from three rows back and returns 0 if a row wasn't found.LAG(val, 3, 0)
从后面三行中获取值,如果找不到行,则返回0。
In this example, I just returned the values from the previous and next orders, but normally you compute something based on the returned values. 在本例中,我只是返回了上一个订单和下一个订单的值,但通常您会根据返回的值计算一些内容。For example, you can compute the difference between the values of the current and previous customers' orders using 例如,您可以使用val – LAG(val) OVER(...)
. val – LAG(val) OVER(...)
计算当前和以前客户订单的值之间的差异。Or you can compute the difference between the current and next customers' orders using 或者,您可以使用val – LEAD(val) OVER(...)
.val – LEAD(val) OVER(...)
计算当前和下一个客户订单之间的差异。
You use the 使用FIRST_VALUE
and LAST_VALUE
functions to return an element from the first and last rows in the window frame, respectively. FIRST_VALUE
和LAST_VALUE
函数分别从窗口框架的第一行和最后一行返回元素。Therefore, these functions support window-partition, window-order, and window-frame clauses. 因此,这些函数支持窗口分区、窗口顺序和窗口框架子句。If you want the element from the first row in the window partition, use 如果需要窗口分区中第一行的元素,请将FIRST_VALUE
with the window frame extent ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. FIRST_VALUE
与ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
的窗框范围一起使用。If you want the element from the last row in the window partition, use 如果需要窗口分区中最后一行中的元素,请将LAST_VALUE
with the window frame extent ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
. LAST_VALUE
与ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
窗框范围一起使用。Note that if you specify 请注意,如果在没有窗口框架单元(例如ORDER BY
without a window-frame unit (such as ROWS
), the bottom delimiter will by default be CURRENT ROW
, and clearly that's not what you want with LAST_VALUE
. ROWS
)的情况下指定ORDER BY
,则默认情况下,底部分隔符将是当前行,显然这不是您想要的LAST_VALUE
。Also, for performance-related reasons that are beyond the scope of this book, you should be explicit about the window-frame extent even for 此外,出于超出本书范围的与性能相关的原因,即使对于FIRST_VALUE
.FIRST_VALUE
,也应该明确说明窗框的范围。
As an example, the following query uses the 例如,以下查询使用FIRST_VALUE
function to return the value of the first customer's order and the LAST_VALUE
function to return the value of the last customer's order:FIRST_VALUE
函数返回第一个客户订单的值,使用LAST_VALUE
函数返回最后一个客户订单的值:
SELECT custid, orderid, val,
FIRST_VALUE(val) OVER(PARTITION BY custid
ORDER BY orderdate, orderid
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS firstval,
LAST_VALUE(val) OVER(PARTITION BY custid
ORDER BY orderdate, orderid
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) AS lastval
FROM Sales.OrderValues
ORDER BY custid, orderdate, orderid;
This query generates the following output, shown here in abbreviated form:此查询生成以下输出,此处以缩写形式显示:
custid orderid val firstval lastval
------- -------- -------- --------- --------
1 10643 814.50 814.50 933.50
1 10692 878.00 814.50 933.50
1 10702 330.00 814.50 933.50
1 10835 845.80 814.50 933.50
1 10952 471.20 814.50 933.50
1 11011 933.50 814.50 933.50
2 10308 88.80 88.80 514.40
2 10625 479.75 88.80 514.40
2 10759 320.00 88.80 514.40
2 10926 514.40 88.80 514.40
3 10365 403.20 403.20 660.00
3 10507 749.06 403.20 660.00
3 10535 1940.85 403.20 660.00
3 10573 2082.00 403.20 660.00
3 10677 813.37 403.20 660.00
3 10682 375.50 403.20 660.00
3 10856 660.00 403.20 660.00
...
(830 row(s) affected)
As with 与LAG
and LEAD
, normally you compute something based on the returned values. LAG
和LEAD
一样,通常根据返回的值计算一些内容。For example, you can compute the difference between the current and the first customer's order values: 例如,您可以计算当前客户订单值和第一个客户订单值之间的差值:val – FIRST_VALUE(val) OVER(...)
. val – FIRST_VALUE(val) OVER(...)
。Or you can compute the difference between the current and last customer's order values: 或者,您可以计算当前客户订单值与上次客户订单值之间的差值:val – LAST_VALUE(val) OVER(...)
.val – LAST_VALUE(val) OVER(...)
。
You use aggregate window functions to aggregate the rows in the defined window. 使用聚合窗口函数聚合定义窗口中的行。They support window-partition, window-order, and window-frame clauses.它们支持窗口分区、窗口顺序和窗口框架子句。
I'll start with an example that doesn't involve ordering and framing. 我将从一个不涉及排序和框架的示例开始。Recall that using an 回想一下,使用带有空括号的OVER
clause with empty parentheses exposes a window of all rows from the underlying query's result set to the function. For example, SUM(val) OVER()
returns the grand total of all values. OVER
子句会向函数公开一个包含基础查询结果集中所有行的窗口。例如,SUM(val) OVER()
返回所有值的总和。If you do add a window-partition clause, you expose a restricted window to the function, with only those rows from the underlying query's result set that share the same values in the partitioning elements as in the current row. 如果确实添加了一个窗口分区子句,则向函数公开一个受限窗口,只有来自基础查询结果集中的那些行在分区元素中与当前行中共享相同的值。For example, 例如,SUM(val) OVER(PARTITION BY custid)
returns the total values for the current customer.SUM(val) OVER(PARTITION BY custid)
返回当前客户的总值。
Here's a query against 下面是对OrderValues
that returns, along with each order, the grand total of all order values, as well as the customer total:OrderValues
的查询,该查询将与每个订单一起返回所有订单值的总计以及客户总计:
SELECT orderid, custid, val,
SUM(val) OVER() AS totalvalue,
SUM(val) OVER(PARTITION BY custid) AS custtotalvalue
FROM Sales.OrderValues;
This query returns the following output, shown here in abbreviated form:此查询返回以下输出,此处以缩写形式显示:
orderid custid val totalvalue custtotalvalue
----------- ----------- ------------ ---------------- ---------------
10643 1 814.50 1265793.22 4273.00
10692 1 878.00 1265793.22 4273.00
10702 1 330.00 1265793.22 4273.00
10835 1 845.80 1265793.22 4273.00
10952 1 471.20 1265793.22 4273.00
11011 1 933.50 1265793.22 4273.00
10926 2 514.40 1265793.22 1402.95
10759 2 320.00 1265793.22 1402.95
10625 2 479.75 1265793.22 1402.95
10308 2 88.80 1265793.22 1402.95
10365 3 403.20 1265793.22 7023.98
...
(830 row(s) affected)
The totalvalue
column shows, for each row, the grand total of all values. totalvalue
列显示每行所有值的总计。The column 列custtotalvalue
has the total values for the current customer.custtotalvalue
包含当前客户的总值。
Important
As mentioned at the beginning of the chapter, one of the great advantages of window functions is that they don't hide the detail. 正如本章开头提到的,窗口函数的一个最大优点是它们不会隐藏细节。This means you can write expressions that mix detail and aggregates. 这意味着您可以编写混合细节和聚合的表达式。The next example demonstrates this.下一个示例演示了这一点。
As an example of mixing detail and aggregates, the following query calculates for each row the percentage of the current value out of the grand total, as well as out of the customer total:作为混合明细和聚合的示例,以下查询为每一行计算当前值在总计和客户总计中的百分比:
SELECT orderid, custid, val,
100. * val / SUM(val) OVER() AS pctall,
100. * val / SUM(val) OVER(PARTITION BY custid) AS pctcust
FROM Sales.OrderValues;
This query returns the following output, shown here in abbreviated form:此查询返回以下输出,此处以缩写形式显示:
orderid custid val pctall pctcust
------------ ------ ----------- ----------------------------- -----------------------------
10643 1 814.50 0.0643470029014691672941 19.0615492628130119354083
10692 1 878.00 0.0693636200705830925528 20.5476246197051252047741
10702 1 330.00 0.0260706089103558320528 7.7229113035338169904048
10835 1 845.80 0.0668197606556938265161 19.7940556985724315469225
10952 1 471.20 0.0372256694501808123130 11.0273812309852562602387
11011 1 933.50 0.0737482224782338461253 21.8464778843903580622513
10926 2 514.40 0.0406385491620819394181 36.6655974910011048148544
10759 2 320.00 0.0252805904585268674452 22.8090808653195053280587
10625 2 479.75 0.0379011352264945770526 34.1958017035532271285505
10308 2 88.80 0.0070153638522412057160 6.3295199401261627285362
10365 3 403.20 0.0318535439777438529809 5.7403352515240647040566
...
(830 row(s) affected)
Aggregate window functions also support a window frame. The frame allows for more sophisticated calculations, such as running and moving aggregates, YTD and MTD calculations, and others. 聚合窗口函数还支持窗口框架。该框架允许进行更复杂的计算,例如运行和移动骨料、YTD和MTD计算等。Let's re-examine the query I used in the introduction to the section about window functions:让我们重新检查一下我在介绍窗口函数一节中使用的查询:
SELECT empid, ordermonth, val,
SUM(val) OVER(PARTITION BY empid
ORDER BY ordermonth
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS runval
FROM Sales.EmpOrders;
This query generates the following output (abbreviated):此查询生成以下输出(缩写):
empid ordermonth val runval
------ ----------- -------- ----------
1 2014-07-01 1614.88 1614.88
1 2014-08-01 5555.90 7170.78
1 2014-09-01 6651.00 13821.78
1 2014-10-01 3933.18 17754.96
1 2014-11-01 9562.65 27317.61
...
2 2014-07-01 1176.00 1176.00
2 2014-08-01 1814.00 2990.00
2 2014-09-01 2950.80 5940.80
2 2014-10-01 5164.00 11104.80
2 2014-11-01 4614.58 15719.38
...
(192 row(s) affected)
Each row in the EmpOrders
view holds information about the order activity for an employee and month. EmpOrders
视图中的每一行都保存有关员工和月份的订单活动的信息。The query returns for each employee and month the monthly value, plus the running-total values from the beginning of the employee's activity until the current month. 查询返回每个员工和月份的月度值,加上从员工活动开始到当月的运行总值。To apply the calculation to each employee independently, you partition the window by 要将计算单独应用于每个员工,请按empid
. empid
对窗口进行分区。Then you define ordering based on ordermonth
, giving meaning to the window frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. This frame means “all activity from the beginning of the partition until the current month.”
T-SQL supports other delimiters for the ROWS
window-frame unit. You can indicate an offset back from the current row as well as an offset forward. For example, to capture all rows from two rows before the current row until one row ahead, you use ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING
. Also, if you do not want an upper bound, you can use UNBOUNDED FOLLOWING
.
Because window functions are so profound and have so many practical uses, I urge you to invest the time and effort to get to know the concept well.因为窗口功能非常深刻,有很多实际用途,我敦促您投入时间和精力来更好地了解这个概念。
Pivoting data involves rotating data from a state of rows to a state of columns, possibly aggregating values along the way. 数据透视涉及将数据从行的状态旋转到列的状态,可能会在此过程中聚合值。In many cases, the pivoting of data is handled by the presentation layer for purposes such as reporting. 在许多情况下,数据透视由表示层处理,以用于报告等目的。This section teaches you how to handle pivoting with T-SQL for cases you do decide to handle in the database.本节将教您如何使用T-SQL处理决定在数据库中处理的情况。
For the rest of the topics in this chapter, I use a sample table called 对于本章的其余主题,我使用一个名为dbo.Orders
that you create and populate in the TSQLV4
database by running Listing 7-1.dbo.Orders
的示例表,您可以通过运行Listing 7-1在TSQLV4
数据库中创建和填充该表。
LISTING 7-1 Code to create and populate the 创建和填充dbo.Orders
tabledbo.Orders
表的代码
USE TSQLV4;
DROP TABLE IF EXISTS dbo.Orders;
CREATE TABLE dbo.Orders
(
orderid INT NOT NULL,
orderdate DATE NOT NULL,
empid INT NOT NULL,
custid VARCHAR(5) NOT NULL,
qty INT NOT NULL,
CONSTRAINT PK_Orders PRIMARY KEY(orderid)
);
INSERT INTO dbo.Orders(orderid, orderdate, empid, custid, qty)
VALUES
(30001, '20140802', 3, 'A', 10),
(10001, '20141224', 2, 'A', 12),
(10005, '20141224', 1, 'B', 20),
(40001, '20150109', 2, 'A', 40),
(10006, '20150118', 1, 'C', 14),
(20001, '20150212', 2, 'B', 12),
(40005, '20160212', 3, 'A', 10),
(20002, '20160216', 1, 'C', 20),
(30003, '20160418', 2, 'B', 15),
(30004, '20140418', 3, 'C', 22),
(30007, '20160907', 3, 'D', 30);
SELECT * FROM dbo.Orders;
The query at the end of Listing 7-1 produces the following output:Listing 7-1末尾的查询生成以下输出:
orderid orderdate empid custid qty
----------- ----------- -------------- --------- -----------
10001 2014-12-24 2 A 12
10005 2014-12-24 1 B 20
10006 2015-01-18 1 C 14
20001 2015-02-12 2 B 12
20002 2016-02-16 1 C 20
30001 2014-08-02 3 A 10
30003 2016-04-18 2 B 15
30004 2014-04-18 3 C 22
30007 2016-09-07 3 D 30
40001 2015-01-09 2 A 40
40005 2016-02-12 3 A 10
Suppose you need to query this table and return the total order quantity for each employee and customer. 假设您需要查询此表并返回每个员工和客户的总订单数量。The following grouped query achieves this task:以下分组查询完成此任务:
SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid, custid;
This query generates the following output:此查询生成以下输出:
empid custid sumqty
----------- --------- -----------
2 A 52
3 A 20
1 B 20
2 B 27
1 C 34
3 C 22
3 D 30
Suppose, however, you have a requirement to produce the output in the form shown in Table 7-1.然而,假设您需要以Table 7-1所示的形式生成输出。
TABLE 7-1 Pivoted view of total quantity per employee (on rows) and customer (on columns)每个员工(在行)和客户(在列)的总数量的数据透视视图
What you see in Table 7-1 is an aggregated and pivoted view of the data; the technique for generating this view is called 您在Table 7-1中看到的是数据的聚合和透视视图;生成此视图的技术称为pivoting
.pivoting
(转秩)。
Every pivoting request involves three logical processing phases, each with associated elements:每个数据透视请求都涉及三个逻辑处理阶段,每个阶段都有相关元素:
1. A grouping phase with an associated grouping or 具有关联分组或on rows
elementon rows
元素的分组阶段
2. A spreading phase with an associated spreading or 具有相关展开或on cols
elementon cols
元素的展开阶段
3. An aggregation phase with an associated aggregation element and aggregate function具有关联聚合元素和聚合函数的聚合阶段
In this example, you need to produce a single row in the result for each unique employee ID. 在本例中,需要为每个唯一的员工ID在结果中生成一行。This means the rows from the 这意味着dbo.Orders
table need to be grouped by the empid
attribute (the grouping element).dbo.Orders
表中的行需要按empid属性(分组元素)分组。
The dbo.Orders
table has a single column that holds all customer ID values and a single column that holds their ordered quantities. dbo.Orders
表有一列保存所有客户ID值,还有一列保存其订购数量。The pivoting task requires you to produce a different result column for each unique customer ID, holding the aggregated quantities for that customer. 数据透视任务要求您为每个唯一的客户ID生成不同的结果列,其中包含该客户的聚合数量。You can think of this process as “spreading” the quantities by customer ID. 您可以将此过程视为按客户ID“分散”数量。The spreading element in this case is the 本例中的展开元素是custid
attribute.custid
属性。
Finally, because pivoting involves grouping, you need to aggregate data to produce the result values in the “intersection” of the grouping (employee) and spreading (customer) elements. 最后,由于数据透视涉及分组,因此需要聚合数据,以在分组(employee)和展开(customer)元素的“交点”中生成结果值。You need to identify the aggregate function (您需要标识聚合函数(本例中为SUM
, in this case) and the aggregation element (the qty
attribute, in this case).SUM
)和聚合元素(本例中为qty
属性)。
To recap, pivoting involves grouping, spreading, and aggregating. 总而言之,数据透视涉及分组、展开和聚合。In this example, you group by 在本例中,您按empid
, spread (quantities) by custid
, and aggregate with SUM(qty)
. empid
分组,按custid
展开(数量),并用SUM(qty)
聚合。After you identify the elements involved in pivoting, the rest is just a matter of incorporating those elements in the right places in a generic query template for pivoting.在确定了数据透视所涉及的元素之后,剩下的只是将这些元素合并到通用查询模板中正确的位置以进行数据透视。
This chapter presents two solutions for pivoting—one based on an explicit grouped query and another with a table operator called 本章介绍两种数据透视解决方案,一种基于显式分组查询,另一种使用名为PIVOT
.PIVOT
的表运算符。
The solution using a grouped query handles all three phases in an explicit and straightforward manner.使用分组查询的解决方案以明确而直接的方式处理所有三个阶段。
The grouping phase is achieved with a 分组阶段是通过GROUP BY
clause—in this case, GROUP BY empid
.GROUP BY
子句实现的。在本例中,GROUP BY empid
。
The spreading phase is achieved in the 展开阶段在SELECT
clause with a CASE
expression for each target column. SELECT
子句中实现,每个目标列都有一个CASE
表达式。You need to know the spreading element values ahead of time and specify a separate expression for each. 您需要提前知道展开元素的值,并为每个值指定一个单独的表达式。Because in this case you need to “spread” the quantities of four customers (A, B, C, and D), there are four 因为在本例中,您需要“分散”四个客户(A、B、C和D)的数量,所以有四个CASE
expressions. CASE
表达式。For example, here's the 例如,下面是客户A的CASE
expression for customer A:CASE
表达式:
CASE WHEN custid = 'A' THEN qty END
This expression returns the quantity from the current row only when the current row represents an order for customer A; otherwise, the expression returns a 仅当当前行表示客户A的订单时,此表达式才返回当前行中的数量;否则,表达式将返回NULL
. NULL
。Remember that if an 请记住,如果在ELSE
clause is not specified in a CASE
expression, the implied default is ELSE NULL
. CASE
表达式中未指定ELSE
子句,则隐含的默认值为ELSE NULL
。This means that in the target column for customer A, only quantities associated with customer A appear as column values, and in all other cases the column values are 这意味着在客户A的目标列中,只有与客户A关联的数量显示为列值,在所有其他情况下,列值均为NULL
.NULL
。
If you don't know the spreading values ahead of time (the distinct customer IDs in this case), you need to query them from the data, construct the query string yourself by using an aggregate string concatenation method, and use dynamic SQL to execute it. 如果您事先不知道展开值(本例中是不同的客户ID),则需要从数据中查询它们,使用聚合字符串串联方法自己构造查询字符串,并使用动态SQL执行它。I'll demonstrate how to achieve this in Chapter 11, “Programmable objects.”我将在第11章“可编程对象”中演示如何实现这一点
Finally, the aggregation phase is achieved by applying the relevant aggregate function (最后,通过将相关的聚合函数(在本例中为SUM
, in this case) to the result of each CASE
expression. SUM
)应用于每个CASE
表达式的结果来实现聚合阶段。For example, here's the expression that produces the result column for customer A:例如,下面是为客户A生成结果列的表达式:
SUM(CASE WHEN custid = 'A' THEN qty END) AS A
Of course, depending on the request, you might need to use another aggregate function (such as 当然,根据请求的不同,您可能需要使用另一个聚合函数(如MAX, MIN,
or COUNT
).MAX
、MIN
或COUNT
)。
Here's the complete solution query that pivots order data, returning the total quantity for each employee (on rows) and customer (on columns):这是一个完整的解决方案查询,它以订单数据为中心,返回每个员工(在行)和客户(在列)的总数量:
SELECT empid,
SUM(CASE WHEN custid = 'A' THEN qty END) AS A,
SUM(CASE WHEN custid = 'B' THEN qty END) AS B,
SUM(CASE WHEN custid = 'C' THEN qty END) AS C,
SUM(CASE WHEN custid = 'D' THEN qty END) AS D
FROM dbo.Orders
GROUP BY empid;
This query produces the output shown earlier in Table 7-1. 此查询产生的输出如Table 7-1所示。SQL Server also generates the following warning in the Messages pane of SSMS:SQL Server还在SSMS的消息窗格中生成以下警告:
Warning: Null value is eliminated by an aggregate or other SET operation.
The warning notifies you that 该警告通知您,聚合函数会忽略NULLs
are ignored by an aggregate function, as we discussed earlier.NULL
,正如我们前面讨论的那样。
PIVOT
operatorPIVOT
运算符旋转The solution for pivoting based on an explicit grouped query is standard. 基于显式分组查询的数据透视解决方案是标准的。T-SQL also supports a proprietary table operator called T-SQL还支持一个名为PIVOT
that you can use to achieve pivoting in a more concise manner. PIVOT
的专有表运算符,您可以使用它以更简洁的方式实现数据透视。As a table operator, 作为一个表运算符,PIVOT
operates in the context of the FROM
clause like any other table operator (for example, JOIN
). PIVOT
在FROM
子句的上下文中进行操作,就像任何其他表运算符(例如JOIN
)一样。It operates on the source table or table expression provided to it as its left input, pivots the data, and returns a result table. 它对作为其左输入提供给它的源表或表表达式进行操作,旋转数据并返回结果表。The PIVOT
operator involves the same logical processing phases as described earlier (grouping, spreading, and aggregating), only it requires less code than the previous solution.PIVOT
运算符涉及与前面描述的相同的逻辑处理阶段(分组、展开和聚合),只是它需要的代码比之前的解决方案少。
The general form of a query with the 使用PIVOT
operator is shown here:PIVOT
运算符的查询的一般形式如下所示:
SELECT ...
FROM <input_table>
PIVOT(<agg_function>(<aggregation_element>)
FOR <spreading_element> IN (<list_of_target_columns>)) AS <result_table_alias>
WHERE ...;
In the parentheses of the 在PIVOT
operator, you specify the aggregate function (SUM
, in this example), aggregation element (qty
), spreading element (custid
), and the list of target column names (A, B, C, D
). PIVOT
运算符的括号中,指定聚合函数(本例中为SUM
)、聚合元素(qty
)、展开元素(custid
)和目标列名列表(A
、B
、C
、D
)。Following the parentheses of the 在PIVOT
operator, you specify an alias for the result table.PIVOT
运算符的括号之后,可以为结果表指定别名。
Note that with the 请注意,使用PIVOT
operator, you do not explicitly specify the grouping elements, removing the need for GROUP BY
in the query. PIVOT
运算符,您不会显式指定分组元素,从而消除了查询中对GROUP BY
的需要。The PIVOT
operator figures out the grouping elements implicitly by elimination. PIVOT
运算符通过消去隐式地计算出分组元素。The grouping elements are all attributes from the source table that were not specified as either the spreading element or the aggregation element. 分组元素是源表中未指定为扩展元素或聚合元素的所有属性。You must ensure that the source table for the 您必须确保PIVOT
operator has no attributes other than the grouping, spreading, and aggregation elements so that the implied selection of the grouping elements will be what you want. PIVOT
运算符的源表除了分组、扩展和聚合元素之外没有其他属性,以便分组元素的隐含选择将是您想要的。You achieve this by using a table expression that includes only the attributes you need as the input of the operator. 通过使用仅包含作为运算符输入所需的属性的表表达式,可以实现这一点。For example, here's the solution query to the original pivoting request, using the 例如,以下是使用PIVOT
operator:PIVOT
运算符对原始透视请求的解决方案查询:
SELECT empid, A, B, C, D
FROM (SELECT empid, custid, qty
FROM dbo.Orders) AS D
PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;
Instead of operating directly on the dbo.Orders
table, the PIVOT
operator operates on a derived table called D
that includes only the pivoting elements empid
, custid
, and qty
. PIVOT
运算符不是直接对dbo.Orders
表进行操作,而是对一个名为D
的派生表进行操作,该派生表只包括数据透视元素empid
、custid
和qty
。When you account for the spreading element, which is 当您考虑展开元素(custid
, and the aggregation element, which is qty
, what's left is the implied grouping element empid
.custid
)和聚合元素(qty
)时,剩下的是隐含的分组元素empid
。
This query returns the output shown earlier in Table 7-1.此查询返回Table 7-1中前面显示的输出。
To understand why you're required to use a table expression here, consider the following query that applies the 要理解为什么需要在这里使用表表达式,请考虑以下将PIVOT
operator directly to the dbo.Orders
table:PIVOT
运算符直接应用于dbo.Orders
表的查询:
SELECT empid, A, B, C, D
FROM dbo.Orders
PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;
The dbo.Orders
table contains the attributes orderid
, orderdate
, empid
, custid
, and qty
. dbo.Orders
表包含属性orderid
、orderdate
、empid
、custid
和qty
。Because the query specifies 因为查询指定custid
as the spreading element and qty
as the aggregation element, the remaining attributes (orderid
, orderdate
, and empid
) are all considered the grouping elements. custid
作为扩展元素,qty
作为聚合元素,所以其余属性(orderid
、orderdate
和empid
)都被视为分组元素。This query, therefore, returns the following output:因此,此查询返回以下输出:
empid A B C D
----------- ----------- ----------- ----------- -----------
2 12 NULL NULL NULL
1 NULL 20 NULL NULL
1 NULL NULL 14 NULL
2 NULL 12 NULL NULL
1 NULL NULL 20 NULL
3 10 NULL NULL NULL
2 NULL 15 NULL NULL
3 NULL NULL 22 NULL
3 NULL NULL NULL 30
2 40 NULL NULL NULL
3 10 NULL NULL NULL
(11 row(s) affected)
Because 因为orderid
is part of the grouping elements, you get a row for each order instead of a row for each employee. orderid
是分组元素的一部分,所以您可以为每个订单获取一行,而不是为每个员工获取一行。The logical equivalent of this query that uses the standard solution for pivoting has 使用标准解决方案进行数据透视的此查询的逻辑等效项在orderid
, orderdate
, and empid
listed in the GROUP BY
list as follows:GROUP BY
列表中列出了orderid
、orderdate
和empid
,如下所示:
SELECT empid,
SUM(CASE WHEN custid = 'A' THEN qty END) AS A,
SUM(CASE WHEN custid = 'B' THEN qty END) AS B,
SUM(CASE WHEN custid = 'C' THEN qty END) AS C,
SUM(CASE WHEN custid = 'D' THEN qty END) AS D
FROM dbo.Orders
GROUP BY orderid, orderdate, empid;
As a best practice with the 作为PIVOT
operator, you should always work with a table expression and not query the underlying table directly. PIVOT
运算符的最佳实践,您应该始终使用表表达式,而不是直接查询基础表。Even if currently the table contains only the columns that are supposed to take part in pivoting, you might add columns to the table in the future, rendering your queries incorrect. 即使当前表中只包含应该参与数据透视的列,将来也可能会向表中添加列,从而导致查询不正确。It is also important to enumerate the columns both in the table expression's inner query and in the outer query.在表表达式的内部查询和外部查询中枚举列也很重要。
As another example of a pivoting request, suppose that instead of returning employees on rows and customers on columns, you want it the other way around: the grouping element is 作为数据透视请求的另一个示例,假设不返回行中的员工和列中的客户,而是以另一种方式返回:分组元素是custid
, the spreading element is empid
, and the aggregation element and aggregate function remain SUM(qty)
. custid
,扩展元素是empid
,聚合元素和聚合函数保持SUM(qty)
。After you learn the “template” for a pivoting solution (with the grouped query or with the 学习了数据透视解决方案的“模板”(使用分组查询或使用PIVOT
operator), it's just a matter of fitting those elements in the right places. PIVOT
运算符)后,只需将这些元素拟合到正确的位置即可。The following solution query uses the 以下解决方案查询使用PIVOT
operator to achieve the result:PIVOT
运算符获得结果:
SELECT custid, [1], [2], [3]
FROM (SELECT empid, custid, qty
FROM dbo.Orders) AS D
PIVOT(SUM(qty) FOR empid IN([1], [2], [3])) AS P;
The employee IDs 1, 2, and 3 are values in the 员工ID 1、2和3是源表中empid
column in the source table, but in terms of the result, these values become target column names. empid
列中的值,但就结果而言,这些值成为目标列名。You must refer to them as identifiers in the 您必须在IN
clause. IN
子句中将它们作为标识符引用。When identifiers are irregular (for example, when they start with a digit), you need to delimit them—hence, the use of square brackets.当标识符不规则时(例如,当标识符以数字开头时),需要对其进行分隔,因此需要使用方括号。
This query returns the following output:此查询返回以下输出:
custid 1 2 3
--------- ----------- ----------- -----------
A NULL 52 20
B 20 27 NULL
C 34 NULL 22
D NULL NULL 30
Unpivoting
is a technique that rotates data from a state of columns to a state of rows. 是一种将数据从列状态旋转到行状态的技术。Usually, it involves querying a pivoted state of the data and producing from each source row multiple result rows, each with a different source column value. 通常,它涉及查询数据的数据透视状态,并从每个源行生成多个结果行,每个结果行具有不同的源列值。A common use case is to unpivot data you imported from a spreadsheet into the database for easier manipulation.一个常见的用例是将从电子表格导入的数据解压到数据库中,以便于操作。
Run the following code to create and populate a table called 运行以下代码在EmpCustOrders
in the TSQLV4
sample database. TSQLV4
示例数据库中创建并填充名为EmpCustOrders
的表。I'll use this table to demonstrate unpivoting techniques:我将使用此表演示取消激励的技巧:
USE TSQLV4;
DROP TABLE IF EXISTS dbo.EmpCustOrders;
CREATE TABLE dbo.EmpCustOrders
(
empid INT NOT NULL
CONSTRAINT PK_EmpCustOrders PRIMARY KEY,
A VARCHAR(5) NULL,
B VARCHAR(5) NULL,
C VARCHAR(5) NULL,
D VARCHAR(5) NULL
);
INSERT INTO dbo.EmpCustOrders(empid, A, B, C, D)
SELECT empid, A, B, C, D
FROM (SELECT empid, custid, qty
FROM dbo.Orders) AS D
PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;
SELECT * FROM dbo.EmpCustOrders;
Here's the output of the query against 以下是针对EmpCustOrders
showing its contents:EmpCustOrders
的查询输出,显示了其内容:
empid A B C D
----------- ----------- ----------- ----------- -----------
1 NULL 20 34 NULL
2 52 27 NULL NULL
3 20 NULL 22 30
The table has a row for each employee; a column for each of the four customers 该表为每个员工都有一行;四个客户A, B, C
, and D;
and the order quantity for each employee and customer. A
、B
、C
和D
各一列;以及每个员工和客户的订单数量。Notice that irrelevant intersections (employee-customer combinations that had no intersecting order activity) are represented by 请注意,不相关的交叉点(没有交叉订单活动的员工-客户组合)由NULLs
. Null
表示。Suppose you get a request to unpivot the data, requiring you to return a row for each employee and customer, along with the order quantity. 假设您收到一个请求,要求您为每个员工和客户返回一行数据以及订单数量。The result should look like this:结果应如下所示:
empid custid qty
----------- --------- -----------
1 B 20
1 C 34
2 A 52
2 B 27
3 A 20
3 C 22
3 D 30
In the following sections, I'll discuss two techniques for solving this problem—one using the 在下面的部分中,我将讨论解决此问题的两种技术,一种是使用APPLY
operator and another using the UNPIVOT
operator.APPLY
运算符,另一种是使用UNPIVOT
运算符。
APPLY
operatorAPPLY
运算符取消激活Unpivoting involves three logical processing phases: producing copies, extracting values, and eliminating irrelevant rows.取消激活涉及三个逻辑处理阶段:生成副本、提取值和消除不相关的行。
The first step in the solution involves producing multiple copies of each source row—one for each column you need to unpivot. 解决方案中的第一步涉及到为每个源行生成多个副本,每个源行对应一个需要取消归档的列。In this case, you need to produce a copy for each of the columns 在这种情况下,您需要为代表客户ID的A, B, C
, and D
, which represent customer IDs. A
、B
、C
和D
列中的每一列生成一个副本。You can achieve this step by applying a cross join between the 可以通过在EmpCustOrders
table and a table that has a row for each customer.EmpCustOrders
表和每个客户都有一行的表之间应用交叉联接来实现这一步骤。
If you already have a table of customers in your database, you can use that table in the cross join. 如果数据库中已有客户表,则可以在交叉联接中使用该表。If you don't have a table of customers, you can build a virtual one on the fly using a table-value constructor based on the 如果没有客户表,可以使用基于VALUES
clause. VALUES
子句的表值构造函数动态构建虚拟客户表。Here's the code that implements this step:以下是实现此步骤的代码:
SELECT *
FROM dbo.EmpCustOrders
CROSS JOIN (VALUES('A'),('B'),('C'),('D')) AS C(custid);
The VALUES
clause defines a set of four rows, each with a single customer ID value. VALUES
子句定义了一组四行,每行有一个客户ID值。The code defines a derived table called 代码基于此子句定义了一个名为C
based on this clause and names the only column in it custid
. C
的派生表,并将其中唯一的列命名为custid
。The code then applies a cross join between 然后,代码在EmpCustOrders
and C
.EmpCustOrders
和C
之间应用交叉连接。
Note
If you're not familiar yet with the 如果您还不熟悉VALUES
clause, it's described in detail in Chapter 8, “Data modification.”VALUES
子句,请参阅第8章“数据修改”中的详细说明
In this example, the query that implements the first step in the solution returns the following output:在本例中,实现解决方案中第一步的查询返回以下输出:
empid A B C D custid
----------- ----------- ----------- ----------- ----------- ------
1 NULL 20 34 NULL A
1 NULL 20 34 NULL B
1 NULL 20 34 NULL C
1 NULL 20 34 NULL D
2 52 27 NULL NULL A
2 52 27 NULL NULL B
2 52 27 NULL NULL C
2 52 27 NULL NULL D
3 20 NULL 22 30 A
3 20 NULL 22 30 B
3 20 NULL 22 30 C
3 20 NULL 22 30 D
As you can see, four copies were produced for each source row—one each for customers 如您所见,为每个源行生成了四个副本,分别为客户A, B, C
, and D.
A
、B
、C
和D
生成一个副本。
The second step in the solution is to extract a value from one of the original customer quantity columns (解决方案中的第二步是从一个原始客户数量列(A
, B
, C
, or D
) to return a single value column (call it qty
in our case). A
、B
、C
或D
)中提取一个值,以返回一个值列(在本例中称为qty
)。You need to extract the value from the column that corresponds to the current 您需要从与当前custid
value. custid
值相对应的列中提取值。If 如果custid
is 'A'
, the qty
column should return the value from column A
, if custid
is 'B'
, qty
should return the value from column B
, and so on. custid
为'A'
,则qty
列应返回A
列的值,如果custid
为'B'
,则qty
应返回B
列的值,依此类推。To achieve this step, you might think you can simply add the 要实现此步骤,您可能认为可以简单地将qty
column as a second column to each row in the table value constructor (the VALUES
clause), like this:qty
列作为第二列添加到表值构造函数(VALUES
子句)的每一行中,如下所示:
SELECT empid, custid, qty
FROM dbo.EmpCustOrders
CROSS JOIN (VALUES('A', A),('B', B),('C', C),('D', D)) AS C(custid, qty);
However, remember that a join treats its two inputs as a set; hence, there's no order between those inputs. 但是,请记住,联接将其两个输入视为一个集合;因此,这些输入之间没有顺序。You can't refer to the elements of either of the inputs when constructing the other. 在构造另一个输入时,不能引用其中任何一个输入的元素。In our case, the table-value constructor on the right side of the join has references to the columns 在示例中,联接右侧的表值构造函数引用了联接左侧(A
, B
, C
, and D
from the left side of the join (EmpCustOrders
). EmpCustOrders
)的列A
、B
、C
和D
。Consequently, when you try to run this code, you get the following errors:因此,当您尝试运行此代码时,会出现以下错误:
Msg 207, Level 16, State 1, Line 222
Invalid column name 'A'.
Msg 207, Level 16, State 1, Line 222
Invalid column name 'B'.
Msg 207, Level 16, State 1, Line 222
Invalid column name 'C'.
Msg 207, Level 16, State 1, Line 222
Invalid column name 'D'.
The solution is to use the 解决方案是使用CROSS APPLY
operator instead of the CROSS JOIN
operator. CROSS APPLY
运算符而不是CROSS JOIN
运算符。They are similar, but the former evaluates the left side first and then applies the right side to each left row, making the left side's elements accessible to the right side. 它们类似,但前者首先计算左侧,然后将右侧应用于每一行,从而使左侧的元素可供右侧访问。Here's the code implementing this step with the 下面是使用CROSS APPLY
operator:CROSS APPLY
运算符实现此步骤的代码:
SELECT empid, custid, qty
FROM dbo.EmpCustOrders
CROSS APPLY (VALUES('A', A),('B', B),('C', C),('D', D)) AS C(custid, qty);
This query runs successfully, returning the following output:此查询成功运行,返回以下输出:
empid custid qty
----------- --------- -----------
1 A NULL
1 B 20
1 C 34
1 D NULL
2 A 52
2 B 27
2 C NULL
2 D NULL
3 A 20
3 B NULL
3 C 22
3 D 30
Recall that in the original table 回想一下,在原始表中,NULLs
represent irrelevant intersections. NULL
值表示不相关的交点。In the result, there's typically no reason to keep irrelevant rows where 因此,通常没有理由在qty
is NULL
. qty
为NULL
的情况下保留不相关的行。The nice thing in our case is that the 在例子中,好的方面是创建列CROSS APPLY
operator, which created the column qty
, was processed in the FROM
clause, and the FROM
clause is evaluated before the WHERE
clause. qty
的CROSS APPLY
运算符在FROM
子句中处理,FROM
子句在WHERE
子句之前求值。This means that the 这意味着qty
column is accessible to expressions in the WHERE
clause. WHERE
子句中的表达式可以访问qty
列。To remove irrelevant rows, add a filter in the 要删除不相关的行,请在WHERE
clause that discards rows with a NULL
in the qty
column, like this:WHERE
子句中添加一个筛选器,该筛选器将丢弃qty
列中为NULL
的行,如下所示:
SELECT empid, custid, qty
FROM dbo.EmpCustOrders
CROSS APPLY (VALUES('A', A),('B', B),('C', C),('D', D)) AS C(custid, qty)
WHERE qty IS NOT NULL;
This query returns the following output:此查询返回以下输出:
empid custid qty
----------- --------- -----------
1 B 20
1 C 34
2 A 52
2 B 27
3 A 20
3 C 22
3 D 30
UNPIVOT
operatorUNPIVOT
运算符取消透视Unpivoting data involves producing two result columns from any number of source columns—one to hold the source column names as strings and another to hold the source column values. 取消激活数据涉及从任意数量的源列中生成两个结果列,一个用于将源列名作为字符串保存,另一个用于保存源列值。In this example, you need to unpivot the source columns 在本例中,需要取消对源列A, B, C
, and D
, producing the result names column custid
and values column qty
. A
、B
、C
和D
的透视,生成结果名称列custid
和值列qty
。Similar to the 与PIVOT
operator, T-SQL also supports the UNPIVOT
operator to enable you to unpivot data. PIVOT
运算符类似,T-SQL还支持UNPIVOT
运算符,使您能够取消透视数据。The general form of a query with the 使用UNPIVOT
operator is as follows:UNPIVOT
运算符的查询的一般形式如下所示:
SELECT ...
FROM <input_table>
UNPIVOT(<values_column> FOR <names_column> IN(<source_columns>)) AS <result_table_alias>
WHERE ...;
Like the 与PIVOT
operator, UNPIVOT
was also implemented as a table operator in the context of the FROM
clause. PIVOT
运算符一样,UNPIVOT
也是在FROM
子句的上下文中作为表运算符实现的。It operates on a source table or table expression (它对源表或表表达式(本例中为EmpCustOrders
in this case). EmpCustOrders
)进行操作。Within the parentheses of the 在UNPIVOT
operator, you specify the name you want to assign to the column that will hold the source-column values (qty
here), the name you want to assign to the column that will hold the source-column names (custid
), and the list of source-column names (A, B, C
, and D
). UNPIVOT
运算符的括号内,指定要分配给将保存源列值的列的名称(此处为qty
)、要分配给将保存源列名称(custid
)的列的名称以及源列名称列表(A
、B
、C
和D
)。Following the parentheses, you provide an alias to the table resulting from the table operator.在括号后面,您可以为表运算符生成的表提供别名。
Here's the query that uses the 以下是使用UNPIVOT
operator to handle our unpivoting task:UNPIVOT
运算符处理反透视任务的查询:
SELECT empid, custid, qty
FROM dbo.EmpCustOrders
UNPIVOT(qty FOR custid IN(A, B, C, D)) AS U;
Note that the 请注意,UNPIVOT
operator implements the same logical-processing phases described earlier—generating copies, extracting elements, and eliminating NULL
intersections. UNPIVOT
运算符实现了前面描述的相同逻辑处理阶段,即生成副本、提取元素和消除NULL
交点。However, the last phase is not optional as in the solution with the 但是,最后一个阶段不是可选的,因为在使用APPLY
operator. APPLY
运算符的解决方案中是可选的。When you need to apply the third phase, it's convenient to use the solution with the 当需要应用第三阶段时,可以方便地将解与UNPIVOT
operator because it's more concise. UNPIVOT
运算符一起使用,因为它更简洁。When you need to keep the rows with the 当需要将行保留为NULLs
, use the solution with the APPLY
operator.Null
时,请使用带有APPLY
运算符的解决方案。
When you're done, run the following code for cleanup:完成后,运行以下代码进行清理:
DROP TABLE IF EXISTS dbo.EmpCustOrders;
This section describes what a grouping set is and the features in T-SQL that support grouping sets.本节介绍什么是分组集以及T-SQL中支持分组集的功能。
A “分组集”是在分组查询(具有grouping set
is a set of expressions you group the data by in a grouped query (a query with a GROUP BY
clause). GROUP BY
子句的查询)中对数据分组所依据的一组表达式。The reason for using the term “set” here is that there's no significance to the order in which you specify the expressions in the 这里使用术语“set”的原因是,在GROUP BY
clause. GROUP BY
子句中指定表达式的顺序没有意义。Traditionally in SQL, a single grouped query defines a single grouping set. 传统上,在SQL中,单个分组查询定义单个分组集。For example, each of the following four queries defines a single grouping set:例如,以下四个查询中的每一个都定义了一个分组集:
SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid, custid;
SELECT empid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid;
SELECT custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY custid;
SELECT SUM(qty) AS sumqty
FROM dbo.Orders;
The first query defines the grouping set 第一个查询定义分组集(empid, custid)
; the second (empid)
, the third (custid)
, and the last query defines what's known as the empty grouping set, ()
. (empid, custid)
;第二个查询(empid)
、第三个查询(custid)
和最后一个查询定义了所谓的空分组集()
。This code returns four result sets—one for each of the four queries.这段代码为四个查询中的每一个返回四个结果集。
Suppose, for reporting purposes, that instead of wanting four separate result sets returned, you want a single unified result set. 假设出于报告目的,您不希望返回四个单独的结果集,而是希望返回一个统一的结果集。You can achieve this by using the 可以通过在查询之间使用UNION ALL
operator between the queries, after planting NULLs
as placeholders for columns that appear in one query but not others. UNION ALL
运算符来实现这一点,方法是将Null
作为一个查询中出现的列(而不是其他查询中出现的列)的占位符。Here's what the code looks like:代码如下所示:
SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid, custid
UNION ALL
SELECT empid, NULL, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY empid
UNION ALL
SELECT NULL, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY custid
UNION ALL
SELECT NULL, NULL, SUM(qty) AS sumqty
FROM dbo.Orders;
This code generates a single result set with the aggregates for all four groupings:此代码生成一个结果集,其中包含所有四个分组的聚合:
empid custid sumqty
----------- --------- -----------
2 A 52
3 A 20
1 B 20
2 B 27
1 C 34
3 C 22
3 D 30
1 NULL 54
2 NULL 79
3 NULL 72
NULL A 72
NULL B 47
NULL C 56
NULL D 30
NULL NULL 205
(15 row(s) affected)
Even though you managed to get what you were after, this solution has two main problems—the length of the code and performance. 尽管您设法获得了想要的东西,但此解决方案有两个主要问题:代码长度和性能。It's long because you have a separate query for each grouping set. 它很长,因为每个分组集都有一个单独的查询。Also, SQL Server needs to apply a separate scanning of the data for each query.此外,SQL Server需要对每个查询应用单独的数据扫描。
T-SQL supports standard features you can use to define multiple grouping sets in the same query. T-SQL支持可用于在同一查询中定义多个分组集的标准功能。Those are the 这些是GROUPING SETS
, CUBE
, and ROLLUP
subclauses of the GROUP BY
clause, and the GROUPING
and GROUPING_ID
functions. GROUP BY
子句的GROUPING SETS
、CUBE
和ROLLUP
亚子句,以及GROUPING
函数和GROUPING_ID
函数。The main use cases are reporting and data analysis. 主要用例是报告和数据分析。These features usually need the presentation layer to use more sophisticated GUI controls to display the data than the typical grid control with its columns and rows. 与具有列和行的典型网格控件相比,这些功能通常需要表示层使用更复杂的GUI控件来显示数据。But this book's focus is the T-SQL code in the database and not the presentation layer.但本书的重点是数据库中的T-SQL代码,而不是表示层。
GROUPING SETS
subclauseGROUPING SETS
亚子句The GROUPING SETS
subclause is a powerful enhancement to the GROUP BY
clause. GROUPING SETS
亚子句是GROUP BY
子句的一个强大增强。You can use it to define multiple grouping sets in the same query. 可以使用它在同一查询中定义多个分组集。Simply list the grouping sets you want, separated by commas within the parentheses of the 只需列出所需的分组集,在GROUPING SETS
subclause, and for each grouping set list the members, separated by commas, within parentheses. GROUPING SETS
亚子句的括号内用逗号分隔,并为每个分组集列出成员,在括号内用逗号分隔。For example, the following query defines four grouping sets: 例如,以下查询定义了四个分组集:(empid, custid)
, (empid)
, (custid)
, and ()
:(empid, custid)
、(empid)
、(custid)
和()
:
SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY
GROUPING SETS
(
(empid, custid),
(empid),
(custid),
()
);
The last grouping set is the empty grouping set representing the grand total. 最后一个分组集是表示总计的空分组集。This query is a logical equivalent of the previous solution that unified the result sets of four aggregate queries. 此查询在逻辑上等效于以前的解决方案,该解决方案统一了四个聚合查询的结果集。Only this one is much shorter, plus it gets optimized better. 只有这一个短得多,而且它得到了更好的优化。SQL Server typically needs fewer scans of the data than the number of grouping sets because it can roll up aggregates internally.SQL Server通常需要的数据扫描比分组集的数量少,因为它可以在内部汇总聚合。
CUBE
subclauseCUBE
亚子句The CUBE
subclause of the GROUP BY
clause provides an abbreviated way to define multiple grouping sets. GROUP BY
子句的CUBE
亚子句提供了定义多个分组集的简化方法。In the parentheses of the 在CUBE
subclause, you provide a set of members separated by commas, and you get all possible grouping sets that can be defined based on the input members. CUBE
亚子句的括号中,您提供了一组由逗号分隔的成员,并且可以得到所有可能的分组集,这些分组集可以根据输入成员定义。For example, 例如,CUBE(a, b, c)
is equivalent to GROUPING SETS( (a, b, c), (a, b), (a, c), (b, c), (a), (b), (c), () )
. CUBE(a, b, c)
等价于GROUPING SETS( (a, b, c), (a, b), (a, c), (b, c), (a), (b), (c), () )
。In set theory, the set of all subsets of elements that can be produced from a particular set is called the 在集合论中,可以从特定集合生成的所有元素子集的集合称为“幂集”。power set
. You can think of the 可以将CUBE
subclause as producing the power set of grouping sets that can be formed from the given set of elements.CUBE
亚子句视为生成分组集的幂集,分组集可以由给定的元素集形成。
Instead of using the 不必在前面的查询中使用GROUPING SETS
subclause in the previous query to define the four grouping sets (empid, custid)
, (empid)
, (custid)
, and ()
, you can simply use CUBE(empid, custid)
. GROUPING SETS
亚子句来定义四个分组集(empid, custid)
、(empid)
、(custid)
和()
,只需使用CUBE(empid, custid)
。Here's the complete query:以下是完整的查询:
SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY CUBE(empid, custid);
ROLLUP
subclauseROLLUP
亚子句The ROLLUP
subclause of the GROUP BY
clause also provides an abbreviated way to define multiple grouping sets. GROUP BY
子句的ROLLUP
亚子句还提供了定义多个分组集的简化方法。However, unlike the 然而,与CUBE
subclause, ROLLUP
doesn't produce all possible grouping sets. CUBE
亚子句不同,ROLLUP
不会生成所有可能的分组集。ROLLUP
assumes a hierarchy among the input members and produces only grouping sets that form leading combinations of the input members. ROLLUP
在输入成员之间采用层次结构,只生成构成输入成员前导组合的分组集。For example, whereas 例如,虽然CUBE(a, b, c)
produces all eight possible grouping sets, ROLLUP(a, b, c)
produces only four based on the hierarchy a>b>c. CUBE(a, b, c)
生成所有八个可能的分组集,但ROLLUP(a, b, c)
仅基于层次结构a>b>c生成四个分组集。It is the equivalent of specifying 它相当于指定GROUPING SETS( (a, b, c), (a, b), (a), () )
.GROUPING SETS( (a, b, c), (a, b), (a), () )
。
For example, suppose you want to return total quantities for all grouping sets that can be defined based on the time hierarchy of 例如,假设要返回所有分组集的总数量,这些分组集可以根据订单年、订单月、订单日的时间层次结构定义。order year, order month, order day
. You can use the 您可以使用GROUPING SETS
subclause and explicitly list all four possible grouping sets:GROUPING SETS
亚子句并显式列出所有四个可能的分组集:
GROUPING SETS(
(YEAR(orderdate), MONTH(orderdate), DAY(orderdate)),
(YEAR(orderdate), MONTH(orderdate)),
(YEAR(orderdate)),
() )
The logical equivalent that uses the 使用ROLLUP
subclause is much more concise:ROLLUP
亚子句的逻辑等价物要简洁得多:
ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate))
Here's the complete query you need to run:以下是您需要运行的完整查询:
SELECT
YEAR(orderdate) AS orderyear,
MONTH(orderdate) AS ordermonth,
DAY(orderdate) AS orderday,
SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate));
This query produces the following output:此查询生成以下输出:
orderyear ordermonth orderday sumqty
----------- -------------- ----------- -----------
2014 4 18 22
2014 4 NULL 22
2014 8 2 10
2014 8 NULL 10
2014 12 24 32
2014 12 NULL 32
2014 NULL NULL 64
2015 1 9 40
2015 1 18 14
2015 1 NULL 54
2015 2 12 12
2015 2 NULL 12
2015 NULL NULL 66
2016 2 12 10
2016 2 16 20
2016 2 NULL 30
2016 4 18 15
2016 4 NULL 15
2016 9 7 30
2016 9 NULL 30
2016 NULL NULL 75
NULL NULL NULL 205
GROUPING
and GROUPING_ID
functionsGROUPING
和GROUPING_ID
函数When you have a single query that defines multiple grouping sets, you might need to associate result rows and grouping sets. 当您有一个定义多个分组集的查询时,可能需要关联结果行和分组集。As long as all grouping elements are defined as 只要所有分组元素都定义为NOT NULL
, this is easy. NOT NULL
,这就很容易。For example, consider the following query:例如,考虑以下查询:
SELECT empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY CUBE(empid, custid);
This query produces the following output:此查询生成以下输出:
empid custid sumqty
----------- --------- -----------
2 A 52
3 A 20
NULL A 72
1 B 20
2 B 27
NULL B 47
1 C 34
3 C 22
NULL C 56
3 D 30
NULL D 30
NULL NULL 205
1 NULL 54
2 NULL 79
3 NULL 72
Because both the 由于empid
and custid
columns were defined in the dbo.Orders
table as NOT NULL
, a NULL
in those columns can only represent a placeholder, indicating that the column did not participate in the current grouping set. empid
和custid
列在dbo.Orders
表中都定义为NOT NULL
,因此这些列中的NULL
只能表示占位符,表示该列未参与当前分组集。For example, all rows in which 例如,empid
is not NULL
and custid
is not NULL
are associated with the grouping set (empid, custid)
. empid
不为NULL
且custid
不为NULL
的所有行都与分组集(empid, custid)
关联。All rows in which empid
is not NULL
and custid
is NULL
are associated with the grouping set (empid)
, and so on.empid
不为NULL
且custid
为NULL
的所有行都与分组集(empid)
相关联,依此类推。
However, if a grouping column allows 但是,如果分组列在表中允许NULLs
in the table, you cannot tell for sure whether a NULL
in the result set originated from the data or is a placeholder for a nonparticipating member in a grouping set. NULL
值,则无法确定结果集中的NULL
值是源于数据还是分组集中非参与成员的占位符。One way to solve this problem is to use the 解决此问题的一种方法是使用GROUPING
function. GROUPING
函数。This function accepts a name of a column and returns 此函数接受列的名称,如果它是当前分组集的成员(详细信息元素),则返回0
if it is a member of the current grouping set (a detail element) and 1
otherwise (an aggregate element).0
,否则返回1
(聚合元素)。
Note
I find it counterintuitive that the 我发现,当元素不是分组集的一部分时,GROUPING
function returns 1
when the element isn't part of the grouping set and 0
when it is. GROUPING
函数返回1
,当元素是分组集的一部分时返回0
,这与直觉相反。To me, it would make more sense to return 对我来说,当元素是分组集的一部分时,返回1
when the element is part of the grouping set and 0
otherwise. 1
、否则返回0
更有意义。The current perspective is to use 当前透视图使用1
to indicate the element is an aggregate element and 0
when it's a detail element. 1
表示元素是聚合元素,使用0
表示元素是细节元素。You just need to make sure you realize this fact.你只需要确保你意识到这个事实。
For example, the following query invokes the 例如,以下查询为每个分组元素调用GROUPING
function for each of the grouping elements:GROUPING
函数:
SELECT
GROUPING(empid) AS grpemp,
GROUPING(custid) AS grpcust,
empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY CUBE(empid, custid);
This query returns the following output:此查询返回以下输出:
grpemp grpcust empid custid sumqty
--------- ---------- ----------- --------- -----------
0 0 2 A 52
0 0 3 A 20
1 0 NULL A 72
0 0 1 B 20
0 0 2 B 27
1 0 NULL B 47
0 0 1 C 34
0 0 3 C 22
1 0 NULL C 56
0 0 3 D 30
1 0 NULL D 30
1 1 NULL NULL 205
0 1 1 NULL 54
0 1 2 NULL 79
0 1 3 NULL 72
(15 row(s) affected)
Now you don't need to rely on the 现在,您不再需要依赖NULLs
anymore to figure out the association between result rows and grouping sets. NULL
来确定结果行和分组集之间的关联。For example, all rows in which 例如,grpemp
is 0
and grpcust
is 0
are associated with the grouping set (empid, custid)
. grpemp
为0
且grpcust
为0
的所有行都与分组集(empid, custid)
关联。All rows in which grpemp
is 0
and grpcust
is 1
are associated with the grouping set (empid)
, and so on.grpemp
为0
且grpcust
为1
的所有行都与分组集(empid)
相关联,依此类推。
T-SQL supports another function, called T-SQL支持另一个名为GROUPING_ID
, that can further simplify the process of associating result rows and grouping sets. GROUPING_ID
的函数,它可以进一步简化关联结果行和分组集的过程。You provide the function with all elements that participate in any grouping set as inputs—for example, 您为函数提供了参与任何分组集的所有元素作为输入,例如GROUPING_ID(a, b, c, d)
—and the function returns an integer bitmap in which each bit represents a different input element—the rightmost element represented by the rightmost bit. GROUPING_ID(a, b, c, d)
,函数返回一个整数位图,其中每个位表示不同的输入元素最右边的元素由最右边的位表示。For example, the grouping set 例如,分组集(a, b, c, d)
is represented by the integer 0 (0×8 + 0×4 + 0×2 + 0×1). (a, b, c, d)
由整数0(0×8 + 0×4 + 0×2 + 0×1)表示。The grouping set 分组集(a, c)
is represented by the integer 5 (0×8 + 1×4 + 0×2 + 1×1), and so on.(a, c)
由整数5(0×8 + 1×4 + 0×2 + 1×1)表示,依此类推。
Instead of calling the 您可以一次性调用GROUPING
function for each grouping element as in the previous query, you can call the GROUPING_ID
function once and provide it with all grouping elements as input, as shown here:GROUPING_ID
函数并将所有分组元素作为输入提供给它,而不是像前面的查询那样为每个分组元素调用GROUPING
函数,如下所示:
SELECT
GROUPING_ID(empid, custid) AS groupingset,
empid, custid, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY CUBE(empid, custid);
This query produces the following output:此查询生成以下输出:
groupingset empid custid sumqty
-------------- ----------- --------- -----------
0 2 A 52
0 3 A 20
2 NULL A 72
0 1 B 20
0 2 B 27
2 NULL B 47
0 1 C 34
0 3 C 22
2 NULL C 56
0 3 D 30
2 NULL D 30
3 NULL NULL 205
1 1 NULL 54
1 2 NULL 79
1 3 NULL 72
Now you can easily figure out which grouping set each row is associated with. 现在,您可以很容易地找出每行与哪个分组集相关联。The integer 整数0
(binary 00) represents the grouping set (empid, custid)
; the integer 1
(binary 01) represents (empid)
; the integer 2
(binary 10) represents (custid)
; and the integer 3
(binary 11) represents ()
.0
(二进制00)表示分组集(empid, custid)
;整数1
(二进制01)表示(empid)
;整数2
(二进制10)表示(custid)
;整数3
(二进制11)表示()
。
This chapter covered window functions, pivoting and unpivoting data, and features related to grouping sets.本章介绍了窗口函数、数据透视和取消透视以及与分组集相关的功能。
You use window functions to perform data-analysis calculations in a more flexible and efficient manner than you can when using alternative methods. 与使用其他方法相比,您可以使用窗口函数以更灵活、更高效的方式执行数据分析计算。Window functions have numerous practical uses, so it's worth your time to get to know them well.窗口函数有许多实际用途,因此值得花时间深入了解它们。
I covered two techniques to handle pivoting: one using a standard grouped query and another using the more concise yet proprietary 我介绍了两种处理数据透视的技术:一种是使用标准分组查询,另一种是使用更简洁但专有的数据PIVOT
operator. PIVOT
运算符。I also covered two methods to handle unpivoting: one using the 我还介绍了两种处理取消激活的方法:一种是使用APPLY
operator, which allows you to control whether to remove rows with NULLs
, and another using the UNPIVOT
operator, which is more concise but removes rows with NULLs
as a mandatory step.APPLY
运算符,它允许您控制是否删除带null
的行;另一种是使用UNPIVOT
运算符,它更简洁,但作为一个强制步骤删除带null
的行。
T-SQL supports features that make the handling of grouping sets flexible and efficient: the T-SQL支持使分组集的处理灵活高效的特性:GROUPING SETS
, CUBE
, and ROLLUP
subclauses and the GROUPING
and GROUPING_ID
functions.GROUPING SETS
、CUBE
和ROLLUP
亚子句以及GROUPING
和GROUPING_ID
函数。
This section provides exercises to help you familiarize yourself with the subjects discussed in Chapter 7. 本节提供练习,帮助您熟悉第7章中讨论的主题。All exercises for this chapter involve querying the 本章的所有练习都涉及查询dbo.Orders
table in the TSQLV4
database that you created and populated earlier in this chapter by running the code in Listing 7-1.TSQLV4
数据库中的dbo.Orders
表,您在本章前面通过运行Listing 7-1中的代码创建并填充了该数据库。
Write a query against the 针对dbo.Orders
table that computes both a rank and a dense rank for each customer order, partitioned by custid
and ordered by qty
:dbo.Orders
表编写一个查询,该表计算每个客户订单的排名和密集排名,按custid
划分,按qty
排序:
Table involved: 涉及的表:TSQLV4
database, dbo.Orders
tableTSQLV4
数据库、dbo.Orders
表
Desired output:所需输出:
custid orderid qty rnk drnk
------ ----------- ----------- -------------------- --------------------
A 30001 10 1 1
A 40005 10 1 1
A 10001 12 3 2
A 40001 40 4 3
B 20001 12 1 1
B 30003 15 2 2
B 10005 20 3 3
C 10006 14 1 1
C 20002 20 2 2
C 30004 22 3 3
D 30007 30 1 1
Earlier in the chapter in the section “Ranking window functions,” I provided the following query against the 在“排名窗口函数”一节的前面一章中,我针对Sales.OrderValues
view to return distinct values and their associated row numbers:Sales.OrderValues
视图提供了以下查询,以返回不同的值及其关联的行号:
SELECT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM Sales.OrderValues
GROUP BY val;
Can you think of an alternative way to achieve the same task?你能想出另一种方法来完成同样的任务吗?
Table involved: 涉及的表:TSQLV4
database, Sales.OrderValues
viewTSQLV4
数据库、Sales.OrderValues
视图
Desired output:所需输出:
val rownum
--------- -------
12.50 1
18.40 2
23.80 3
28.00 4
30.00 5
33.75 6
36.00 7
40.00 8
45.00 9
48.00 10
...
12615.05 793
15810.00 794
16387.50 795
(795 row(s) affected)
Write a query against the 针对dbo.Orders
table that computes for each customer order both the difference between the current order quantity and the customer's previous order quantity and
the difference between the current order quantity and the customer's next order quantity:dbo.Orders
表编写一个查询,该表计算每个客户订单的当前订单数量与客户以前订单数量之间的差异,以及当前订单数量与客户下一订单数量之间的差异:
Table involved: 涉及的表:TSQLV4
database, dbo.Orders
tableTSQLV4
数据库、dbo.Orders
表
Desired output:所需输出:
custid orderid qty diffprev diffnext
------ ----------- ----------- ----------- -----------
A 30001 10 NULL -2
A 10001 12 2 -28
A 40001 40 28 30
A 40005 10 -30 NULL
B 10005 20 NULL 8
B 20001 12 -8 -3
B 30003 15 3 NULL
C 30004 22 NULL 8
C 10006 14 -8 -6
C 20002 20 6 NULL
D 30007 30 NULL NULL
Write a query against the 针对dbo.Orders
table that returns a row for each employee, a column for each order year, and the count of orders for each employee and order year:dbo.Orders
表编写查询,该表为每个员工返回一行,为每个订单年返回一列,并为每个员工和订单年返回订单计数:
Table involved: 涉及的表:TSQLV4
database, dbo.Orders
tableTSQLV4
数据库、dbo.Orders
表
Desired output:所需输出:
empid cnt2014 cnt2015 cnt2016
----------- ----------- ----------- -----------
1 1 1 1
2 1 2 1
3 2 0 2
Run the following code to create and populate the 运行以下代码创建并填充EmpYearOrders
table:EmpYearOrders
表:
USE TSQLV4;
DROP TABLE IF EXISTS dbo.EmpYearOrders;
CREATE TABLE dbo.EmpYearOrders
(
empid INT NOT NULL
CONSTRAINT PK_EmpYearOrders PRIMARY KEY,
cnt2014 INT NULL,
cnt2015 INT NULL,
cnt2016 INT NULL
);
INSERT INTO dbo.EmpYearOrders(empid, cnt2014, cnt2015, cnt2016)
SELECT empid, [2014] AS cnt2014, [2015] AS cnt2015, [2016] AS cnt2016
FROM (SELECT empid, YEAR(orderdate) AS orderyear
FROM dbo.Orders) AS D
PIVOT(COUNT(orderyear)
FOR orderyear IN([2014], [2015], [2016])) AS P;
SELECT * FROM dbo.EmpYearOrders;
Here's the output for the query:以下是查询的输出:
empid cnt2014 cnt2015 cnt2016
----------- ----------- ----------- -----------
1 1 1 1
2 1 2 1
3 2 0 2
Write a query against the 针对EmpYearOrders
table that unpivots the data, returning a row for each employee and order year with the number of orders. EmpYearOrders
表编写一个查询,该表取消对数据的透视,为每个员工和订单年返回一行订单数。Exclude rows in which the number of orders is 0 (in this example, employee 3 in the year 2015).排除订单数为0的行(在本例中为2015年的员工3)。
Desired output:所需输出:
empid orderyear numorders
----------- ----------- -----------
1 2014 1
1 2015 1
1 2016 1
2 2014 1
2 2015 2
2 2016 1
3 2014 2
3 2016 2
Write a query against the 针对dbo.Orders
table that returns the total quantities for each of the following: (employee, customer, and order year), (employee and order year), and (customer and order year). dbo.Orders
表编写一个查询,该表返回以下各项的总数量:(employee、customer和order year)、(employee和order year)和(customer和order year)。Include a result column in the output that uniquely identifies the grouping set with which the current row is associated:在输出中包括一个结果列,该列唯一标识与当前行关联的分组集:
Table involved: 涉及的表:TSQLV4
database, dbo.Orders
tableTSQLV4
数据库、dbo.Orders
表
Desired output:所需输出:
groupingset empid custid orderyear sumqty
-------------- ----------- --------- ----------- -----------
0 2 A 2014 12
0 3 A 2014 10
4 NULL A 2014 22
0 2 A 2015 40
4 NULL A 2015 40
0 3 A 2016 10
4 NULL A 2016 10
0 1 B 2014 20
4 NULL B 2014 20
0 2 B 2015 12
4 NULL B 2015 12
0 2 B 2016 15
4 NULL B 2016 15
0 3 C 2014 22
4 NULL C 2014 22
0 1 C 2015 14
4 NULL C 2015 14
0 1 C 2016 20
4 NULL C 2016 20
0 3 D 2016 30
4 NULL D 2016 30
2 1 NULL 2014 20
2 2 NULL 2014 12
2 3 NULL 2014 32
2 1 NULL 2015 14
2 2 NULL 2015 52
2 1 NULL 2016 20
2 2 NULL 2016 15
2 3 NULL 2016 40
(29 row(s) affected)
When you're done with the exercises in this chapter, run the following code for cleanup:完成本章中的练习后,请运行以下代码进行清理:
DROP TABLE IF EXISTS dbo.Orders;
This exercise is very technical. Figuring it out is just a matter of being familiar with the syntax for window-ranking functions. 这个练习非常技术性。弄清楚这一点只需熟悉窗口排名函数的语法即可。Here's the solution query, returning for each order both the rank and the dense rank, partitioned by 以下是解决方案查询,为每个订单返回排名和密集排名,按custid
and ordered by qty
:custid
划分,按qty
排序:
SELECT custid, orderid, qty,
RANK() OVER(PARTITION BY custid ORDER BY qty) AS rnk,
DENSE_RANK() OVER(PARTITION BY custid ORDER BY qty) AS drnk
FROM dbo.Orders;
Another way to handle this task is to write a query that returns distinct values without a row number computation, define a table expression based on this query, and then compute row numbers in the outer query against the table expression. 处理此任务的另一种方法是编写一个返回不同值的查询,而无需计算行号,基于此查询定义一个表表达式,然后根据表表达式计算外部查询中的行号。Here's the solution query:以下是解决方案查询:
WITH C AS
(
SELECT DISTINCT val
FROM Sales.OrderValues
)
SELECT val, ROW_NUMBER() OVER(ORDER BY val) AS rownum
FROM C;
You use the window offset functions 根据指定的分区和排序规范,可以使用窗口偏移函数LAG
and LEAD
to return an element from the previous row and the next row, respectively, based on the indicated partitioning and ordering specification. LAG
和LEAD
分别返回前一行和下一行的元素。In this exercise, you need to perform the calculations within each customer's orders; hence, the window partitioning should be based on 在本练习中,您需要在每个客户的订单中执行计算;因此,窗口分区应该基于custid
. custid
。As for ordering, use 对于订购,使用orderdate
as the first ordering column and orderid
as the tiebreaker. orderdate
作为第一个订购列,orderid
作为分阶段符。Here's the complete solution query:以下是完整的解决方案查询:
SELECT custid, orderid, qty,
qty - LAG(qty) OVER(PARTITION BY custid
ORDER BY orderdate, orderid) AS diffprev,
qty - LEAD(qty) OVER(PARTITION BY custid
ORDER BY orderdate, orderid) AS diffnext
FROM dbo.Orders;
This query is a good example that shows you can mix detail elements from the row with window functions in the same expression.此查询是一个很好的示例,它显示可以在同一表达式中将行中的细节元素与窗口函数混合使用。
Solving a pivoting problem is all about identifying the elements involved: the grouping element, spreading element, aggregation element, and aggregate function. 解决数据透视问题就是要识别所涉及的元素:分组元素、扩展元素、聚合元素和聚合函数。After you identify those, you simply fit them into the “template” query for pivoting—whether it's the solution with the grouped query or the solution using the 在识别出它们之后,只需将它们放入“模板”查询中,以进行数据透视,无论是使用分组查询的解决方案还是使用PIVOT
operator.PIVOT
运算符的解决方案。
In this exercise, the grouping element is the employee (在本练习中,分组元素是雇员(empid
), the spreading element is order year (YEAR(orderdate)
), and the aggregate function is COUNT
; however, identifying the aggregation element is not that straightforward. empid
),扩展元素是order year(YEAR(orderdate)
),聚合函数是COUNT
;然而,识别聚合元素并不是那么简单。You want the 您希望COUNT
aggregate function to count matching rows—you don't really care which attribute it counts. COUNT
聚合函数对匹配行进行计数,而不关心它对哪个属性进行计数。In other words, you can use any attribute you want, as long as the attribute does not allow 换句话说,您可以使用任何想要的属性,只要该属性不允许NULLs
, because aggregate functions ignore NULLs
.Null
,因为聚合函数会忽略Null
。
If it doesn't really matter which attribute you use as the input to the 如果使用哪个属性作为COUNT
aggregate, why not use the same attribute you already use as the spreading element? COUNT
聚合的输入并不重要,为什么不使用已经用作扩展元素的相同属性呢?In this case, you can use the order year as both the spreading element and aggregation element.在这种情况下,您可以使用订单年作为扩展元素和聚合元素。
Now that you've identified all pivoting elements, you're ready to write the complete solution. 现在,您已经识别了所有枢轴元素,可以编写完整的解决方案了。Here's the solution query without using the 以下是不使用PIVOT
operator:PIVOT
运算符的解决方案查询:
USE TSQLV4;
SELECT empid,
COUNT(CASE WHEN orderyear = 2014 THEN orderyear END) AS cnt2014,
COUNT(CASE WHEN orderyear = 2015 THEN orderyear END) AS cnt2015,
COUNT(CASE WHEN orderyear = 2016 THEN orderyear END) AS cnt2016
FROM (SELECT empid, YEAR(orderdate) AS orderyear
FROM dbo.Orders) AS D
GROUP BY empid;
Recall that if you do not specify an 回想一下,如果没有在ELSE
clause in a CASE
expression, an implicit ELSE NULL
is assumed. CASE
表达式中指定ELSE
子句,则假定为隐式ELSE NULL
。Thus, the 因此,CASE
expression produces non-NULLs
only for matching orders (orders placed by the current employee in the current order year), and only those matching orders are taken into consideration by the COUNT
aggregate.CASE
表达式仅为匹配的订单(当前员工在当前订单年下的订单)生成非Null
,并且计数聚合仅考虑那些匹配的订单。
Notice that even though this solution does not require you to use a table expression, I used one here to alias the 请注意,尽管此解决方案不需要使用表表达式,但我在这里使用了一个表表达式来将YEAR(orderdate)
expression as orderyear
to avoid repeating the expression.YEAR(orderdate)
表达式别名为orderyear
,以避免重复该表达式。
Here's the solution query that uses the 以下是使用PIVOT
operator:PIVOT
运算符的解决方案查询:
SELECT empid, [2014] AS cnt2014, [2015] AS cnt2015, [2016] AS cnt2016
FROM (SELECT empid, YEAR(orderdate) AS orderyear
FROM dbo.Orders) AS D
PIVOT(COUNT(orderyear)
FOR orderyear IN([2014], [2015], [2016])) AS P;
As you can see, it's just a matter of fitting the pivoting elements in the right places.正如您所看到的,这只是将枢轴元素安装到正确位置的问题。
If you prefer to use your own target column names and not the ones based on the actual data, you can provide your own aliases in the 如果希望使用自己的目标列名而不是基于实际数据的列名,可以在SELECT
list. SELECT
列表中提供自己的别名。In this query, I aliased the result columns 在这个查询中,我将结果列[2014]
, [2015]
, and [2016]
as cnt2014
, cnt2015
, and cnt2016
, respectively.[2014]
、[2015]
和[2016]
分别取别名为cnt2014
、cnt2015
和cnt2016
。
This exercise involves a request to unpivot the source columns 此练习涉及请求将源列cnt2014
, cnt2015
, and cnt2016
to two target columns—orderyear
to hold the year that the source column name represents and numorders
to hold the source-column value. cnt2014
、cnt2015
和cnt2016
解压到两个目标列orderyear
以保存源列名表示的年份,并解压以保存源列值。You can use the solutions I showed in the chapter as the basis for solving this exercise with a couple of small revisions.您可以使用我在本章中展示的解决方案作为解决此练习的基础,并进行一些小的修改。
In the examples I used in the chapter, 在我在本章中使用的示例中,表中的NULLs
in the table represented irrelevant column values. null
表示不相关的列值。The unpivoting solutions I presented filtered out rows with 我提供的取消激活解决方案筛选掉了带有NULLs
. null
的行。The EmpYearOrders
table has no NULLs
, but it does have zeros in some cases, and the request is to filter out rows with 0 number of orders. EmpYearOrders
表没有NULL
值,但在某些情况下它确实有零,请求是筛选出订单数为0的行。With the solution that is based on the 对于基于APPLY
operator, simply use the predicate numorders
<> 0 instead of using IS NOT NULL
, like this:APPLY
运算符的解决方案,只需使用谓词numorders<>0
,而不是使用IS NOT NULL
,如下所示:
SELECT empid, orderyear, numorders
FROM dbo.EmpYearOrders
CROSS APPLY (VALUES(2014, cnt2014),
(2015, cnt2015),
(2016, cnt2016)) AS A(orderyear, numorders)
WHERE numorders <> 0;
As for the solution that uses the 至于使用UNPIVOT
operator, remember that it eliminates NULLs
as an integral part of its logic. UNPIVOT
运算符的解决方案,请记住,它消除了NULL
,将其作为其逻辑的一个组成部分。However, it does not eliminate zeros—you have to take care of eliminating zeros yourself by adding a 但是,它不能消除零您必须通过添加WHERE
clause, like this:WHERE
子句来自行消除零,如下所示:
SELECT empid, CAST(RIGHT(orderyear, 4) AS INT) AS orderyear, numorders
FROM dbo.EmpYearOrders
UNPIVOT(numorders FOR orderyear IN(cnt2014, cnt2015, cnt2016)) AS U
WHERE numorders <> 0;
Notice the expression used in the 请注意SELECT
list to produce the orderyear
result column: CAST(RIGHT(orderyear, 4) AS INT)
. SELECT
列表中用于生成orderyear
结果列的表达式:CAST(RIGHT(orderyear, 4) AS INT)
。The original column names that the query unpivots are 查询cnt2014
, cnt2015
, and cnt2016
. unpivot
的原始列名是cnt2014
、cnt2015
和cnt2016
。These column names become the values 这些列名分别成为'cnt2014'
, 'cnt2015'
, and 'cnt2016'
, respectively, in the orderyear
column in the result of the UNPIVOT
operator. UNPIVOT
运算符结果中orderyear
列中的值'cnt2014'
、'cnt2015'
和'cnt2016'
。The purpose of this expression is to extract the four rightmost characters representing the order year and convert the value to an integer. 此表达式的目的是提取表示订单年份的四个最右边的字符,并将该值转换为整数。This manipulation was not required in the standard solution because the constants used to construct the table expression 标准溶液中不需要此操作,因为用于构造表表达式A
were specified explicitly.A
的常量是显式指定的。
You can use the 您可以使用GROUPING SETS
subclause to list the requested grouping sets and the GROUPING_ID
function to produce a unique identifier for the grouping sets. GROUPING SETS
子类列出请求的分组集,并使用GROUPING_ID
函数为分组集生成唯一标识符。Here's the complete solution query:以下是完整的解决方案查询:
SELECT
GROUPING_ID(empid, custid, YEAR(Orderdate)) AS groupingset,
empid, custid, YEAR(Orderdate) AS orderyear, SUM(qty) AS sumqty
FROM dbo.Orders
GROUP BY
GROUPING SETS
(
(empid, custid, YEAR(orderdate)),
(empid, YEAR(orderdate)),
(custid, YEAR(orderdate))
);
The requested grouping sets are neither a power set nor a rollup of some set of attributes. 请求的分组集既不是幂集,也不是某些属性集的汇总。Therefore, you cannot use either the 因此,您不能使用CUBE
or ROLLUP
subclause to further abbreviate the code.CUBE
或ROLLUP
子类来进一步缩写代码。