The 查询的FROM
clause of a query is the first clause to be logically processed, and within the FROM
clause, table operators operate on input tables. FROM
子句是第一个逻辑处理的子句,在FROM
子句中,表运算符对输入表进行操作。T-SQL supports four table operators: T-SQL支持四个表运算符:JOIN
, APPLY
, PIVOT
, and UNPIVOT
. JOIN
、APPLY
、PIVOT
和UNPIVOT
。The JOIN
table operator is standard, whereas APPLY
, PIVOT
, and UNPIVOT
are T-SQL extensions to the standard. JOIN
表运算符是标准的,而APPLY
、PIVOT
和UNPIVOT
是标准的T-SQL扩展。Each table operator acts on tables provided to it as input, applies a set of logical query processing phases, and returns a table result. 每个表运算符对作为输入提供给它的表进行操作,应用一组逻辑查询处理阶段,并返回一个表结果。This chapter focuses on the 本章主要介绍JOIN
table operator. JOIN
表运算符。The APPLY
operator will be covered in Chapter 5, “Table expressions,” and the PIVOT
and UNPIVOT
operators will be covered in Chapter 7, “Beyond the fundamentals of querying.”APPLY
运算符将在第5章“表表达式”中介绍,PIVOT
和UNPIVOT
运算符将在第7章“超越查询基础”中介绍
A JOIN
table operator operates on two input tables. JOIN
表运算符对两个输入表进行操作。The three fundamental types of joins are cross joins, inner joins, and outer joins. 连接的三种基本类型是交叉连接、内部连接和外部连接。These three types of joins differ in how they apply their logical query processing phases; each type applies a different set of phases. 这三种类型的联接在应用逻辑查询处理阶段的方式上有所不同;每种类型应用一组不同的阶段。A cross join applies only one phase—Cartesian Product. 交叉连接仅应用一个阶段:笛卡尔积。An inner join applies two phases—Cartesian Product and Filter. 内部联接应用两个阶段:笛卡尔积和筛选器。An outer join applies three phases—Cartesian Product, Filter, and Add Outer Rows. 外部联接应用三个阶段:笛卡尔积、筛选器和添加外部行。This chapter explains in detail each of the join types and the phases involved.本章详细解释了每种连接类型和涉及的阶段。
“逻辑查询处理”描述了一系列通用的逻辑步骤,对于任何指定的查询,这些步骤都会生成正确的结果,而物理查询处理实际上是RDBMS引擎处理查询的方式。Logical query processing
describes a generic series of logical steps that for any specified query produces the correct result, whereas physical query processing
is the way the query is processed by the RDBMS engine in practice. Some phases of logical query processing of joins might sound inefficient, but remember that the query will be optimized by the physical implementation. 连接的逻辑查询处理的某些阶段可能听起来效率低下,但请记住,查询将通过物理实现进行优化。It's important to stress the term 在逻辑查询处理中,强调“逻辑”一词很重要。logical
in logical query processing
. The steps in the process apply operations to the input tables based on relational algebra. 该过程中的步骤基于关系代数将操作应用于输入表。The database engine does not have to follow logical query processing phases literally, as long as it can guarantee that the result that it produces is the same as that dictated by logical query processing. 数据库引擎不必严格遵循逻辑查询处理阶段,只要它能保证生成的结果与逻辑查询处理所指示的结果相同。The query optimizer often applies shortcuts when it knows it can still produce the correct result. 当查询优化器知道它仍然可以生成正确的结果时,它通常会应用快捷方式。Even though this book's focus is on understanding the logical aspects of querying, I want to stress this point to avoid performance-related concerns.尽管本书的重点是理解查询的逻辑方面,但我想强调这一点,以避免与性能相关的担忧。
The cross join is the simplest type of join. 交叉连接是最简单的连接类型。It implements only one logical query processing phase—a Cartesian Product. 它只实现了一个逻辑查询处理阶段——笛卡尔积。This phase operates on the two tables provided as inputs and produces a Cartesian product of the two. 该阶段对作为输入提供的两个表进行操作,并生成这两个表的笛卡尔乘积。That is, each row from one input is matched with all rows from the other. 也就是说,一个输入的每一行都与另一个输入的所有行匹配。So if you have 因此,如果一个表中有m
rows in one table and n
rows in the other, you get m×n
rows in the result.m
行,另一个表中有n
行,则结果中有mn
行。
T-SQL supports two standard syntaxes for cross joins: the ISO/ANSI SQL-92 and ISO/ANSI SQL-89 syntaxes. T-SQL支持交叉联接的两种标准语法:ISO/ANSI SQL-92和ISO/ANSI SQL-89语法。I recommend you use the SQL 92 syntax for reasons I'll describe shortly. 我建议您使用SQL 92语法,原因我将简要介绍。It's also the main syntax I use throughout the book. 这也是我在整本书中使用的主要语法。For the sake of completeness, I describe both syntaxes in this section.为了完整起见,我在本节中描述了这两种语法。
The following query applies a cross join between the 以下查询在TSQLV4数据库中的Customers
and Employees
tables (using the SQL-92 syntax) in the TSQLV4
database, and returns the custid
and empid
attributes in the result set:Customers
和Employees
表(使用SQL-92语法)之间应用交叉联接,并在结果集中返回custid
和empid
属性:
USE TSQLV4;
SELECT C.custid, E.empid
FROM Sales.Customers AS C
CROSS JOIN HR.Employees AS E;
Because there are 91 rows in the 因为Customers
table and 9 rows in the Employees
table, this query produces a result set with 819 rows, as shown here in abbreviated form:Customers
表中有91行,Employees
表中有9行,所以此查询生成一个包含819行的结果集,如以下缩略形式所示:
custid empid
----------- -----------
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 2
11 2
...
(819 row(s) affected)
When you use the SQL-92 syntax, you specify the CROSS JOIN keywords between the two tables involved in the join.使用SQL-92语法时,需要在连接中涉及的两个表之间指定CROSS JOIN
关键字。
Notice that in the 注意,在前面查询的FROM
clause of the preceding query, I assigned the aliases C
and E
to the Customers
and Employees
tables, respectively. FROM
子句中,我分别将别名C
和E
分配给Customers
和Employees
表。The result set produced by the cross join is a virtual table with attributes that originate from both sides of the join. 交叉联接产生的结果集是一个虚拟表,其属性源自联接的两侧。Because I assigned aliases to the source tables, the names of the columns in the virtual table are prefixed by the table aliases (for example, 因为我为源表分配了别名,所以虚拟表中的列的名称以表别名作为前缀(例如,C.custid
, E.empid
). C.custid
、E.empid
)。The column prefixes do not appear in the final query result. 列前缀不会出现在最终查询结果中。If you do not assign aliases to the tables in the 如果未在FROM
clause, the names of the columns in the virtual table are prefixed by the full source-table names (for example, Customers.custid
, Employees.empid
). FROM
子句中为表指定别名,则虚拟表中列的名称将以完整的源表名称作为前缀(例如Customer.CustId
、Employees.empid
)。The purpose of the prefixes is to facilitate the identification of columns in an unambiguous manner when the same column name appears in both tables. 前缀的目的是,当两个表中出现相同的列名时,便于以明确的方式识别列。The aliases of the tables are assigned for brevity. 为简洁起见,表的别名被指定。Note that you are required to use column prefixes only when referring to ambiguous column names (column names that appear in more than one table); in unambiguous cases, column prefixes are optional. 请注意,只有在引用不明确的列名(出现在多个表中的列名)时,才需要使用列前缀;在明确的情况下,列前缀是可选的。However, it's a good practice to always use column prefixes for the sake of clarity. 然而,为了清晰起见,最好总是使用列前缀。Also note that if you assign an alias to a table, it's invalid to use the full table name as a column prefix; in ambiguous cases, you have to use the table alias as a prefix.还要注意,如果为表指定别名,则使用完整表名作为列前缀是无效的;在不明确的情况下,必须使用表别名作为前缀。
T-SQL also supports an older syntax for cross joins that was introduced in SQL-89. T-SQL还支持SQL-89中引入的用于交叉联接的旧语法。In this syntax, you simply specify a comma between the table names, like this:在这种语法中,只需在表名之间指定一个逗号,如下所示:
SELECT C.custid, E.empid
FROM Sales.Customers AS C, HR.Employees AS E;
There is no logical or performance difference between the two syntaxes. 这两种语法在逻辑或性能上没有区别。Both are integral parts of the SQL standard, and both are fully supported by T-SQL. 两者都是SQL标准不可分割的一部分,都完全受T-SQL支持。I'm not aware of any plans to deprecate the older syntax in the SQL standard or in T-SQL. 我不知道有任何计划反对SQL标准或T-SQL中的旧语法。However, I recommend using the SQL-92 syntax for reasons that will become clear after I explain inner and outer joins.然而,我建议使用SQL-92语法,原因在我解释内部联接和外部联接之后会变得很清楚。
You can join multiple instances of the same table. 可以联接同一个表的多个实例。This capability is known as a 这种功能称为“自联接”,所有基本联接类型(交叉联接、内部联接和外部联接)都支持这种功能。self join
and is supported with all fundamental join types (cross joins, inner joins, and outer joins). For example, the following query performs a self cross join between two instances of the 例如,以下查询在Employees
table:Employees
表的两个实例之间执行自交叉联接:
SELECT
E1.empid, E1.firstname, E1.lastname,
E2.empid, E2.firstname, E2.lastname
FROM HR.Employees AS E1
CROSS JOIN HR.Employees AS E2;
This query produces all possible combinations of pairs of employees. 此查询生成所有可能的成对员工组合。Because the 因为Employees
table has 9 rows, this query returns 81 rows, as shown here in abbreviated form:Employees
表有9行,所以该查询返回81行,如下所示:
empid firstname lastname empid firstname lastname
----------- ---------- -------------------- ----------- ---------- --------------------
1 Sara Davis 1 Sara Davis
2 Don Funk 1 Sara Davis
3 Judy Lew 1 Sara Davis
4 Yael Peled 1 Sara Davis
5 Sven Mortensen 1 Sara Davis
6 Paul Suurs 1 Sara Davis
7 Russell King 1 Sara Davis
8 Maria Cameron 1 Sara Davis
9 Patricia Doyle 1 Sara Davis
1 Sara Davis 2 Don Funk
2 Don Funk 2 Don Funk
3 Judy Lew 2 Don Funk
4 Yael Peled 2 Don Funk
5 Sven Mortensen 2 Don Funk
6 Paul Suurs 2 Don Funk
7 Russell King 2 Don Funk
8 Maria Cameron 2 Don Funk
9 Patricia Doyle 2 Don Funk
...
(81 row(s) affected)
In a self join, aliasing tables is not optional. Without table aliases, all column names in the result of the join would be ambiguous.在自联接中,别名表不是可选的。如果没有表别名,联接结果中的所有列名都将是不明确的。
One situation in which cross joins can be handy is when they are used to produce a result set with a sequence of integers (1, 2, 3, and so on). 交叉连接非常方便的一种情况是,使用交叉连接生成包含整数序列(1、2、3等等)的结果集。Such a sequence of numbers is an extremely powerful tool that I use for many purposes. 这样的数字序列是一个非常强大的工具,我可以用于很多目的。By using cross joins, you can produce the sequence of integers in a very efficient manner.通过使用交叉连接,可以非常高效地生成整数序列。
You can start by creating a table called 首先,您可以创建一个名为Digits
with a column called digit
, and populate the table with 10 rows with the digits 0 through 9. Digits
的表和一个名为digit
的列,然后用数字0到9的10行填充该表。Run the following code to create the 运行以下代码在TSQLV4数据库中创建Digits
table in the TSQLV4
database and populate it with the 10 digits:Digits
表,并用10位数字填充它:
USE TSQLV4;
DROP TABLE IF EXISTS dbo.Digits;
CREATE TABLE dbo.Digits(digit INT NOT NULL PRIMARY KEY);
INSERT INTO dbo.Digits(digit)
VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
SELECT digit FROM dbo.Digits;
This code also uses an 此代码还使用INSERT
statement to populate the Digits
table. INSERT
语句填充Digits
表。If you're not familiar with the syntax of the 如果您不熟悉INSERT
statement, see Chapter 8, “Data modification,” for details.INSERT
语句的语法,请参阅第8章“数据修改”,了解详细信息。
This code generates the following output:此代码生成以下输出:
digit
-----------
0
1
2
3
4
5
6
7
8
9
Suppose you need to write a query that produces a sequence of integers in the range 1 through 1,000. 假设您需要编写一个查询,以生成1到1000范围内的整数序列。You apply cross joins between three instances of the 在Digits
table, each representing a different power of 10 (1, 10, 100). Digits
表的三个实例之间应用交叉联接,每个实例表示不同的10次方(1、10、100)。By multiplying three instances of the same table, each instance with 10 rows, you get a result set with 1,000 rows. 将同一个表的三个实例相乘,每个实例有10行,得到一个有1000行的结果集。To produce the actual number, multiply the digit from each instance by the power of 10 it represents, sum the results, and add 1. 要产生实际数字,将每个实例的数字乘以它所代表的10的幂,将结果相加,再加1。Here's the complete query:以下是完整的查询:
SELECT D3.digit * 100 + D2.digit * 10 + D1.digit + 1 AS n
FROM dbo.Digits AS D1
CROSS JOIN dbo.Digits AS D2
CROSS JOIN dbo.Digits AS D3
ORDER BY n;
This query returns the following output, shown here in abbreviated form:此查询返回以下输出,以缩写形式显示:
n
-----------
1
2
3
4
5
6
7
8
9
10
...
998
999
1000
(1000 row(s) affected)
This was an example that produces a sequence of 1,000 integers. 这是一个生成1000个整数序列的示例。If you need more numbers, you can add more instances of the 如果需要更多数字,可以向查询中添加更多Digits
table to the query. Digits
表实例。For example, if you need to produce a sequence of 1,000,000 rows, you need to join six instances.例如,如果需要生成1000000行的序列,则需要连接六个实例。
An inner join applies two logical query processing phases—it applies a Cartesian product between the two input tables like in a cross join, and then it filters rows based on a predicate you specify. 内部联接应用两个逻辑查询处理阶段它在两个输入表之间应用笛卡尔积,就像在交叉联接中一样,然后根据指定的谓词筛选行。Like cross joins, inner joins have two standard syntaxes: SQL-92 and SQL-89.与交叉联接一样,内部联接有两个标准语法:SQL-92和SQL-89。
Using the SQL-92 syntax, you specify the INNER JOIN keywords between the table names. 使用SQL-92语法,可以指定表名之间的内部联接关键字。The INNER keyword is optional, because an inner join is the default. INNER
关键字是可选的,因为默认情况下是内部联接。So you can specify the JOIN keyword alone. 因此,您可以单独指定JOIN
关键字。You specify the predicate that is used to filter rows in a designated clause called 指定用于ON
. ON
调用的指定子句中筛选行的谓词。This predicate is also known as the 这个谓词也称为“连接条件”。join condition
.
For example, the following query performs an inner join between the 例如,以下查询在TSQLV4数据库中的Employees
and Orders
tables in the TSQLV4
database, matching employees and orders based on the predicate E.empid
= O.empid
:Employees
和Orders
表之间执行内部联接,根据谓词E.empid=O.empid
匹配Employees
和Orders
:
USE TSQLV4;
SELECT E.empid, E.firstname, E.lastname, O.orderid
FROM HR.Employees AS E
INNER JOIN Sales.Orders AS O
ON E.empid = O.empid;
This query produces the following result set, shown here in abbreviated form:此查询生成以下结果集,以缩写形式显示:
empid firstname lastname orderid
----------- ---------- -------------------- -----------
1 Sara Davis 10258
1 Sara Davis 10270
1 Sara Davis 10275
1 Sara Davis 10285
1 Sara Davis 10292
...
2 Don Funk 10265
2 Don Funk 10277
2 Don Funk 10280
2 Don Funk 10295
2 Don Funk 10300
...
(830 row(s) affected)
For most people, the easiest way to think of such an inner join is as matching each employee row with all order rows that have the same employee ID as in the employee row. 对于大多数人来说,考虑这种内部联接的最简单方法是将每个员工行与所有订单行匹配,这些订单行与员工行中的员工ID相同。This is a simplified way to think of the join. 这是一种简单的连接方式。The more formal way to think of it is based on relational algebra. 更正式的思考方式是基于关系代数。First, the join performs a Cartesian product between the two tables (9 employee rows × 830 order rows = 7,470 rows). 首先,联接在两个表之间执行笛卡尔乘积(9个员工行830个订单行=7470行)。Then, the join filters rows based on the predicate 然后,联接根据谓词E.empid
= O.empid
, eventually returning 830 rows. E.empid=O.empid
筛选行,最终返回830行。As mentioned earlier, that's just the logical way that the join is processed; in practice, physical processing of the query by the database engine can be different.如前所述,这只是处理连接的逻辑方式;实际上,数据库引擎对查询的物理处理可能会有所不同。
Recall the discussion from previous chapters about the three-valued predicate logic used by SQL. 回想一下前几章中关于SQL使用的三值谓词逻辑的讨论。As with the 与WHERE
and HAVING
clauses, the ON
clause also returns only rows for which the predicate returns TRUE
, and it does not return rows for which the predicate evaluates to FALSE
or UNKNOWN
.WHERE
和HAVING
子句一样,ON
子句也只返回谓词返回TRUE
的行,而不返回谓词计算为FALSE
或UNKNOWN
的行。
In the 在TSQLV4
database, all employees have related orders, so all employees show up in the output. TSQLV4
数据库中,所有员工都有相关的订单,因此所有员工都显示在输出中。However, had there been employees with no related orders, they would have been discarded by the filter phase. 然而,如果有没有相关订单的员工,他们将被筛选阶段丢弃。The same would apply to orders with no related employees, although a foreign-key relationship forbids those in our sample database.这同样适用于没有相关员工的订单,尽管外键关系禁止我们样本数据库中的订单。
Similar to cross joins, inner joins can be expressed by using the SQL-89 syntax. 与交叉联接类似,内部联接可以使用SQL-89语法来表示。You specify a comma between the table names just as in a cross join, and you specify the join condition in the query's 在表名之间指定逗号,就像在交叉联接中一样,并在查询的WHERE
clause, like this:WHERE
子句中指定联接条件,如下所示:
SELECT E.empid, E.firstname, E.lastname, O.orderid
FROM HR.Employees AS E, Sales.Orders AS O
WHERE E.empid = O.empid;
Note that the SQL-89 syntax has no 请注意,SQL-89语法没有ON
clause.ON
子句。
Again, both syntaxes are standard, fully supported by T-SQL, and interpreted in the same way by the database engine, so you shouldn't expect any performance difference between them. 同样,这两种语法都是标准的,完全由T-SQL支持,并由数据库引擎以相同的方式进行解释,因此您不应该期望它们之间有任何性能差异。But one syntax is safer, as explained in the next section.但有一种语法更安全,如下一节所述。
I strongly recommend that you stick to the SQL-92 join syntax because it's less prone to errors. 我强烈建议您坚持使用SQL-92连接语法,因为它不太容易出错。Suppose you intend to write an inner join query, and by mistake you forget to specify the join condition. 假设您打算编写一个内部连接查询,但由于错误,您忘记了指定连接条件。With the SQL-92 syntax, the query becomes invalid, and the parser generates an error. 使用SQL-92语法,查询将无效,解析器将生成一个错误。For example, try to run the following code:例如,尝试运行以下代码:
SELECT E.empid, E.firstname, E.lastname, O.orderid
FROM HR.Employees AS E
INNER JOIN Sales.Orders AS O;
You get the following error:出现以下错误:
Msg 102, Level 15, State 1, Line 74
Incorrect syntax near ';'.
Even though it might not be immediately obvious from the error message that the error involves a missing join condition, you will figure it out eventually and fix the query. 尽管从错误消息中可能无法立即看出该错误涉及缺少连接条件,但最终您会找到答案并修复查询。However, if you forget to specify the join condition when you're using the SQL-89 syntax, you get a valid query that performs a cross join:但是,如果在使用SQL-89语法时忘记指定连接条件,则会得到一个执行交叉连接的有效查询:
SELECT E.empid, E.firstname, E.lastname, O.orderid
FROM HR.Employees AS E, Sales.Orders AS O;
Because the query doesn't fail, the logical error might go unnoticed for a while, and users of your application might end up relying on incorrect results. 由于查询没有失败,逻辑错误可能会在一段时间内被忽略,应用程序的用户可能最终依赖于错误的结果。It's unlikely that a programmer would forget to specify the join condition with such short and simple queries; however, most production queries are much more complicated and have multiple tables, filters, and other query elements. 程序员不太可能忘记用如此简短的查询指定连接条件;然而,大多数生产查询要复杂得多,并且有多个表、筛选器和其他查询元素。In those cases, the likelihood of forgetting to specify a join condition increases.在这些情况下,忘记指定连接条件的可能性会增加。
If I've convinced you that it's important to use the SQL-92 syntax for inner joins, you might wonder whether the recommendation holds for cross joins. 如果我已经说服您,对内部联接使用SQL-92语法很重要,那么您可能会想,对于交叉联接,该建议是否适用。Because no join condition is involved, you might think that both syntaxes are just as good for cross joins. 因为不涉及连接条件,所以您可能会认为这两种语法对于交叉连接同样有效。However, I recommend staying with the SQL-92 syntax with cross joins for a couple of reasons—one being consistency. 然而,我建议使用带有交叉连接的SQL-92语法,原因有两个,一个是一致性。Also, suppose you do use the SQL-89 syntax. 另外,假设您确实使用了SQL-89语法。Even if you intended to write a cross join, when other developers need to review or maintain your code, how will they know whether you intended to write a cross join or intended to write an inner join and forgot to specify the join condition?即使您打算编写交叉联接,当其他开发人员需要查看或维护您的代码时,他们如何知道您是打算编写交叉联接,还是打算编写内部联接,却忘记指定联接条件?
This section covers a few join examples that are known by specific names: composite joins, non-equi joins, and multi-join queries.本节将介绍几个以特定名称命名的联接示例:复合联接、非等联接和多联接查询。
A “复合连接”只是一种连接,需要从每一侧匹配多个属性。composite join
is simply a join where you need to match multiple attributes from each side. You usually need such a join when a primary key–foreign key relationship is based on more than one attribute. 当主键-外键关系基于多个属性时,通常需要这样的连接。For example, suppose you have a foreign key defined on 例如,假设在dbo.Table2
, columns col1
, col2
, referencing dbo.Table1
, columns col1
, col2
, and you need to write a query that joins the two based on this relationship. dbo.Table2
的col1
、col2
列上定义了一个外键,引用了dbo.Table1
的col1
、col2
列,并且需要编写一个基于此关系连接这两个列的查询。The 查询的FROM
clause of the query would look like this:FROM
子句如下所示:
FROM dbo.Table1 AS T1
INNER JOIN dbo.Table2 AS T2
ON T1.col1 = T2.col1
AND T1.col2 = T2.col2
For a more tangible example, suppose you need to audit updates to column values against the 举一个更具体的例子,假设您需要根据OrderDetails
table in the TSQLV4
database. TSQLV4
数据库中的OrderDetails
表审核列值的更新。You create a custom auditing table called 创建名为OrderDetailsAudit
:OrderDetailsAudit
的自定义审核表:
USE TSQLV4;
DROP TABLE IF EXISTS Sales.OrderDetailsAudit;
CREATE TABLE Sales.OrderDetailsAudit
(
lsn INT NOT NULL IDENTITY,
orderid INT NOT NULL,
productid INT NOT NULL,
dt DATETIME NOT NULL,
loginname sysname NOT NULL,
columnname sysname NOT NULL,
oldval SQL_VARIANT,
newval SQL_VARIANT,
CONSTRAINT PK_OrderDetailsAudit PRIMARY KEY(lsn),
CONSTRAINT FK_OrderDetailsAudit_OrderDetails
FOREIGN KEY(orderid, productid)
REFERENCES Sales.OrderDetails(orderid, productid)
);
Each audit row stores a log serial number (每个审核行存储一个日志序列号(lsn
), the key of the modified row (orderid
, productid
), the name of the modified column (columnname
), the old value (oldval
), the new value (newval
), when the change took place (dt
), and who made the change (loginname
). lsn
)、修改行的键(orderid
、productid
)、修改列的名称(columnname
)、旧值(oldval
)、新值(newval
)、更改发生的时间(dt
)以及进行更改的人(loginname
)。The table has a foreign key defined on the attributes 该表在属性orderid
, productid
, referencing the primary key of the OrderDetails
table, which is defined on the attributes orderid
, productid
. orderid
、productid
上定义了一个外键,它引用了OrderDetails
表的主键,该表在属性orderid
、productid
上定义。Assume you already have in place the process that logs changes in the 假设您已经有了一个流程,每当OrderDetailsAudit
table whenever columns are updated in the OrderDetails
table.OrderDetails
表中的列更新时,该流程都会在OrderDetailsAudit
表中记录更改。
You need to write a query against the 您需要针对OrderDetails
and OrderDetailsAudit
tables that returns information about all value changes that took place in the column qty
. OrderDetails
和OrderDetailsAudit
表编写一个查询,返回有关列qty
中发生的所有值更改的信息。In each result row, you need to return the current value from the 在每个结果行中,您需要返回OrderDetails
table and the values before and after the change from the OrderDetailsAudit
table. OrderDetails
表中的当前值以及OrderDetailsAudit
表中更改前后的值。You need to join the two tables based on a primary key–foreign key relationship, like this:您需要基于主键-外键关系连接两个表,如下所示:
SELECT OD.orderid, OD.productid, OD.qty,
ODA.dt, ODA.loginname, ODA.oldval, ODA.newval
FROM Sales.OrderDetails AS OD
INNER JOIN Sales.OrderDetailsAudit AS ODA
ON OD.orderid = ODA.orderid
AND OD.productid = ODA.productid
WHERE ODA.columnname = N'qty';
Because the relationship is based on multiple attributes, the join condition is composite.因为关系基于多个属性,所以连接条件是复合的。
When a join condition involves only an equality operator, the join is said to be an 当连接条件只涉及相等运算符时,该连接称为“相等连接”。equi join
. When a join condition involves any operator besides equality, the join is said to be a 当一个连接条件涉及除相等外的任何运算符时,该连接称为非相等连接。non-equi join
.
Note
Standard SQL supports a concept called 标准SQL支持名为“自然联接”的概念,它表示基于两侧同名列之间匹配的内部联接。natural join
, which represents an inner join based on a match between columns with the same name in both sides. For example, 例如,T1 NATURAL JOIN T2
joins the rows between T1
and T2
based on a match between the columns with the same names on both sides. T1 NATURAL JOIN T2
根据两侧同名列之间的匹配来连接T1和T2之间的行。T-SQL doesn't have an implementation of a natural join. T-SQL没有自然连接的实现。A join that has an explicit join predicate that is based on a binary operator (equality or inequality) is known as a 具有基于二进制运算符(相等或不相等)的显式连接谓词的连接称为“θ连接”。theta join
. So both equi joins and non-equi joins are types of theta joins.所以等连接和非等连接都是θ连接的类型。
As an example of a non-equi join, the following query joins two instances of the 作为非相等联接的示例,以下查询联接Employees
table to produce unique pairs of employees:Employees
表的两个实例,以生成唯一的雇员对:
SELECT
E1.empid, E1.firstname, E1.lastname,
E2.empid, E2.firstname, E2.lastname
FROM HR.Employees AS E1
INNER JOIN HR.Employees AS E2
ON E1.empid < E2.empid;
Notice the predicate specified in the 注意ON
clause. ON
子句中指定的谓词。The purpose of the query is to produce unique pairs of employees. 查询的目的是生成唯一的员工对。Had a cross join been used, the result would have included self pairs (for example, 1 with 1) and also mirrored pairs (for example, 1 with 2 and also 2 with 1). 如果使用交叉连接,结果将包括自对(例如,1与1)和镜像对(例如,1与2以及2与1)。Using an inner join with a join condition that says the key on the left side must be smaller than the key on the right side eliminates the two inapplicable cases. 使用带有连接条件的内部连接,即左侧的键必须小于右侧的键,可以消除这两种不适用的情况。Self pairs are eliminated because both sides are equal. 因为两边相等,所以消除了自配对。With mirrored pairs, only one of the two cases qualifies because, of the two cases, only one will have a left key that is smaller than the right key. 对于镜像对,这两种情况中只有一种符合条件,因为在这两种情况中,只有一种情况的左键比右键小。In this example, of the 81 possible pairs of employees a cross join would have returned, this query returns the 36 unique pairs shown here:在本例中,交叉联接可能会返回81对员工,该查询返回36对唯一的员工,如下所示:
empid firstname lastname empid firstname lastname
----------- ---------- -------------------- ----------- ---------- --------------------
1 Sara Davis 2 Don Funk
1 Sara Davis 3 Judy Lew
2 Don Funk 3 Judy Lew
1 Sara Davis 4 Yael Peled
2 Don Funk 4 Yael Peled
3 Judy Lew 4 Yael Peled
1 Sara Davis 5 Sven Mortensen
2 Don Funk 5 Sven Mortensen
3 Judy Lew 5 Sven Mortensen
4 Yael Peled 5 Sven Mortensen
1 Sara Davis 6 Paul Suurs
2 Don Funk 6 Paul Suurs
3 Judy Lew 6 Paul Suurs
4 Yael Peled 6 Paul Suurs
5 Sven Mortensen 6 Paul Suurs
1 Sara Davis 7 Russell King
2 Don Funk 7 Russell King
3 Judy Lew 7 Russell King
4 Yael Peled 7 Russell King
5 Sven Mortensen 7 Russell King
6 Paul Suurs 7 Russell King
1 Sara Davis 8 Maria Cameron
2 Don Funk 8 Maria Cameron
3 Judy Lew 8 Maria Cameron
4 Yael Peled 8 Maria Cameron
5 Sven Mortensen 8 Maria Cameron
6 Paul Suurs 8 Maria Cameron
7 Russell King 8 Maria Cameron
1 Sara Davis 9 Patricia Doyle
2 Don Funk 9 Patricia Doyle
3 Judy Lew 9 Patricia Doyle
4 Yael Peled 9 Patricia Doyle
5 Sven Mortensen 9 Patricia Doyle
6 Paul Suurs 9 Patricia Doyle
7 Russell King 9 Patricia Doyle
8 Maria Cameron 9 Patricia Doyle
(36 row(s) affected)
If it's still not clear to you what this query does, try to process it one step at a time with a smaller set of employees. 如果你仍然不清楚这个查询是做什么的,试着用一小部分员工一步一步地处理它。For example, suppose the 例如,假设Employees
table contained only employees 1, 2, and 3. Employees
表只包含Employees 1、Employees 2和Employees 3。First, produce the Cartesian product of two instances of the table:首先,生成表的两个实例的笛卡尔乘积:
E1.empid E2.empid
------------- -------------
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
Next, filter the rows based on the predicate 接下来,根据谓词E1.empid < E2.empid
, and you are left with only three rows:E1.empid < E2.empid
筛选行,只剩下三行:
E1.empid E2.empid
------------- -------------
1 2
1 3
2 3
A join table operator operates only on two tables, but a single query can have multiple joins. 联接表运算符只能对两个表进行操作,但一个查询可以有多个联接。In general, when more than one table operator appears in the 通常,当FROM
clause, the table operators are logically processed from left to right. FROM
子句中出现多个表运算符时,表运算符将从左到右进行逻辑处理。That is, the result table of the first table operator is treated as the left input to the second table operator; the result of the second table operator is treated as the left input to the third table operator; and so on. 也就是说,第一个表运算符的结果表被视为第二个表运算符的左输入;第二个表运算符的结果被视为第三个表运算符的左输入;等等So if there are multiple joins in the 因此,如果FROM
clause, the first join operates on two base tables, but all other joins get the result of the preceding join as their left input. FROM
子句中有多个联接,则第一个联接对两个基表进行操作,但所有其他联接都将前一个联接的结果作为其左输入。With cross joins and inner joins, the database engine can (and often does) internally rearrange join ordering for optimization purposes because it won't have an impact on the correctness of the result of the query.对于交叉连接和内部连接,数据库引擎可以(而且经常)在内部重新排列连接顺序以达到优化目的,因为它不会影响查询结果的正确性。
As an example, the following query joins the 例如,下面的查询联接Customers
and Orders
tables to match customers with their orders, and then it joins the result of the first join with the OrderDetails
table to match orders with their order lines:Customers
和Orders
表以将客户与其订单匹配,然后将第一次联接的结果联接OrderDetails
表以将订单与其订单行匹配:
SELECT
C.custid, C.companyname, O.orderid,
OD.productid, OD.qty
FROM Sales.Customers AS C
INNER JOIN Sales.Orders AS O
ON C.custid = O.custid
INNER JOIN Sales.OrderDetails AS OD
ON O.orderid = OD.orderid;
This query returns the following output, shown here in abbreviated form:此查询返回以下输出,以缩写形式显示:
custid companyname orderid productid qty
----------- ----------------- ----------- ----------- ------
85 Customer ENQZT 10248 11 12
85 Customer ENQZT 10248 42 10
85 Customer ENQZT 10248 72 5
79 Customer FAPSM 10249 14 9
79 Customer FAPSM 10249 51 40
34 Customer IBVRG 10250 41 10
34 Customer IBVRG 10250 51 35
34 Customer IBVRG 10250 65 15
84 Customer NRCSK 10251 22 6
84 Customer NRCSK 10251 57 15
...
(2155 row(s) affected)
Compared to the other types of joins, outer joins are usually harder for people to grasp. 与其他类型的连接相比,外部连接通常更难让人理解。First I will describe the fundamentals of outer joins. 首先,我将描述外部联接的基本原理。If by the end of the “Fundamentals of outer joins” section you feel ready for more advanced content, you can proceed to the optional section “Beyond the fundamentals of outer joins.” 如果在“外部连接的基本原理”部分结束时,您觉得已经准备好阅读更高级的内容,您可以继续阅读可选部分“超越外部连接的基本原理”。Otherwise, feel free to skip that part and return to it when you feel more comfortable with the material.否则,可以跳过这一部分,当你对材料感到更舒适时再回到这一部分。
Outer joins were introduced in SQL-92 and, unlike inner joins and cross joins, have only one standard syntax—the one in which the JOIN keyword is specified between the table names and the join condition is specified in the 外部联接是在SQL-92中引入的,与内部联接和交叉联接不同,外部联接只有一种标准语法,即在表名之间指定联接关键字,在ON
clause. ON
子句中指定联接条件。Outer joins apply the two logical processing phases that inner joins apply (Cartesian Product and the 外部联接应用内部联接所应用的两个逻辑处理阶段(笛卡尔积和ON
filter), plus a third phase called Adding Outer Rows that is unique to this type of join.ON
筛选器),再加上第三个阶段,称为添加外部行,这是这种联接所特有的。
In an outer join, you mark a table as a “preserved” table by using the keywords LEFT OUTER JOIN, RIGHT OUTER JOIN, or FULL OUTER JOIN between the table names. 在外部联接中,通过在表名之间使用关键字LEFT OUTER JOIN
、RIGHT OUTER JOIN
或FULL OUTER JOIN
,将表标记为“保留”表。The OUTER keyword is optional. OUTER
关键字是可选的。The LEFT keyword means that the rows of the left table (the one to the left of the JOIN keyword) are preserved; the RIGHT keyword means that the rows in the right table are preserved; and the FULL keyword means that the rows in both the left and right tables are preserved. LEFT
关键字表示保留左表的行(JOIN
关键字左边的那一行);RIGHT
关键字表示保留了右表中的行;FULL
关键字意味着左表和右表中的行都被保留。The third logical query processing phase of an outer join identifies the rows from the preserved table that did not find matches in the other table based on the 外部联接的第三个逻辑查询处理阶段根据ON
predicate. ON
谓词识别保留表中未在另一个表中找到匹配项的行。This phase adds those rows to the result table produced by the first two phases of the join, and it uses 此阶段将这些行添加到联接的前两个阶段生成的结果表中,并使用NULLs
as placeholders for the attributes from the nonpreserved side of the join in those outer rows.NULL
值作为这些外部行中联接的非保留端属性的占位符。
A good way to understand outer joins is through an example. 理解外部联接的一个好方法是通过一个例子。The following query joins the 以下查询基于客户的客户ID和订单的客户ID之间的匹配,将Customers
and Orders
tables, based on a match between the customer's customer ID and the order's customer ID, to return customers and their orders. Customers
表和Orders
表连接起来,以返回客户及其订单。The join type is a left outer join; therefore, the query also returns customers who did not place any orders:联接类型为左外联接;因此,查询还返回未下订单的客户:
SELECT C.custid, C.companyname, O.orderid
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON C.custid = O.custid;
This query returns the following output, shown here in abbreviated form:此查询返回以下输出,以缩写形式显示:
custid companyname orderid
----------- --------------- -----------
1 Customer NRZBB 10643
1 Customer NRZBB 10692
1 Customer NRZBB 10702
1 Customer NRZBB 10835
1 Customer NRZBB 10952
...
21 Customer KIDPX 10414
21 Customer KIDPX 10512
21 Customer KIDPX 10581
21 Customer KIDPX 10650
21 Customer KIDPX 10725
22 Customer DTDMN NULL
23 Customer WVFAF 10408
23 Customer WVFAF 10480
23 Customer WVFAF 10634
23 Customer WVFAF 10763
23 Customer WVFAF 10789
...
56 Customer QNIVZ 10684
56 Customer QNIVZ 10766
56 Customer QNIVZ 10833
56 Customer QNIVZ 10999
56 Customer QNIVZ 11020
57 Customer WVAXS NULL
58 Customer AHXHT 10322
58 Customer AHXHT 10354
58 Customer AHXHT 10474
58 Customer AHXHT 10502
58 Customer AHXHT 10995
...
91 Customer CCFIZ 10792
91 Customer CCFIZ 10870
91 Customer CCFIZ 10906
91 Customer CCFIZ 10998
91 Customer CCFIZ 11044
(832 row(s) affected)
Two customers in the 客户表中的两位客户没有下任何订单。Customers
table did not place any orders. Their IDs are 22 and 57. 他们的ID是22和57。Observe that in the output of the query, both customers are returned with 请注意,在查询的输出中,NULLs
in the attributes from the Orders
table. Orders
表中的属性都返回NULL
值。Logically, the rows for these two customers were discarded by the second phase of the join (the filter based on the 从逻辑上讲,连接的第二阶段(基于ON
predicate), but the third phase added those as outer rows. ON
谓词的筛选器)丢弃了这两个客户的行,但第三阶段将它们添加为外部行。Had the join been an inner join, these two rows would not have been returned. 如果联接是内部联接,则不会返回这两行。These two rows are added to preserve all the rows of the left table.添加这两行是为了保留左表中的所有行。
It might help to think of the result of an outer join as having two kinds of rows with respect to the preserved side—inner rows and outer rows. 将外部联接的结果视为有两种类型的行(相对于保留侧的内部行和外部行),这可能会有所帮助。Inner rows are rows that have matches on the other side based on the 内行是另一侧基于ON
predicate, and outer rows are rows that don't. ON
谓词具有匹配项的行,外行是不具有匹配项的行。An inner join returns only inner rows, whereas an outer join returns both inner and outer rows.内部联接只返回内部行,而外部联接同时返回内部行和外部行。
A common question about outer joins that is the source of a lot of confusion is whether to specify a predicate in the 关于外部联接的一个常见问题是,是否在查询的ON
or WHERE
clause of a query. ON
或WHERE
子句中指定一个谓词,这是造成大量混淆的原因。You can see that with respect to rows from the preserved side of an outer join, the filter based on the 可以看到,对于外部联接的保留端的行,基于ON
predicate is not final. ON
谓词的筛选器不是最终的。In other words, the 换句话说,ON谓词并不确定一行是否会显示在输出中,只确定它是否会与另一侧的行匹配。ON
predicate does not determine whether a row will show up in the output, only whether it will be matched with rows from the other side. So when you need to express a predicate that is not final—meaning a predicate that determines which rows to match from the nonpreserved side—specify the predicate in the 因此,当您需要表达一个不是final的谓词时,可以在ON
clause. ON
子句中指定谓词,该谓词决定从非保留端匹配哪些行。When you need a filter to be applied after outer rows are produced, and you want the filter to be final, specify the predicate in the 如果需要在生成外部行后应用筛选器,并且希望筛选器是最终的,请在WHERE
clause. WHERE
子句中指定谓词。The WHERE
clause is processed after the FROM
clause—specifically, after all table operators have been processed and (in the case of outer joins) after all outer rows have been produced. WHERE
子句是在FROM
子句之后处理的,尤其是在所有表运算符处理完毕之后,以及(在外部联接的情况下)在所有外部行生成之后。Also, the 此外,WHERE
clause is final with respect to rows that it filters out, unlike the ON
clause. WHERE
子句对于它筛选掉的行是最终的,这与ON
子句不同。To recap, in the 总而言之,在ON
clause you specify nonfinal, or matching, predicates. ON
子句中指定非最终谓词或匹配谓词。In the 在WHERE
clause you specify final, or filtering, predicates.WHERE
子句中指定final或filtering谓词。
Suppose you need to return only customers who did not place any orders or, more technically speaking, you need to return only outer rows. 假设只需要返回没有下订单的客户,或者更严格地说,只需要返回外行。You can use the previous query as your basis, adding a 可以使用前面的查询作为基础,添加一个WHERE
clause that filters only outer rows. WHERE
子句,该子句只筛选外部行。Remember that outer rows are identified by the 请记住,外部行由连接的非保留端的属性中的NULLs
in the attributes from the nonpreserved side of the join. NULL
值标识。So you can filter only the rows in which one of the attributes on the nonpreserved side of the join is 因此,您只能筛选联接非保留端的一个属性为NULL
, like this:NULL
的行,如下所示:
SELECT C.custid, C.companyname
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON C.custid = O.custid
WHERE O.orderid IS NULL;
This query returns only two rows, with the customers 22 and 57:
custid companyname
----------- ---------------
22 Customer DTDMN
57 Customer WVAXS
(2 row(s) affected)
Notice a couple of important things about this query. 请注意关于这个查询的几个重要事项。Recall the discussions about 回想一下本书前面关于NULLs
earlier in the book: When looking for a NULL
, you should use the operator IS NULL
and not an equality operator. NULL
的讨论:当查找NULL
时,应该使用运算符IS NULL
,而不是相等运算符。You do this because when an equality operator compares something with a 这样做是因为当相等运算符将某个值与NULL
, it always returns UNKNOWN
—even when it's comparing two NULLs
. NULL
值进行比较时,即使在比较两个NULL
值时,它也总是返回UNKNOWN
。Also, the choice of which attribute from the nonpreserved side of the join to filter is important. 此外,从连接到筛选器的非保留端选择哪个属性也很重要。You should choose an attribute that can have only a 您应该选择一个属性,当该行是外部行时,该属性只能为NULL
when the row is an outer row and not otherwise (for example, not a NULL
originating from the base table). NULL
,否则不能为NULL
(例如,不是源于基表的NULL
)。For this purpose, three cases are safe to consider: a primary key column, a join column, and a column defined as 为此,可以安全地考虑三种情况:主键列、联接列和定义为NOT NULL
. NOT NULL
的列。A primary key column cannot be 主键列不能为NULL
; therefore, a NULL
in such a column can only mean that the row is an outer row. NULL
;因此,此类列中的NULL
只能表示该行是外部行。If a row has a 如果一行的联接列中有一个NULL
in the join column, that row is filtered out by the second phase of the join, so a NULL
in such a column can only mean that it's an outer row. NULL
值,则该行将被联接的第二阶段筛选掉,因此该列中的NULL
值只能表示它是一个外部行。And obviously, a 显然,定义为NULL
in a column that is defined as NOT NULL
can only mean that the row is an outer row.NOT NULL
的列中的NULL
只能表示该行是外部行。
To practice what you learned and get a better grasp of outer joins, make sure you perform the exercises for this chapter.为了练习所学内容并更好地掌握外部连接,请确保完成本章的练习。
This section covers more advanced aspects of outer joins and is provided as optional reading for when you feel comfortable with the fundamentals of outer joins.本节介绍了外部联接的更高级方面,当您对外部联接的基本原理感到满意时,本节将作为可选阅读提供。
You can use outer joins to identify and include missing values when querying data. 查询数据时,可以使用外部联接来标识和包含缺失的值。For example, suppose you need to query all orders from the 例如,假设需要从Orders
table in the TSQLV4
database. TSQLV4
数据库的orders
表中查询所有订单。You need to ensure that you get at least one row in the output for each date in the range January 1, 2014 through December 31, 2016. 您需要确保在2014年1月1日到2016年12月31日的范围内,每个日期的输出中至少有一行。You don't want to do anything special with dates within the range that have orders, but you do want the output to include the dates with no orders, with 您不希望对范围内有订单的日期执行任何特殊操作,但希望输出包含没有订单的日期,并在订单属性中使用NULLs
as placeholders in the attributes of the order.NULL
值作为占位符。
To solve the problem, you can first write a query that returns a sequence of all dates in the requested period. 为了解决这个问题,您可以首先编写一个查询,返回请求期间内所有日期的序列。You can then perform a left outer join between that set and the 然后,可以在该集合和Orders
table. Orders
表之间执行左外联接。This way, the result also includes the missing dates.这样,结果还包括丢失的日期。
To produce a sequence of dates in a given range, I usually use an auxiliary table of numbers. 为了生成给定范围内的日期序列,我通常使用一个辅助数字表。I create a table called 我创建了一个名为dbo.Nums
with a column called n
, and populate it with a sequence of integers (1, 2, 3, and so on). dbo.Nums
的表,其中有一个名为n
的列,并用一系列整数(1、2、3等等)填充它。I find that an auxiliary table of numbers is an extremely powerful general-purpose tool I end up using to solve many problems. 我发现辅助数字表是一个非常强大的通用工具,我最终用它来解决许多问题。You need to create it only once in the database and populate it with as many numbers as you might need. 您只需要在数据库中创建一次,并用您可能需要的数字填充它。The TSQLV4
sample database already has such an auxiliary table.TSQLV4
示例数据库已经有了这样一个辅助表。
As the first step in the solution, you need to produce a sequence of all dates in the requested range. 作为解决方案的第一步,您需要生成请求范围内所有日期的序列。You can achieve this by querying the 可以通过查询Nums
table and filtering as many numbers as the number of days in the requested date range. Nums
表并筛选请求日期范围内的天数来实现这一点。You can use the 您可以使用DATEDIFF
function to calculate that number. DATEDIFF
函数来计算该数字。By adding 通过在日期范围的起始点(2014年1月1日)加上n – 1
days to the starting point of the date range (January 1, 2014), you get the actual date in the sequence. Here's the solution query:n-1
天,可以得到序列中的实际日期。以下是解决方案查询:
SELECT DATEADD(day, n-1, CAST('20140101' AS DATE)) AS orderdate
FROM dbo.Nums
WHERE n <= DATEDIFF(day, '20140101', '20161231') + 1
ORDER BY orderdate;
This query returns a sequence of all dates in the range January 1, 2014 through December 31, 2016, as shown here in abbreviated form:此查询返回2014年1月1日至2016年12月31日范围内的所有日期序列,如以下缩写形式所示:
orderdate
-----------
2014-01-01
2014-01-02
2014-01-03
2014-01-04
2014-01-05
...
2016-12-27
2016-12-28
2016-12-29
2016-12-30
2016-12-31
(1096 row(s) affected)
The next step is to extend the previous query, adding a left outer join between 下一步是扩展前面的查询,在Nums
and the Orders
tables. Nums
和Orders
表之间添加一个左外部联接。The join condition compares the order date produced from the 联接条件使用表达式Nums
table and the orderdate
from the Orders
table by using the expression DATEADD(day, Nums.n – 1, CAST('20140101' AS DATE))
like this:DATEADD(day, Nums.n – 1, CAST('20140101' AS DATE))
比较Nums
表中生成的订单日期和Orders
表中的订单日期,如下所示:
SELECT DATEADD(day, Nums.n - 1, CAST('20140101' AS DATE)) AS orderdate,
O.orderid, O.custid, O.empid
FROM dbo.Nums
LEFT OUTER JOIN Sales.Orders AS O
ON DATEADD(day, Nums.n - 1, CAST('20140101' AS DATE)) = O.orderdate
WHERE Nums.n <= DATEDIFF(day, '20140101', '20161231') + 1
ORDER BY orderdate;
This query produces the following output, shown here in abbreviated form:此查询生成以下输出,以缩写形式显示:
orderdate orderid custid empid
----------- ----------- ----------- -----------
2014-01-01 NULL NULL NULL
2014-01-02 NULL NULL NULL
2014-01-03 NULL NULL NULL
2014-01-04 NULL NULL NULL
2014-01-05 NULL NULL NULL
...
2014-06-29 NULL NULL NULL
2014-06-30 NULL NULL NULL
2014-07-01 NULL NULL NULL
2014-07-02 NULL NULL NULL
2014-07-03 NULL NULL NULL
2014-07-04 10248 85 5
2014-07-05 10249 79 6
2014-07-06 NULL NULL NULL
2014-07-07 NULL NULL NULL
2014-07-08 10250 34 4
2014-07-08 10251 84 3
2014-07-09 10252 76 4
2014-07-10 10253 34 3
2014-07-11 10254 14 5
2014-07-12 10255 68 9
2014-07-13 NULL NULL NULL
2014-07-14 NULL NULL NULL
2014-07-15 10256 88 3
2014-07-16 10257 35 4
...
2008-12-2 NULL NULL NULL
2008-12-2 NULL NULL NULL
2008-12-2 NULL NULL NULL
2008-12-3 NULL NULL NULL
2008-12-3 NULL NULL NULL
(1446 row(s) affected)
Dates that do not appear as order dates in the 在Orders
table appear in the output of the query with NULLs
in the order attributes.Orders
表中未显示为订单日期的日期将显示在查询的输出中,order
属性中的日期为空。
When you need to review code involving outer joins to look for logical bugs, one of the things you should examine is the 当需要检查涉及外部联接的代码以查找逻辑错误时,应该检查WHERE
clause. WHERE
子句。If the predicate in the 如果WHERE
clause refers to an attribute from the nonpreserved side of the join using an expression in the form <attribute> <operator> <value>
, it's usually an indication of a bug. WHERE
子句中的谓词使用<attribute> <operator> <value>
形式的表达式引用连接的非保留端的属性,则通常表示存在错误。This is because attributes from the nonpreserved side of the join are 这是因为联接的非保留端的属性在外部行中为NULLs
in outer rows, and an expression in the form NULL <operator> <value>
yields UNKNOWN
(unless it's the IS NULL
operator explicitly looking for NULLs
). NULL
,而NULL <operator> <value>
形式的表达式会产生UNKNOWN
(除非它是显式查找NULL
的IS NULL
运算符)。Recall that a 回想一下,WHERE
clause filters UNKNOWN
out. WHERE
子句会筛选掉UNKNOWN
的内容。Such a predicate in the WHERE
clause causes all outer rows to be filtered out, effectively nullifying the outer join. WHERE
子句中的这样一个谓词会导致筛选掉所有外部行,从而有效地使外部连接无效。Effectively, the join becomes an inner join. 实际上,连接变成了内部连接。So the programmer either made a mistake in the join type or in the predicate.因此,程序员要么在连接类型中出错,要么在谓词中出错。
If this is not clear yet, the following example might help. 如果这还不清楚,下面的例子可能会有所帮助。Consider the following query:考虑下面的查询:
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON C.custid = O.custid
WHERE O.orderdate >= '20160101';
The query performs a left outer join between the 查询在Customers
and Orders
tables. Customers
和Orders
表之间执行左外联接。Prior to applying the 在应用WHERE
filter, the join operator returns inner rows for customers who placed orders and outer rows for customers who didn't place orders, with NULLs
in the order attributes. WHERE
筛选器之前,join运算符会为下订单的客户返回内行,为未下订单的客户返回外行,订单属性中为NULL
。The predicate O.orderdate >= '20160101'
in the WHERE
clause evaluates to UNKNOWN
for all outer rows, because those have a NULL
in the O.orderdate
attribute. WHERE
子句中的谓词O.orderdate>='20160101'
对于所有外部行的计算结果都是UNKNOWN
,因为这些行的O.orderdate
属性中有NULL
。All outer rows are eliminated by the 所有外部行都被WHERE
filter, as you can see in the output of the query, shown here in abbreviated form:WHERE
筛选器删除,正如您在查询的输出中所看到的,这里以缩写形式显示:
custid companyname orderid orderdate
----------- --------------- ----------- ----------
1 Customer NRZBB 10835 2016-01-15
1 Customer NRZBB 10952 2016-03-16
1 Customer NRZBB 11011 2016-04-09
2 Customer MLTDN 10926 2016-03-04
3 Customer KBUDE 10856 2016-01-28
...
90 Customer XBBVR 10910 2016-02-26
91 Customer CCFIZ 10906 2016-02-25
91 Customer CCFIZ 10870 2016-02-04
91 Customer CCFIZ 10998 2016-04-03
91 Customer CCFIZ 11044 2016-04-23
(270 row(s) affected)
This means that the use of an outer join here was futile. 这意味着在这里使用外部连接是徒劳的。The programmer either made a mistake in using an outer join or in specifying the predicate in the 程序员在使用外部联接或在WHERE
clause.WHERE
子句中指定谓词时出错。
Recall the discussion about all-at-once operations in Chapter 2, “Single-table queries.” 回想一下第2章“单表查询”中关于一次性所有操作的讨论。The concept describes the fact that all expressions that appear in the same logical query processing phase are evaluated as a set, at the same point in time. 这个概念描述了一个事实,即在同一个逻辑查询处理阶段出现的所有表达式在同一时间点作为一个集合进行计算。However, this concept is not applicable to the processing of table operators in the 然而,这个概念不适用于FROM
phase. FROM
阶段中表运算符的处理。Table operators are logically evaluated from left to right. 表运算符从左到右进行逻辑计算。Rearranging the order in which outer joins are processed might result in different output, so you cannot rearrange them at will.重新排列外部联接的处理顺序可能会导致不同的输出,因此不能随意重新排列它们。
Some interesting bugs have to do with the logical order in which outer joins are processed. 一些有趣的错误与处理外部联接的逻辑顺序有关。For example, a common bug could be considered a variation of the bug in the previous section. 例如,一个常见的bug可以被认为是前一节中bug的变体。Suppose you write a multi-join query with an outer join between two tables, followed by an inner join with a third table. 假设您编写了一个多连接查询,其中两个表之间有一个外部连接,后面是一个与第三个表的内部连接。If the predicate in the inner join's 如果内部联接的ON
clause compares an attribute from the nonpreserved side of the outer join and an attribute from the third table, all outer rows are discarded. ON
子句中的谓词将外部联接的非保留端的属性与第三个表中的属性进行比较,则所有外部行都将被丢弃。Remember that outer rows have 请记住,外部行在连接的非保留端的属性中有NULLs
in the attributes from the nonpreserved side of the join, and comparing a NULL
with anything yields UNKNOWN
. NULL
值,将空值与任何内容进行比较会产生UNKNOWN
结果。UNKNOWN
is filtered out by the ON
filter. UNKNOWN
由ON
筛选器筛选掉。In other words, such a predicate nullifies the outer join, effectively turning it into an inner join. 换句话说,这样的谓词会使外部连接无效,有效地将其转化为内部连接。For example, consider the following query:例如,考虑下面的查询:
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON C.custid = O.custid
INNER JOIN Sales.OrderDetails AS OD
ON O.orderid = OD.orderid;
The first join is an outer join returning customers and their orders and also customers who did not place any orders. 第一个连接是一个外部连接,返回客户及其订单,以及没有下任何订单的客户。The outer rows representing customers with no orders have 表示没有订单的客户的外部行在订单属性中为NULLs
in the order attributes. NULL
。The second join matches order lines from the 第二个连接将OrderDetails
table with rows from the result of the first join, based on the predicate O.orderid = OD.orderid
; however, in the rows representing customers with no orders, the O.orderid
attribute is NULL
. OrderDetails
表中的订单行与第一个连接结果中的行相匹配,基于谓词O.orderid = OD.orderid
;但是,在表示没有订单的客户的行中,O.orderid
属性为NULL
。Therefore, the predicate evaluates to 因此,谓词的计算结果为UNKNOWN
, and those rows are discarded. UNKNOWN
,这些行将被丢弃。The output shown here in abbreviated form doesn't contain the customers 22 and 57, the two customers who did not place orders:此处以缩写形式显示的输出不包含客户22和57,这两个客户没有下订单:
custid orderid productid qty
----------- ----------- ----------- ------
85 10248 11 12
85 10248 42 10
85 10248 72 5
79 10249 14 9
79 10249 51 40
...
65 11077 64 2
65 11077 66 1
65 11077 73 2
65 11077 75 4
65 11077 77 2
(2155 row(s) affected)
Generally, outer rows are dropped whenever any kind of outer join (left, right, or full) is followed by a subsequent inner join or right outer join. 通常,当任何类型的外部联接(左、右或全)后面紧跟着后续的内部联接或右外部联接时,都会删除外部行。That's assuming, of course, that the join condition compares the 当然,这是假设连接条件将左侧的空值与右侧的空值进行比较。NULLs
from the left side with something from the right side.
There are several ways to get around the problem if you want to return customers with no orders in the output. 如果您想退回输出中没有订单的客户,有几种方法可以解决这个问题。One option is to use a left outer join in the second join as well:一个选项是在第二个联接中也使用左外部联接:
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON C.custid = O.custid
LEFT OUTER JOIN Sales.OrderDetails AS OD
ON O.orderid = OD.orderid;
This way, the outer rows produced by the first join aren't filtered out, as you can see in the output shown here in abbreviated form:这样,第一次联接产生的外部行就不会被筛选掉,正如您在这里以缩写形式显示的输出中所看到的:
custid orderid productid qty
----------- ----------- ----------- ------
85 10248 11 12
85 10248 42 10
85 10248 72 5
79 10249 14 9
79 10249 51 40
...
65 11077 64 2
65 11077 66 1
65 11077 73 2
65 11077 75 4
65 11077 77 2
22 NULL NULL NULL
57 NULL NULL NULL
(2157 row(s) affected)
This solution is usually not a good one because it preserves all rows from 这种解决方案通常不是很好,因为它保留了Orders
. Orders
中的所有行。What if there were rows in 如果订单中有一些行在Orders
that didn't have matches in OrderDetails
, and you wanted those rows to be discarded. OrderDetails
中没有匹配项,而您希望丢弃这些行,该怎么办。What you want is an inner join between 你想要的是Orders
and OrderDetails
.Orders
和OrderDetails
之间的内在连接。
A second option is to use an inner join between 第二个选项是在Orders
and OrderDetails
, and then join the result with the Customers
table using a right outer join:Orders
和OrderDetails
之间使用内部联接,然后使用右外部联接将结果与Customers
表联接:
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Orders AS O
INNER JOIN Sales.OrderDetails AS OD
ON O.orderid = OD.orderid
RIGHT OUTER JOIN Sales.Customers AS C
ON O.custid = C.custid;
This way, the outer rows are produced by the last join and are not filtered out.这样,外部行由最后一次连接生成,不会被筛选掉。
A third option is to use parentheses to turn the inner join between 第三个选项是使用括号将Orders
and OrderDetails
into an independent unit. Orders
和OrderDetails
之间的内部连接变成一个独立的单元。This way, you can apply a left outer join between the 通过这种方式,可以在Customers
table and that unit. Customers
表和该单元之间应用左外联接。The query would look like this:查询如下所示:
SELECT C.custid, O.orderid, OD.productid, OD.qty
FROM Sales.Customers AS C
LEFT OUTER JOIN
(Sales.Orders AS O
INNER JOIN Sales.OrderDetails AS OD
ON O.orderid = OD.orderid)
ON C.custid = O.custid;
COUNT
aggregate with outer joinsCOUNT
聚合与外部联接一起使用Another common bug involves using 另一个常见的错误是将COUNT
with outer joins. COUNT
与外部联接一起使用。When you group the result of an outer join and use the 当对外部联接的结果进行分组并使用COUNT(*)
aggregate, the aggregate takes into consideration both inner rows and outer rows, because it counts rows regardless of their contents. COUNT(*)
聚合时,聚合会同时考虑内部行和外部行,因为它对行进行计数,而不考虑其内容。Usually, you're not supposed to take outer rows into consideration for the purposes of counting. 通常情况下,你不应该为了计数的目的而考虑外行。For example, the following query is supposed to return the count of orders for each customer:例如,下面的查询应该返回每个客户的订单数量:
SELECT C.custid, COUNT(*) AS numorders
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON C.custid = O.custid
GROUP BY C.custid;
Customers such as 22 and 57 who did not place orders, each have an outer row in the result of the join; therefore, they show up in the output with a count of 1:22和57等没有下订单的客户,每一个都有一个外行加入的结果;因此,它们以1的计数显示在输出中:
custid numorders
----------- -----------
1 6
2 4
3 7
4 13
5 18
...
22 1
...
57 1
...
87 15
88 9
89 14
90 7
91 7
(91 row(s) affected)
The COUNT(*)
aggregate function cannot detect whether a row really represents an order. COUNT(*)
聚合函数无法检测一行是否真正代表订单。To fix the problem, you should use 要解决此问题,应该使用COUNT(<column>)
instead of COUNT(*)
and provide a column from the nonpreserved side of the join. COUNT(<column>)
而不是COUNT(*)
,并从联接的非保留端提供一个列。This way, the 这样,COUNT()
aggregate ignores outer rows because they have a NULL
in that column. COUNT()
聚合将忽略外部行,因为它们在该列中有一个NULL
。Remember to use a column that can only be 记住,如果行是外行,则使用只能为NULL
in case the row is an outer row—for example, the primary key column orderid
:NULL
的列,例如,主键列orderid
:
SELECT C.custid, COUNT(O.orderid) AS numorders
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON C.custid = O.custid
GROUP BY C.custid;
Notice in the output that the customers 22 and 57 now show up with a count of 0:请注意,在输出中,客户22和57现在显示为0:
custid numorders
----------- -----------
1 6
2 4
3 7
4 13
5 18
...
22 0
...
57 0
...
87 15
88 9
89 14
90 7
91 7
(91 row(s) affected)
This chapter covered the 本章介绍了JOIN
table operator. JOIN
表运算符。It described the logical query processing phases involved in the three fundamental types of joins: cross joins, inner joins, and outer joins. 它描述了三种基本类型的连接所涉及的逻辑查询处理阶段:交叉连接、内部连接和外部连接。The chapter also covered further join examples, including composite joins, non-equi joins, and multi-join queries. 本章还介绍了更多连接示例,包括复合连接、非等连接和多连接查询。The chapter concluded with an optional reading section covering more advanced aspects of outer joins. 本章最后有一个可选阅读部分,涵盖了外部连接的更高级方面。To practice what you learned, go over the exercises for this chapter.要练习所学内容,请复习本章的练习。
This section provides exercises to help you familiarize yourself with the subjects discussed in this chapter. 本节提供练习,帮助您熟悉本章讨论的主题。All exercises involve querying objects in the 所有练习都涉及查询TSQLV4
database.TSQLV4
数据库中的对象。
Write a query that generates five copies of each employee row:编写一个查询,生成每个员工行的五个副本:
Tables involved: 涉及的表格:HR.Employees
and dbo.Nums
HR.Employees
和dbo.Nums
Desired output:期望输出:
empid firstname lastname n
----------- ---------- -------------------- -----------
1 Sara Davis 1
2 Don Funk 1
3 Judy Lew 1
4 Yael Peled 1
5 Sven Mortensen 1
6 Paul Suurs 1
7 Russell King 1
8 Maria Cameron 1
9 Patricia Doyle 1
1 Sara Davis 2
2 Don Funk 2
3 Judy Lew 2
4 Yael Peled 2
5 Sven Mortensen 2
6 Paul Suurs 2
7 Russell King 2
8 Maria Cameron 2
9 Patricia Doyle 2
1 Sara Davis 3
2 Don Funk 3
3 Judy Lew 3
4 Yael Peled 3
5 Sven Mortensen 3
6 Paul Suurs 3
7 Russell King 3
8 Maria Cameron 3
9 Patricia Doyle 3
1 Sara Davis 4
2 Don Funk 4
3 Judy Lew 4
4 Yael Peled 4
5 Sven Mortensen 4
6 Paul Suurs 4
7 Russell King 4
8 Maria Cameron 4
9 Patricia Doyle 4
1 Sara Davis 5
2 Don Funk 5
3 Judy Lew 5
4 Yael Peled 5
5 Sven Mortensen 5
6 Paul Suurs 5
7 Russell King 5
8 Maria Cameron 5
9 Patricia Doyle 5
(45 row(s) affected)
Write a query that returns a row for each employee and day in the range June 12, 2016 through June 16, 2016:编写一个查询,返回2016年6月12日至2016年6月16日范围内每个员工和日期的行:
Tables involved: 涉及的表格:HR.Employees
and dbo.Nums
HR.Employees
和dbo.Nums
Desired output:期望输出:
empid dt
----------- -----------
1 2016-06-12
1 2016-06-13
1 2016-06-14
1 2016-06-15
1 2016-06-16
2 2016-06-12
2 2016-06-13
2 2016-06-14
2 2016-06-15
2 2016-06-16
3 2016-06-12
3 2016-06-13
3 2016-06-14
3 2016-06-15
3 2016-06-16
4 2016-06-12
4 2016-06-13
4 2016-06-14
4 2016-06-15
4 2016-06-16
5 2016-06-12
5 2016-06-13
5 2016-06-14
5 2016-06-15
5 2016-06-16
6 2016-06-12
6 2016-06-13
6 2016-06-14
6 2016-06-15
6 2016-06-16
7 2016-06-12
7 2016-06-13
7 2016-06-14
7 2016-06-15
7 2016-06-16
8 2016-06-12
8 2016-06-13
8 2016-06-14
8 2016-06-15
8 2016-06-16
9 2016-06-12
9 2016-06-13
9 2016-06-14
9 2016-06-15
9 2016-06-16
(45 row(s) affected)
Explain what's wrong in the following query, and provide a correct alternative:解释以下查询中的错误,并提供正确的替代方案:
SELECT Customers.custid, Customers.companyname, Orders.orderid, Orders.orderdate
FROM Sales.Customers AS C
INNER JOIN Sales.Orders AS O
ON Customers.custid = Orders.custid;
Return US customers, and for each customer return the total number of orders and total quantities:将客户退回,并为每个客户退回订单总数和总数量:
Tables involved: 涉及的表格:Sales.Customers
, Sales.Orders
, and Sales.OrderDetails
Sales.Customers
、Sales.Orders
和Sales.OrderDetails
Desired output:期望输出:
custid numorders totalqty
----------- ----------- -----------
32 11 345
36 5 122
43 2 20
45 4 181
48 8 134
55 10 603
65 18 1383
71 31 4958
75 9 327
77 4 46
78 3 59
82 3 89
89 14 1063
(13 row(s) affected)
Return customers and their orders, including customers who placed no orders:退回客户及其订单,包括未下订单的客户:
Tables involved: 涉及的表格:Sales.Customers
and Sales.Orders
Sales.Customers
和Sales.Orders
Desired output (abbreviated):期望输出(缩写):
custid companyname orderid orderdate
----------- --------------- ----------- -----------
85 Customer ENQZT 10248 2014-07-04
79 Customer FAPSM 10249 2014-07-05
34 Customer IBVRG 10250 2014-07-08
84 Customer NRCSK 10251 2014-07-08
...
73 Customer JMIKW 11074 2016-05-06
68 Customer CCKOT 11075 2016-05-06
9 Customer RTXGC 11076 2016-05-06
65 Customer NYUHS 11077 2016-05-06
22 Customer DTDMN NULL NULL
57 Customer WVAXS NULL NULL
(832 row(s) affected)
Return customers who placed no orders:退回未下订单的客户:
Tables involved: 涉及的表格:Sales.Customers
and Sales.Orders
Sales.Customers
和Sales.Orders
Desired output:期望输出:
custid companyname
----------- ---------------
22 Customer DTDMN
57 Customer WVAXS
(2 row(s) affected)
Return customers with orders placed on February 12, 2016, along with their orders:退回2016年2月12日下订单的客户及其订单:
Tables involved: 涉及的表格:Sales.Customers
and Sales.Orders
Sales.Customers
和Sales.Orders
Desired output:期望输出:
custid companyname orderid orderdate
----------- --------------- ----------- ----------
48 Customer DVFMB 10883 2016-02-12
45 Customer QXPPT 10884 2016-02-12
76 Customer SFOGW 10885 2016-02-12
(3 row(s) affected)
Write a query that returns all customers in the output, but matches them with their respective orders only if they were placed on February 12, 2016:编写一个查询,返回输出中的所有客户,但仅当他们在2016年2月12日下单时,才将他们与各自的订单进行匹配:
Tables involved: 涉及的表格:Sales.Customers
and Sales.Orders
Sales.Customers
和Sales.Orders
Desired output (abbreviated):期望输出(缩写):
custid companyname orderid orderdate
----------- --------------- ----------- ----------
72 Customer AHPOP NULL NULL
58 Customer AHXHT NULL NULL
25 Customer AZJED NULL NULL
18 Customer BSVAR NULL NULL
91 Customer CCFIZ NULL NULL
68 Customer CCKOT NULL NULL
49 Customer CQRAA NULL NULL
24 Customer CYZTN NULL NULL
22 Customer DTDMN NULL NULL
48 Customer DVFMB 10883 2016-02-12
10 Customer EEALV NULL NULL
40 Customer EFFTC NULL NULL
85 Customer ENQZT NULL NULL
82 Customer EYHKM NULL NULL
79 Customer FAPSM NULL NULL
...
51 Customer PVDZC NULL NULL
52 Customer PZNLA NULL NULL
56 Customer QNIVZ NULL NULL
8 Customer QUHWH NULL NULL
67 Customer QVEPD NULL NULL
45 Customer QXPPT 10884 2016-02-12
7 Customer QXVLA NULL NULL
60 Customer QZURI NULL NULL
19 Customer RFNQC NULL NULL
9 Customer RTXGC NULL NULL
76 Customer SFOGW 10885 2016-02-12
69 Customer SIUIH NULL NULL
86 Customer SNXOJ NULL NULL
88 Customer SRQVM NULL NULL
54 Customer TDKEG NULL NULL
20 Customer THHDP NULL NULL
...
(91 row(s) affected)
Explain why the following query isn't a correct solution query for Exercise 7:解释为什么以下查询不是练习7的正确解决方案查询:
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON O.custid = C.custid
WHERE O.orderdate = '20160212'
OR O.orderid IS NULL;
Return all customers, and for each return a Yes/No value depending on whether the customer placed orders on February 12, 2016:退回所有客户,根据客户是否在2016年2月12日下单,每次退回的值为是/否:
Tables involved: 涉及的表格:Sales.Customers
and Sales.Orders
Sales.Customers
和Sales.Orders
Desired output (abbreviated):期望输出(缩写):
custid companyname HasOrderOn20160212
----------- --------------- ------------------
...
40 Customer EFFTC No
41 Customer XIIWM No
42 Customer IAIJK No
43 Customer UISOJ No
44 Customer OXFRU No
45 Customer QXPPT Yes
46 Customer XPNIK No
47 Customer PSQUZ No
48 Customer DVFMB Yes
49 Customer CQRAA No
50 Customer JYPSC No
51 Customer PVDZC No
52 Customer PZNLA No
53 Customer GCJSG No
...
(91 row(s) affected)
This section provides solutions to the exercises for this chapter.本节提供本章练习的解决方案。
Producing multiple copies of rows can be achieved with a cross join. 通过交叉连接可以生成行的多个副本。If you need to produce five copies of each employee row, you need to perform a cross join between the 如果需要生成每个员工行的五个副本,则需要在员工表和具有五行的表之间执行交叉联接;或者,您可以在Employees
table and a table that has five rows; alternatively, you can perform a cross join between Employees
and a table that has more than five rows, but filter only five from that table in the WHERE
clause. Employees
和包含五行以上的表之间执行交叉联接,但在WHERE
子句中仅从该表中筛选五行。The Nums
table is convenient for this purpose. Nums
表为此目的很方便。Simply join 只需连接Employees
and Nums
, and filter from Nums
as many rows as the number of requested copies (five, in this case). Employees
和Nums
,并从Nums
中筛选所需拷贝数(本例中为五)的行数。Here's the solution query:以下是解决方案查询:
SELECT E.empid, E.firstname, E.lastname, N.n
FROM HR.Employees AS E
CROSS JOIN dbo.Nums AS N
WHERE N.n <= 5
ORDER BY n, empid;
This exercise is an extension of the previous exercise. 本练习是上一练习的延伸。Instead of being asked to produce a predetermined constant number of copies of each employee row, you are asked to produce a copy for each day in a certain date range. 要求您为特定日期范围内的每一天制作一份副本,而不是为每一员工行制作预定的固定数量的副本。So here you need to calculate the number of days in the requested date range by using the 因此,这里需要使用DATEDIFF
function, and refer to the result of that expression in the query's WHERE
clause instead of referring to a constant. DATEDIFF
函数计算请求日期范围内的天数,并在查询的WHERE
子句中引用该表达式的结果,而不是引用常量。To produce the dates, simply add 要生成日期,只需在开始请求范围的日期上添加n – 1
days to the date that starts the requested range. n-1
天。Here's the solution query:以下是解决方案查询:
SELECT E.empid,
DATEADD(day, D.n - 1, CAST('20160612' AS DATE)) AS dt
FROM HR.Employees AS E
CROSS JOIN dbo.Nums AS D
WHERE D.n <= DATEDIFF(day, '20160612', '20160616') + 1
ORDER BY empid, dt;
The DATEDIFF
function returns 4
because there is a four-day difference between June 12, 2016 and June 16, 2016. DATEDIFF
函数返回4,因为2016年6月12日和2016年6月16日之间有四天的差异。Add 1 to the result, and you get 5 for the five days in the range. 在结果中加1,在该范围内的五天内得到5。So the 所以WHERE
clause filters five rows from Nums
where n
is less than or equal to 5. WHERE
子句从NUM中筛选五行,其中n小于或等于5。By adding 将n-1天添加到2016年6月12日,您将获得2016年6月12日至2016年6月16日范围内的所有日期。n – 1
days to June 12, 2016, you get all dates in the range June 12, 2016 through June 16, 2016.
The first step in the processing of the JOIN table operator assigns to the JOIN表运算符处理的第一步是分别为Customers
and Orders
tables the shorter aliases C
and O
, respectively. Customers
和Orders
表指定较短的别名C
和O
。The aliasing effectively renames the tables for the purposes of the query. 出于查询的目的,别名有效地重命名了表。In all subsequent phases of logical query processing, the original table names are not accessible, rather only the shorter aliases are. 在逻辑查询处理的所有后续阶段中,无法访问原始表名,而只能访问较短的别名。You have two options to fix the query. 有两个选项可以修复该查询。One is to avoid aliasing and use the original table names as prefixes, like so:一种是避免别名,并使用原始表名作为前缀,如下所示:
SELECT Customers.custid, Customers.companyname, Orders.orderid, Orders.orderdate
FROM Sales.Customers
INNER JOIN Sales.Orders
ON Customers.custid = Orders.custid;
Another solution is to keep the aliases, but to make sure to use the aliases as prefixes, like so:另一种解决方案是保留别名,但要确保将别名用作前缀,如下所示:
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C
INNER JOIN Sales.Orders AS O
ON C.custid = O.custid;
This exercise requires you to write a query that joins three tables: 本练习要求您编写一个连接三个表的查询:Customers
, Orders
, and OrderDetails
. Customers
、Orders
和OrderDetails
。The query should use the 查询应该使用WHERE
clause to filter only rows where the customer's country is the United States. WHERE
子句只筛选客户所在国家为美国的行。Because you are asked to return aggregates per customer, the query should group the rows by customer ID. 因为要求您返回每个客户的聚合,所以查询应该按客户ID对行进行分组。You need to resolve a tricky issue here to return the right number of orders for each customer. 您需要解决一个棘手的问题,才能为每位客户返回正确数量的订单。Because of the join between 由于Orders
and OrderDetails
, you don't get only one row per order—you get one row per order line. Orders
和OrderDetails
之间的连接,每个订单不能只有一行,每个订单行只能有一行。So if you use the 因此,如果在COUNT(*)
function in the SELECT
list, you get back the number of order lines for each customer and not the number of orders.SELECT
列表中使用COUNT(*)
函数,则返回每个客户的订单行数,而不是订单数。
To resolve this issue, you need to take each order into consideration only once. 要解决此问题,您只需考虑一次每个订单。You can do this by using 可以通过使用COUNT(DISTINCT O.orderid)
instead of COUNT(*).
COUNT(DISTINCT O.orderid)
而不是COUNT(*)
来实现这一点。The total quantities don't create any special issues because the quantity is associated with the order line and not the order. 总数量不会产生任何特殊问题,因为数量与订单行而不是订单关联。Here's the solution query:以下是解决方案查询:
SELECT C.custid, COUNT(DISTINCT O.orderid) AS numorders, SUM(OD.qty) AS totalqty
FROM Sales.Customers AS C
INNER JOIN Sales.Orders AS O
ON O.custid = C.custid
INNER JOIN Sales.OrderDetails AS OD
ON OD.orderid = O.orderid
WHERE C.country = N'USA'
GROUP BY C.custid;
To get both customers who placed orders and customers who didn't place orders in the result, you need to use an outer join, like this:要在结果中同时获得下订单的客户和未下订单的客户,需要使用外部联接,如下所示:
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON O.custid = C.custid;
This query returns 832 rows (including the customers 22 and 57, who didn't place orders). 此查询返回832行(包括未下订单的客户22和57)。An inner join between the tables would return only 830 rows, without those customers.如果没有这些客户,表之间的内部联接将只返回830行。
This exercise is an extension of the previous one. 这个练习是前一个练习的延伸。To return only customers who didn't place orders, you need to add a 要仅返回未下订单的客户,需要在查询中添加WHERE
clause to the query that filters only outer rows—namely, rows that represent customers with no orders. WHERE
子句,该子句只筛选外部行,即表示没有订单的客户的行。Outer rows have 外部行在连接(NULLs
in the attributes from the nonpreserved side of the join (Orders
). Orders
)的非保留端的属性中有NULL
值。But to make sure that the 但为了确保NULL
is a placeholder for an outer row and not a NULL
that originated from the table, it's recommended that you refer to an attribute that is the primary key, or the join column, or one defined as not allowing NULLs
. NULL
是外部行的占位符,而不是源于表的NULL
,建议您引用主键属性、联接列或定义为不允许NULL
的属性。Here's the solution query, which refers to the primary key of the 下面是解决方案查询,它引用Orders
table in the WHERE
clause:WHERE
子句中Orders
表的主键:
SELECT C.custid, C.companyname
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON O.custid = C.custid
WHERE O.orderid IS NULL;
This query returns only two rows, for customers 22 and 57, who didn't place orders.这个查询只返回两行,分别是没有下订单的客户22和57。
This exercise involves writing a query that performs an inner join between 本练习涉及编写一个查询,在Customers
and Orders
and filters only rows in which the order date is February 12, 2016:Customers
和Orders
之间执行内部联接,并只筛选订单日期为2016年2月12日的行:
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C
INNER JOIN Sales.Orders AS O
ON O.custid = C.custid
WHERE O.orderdate = '20160212';
The WHERE
clause filtered out customers who didn't place orders on February 12, 2016, but that was the request.WHERE
子句筛选掉了2016年2月12日没有下订单的客户,但这就是请求。
This exercise builds on the previous one. 这个练习建立在前一个练习的基础上。The trick here is to realize two things. 这里的诀窍是实现两件事。First, you need an outer join because you are supposed to preserve all customers, even if they don't have matching orders. 首先,您需要一个外部连接,因为您应该保留所有客户,即使他们没有匹配的订单。Second, the predicate based on the order date is a nonfinal matching predicate; as such it must appear in the 其次,基于订单日期的谓词是非最终匹配谓词;因此,它必须出现在ON
clause and not the WHERE
clause. ON
子句中,而不是WHERE
子句中。Remember that the 记住,WHERE
clause is a final filter that is applied after outer rows are added. WHERE
子句是在添加外部行之后应用的最终筛选器。Your goal is to match orders to customers only if the order was placed by the customer on February 12, 2016. 您的目标是,仅当客户在2016年2月12日下订单时,才将订单与客户匹配。You still want to get customers who didn't place orders on that date in the output. 您仍然希望在输出中找到未在该日期下订单的客户。Hence, the 因此,ON
clause should match customers and orders based on both an equality between the customer's customer ID and the order's customer ID, and on the order date being February 12, 2016. ON
子句应根据客户的客户ID和订单的客户ID之间的相等性,以及订单日期为2016年2月12日的情况,匹配客户和订单。Here's the solution query:以下是解决方案查询:
SELECT C.custid, C.companyname, O.orderid, O.orderdate
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON O.custid = C.custid
AND O.orderdate = '20160212';
The outer join matches all customers with their respective orders, and it preserves also customers who didn't place any orders. 外部连接将所有客户与其各自的订单进行匹配,并且还保留了没有下任何订单的客户。Customers without orders have 没有订单的客户在订单属性中为NULLs
in the order attributes. NULL
。Then the 然后WHERE
filter keeps only rows where the order date is February 12, 2016 or the order ID is NULL (a customer without orders at all). WHERE
筛选器只保留订单日期为2016年2月12日或订单ID为空(完全没有订单的客户)的行。The filter discards customers who didn't place orders on February 12, 2016 but did place orders on other dates, and according to Exercise 7 the query is supposed to return all customers. 筛选器会丢弃那些在2016年2月12日没有下订单但在其他日期下订单的客户,根据练习7,查询应该返回所有客户。Here's the output of the incorrect query:以下是错误查询的输出:
custid companyname orderid orderdate
----------- --------------- ----------- ----------
48 Customer DVFMB 10883 2016-02-12
45 Customer QXPPT 10884 2016-02-12
76 Customer SFOGW 10885 2016-02-12
22 Customer DTDMN NULL NULL
57 Customer WVAXS NULL NULL
(5 row(s) affected)
The first three rows represent orders that were placed on February 12, 2016. The last two rows represent customers who didn't place orders at all. 前三行代表2016年2月12日下的订单。最后两行代表根本没有下订单的客户。Observe that many of the 91 customers from the 请注意,Customers
table are missing. Customers
表中的91位客户中有许多人失踪。As mentioned, those are customers who didn't place orders on February 12, 2016, but did place orders on other dates.如前所述,这些客户没有在2016年2月12日下单,但在其他日期下单。
This exercise is an extension of Exercise 7. Here, instead of returning matching orders, you just need to return a Yes/No value indicating whether there is a matching order. 本练习是练习7的延伸。在这里,您不需要返回匹配的订单,只需要返回一个Yes/No值,指示是否存在匹配的订单。Remember that in an outer join, a nonmatch is identified as an outer row with 请记住,在外部联接中,非匹配被标识为非保留端的属性中包含NULLs
in the attributes of the nonpreserved side. NULL
值的外部行。So you can use a simple 因此,您可以使用一个简单的CASE
expression that checks whether the current row is not an outer one, in which case it returns Yes
; otherwise, it returns No
. CASE
表达式来检查当前行是否不是外部行,在这种情况下,它会返回Yes
;否则,它返回No
。Because technically you can have more than one match per customer, you should add a 因为从技术上讲,每个客户可以有多个匹配项,所以应该在DISTINCT
clause to the SELECT
list. SELECT
列表中添加一个DISTINCT
子句。This way, you get only one row back for each customer. 这样,每个客户只返回一行。Here's the solution query:以下是解决方案查询:
SELECT DISTINCT C.custid, C.companyname,
CASE WHEN O.orderid IS NOT NULL THEN 'Yes' ELSE 'No' END AS HasOrderOn20160212
FROM Sales.Customers AS C
LEFT OUTER JOIN Sales.Orders AS O
ON O.custid = C.custid
AND O.orderdate = '20160212';