Trail: Collections
Lesson: ~~Interfaces~~接口

« Previous • Trail • Next »

~~The Java Tutorials have been written for JDK 8.~~Java教程是为JDK 8编写的。~~Examples and practices described in this page don't take advantage of improvements introduced in later releases and might use technology no longer available.~~本页中描述的示例和实践没有利用后续版本中引入的改进，并且可能使用不再可用的技术。
~~See Java Language Changes for a summary of updated language features in Java SE 9 and subsequent releases.~~有关Java SE 9及其后续版本中更新的语言特性的摘要，请参阅Java语言更改。
~~See JDK Release Notes for information about new features, enhancements, and removed or deprecated options for all JDK releases.~~有关所有JDK版本的新功能、增强功能以及已删除或不推荐的选项的信息，请参阅JDK发行说明。

The Set InterfaceSet接口

~~A Set is a Collection that cannot contain duplicate elements.~~ Set是不能包含重复元素的Collection。~~It models the mathematical set abstraction.~~ 它对数学集合抽象进行建模。~~The Set interface contains only methods inherited from Collection and adds the restriction that duplicate elements are prohibited.~~ Set接口只包含从Collection继承的方法，并添加了禁止重复元素的限制。~~Set also adds a stronger contract on the behavior of the equals and hashCode operations, allowing Set instances to be compared meaningfully even if their implementation types differ.~~ Set还对equals和hashCode操作的行为增加了更严格的约定，允许对Set实例进行有意义的比较，即使它们的实现类型不同。~~Two Set instances are equal if they contain the same elements.~~如果两个Set实例包含相同的元素，则它们是相等的。

~~The Java platform contains three general-purpose Set implementations: HashSet, TreeSet, and LinkedHashSet.~~ Java平台包含三种通用的Set实现：HashSet、TreeSet和LinkedHashSet。HashSet, which stores its elements in a hash table, is the best-performing implementation; however it makes no guarantees concerning the order of iteration. TreeSet, which stores its elements in a red-black tree, orders its elements based on their values; it is substantially slower than HashSet. LinkedHashSet, which is implemented as a hash table with a linked list running through it, orders its elements based on the order in which they were inserted into the set (insertion-order). LinkedHashSet spares its clients from the unspecified, generally chaotic ordering provided by HashSet at a cost that is only slightly higher.

Here's a simple but useful Set idiom. Suppose you have a Collection, c, and you want to create another Collection containing the same elements but with all duplicates eliminated. The following one-liner does the trick.

Collection<Type> noDups = new HashSet<Type>(c);

It works by creating a Set (which, by definition, cannot contain duplicates), initially containing all the elements in c. It uses the standard conversion constructor described in the The Collection Interface section.

Or, if using JDK 8 or later, you could easily collect into a Set using aggregate operations:

c.stream()
.collect(Collectors.toSet()); // no duplicates

Here's a slightly longer example that accumulates a Collection of names into a TreeSet:

Set<String> set = people.stream()
.map(Person::getName)
.collect(Collectors.toCollection(TreeSet::new));

And the following is a minor variant of the first idiom that preserves the order of the original collection while removing duplicate elements:

Collection<Type> noDups = new LinkedHashSet<Type>(c);

The following is a generic method that encapsulates the preceding idiom, returning a Set of the same generic type as the one passed.

public static <E> Set<E> removeDups(Collection<E> c) {
    return new LinkedHashSet<E>(c);
}

Set Interface Basic Operations

The size operation returns the number of elements in the Set (its cardinality). The isEmpty method does exactly what you think it would. The add method adds the specified element to the Set if it is not already present and returns a boolean indicating whether the element was added. Similarly, the remove method removes the specified element from the Set if it is present and returns a boolean indicating whether the element was present. The iterator method returns an Iterator over the Set.

The following program prints out all distinct words in its argument list. Two versions of this program are provided. The first uses JDK 8 aggregate operations. The second uses the for-each construct.

Using JDK 8 Aggregate Operations:

import java.util.*;
import java.util.stream.*;

public class FindDups {
    public static void main(String[] args) {
        Set<String> distinctWords = Arrays.asList(args).stream()
		.collect(Collectors.toSet()); 
        System.out.println(distinctWords.size()+ 
                           " distinct words: " + 
                           distinctWords);
    }
}

Using the for-each Construct:

import java.util.*;

public class FindDups {
    public static void main(String[] args) {
        Set<String> s = new HashSet<String>();
        for (String a : args)
               s.add(a);
        System.out.println(s.size() + " distinct words: " + s);
    }
}

Now run either version of the program.

java FindDups i came i saw i left

The following output is produced:

4 distinct words: [left, came, saw, i]

Note that the code always refers to the Collection by its interface type (Set) rather than by its implementation type. This is a strongly recommended programming practice because it gives you the flexibility to change implementations merely by changing the constructor. If either of the variables used to store a collection or the parameters used to pass it around are declared to be of the Collection's implementation type rather than its interface type, all such variables and parameters must be changed in order to change its implementation type.

Furthermore, there's no guarantee that the resulting program will work. If the program uses any nonstandard operations present in the original implementation type but not in the new one, the program will fail. Referring to collections only by their interface prevents you from using any nonstandard operations.

The implementation type of the Set in the preceding example is HashSet, which makes no guarantees as to the order of the elements in the Set. If you want the program to print the word list in alphabetical order, merely change the Set's implementation type from HashSet to TreeSet. Making this trivial one-line change causes the command line in the previous example to generate the following output.

java FindDups i came i saw i left

4 distinct words: [came, i, left, saw]

Set Interface Bulk Operations

Bulk operations are particularly well suited to Sets; when applied, they perform standard set-algebraic operations. Suppose s1 and s2 are sets. Here's what bulk operations do:

s1.containsAll(s2) — returns true if s2 is a subset of s1. (s2 is a subset of s1 if set s1 contains all of the elements in s2.)
s1.addAll(s2) — transforms s1 into the union of s1 and s2. (The union of two sets is the set containing all of the elements contained in either set.)
s1.retainAll(s2) — transforms s1 into the intersection of s1 and s2. (The intersection of two sets is the set containing only the elements common to both sets.)
s1.removeAll(s2) — transforms s1 into the (asymmetric) set difference of s1 and s2. (For example, the set difference of s1 minus s2 is the set containing all of the elements found in s1 but not in s2.)

To calculate the union, intersection, or set difference of two sets nondestructively (without modifying either set), the caller must copy one set before calling the appropriate bulk operation. The following are the resulting idioms.

Set<Type> union = new HashSet<Type>(s1);
union.addAll(s2);

Set<Type> intersection = new HashSet<Type>(s1);
intersection.retainAll(s2);

Set<Type> difference = new HashSet<Type>(s1);
difference.removeAll(s2);

The implementation type of the result Set in the preceding idioms is HashSet, which is, as already mentioned, the best all-around Set implementation in the Java platform. However, any general-purpose Set implementation could be substituted.

Let's revisit the FindDups program. Suppose you want to know which words in the argument list occur only once and which occur more than once, but you do not want any duplicates printed out repeatedly. This effect can be achieved by generating two sets — one containing every word in the argument list and the other containing only the duplicates. The words that occur only once are the set difference of these two sets, which we know how to compute. Here's how the resulting program looks.

import java.util.*;

public class FindDups2 {
    public static void main(String[] args) {
        Set<String> uniques = new HashSet<String>();
        Set<String> dups    = new HashSet<String>();

        for (String a : args)
            if (!uniques.add(a))
                dups.add(a);

        // Destructive set-difference
        uniques.removeAll(dups);

        System.out.println("Unique words:    " + uniques);
        System.out.println("Duplicate words: " + dups);
    }
}

When run with the same argument list used earlier (i came i saw i left), the program yields the following output.

Unique words:    [left, saw, came]
Duplicate words: [i]

A less common set-algebraic operation is the symmetric set difference — the set of elements contained in either of two specified sets but not in both. The following code calculates the symmetric set difference of two sets nondestructively.

Set<Type> symmetricDiff = new HashSet<Type>(s1);
symmetricDiff.addAll(s2);
Set<Type> tmp = new HashSet<Type>(s1);
tmp.retainAll(s2);
symmetricDiff.removeAll(tmp);

Set Interface Array Operations

The array operations don't do anything special for Sets beyond what they do for any other Collection. These operations are described in The Collection Interface section.

« Previous • Trail • Next »