Trail: Internationalization
Lesson: Working with Text

« Previous • Trail • Next »

~~The Java Tutorials have been written for JDK 8.~~Java教程是为JDK 8编写的。~~Examples and practices described in this page don't take advantage of improvements introduced in later releases and might use technology no longer available.~~本页中描述的示例和实践没有利用后续版本中引入的改进，并且可能使用不再可用的技术。
~~See Java Language Changes for a summary of updated language features in Java SE 9 and subsequent releases.~~有关Java SE 9及其后续版本中更新的语言特性的摘要，请参阅Java语言更改。
~~See JDK Release Notes for information about new features, enhancements, and removed or deprecated options for all JDK releases.~~有关所有JDK版本的新功能、增强功能以及已删除或不推荐的选项的信息，请参阅JDK发行说明。

Converting Non-Unicode Text

In the Java programming language char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages. You can learn more about the Unicode standard at the Unicode Consortium Web site .

Few text editors currently support Unicode text entry. The text editor we used to write this section's code examples supports only ASCII characters, which are limited to 7 bits. To indicate Unicode characters that cannot be represented in ASCII, such as ö, we used the \uXXXX escape sequence. Each X in the escape sequence is a hexadecimal digit. The following example shows how to indicate the ö character with an escape sequence:

String str = "\u00F6";
char c = '\u00F6';
Character letter = new Character('\u00F6');

A variety of character encodings are used by systems around the world. Currently few of these encodings conform to Unicode. Because your program expects characters in Unicode, the text data it gets from the system must be converted into Unicode, and vice versa. Data in text files is automatically converted to Unicode when its encoding matches the default file encoding of the Java Virtual Machine. You can identify the default file encoding by creating an OutputStreamWriter using it and asking for its canonical name:

OutputStreamWriter out = new OutputStreamWriter(new ByteArrayOutputStream());
System.out.println(out.getEncoding());

If the default file encoding differs from the encoding of the text data you want to process, then you must perform the conversion yourself. You might need to do this when processing text from another country or computing platform.

This section discusses the APIs you use to translate non-Unicode text into Unicode. Before using these APIs, you should verify that the character encoding you wish to convert into Unicode is supported. The list of supported character encodings is not part of the Java programming language specification. Therefore the character encodings supported by the APIs may vary with platform. To see which encodings the Java Development Kit supports, see the Supported Encodings document.

The material that follows describes two techniques for converting non-Unicode text to Unicode. You can convert non-Unicode byte arrays into String objects, and vice versa. Or you can translate between streams of Unicode characters and byte streams of non-Unicode text.

Byte Encodings and Strings

This section shows you how to convert non-Unicode byte arrays into String objects, and vice versa.

Character and Byte Streams

In this section you'll learn how to translate between streams of Unicode characters and byte streams of non-Unicode text.

« Previous • Trail • Next »