The utfmb4
character set has these characteristics:
Supports BMP and supplementary characters.
Requires a maximum of four bytes per multibyte character.
utf8mb4
contrasts with the utf8mb3
character set, which supports only BMP characters and uses a maximum of three bytes per character:
For a BMP character, utf8mb4
and utf8mb3
have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character, utf8mb4
requires four bytes to store it, whereas utf8mb3
cannot store the character at all. When converting utf8mb3
columns to utf8mb4
, you need not worry about converting supplementary characters because there are none.
utf8mb4
is a superset of utf8mb3
, so for an operation such as the following concatenation, the result has character set utf8mb4
and the collation of utf8mb4_col
:
SELECT CONCAT(utf8mb3_col, utf8mb4_col);
Similarly, the following comparison in the WHERE
clause works according to the collation of utf8mb4_col
:
SELECT * FROM utf8mb3_tbl, utf8mb4_tbl WHERE utf8mb3_tbl.utf8mb3_col = utf8mb4_tbl.utf8mb4_col;
For information about data type storage as it relates to multibyte character sets, see String Type Storage Requirements.