On this page本页内容
$strLenBytes
¶New in version 3.4.版本3.4中的新功能。
Returns the number of UTF-8 encoded bytes in the specified string.
$strLenBytes
has the following operator expression syntax:
The argument can be any valid expression as long as it resolves to a string. For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式。
If the argument resolves to a value of null
or refers to a missing field, $strLenBytes
returns an error.
The $strLenBytes
operator counts the number of UTF-8 encoded bytes in a string where each character may use between one and four bytes.
For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters (i.e. Latin characters outside of the English alphabet) are encoded using two bytes. Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.
The $strLenBytes
operator differs from $strLenCP
operator which counts the code points in the specified string regardless of how many bytes each character uses.
5 |
Each character is encoded using one byte. | |
12 |
Each character is encoded using one byte. | |
9 |
Each character is encoded using one byte. | |
11 |
é is encoded using two bytes. | |
0 |
Empty strings return 0. | |
7 |
€ is encoded using three bytes. λ is encoded using two bytes. | |
6 |
Each character is encoded using three bytes. |
A collection named food
contains the following documents:
The following operation uses the $strLenBytes
operator to calculate the length
of each name
value:
The operation returns the following results:操作返回以下结果:
The documents with _id: 3
and _id: 5
each contain a diacritic character (é
and ñ
respectively) that requires two bytes to encode. The document with _id: 8
contains two Japanese characters that are encoded using three bytes each. This makes the length
greater than the number of characters in name
for the documents with _id: 3
, _id: 5
and _id: 8
.
See also参阅