$binarySize (aggregation)

On this page本页内容

Definition定义

$binarySize

New in version 4.4.版本4.4中的新功能。

Returns the size of a given string or binary data value’s content in bytes.返回给定字符串或二进制数据值内容的大小(字节)。

$binarySize has the following syntax:语法如下所示:

{ $binarySize: <string or binData> }

The argument can be any valid expression as long as it resolves to either a string or binary data value. 参数可以是任何有效的表达式,只要它解析为字符串或二进制数据值。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式

Behavior行为

The argument for $binarySize must resolve to either:$binarySize的参数必须解析为:

If the argument is a string or binary data value, the expression returns the size of the argument in bytes.如果参数是字符串或二进制数据值,则表达式将以字节为单位返回参数的大小。

If the argument is null, the expression returns null.如果参数为null,则表达式返回null

If the argument resolves to any other data type, $binarySize errors.如果参数解析为任何其他数据类型,$binarySize错误。

String Size Calculation字符串大小计算

If the argument for $binarySize is a string, the operator counts the number of UTF-8 encoded bytes in a string where each character may use between one and four bytes.如果$binarySize的参数是一个字符串,则运算符计算字符串中UTF-8编码的字节数,其中每个字符可以使用1到4个字节。

For example, US-ASCII characters are encoded using one byte. 例如,US-ASCII字符使用一个字节进行编码。Characters with diacritic markings and additional Latin alphabetical characters (i.e. Latin characters outside of the English alphabet) are encoded using two bytes. 带有变音符号的字符和附加的拉丁字母字符(即英语字母表之外的拉丁字符)使用两个字节进行编码。Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.中文、日文和韩文字符通常需要三个字节,而其他unicode平面(表情符号、数学符号等)则需要四个字节。

Consider the following examples:考虑下面的例子:

Example示例Results结果Notes备注
{ $binarySize: "abcde" }
5 Each character is encoded using one byte.每个字符用一个字节编码。
{ $binarySize: "Hello World!" }
12 Each character is encoded using one byte.每个字符用一个字节编码。
{ $binarySize: "cafeteria" }
9 Each character is encoded using one byte.每个字符用一个字节编码。
{ $binarySize: "cafétéria" }
11 é is encoded using two bytes.使用两个字节进行编码。
{ $binarySize: "" }
0 Empty strings return 0.空字符串返回0。
{ $binarySize: "$€λG" }
7 is encoded using three bytes. 使用三个字节进行编码。λ is encoded using two bytes.使用两个字节进行编码。
{ $binarySize: "寿司" }
6 Each character is encoded using three bytes.每个字符用三个字节编码。

Example示例

From the mongo shell, create a sample collection named images with the following documents:mongo shell中,使用以下文档创建名为images的示例集合:

db.images.insertMany([
  { _id: 1, name: "cat.jpg", binary: new BinData(0, "OEJTfmD8twzaj/LPKLIVkA==")},
  { _id: 2, name: "big_ben.jpg", binary: new BinData(0, "aGVsZmRqYWZqYmxhaGJsYXJnYWZkYXJlcTU1NDE1Z2FmZCBmZGFmZGE=")},
  { _id: 3, name: "tea_set.jpg", binary: new BinData(0, "MyIRAFVEd2aImaq7zN3u/w==")},
  { _id: 4, name: "concert.jpg", binary: new BinData(0, "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=")},
  { _id: 5, name: "empty.jpg", binary: new BinData(0, "") }
])

The following aggregation projects:以下聚合projects

db.images.aggregate([
  {
    $project: {
      "name": "$name",
      "imageSize": { $binarySize: "$binary" }
    }
  }
])

The operation returns the following result:操作返回以下结果:

{ "_id" : 1, "name" : "cat.jpg", "imageSize" : 16 }
{ "_id" : 2, "name" : "big_ben.jpg", "imageSize" : 41 }
{ "_id" : 3, "name" : "teaset.jpg", "imageSize" : 16 }
{ "_id" : 4, "name" : "concert.jpg", "imageSize" : 269 }
{ "_id" : 5, "name" : "empty.jpg", "imageSize" : 0 }

Find Largest Binary Data查找最大的二进制数据

The following pipeline returns the image with the largest binary data size:以下管道返回具有最大二进制数据大小的图像:

db.images.aggregate([
   // First Stage
   { $project: { name: "$name", imageSize: { $binarySize: "$binary" } }  },
   // Second Stage
   { $sort: { "imageSize" : -1 } },
   // Third Stage
   { $limit: 1 }
])
First Stage第一阶段

The first stage of the pipeline projects:管道projects的第一阶段:

  • The name fieldname字段
  • The imageSize field, which uses binarySize to return the size of the document’s binary field in bytes.imageSize字段,它使用binarySize以字节为单位返回文档binary字段的大小。

This stage outputs the following documents to the next stage:本阶段向下一阶段输出以下文件:

{ "_id" : 1, "name" : "cat.jpg", "imageSize" : 16 }
{ "_id" : 2, "name" : "big_ben.jpg", "imageSize" : 41 }
{ "_id" : 3, "name" : "teaset.jpg", "imageSize" : 16 }
{ "_id" : 4, "name" : "concert.jpg", "imageSize" : 269 }
{ "_id" : 5, "name" : "empty.jpg", "imageSize" : 0 }
Second Stage

The second stage sorts the documents by imageSize in descending order.第二阶段sortsimageSize降序排列文档。

This stage outputs the following documents to the next stage:本阶段向下一阶段输出以下文件:

{ "_id" : 4, "name" : "concert.jpg", "imageSize" : 269 }
{ "_id" : 2, "name" : "big_ben.jpg", "imageSize" : 41 }
{ "_id" : 1, "name" : "cat.jpg", "imageSize" : 16 }
{ "_id" : 3, "name" : "teaset.jpg", "imageSize" : 16 }
{ "_id" : 5, "name" : "empty.jpg", "imageSize" : 0 }
Third Stage第三阶段

The third stage limits the output documents to only return the document appearing first in the sort order:第三阶段将输出文档limits为仅返回排序顺序中最先出现的文档:

{ "_id" : 4, "name" : "concert.jpg", "imageSize" : 269 }