On this page本页内容
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16 MB.GridFS是一种用于存储和检索超过BSON文档大小限制16 MB的文件的规范。
Note
GridFS does not support multi-document transactions.GridFS不支持多文档事务。
Instead of storing a file in a single document, GridFS divides the file into parts, or chunks [1], and stores each chunk as a separate document. GridFS没有将文件存储在单个文档中,而是将文件分成多个部分或块[1],并将每个块存储为单独的文档。By default, GridFS uses a default chunk size of 255 kB; that is, GridFS divides a file into chunks of 255 kB with the exception of the last chunk. 默认情况下,GridFS使用255kb的默认块大小;也就是说,GridFS将文件划分为255kb的块,最后一个块除外。The last chunk is only as large as necessary. 最后一个块仅为所需的大小。Similarly, files that are no larger than the chunk size only have a final chunk, using only as much space as needed plus some additional metadata.类似地,不大于块大小的文件只有最后一个块,只使用所需的空间和一些额外的元数据。
GridFS uses two collections to store files. GridFS使用两个集合来存储文件。One collection stores the file chunks, and the other stores file metadata. 一个集合存储文件块,另一个存储文件元数据。The section GridFS Collections describes each collection in detail.GridFS集合一节详细描述了每个集合。
When you query GridFS for a file, the driver will reassemble the chunks as needed. 当您查询GridFS文件时,驱动程序将根据需要重新组合块。You can perform range queries on files stored through GridFS. 您可以对通过GridFS存储的文件执行范围查询。You can also access information from arbitrary sections of files, such as to “skip” to the middle of a video or audio file.您还可以从文件的任意部分访问信息,例如“跳过”到视频或音频文件的中间。
GridFS is useful not only for storing files that exceed 16 MB but also for storing any files for which you want access without having to load the entire file into memory. GridFS不仅适用于存储超过16MB的文件,而且还适用于存储您想要访问的任何文件,而无需将整个文件加载到内存中。See also When to Use GridFS.另请参见何时使用GridFS。
In MongoDB, use GridFS for storing files larger than 16 MB.
In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.
mongod
instances and facilities.Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.
Furthermore, if your files are all smaller than the 16 MB BSON Document Size
limit, consider storing each file in a single document instead of using GridFS. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.
To store and retrieve files using GridFS, use either of the following:
mongofiles
command-line tool. See the mongofiles
reference for documentation.GridFS stores files in two collections:
chunks
stores the binary chunks. For details, see The chunks Collection.files
stores the file’s metadata. For details, see The files Collection.GridFS places the collections in a common bucket by prefixing each with the bucket name. By default, GridFS uses two collections with a bucket named fs
:
fs.files
fs.chunks
You can choose a different bucket name, as well as create multiple buckets in a single database. The full collection name, which includes the bucket name, is subject to the namespace length limit
.
chunks
Collection¶Each document in the chunks
[1] collection represents a distinct chunk of a file as represented in GridFS. Documents in this collection have the following form:
A document from the chunks
collection contains the following fields:
chunks.
files_id
¶The _id
of the “parent” document, as specified in the files
collection.
chunks.
n
¶The sequence number of the chunk. GridFS numbers all chunks, starting with 0.
files
Collection¶Each document in the files
collection represents a file in GridFS.
Documents in the files
collection contain some or all of the following fields:
files.
_id
¶The unique identifier for this document. The _id
is of the data type you chose for the original document. The default type for MongoDB documents is BSON ObjectId.
files.
length
¶The size of the document in bytes.
files.
chunkSize
¶The size of each chunk in bytes. GridFS divides the document into chunks of size chunkSize
, except for the last, which is only as large as needed. The default size is 255 kilobytes (kB).
files.
uploadDate
¶The date the document was first stored by GridFS. This value has the Date
type.
files.
md5
¶Deprecated
The MD5 algorithm is prohibited by FIPS 140-2. MongoDB drivers deprecate MD5 support and will remove MD5 generation in future releases. Applications that require a file digest should implement it outside of GridFS and store in files.metadata
.
An MD5 hash of the complete file returned by the filemd5 command. This value has the String
type.
files.
filename
¶Optional. A human-readable name for the GridFS file.
files.
contentType
¶Deprecated
Optional. A valid MIME type for the GridFS file. For application use only.
Use files.metadata
for storing information related to the MIME type of the GridFS file.
files.
aliases
¶Deprecated
Optional. An array of alias strings. For application use only.
Use files.metadata
for storing information related to the MIME type of the GridFS file.
files.
metadata
¶Optional. The metadata field may be of any data type and can hold any additional information you want to store. If you wish to add additional arbitrary fields to documents in the files
collection, add them to an object in the metadata field.
GridFS uses indexes on each of the chunks
and files
collections for efficiency. Drivers that conform to the GridFS specification automatically create these indexes for convenience. You can also create any additional indexes as desired to suit your application’s needs.
chunks
Index¶GridFS uses a unique, compound index on the chunks
collection using the files_id
and n
fields. This allows for efficient retrieval of chunks, as demonstrated in the following example:
Drivers that conform to the GridFS specification will automatically ensure that this index exists before read and write operations. See the relevant driver documentation for the specific behavior of your GridFS application.
If this index does not exist, you can issue the following operation to create it using the mongo
shell:
files
Index¶GridFS uses an index on the files
collection using the filename
and uploadDate
fields. This index allows for efficient retrieval of files, as shown in this example:
Drivers that conform to the GridFS specification will automatically ensure that this index exists before read and write operations. See the relevant driver documentation for the specific behavior of your GridFS application.
If this index does not exist, you can issue the following operation to create it using the mongo
shell:
[1] | (1, 2) The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding. |
There are two collections to consider with gridfs - files
and chunks
.
chunks
Collection¶To shard the chunks
collection, use either { files_id : 1, n : 1 }
or { files_id : 1 }
as the shard key index. files_id
is an objectid and changes monotonically.
For MongoDB drivers that do not run filemd5
to verify successful upload (for example, MongoDB drivers that support MongoDB 4.0 or greater), you can use Hashed Sharding for the chunks
collection.
If the MongoDB driver runs filemd5
, you cannot use Hashed Sharding. For details, see SERVER-9888.
files
Collection¶The files
collection is small and only contains metadata. None of the required keys for GridFS lend themselves to an even distribution in a sharded environment. Leaving files
unsharded allows all the file metadata documents to live on the primary shard.
If you must shard the files
collection, use the _id
field, possibly in combination with an application field.