Skip to main content

File Formats文件格式

circo graph of format support

graph legend

Format总体安排Read读取Write写入
Excel Worksheet/Workbook FormatsExcel工作表/工作簿格式:-----::-----:
Excel 2007+ XML Formats (XLSX/XLSM)
Excel 2007+ Binary Format (XLSB BIFF12)
Excel 2003-2004 XML Format (XML "SpreadsheetML")
Excel 97-2004 (XLS BIFF8)
Excel 5.0/95 (XLS BIFF5)
Excel 4.0 (XLS/XLW BIFF4)
Excel 3.0 (XLS BIFF3)
Excel 2.0/2.1 / Multiplan 4.x DOS (XLS BIFF2)
Excel Supported Text FormatsExcel支持的文本格式:-----::-----:
Delimiter-Separated Values分隔符分隔的值 (CSV/TXT)
Data Interchange Format (DIF)
Symbolic Link (SYLK/SLK)
Lotus Formatted Text (PRN)
UTF-16 Unicode Text (TXT)
Other Workbook/Worksheet Formats其他工作簿/工作表格式:-----::-----:
Numbers 3.0+ / iWork 2013+ Spreadsheet (NUMBERS)数字3.0+/iWork 2013+电子表格(数字)
OpenDocument Spreadsheet (ODS)
Flat XML ODF Spreadsheet (FODS)
Uniform Office Format Spreadsheet统一办公格式电子表格 (标文通 UOS1/UOS2)
dBASE II/III/IV / Visual FoxProdBASE II/III/IV/Visual FoxPro (DBF)
Lotus 1-2-3 (WK1/WK3)
Lotus 1-2-3 (WKS/WK2/WK4/123)
Quattro Pro Spreadsheet (WQ1/WQ2/WB1/WB2/WB3/QPW)
Works 1.x-3.x DOS / 2.x-5.x Windows Spreadsheet (WKS)
Works 6.x-9.x Spreadsheet (XLR)
Other Common Spreadsheet Output Formats其他常见的电子表格输出格式:-----::-----:
HTML Tables
Rich Text Format tables (RTF)
Ethercalc Record Format (ETH)

Features not supported by a given file format will not be written. 不会写入给定文件格式不支持的功能。Formats with range limits will be silently truncated:具有范围限制的格式将自动截断:

FormatLast CellMax ColsMax Rows
Excel 2007+ XML Formats (XLSX/XLSM)XFD1048576163841048576
Excel 2007+ Binary Format (XLSB BIFF12)XFD1048576163841048576
Numbers 12.0 (NUMBERS)ALL100000010001000000
Quattro Pro 9+ (QPW)IV10000002561000000
Excel 97-2004 (XLS BIFF8)IV6553625665536
Excel 5.0/95 (XLS BIFF5)IV1638425616384
Excel 4.0 (XLS BIFF4)IV1638425616384
Excel 3.0 (XLS BIFF3)IV1638425616384
Excel 2.0/2.1 (XLS BIFF2)IV1638425616384
Lotus 1-2-3 R2 - R5 (WK1/WK3/WK4)IV81922568192
Lotus 1-2-3 R1 (WKS)IV20482562048

Excel 2003 SpreadsheetML range limits are governed by the version of Excel and are not enforced by the writer.Excel 2003 SpreadsheetML范围限制由Excel版本控制,不由编写器强制执行。

Excel 2007+ XML (XLSX/XLSM)

XLSX and XLSM files are ZIP containers containing a series of XML files in accordance with the Open Packaging Conventions (OPC). XLSX和XLSM文件是包含一系列XML文件的ZIP容器,符合开放式打包约定(OPC)。The XLSM format, almost identical to XLSX, is used for files containing macros.XLSM格式几乎与XLSX相同,用于包含宏的文件。

The format is standardized in ECMA-376 and later in ISO/IEC 29500. 该格式在ECMA-376和后来的ISO/IEC 29500中标准化。Excel does not follow the specification, and there are additional documents discussing how Excel deviates from the specification.Excel不遵循规范,还有其他文件讨论了Excel如何偏离规范。

Excel 2.0-95 (BIFF2/BIFF3/BIFF4/BIFF5)

BIFF 2/3 XLS are single-sheet streams of binary records. BIFF 2/3 XLS是二进制记录的单页流。Excel 4 introduced the concept of a workbook (XLW files) but also had single-sheet XLS format. Excel 4引入了工作簿(XLW文件)的概念,但也有单页XLS格式。The structure is largely similar to the Lotus 1-2-3 file formats. 该结构在很大程度上类似于Lotus 1-2-3文件格式。BIFF5/8/12 extended the format in various ways but largely stuck to the same record format.BIFF5/8/12以各种方式扩展了格式,但在很大程度上坚持了相同的记录格式。

Multiplan 4 "Normal" files are identical in structure to BIFF2 and use the same cell value records. 多平面4“正常”文件在结构上与BIFF2相同,并使用相同的单元格值记录。There are some different record types for more advanced features like Print Settings. 有一些不同的记录类型用于更高级的功能,如打印设置。The BIFF2 writer generates files that can be read in Multiplan 4 and the parser can extract values from "Normal" files.BIFF2编写器生成可以在多平面4中读取的文件,解析器可以从“正常”文件中提取值。

There is no official specification for any of these formats. 这些格式都没有官方规范。Excel 95 can write files in these formats, so record lengths and fields were determined by writing in all of the supported formats and comparing files. Excel 95可以以这些格式写入文件,因此记录长度和字段是通过以所有支持的格式写入并比较文件来确定的。Excel 2016 can generate BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2.Excel 2016可以生成BIFF5文件,支持从XLSX或BIFF2开始的全套文件测试。

Excel 97-2004 Binary (BIFF8)

BIFF8 exclusively uses the Compound File Binary container format, splitting some content into streams within the file. BIFF8专门使用复合文件二进制容器格式,将一些内容拆分为文件内的流。At its core, it still uses an extended version of the binary record format from older versions of BIFF.其核心仍然使用BIFF旧版本的二进制记录格式的扩展版本。

The MS-XLS specification covers the basics of the file format, and other specifications expand on serialization of features like properties.MS-XLS规范涵盖了文件格式的基本知识,其他规范扩展了属性等功能的序列化。

Excel 2003-2004 (SpreadsheetML)

Predating XLSX, SpreadsheetML files are simple XML files. 在XLSX之前,电子表格ML文件是简单的XML文件。There is no official and comprehensive specification, although MS has released documentation on the format. 虽然微软已经发布了有关该格式的文档,但目前还没有正式的全面规范。Since Excel 2016 can generate SpreadsheetML files, mapping features is pretty straightforward.由于Excel 2016可以生成电子表格ML文件,映射功能非常简单。

Excel 2007+ Binary (XLSB, BIFF12)

Introduced in parallel with XLSX, the XLSB format combines the BIFF architecture with the content separation and ZIP container of XLSX. 与XLSX并行引入的XLSB格式将BIFF架构与XLSX的内容分离和ZIP容器相结合。For the most part nodes in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.在大多数情况下,XLSX子文件中的节点可以映射到相应子文件中的XLSB记录。

The MS-XLSB specification covers the basics of the file format, and other specifications expand on serialization of features like properties.MS-XLSB规范涵盖了文件格式的基本知识,其他规范扩展了属性等功能的序列化。

Delimiter-Separated Values分隔符分隔的值 (CSV/TXT)

Excel CSV deviates from RFC4180 in a number of important ways. Excel CSV在许多重要方面与RFC4180不同。The generated CSV files should generally work in Excel although they may not work in RFC4180 compatible readers. 生成的CSV文件通常应在Excel中工作,尽管它们可能无法在RFC4180兼容的读卡器中工作。The parser should generally understand Excel CSV. 解析器通常应该理解Excel CSV。The writer proactively generates cells for formulae if values are unavailable.如果值不可用,编写器会主动为公式生成单元格。

Excel TXT uses tab as the delimiter and code page 1200.Excel TXT使用tab作为分隔符和代码页1200。

Like in Excel, files starting with 0x49 0x44 ("ID") are treated as Symbolic Link files. 与Excel中一样,以0x49 0x44 ("ID")开头的文件被视为符号链接文件。Unlike Excel, if the file does not have a valid SYLK header, it will be proactively reinterpreted as CSV. 与Excel不同,如果文件没有有效的SYLK头,则会主动将其重新解释为CSV。There are some files with semicolon delimiter that align with a valid SYLK file. 有些带有分号分隔符的文件与有效的SYLK文件对齐。For the broadest compatibility, all cells with the value of ID are automatically wrapped in double-quotes.为了实现最广泛的兼容性,所有ID值为的单元格都自动用双引号括起来。

Miscellaneous Workbook Formats其他工作簿格式

Support for other formats is generally far behind XLS/XLSB/XLSX support, due in part to a lack of publicly available documentation. 对其他格式的支持通常远远落后于对XLS/XLSB/XLSX的支持,部分原因是缺乏公开可用的文档。Test files were produced in the respective apps and compared to their XLS exports to determine structure. 测试文件在各自的应用程序中生成,并与XLS导出进行比较,以确定结构。The main focus is data extraction.主要重点是数据提取。

Lotus 1-2-3 (WKS/WK1/WK2/WK3/WK4/123)

The Lotus formats consist of binary records similar to the BIFF structure. Lotus格式由类似于BIFF结构的二进制记录组成。Lotus did release a specification decades ago covering the original WK1 format. Lotus确实在几十年前发布了一个规范,涵盖了原始的WK1格式。Other features were deduced by producing files and comparing to Excel support.通过生成文件并与Excel支持进行比较,推断出其他特征。

Generated WK1 worksheets are compatible with Lotus 1-2-3 R2 and Excel 5.0.生成的WK1工作表与Lotus 1-2-3 R2和Excel 5.0兼容。

Generated WK3 workbooks are compatible with Lotus 1-2-3 R9 and Excel 5.0.生成的WK3工作簿与Lotus 1-2-3 R9和Excel 5.0兼容。

Quattro Pro (WQ1/WQ2/WB1/WB2/WB3/QPW)

The Quattro Pro formats use binary records in the same way as BIFF and Lotus. Quattro Pro格式使用二进制记录的方式与BIFF和Lotus相同。Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like BIFF8 XLS.一些较新的格式(即WB3和QPW)使用循环流化床外壳,就像BIFF8 XLS一样。

Works for DOS / Windows Spreadsheet (WKS/XLR)

All versions of Works were limited to a single worksheet.所有版本的作品仅限于一张工作表。

Works for DOS 1.x - 3.x and Works for Windows 2.x extends the Lotus WKS format with additional record types.适用于DOS 1.x-3.x和Windows 2.x,使用其他记录类型扩展了Lotus WKS格式。

Works for Windows 3.x - 5.x uses the same format and WKS extension. Works for Windows 3.x-5.x使用相同的格式和WKS扩展名。The BOF record has type FF转炉记录的类型为FF

Works for Windows 6.x - 9.x use the XLR format. 适用于Windows 6.x-9.x,使用卡侬格式。XLR is nearly identical to BIFF8 XLS: it uses the CFB container with a Workbook stream. XLR与BIFF8 XLS几乎相同:它使用带有工作簿流的CFB容器。Works 9 saves the exact Workbook stream for the XLR and the 97-2003 XLS export. Works 9为XLR和97-2003 XLS导出保存准确的工作簿流。Works 6 XLS includes two empty worksheets but the main worksheet has an identical encoding. Works 6 XLS包括两个空工作表,但主工作表具有相同的编码。XLR also includes a WksSSWorkBook stream similar to Lotus FM3/FMT files.卡侬还包括类似于Lotus FM3/FMT文件的Wkssworkbook流。

Numbers 3.0+ / iWork 2013+ Spreadsheet (NUMBERS)

iWork 2013 (Numbers 3.0 / Pages 5.0 / Keynote 6.0) switched from a proprietary XML-based format to the current file format based on the iWork Archive (IWA). iWork 2013(数字3.0/Pages 5.0/Keynote 6.0)从基于XML的专有格式切换到基于iWork Archive(IWA)的当前文件格式。This format has been used up through the current release (Numbers 11.2).此格式已在当前版本中用完(数字11.2)。

The parser focuses on extracting raw data from tables. 解析器的重点是从表中提取原始数据。Numbers technically supports multiple tables in a logical worksheet, including custom titles. 数字技术上支持逻辑工作表中的多个表,包括自定义标题。This parser will generate one worksheet per Numbers table.该解析器将为每个数字表生成一个工作表。

The writer currently exports a small range from the first worksheet.作者目前从第一个工作表导出一个小范围。

OpenDocument Spreadsheet OpenDocument电子表格(ODS/FODS)

ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to SpreadsheetML. ODS是一种类似于XLSX的ZIP格式的XML,而FODS是一种类似于SpreadsheetML的XML格式。Both are detailed in the OASIS standard, but tools like LO/OO add undocumented extensions. 这两者都在OASIS标准中有详细说明,但像LO/OO这样的工具添加了未记录的扩展。The parsers and writers do not implement the full standard, instead focusing on parts necessary to extract and store raw data.解析器和编写器并没有实现完整的标准,而是专注于提取和存储原始数据所需的部分。

Uniform Office Spreadsheet统一办公电子表格 (UOS1/2)

UOS is a very similar format, and it comes in 2 varieties corresponding to ODS and FODS respectively. UOS是一种非常相似的格式,它有两种分别对应于ODS和FOD的变体。For the most part, the difference between the formats is in the names of tags and attributes.在大多数情况下,格式之间的差异在于标记和属性的名称。

Miscellaneous Worksheet Formats其他工作表格式

Many older formats supported only one worksheet:许多旧格式只支持一个工作表:

dBASE and Visual FoxProdBASE和Visual FoxPro (DBF)

DBF is really a typed table format: each column can only hold one data type and each record omits type information. DBF实际上是一种类型化的表格式:每个列只能保存一种数据类型,而每个记录都省略了类型信息。The parser generates a header row and inserts records starting at the second row of the worksheet. 解析器生成标题行并插入从工作表第二行开始的记录。The writer makes files compatible with Visual FoxPro extensions.作者使文件与Visual FoxPro扩展兼容。

Multi-file extensions like external memos and tables are currently unsupported, limited by the general ability to read arbitrary files in the web browser. 目前不支持外部备忘录和表格等多文件扩展名,这受到在web浏览器中读取任意文件的一般能力的限制。The reader understands DBF Level 7 extensions like DATETIME.读者理解DBF 7级扩展,如DATETIME。

https://oss.sheetjs.com/notes/sylk/ is an informal specification based on our experimentation and previous documentation efforts.是基于实验和以前的文档工作的非正式规范。

Lotus Formatted TextLotus格式的文本 (PRN)

There is no real documentation, and in fact Excel treats PRN as an output-only file format. 没有真正的文档,事实上Excel将PRN视为仅输出的文件格式。 Nevertheless we can guess the column widths and reverse-engineer the original layout. 然而,我们可以猜测列宽并对原始布局进行反向工程。Excel's 240 character width limitation is not enforced.Excel的240字符宽度限制未强制执行。

Data Interchange Format (DIF)数据交换格式(DIF)

There is no unified definition. 没有统一的定义。Visicalc DIF differs from Lotus DIF, and both differ from Excel DIF. Visicalc DIF不同于Lotus DIF,也不同于Excel DIF。Where ambiguous, the parser/writer follows the expected behavior from Excel. 在不明确的地方,解析器/编写器遵循Excel中的预期行为。 In particular, Excel extends DIF in incompatible ways:特别是,Excel以不兼容的方式扩展了DIF:

  • Since Excel automatically converts numbers-as-strings to numbers, numeric string constants are converted to formulae: 由于Excel会自动将数字作为字符串转换为数字,因此数字字符串常数会转换为公式:"0.3" -> "=""0.3""
  • DIF technically expects numeric cells to hold the raw numeric data, but Excel permits formatted numbers (including dates)DIF技术上希望数字单元格保存原始数字数据,但Excel允许格式化数字(包括日期)
  • DIF technically has no support for formulae, but Excel will automatically convert plain formulae. DIF技术上不支持公式,但Excel将自动转换普通公式。Array formulae are not preserved.不保留数组公式。

HTML

Excel HTML worksheets include special metadata encoded in styles. Excel HTML工作表包括以样式编码的特殊元数据。For example, mso-number-format is a localized string containing the number format. 例如,mso-number-format是包含数字格式的本地化字符串。Despite the metadata the output is valid HTML, although it does accept bare & symbols.尽管有元数据,但输出是有效的HTML,尽管它确实接受裸&符号。

The writer adds type metadata to the TD elements via the t tag. 编写器通过t标记将类型元数据添加到TD元素。The parser looks for those tags and overrides the default interpretation. 解析器查找这些标记并覆盖默认解释。For example, text like <td>12345</td> will be parsed as numbers but <td t="s">12345</td> will be parsed as text.例如,像<td>12345</td>这样的文本将被解析为数字,但<td t="s">12345</td>将被解析为文本。

Rich Text Format (RTF)RTF格式

Excel RTF worksheets are stored in clipboard when copying cells or ranges from a worksheet. 从工作表中复制单元格或区域时,Excel RTF工作表存储在剪贴板中。The supported codes are a subset of the Word RTF support.支持的代码是文字RTF支持的子集。

Ethercalc Record FormatEthercalc记录格式 (ETH)

Ethercalc is an open source web spreadsheet powered by a record format reminiscent of SYLK wrapped in a MIME multi-part message.是一个开源的web电子表格,由一种类似于封装在MIME多部分消息中的SYLK的记录格式提供支持。