Skip to main content

Reading Files读取文件

XLSX.read(data, options)

read attempts to parse data and return a workbook object尝试分析data并返回工作簿对象

The type of the options object determines how data is interpreted. For string data, the default interpretation is Base64.options对象的type决定了如何解释data。对于字符串数据,默认解释为Base64。

XLSX.readFile(filename, options)

readFile attempts to read a local file with specified filename.尝试读取具有指定filename的本地文件。

This method only works in specific environments. It does not work in browsers!此方法仅适用于特定环境。它在浏览器中不起作用!

The NodeJS installation note includes additional instructions for non-standard use cases.NodeJS安装说明包括非标准用例的附加说明。

Parsing Options分析选项

The read functions accept an options argument:read函数接受一个options参数:

Option Name选项名称Default违约Description描述
typeInput data encoding (see Input Type below)输入数据编码(请参阅下面的输入类型)
rawfalseIf true, plain text parsing will not parse values如果为true,则纯文本解析将不解析值 **
densefalseIf true, use a dense worksheet representation如果为true,请使用密集工作表表示 **
codepageIf specified, use code page when appropriate如果指定,在适当的时候使用代码页 **
cellFormulatrueSave formulae to the .f field将公式保存到.f字段
cellHTMLtrueParse rich text and save HTML to the .h field分析富文本并将HTML保存到.h字段
cellNFfalseSave number format string to the .z field将数字格式字符串保存到.z字段
cellStylesfalseSave style/theme info to the .s field将样式/主题信息保存到.s字段
cellTexttrueGenerated formatted text to the .w field.w字段生成格式化文本
cellDatesfalseStore dates as type d (default is n)将日期存储为类型d(默认为n
dateNFIf specified, use the string for date code 14如果指定,请使用日期代码14的字符串 **
sheetStubsfalseCreate cell objects of type z for stub cells为存根单元格创建z类型的单元格对象
sheetRows0If >0, read the first sheetRows rows如果>0,则读取第一个sheetRows **
bookDepsfalseIf true, parse calculation chains如果为true,则解析计算链
bookFilesfalseIf true, add raw files to book object如果为true,则将原始文件添加到book对象 **
bookPropsfalseIf true, only parse enough to get book metadata如果为true,则只进行足够的解析以获取书籍元数据 **
bookSheetsfalseIf true, only parse enough to get the sheet names如果为true,则仅进行足够的解析以获取图纸名称
bookVBAfalseIf true, copy VBA blob to vbaraw field如果为true,则将VBA blob复制到vbaraw字段 **
password""If defined and file is encrypted, use password如果已定义并且文件已加密,请使用密码 **
WTFfalseIf true, throw errors on unexpected file features如果为true,则在意外的文件功能上引发错误 **
sheetsIf specified, only parse specified sheets如果指定,则仅分析指定的图纸 **
PRNfalseIf true, allow parsing of PRN files如果为true,则允许解析PRN文件 **
xlfnfalseIf true, preserve _xlfn. prefixes in formulae如果为true,则保留公式中的_xlfn.前缀 **
FSDSV Field Separator overrideDSV字段分隔符覆盖
UTCtrueIf explicitly false, parse text dates in local time如果显式为false,则以本地时间解析文本日期
  • Even if cellNF is false, formatted text will be generated and saved to .w即使cellNFfalse,也会生成格式化文本并将其保存到.w
  • In some cases, sheets may be parsed even if bookSheets is false.在某些情况下,即使bookSheetsfalse,也可能解析图纸。
  • Excel aggressively tries to interpret values from CSV and other plain text. This leads to surprising behavior! Excel积极尝试解释CSV和其他纯文本中的值。这导致了令人惊讶的行为!The raw option suppresses value parsing.raw选项禁止进行值解析。
  • bookSheets and bookProps combine to give both sets of informationbookSheetsbookProps结合起来提供两组信息
  • Deps will be an empty object if bookDeps is false如果bookDepsfalseDeps将为空对象
  • bookFiles behavior depends on file type:行为取决于文件类型:
    • keys array (paths in the ZIP) for ZIP-based formats基于ZIP格式的keys数组(ZIP中的路径)
    • files hash (mapping paths to objects representing the files) for ZIPZIP的hash(将路径映射到表示文件的对象)
    • cfb object for formats using CFB containers使用CFB容器的格式的对象
  • sheetRows-1 rows will be generated when looking at the JSON object output (since the header row is counted as a row when parsing the data)在查看JSON对象输出时将生成行(因为解析数据时头行被算作一行)
  • By default all worksheets are parsed. 默认情况下,将解析所有工作表。sheets restricts based on input type:根据输入类型进行限制:
    • number: zero-based index of worksheet to parse (0 is first worksheet)number:要分析的工作表的从零开始的索引(0是第一个工作表)
    • string: name of worksheet to parse (case insensitive)string:要分析的工作表的名称(不区分大小写)
    • array of numbers and strings to select multiple worksheets.数字和字符串的数组,以选择多个工作表。
  • bookVBA merely exposes the raw VBA CFB object. 仅公开原始VBA CFB对象。It does not parse the data. XLSM and XLSB store the VBA CFB object in xl/vbaProject.bin. 它不解析数据。XLSM和XLSB将VBA CFB对象存储在xl/vbaProject.bin中。BIFF8 XLS mixes the VBA entries alongside the core Workbook entry, so the library generates a new blob from the XLS CFB container that works in XLSM and XLSB files.BIFF8XLS将VBA条目与核心工作簿条目混合在一起,因此库从XLSCFB容器生成一个新的blob,该容器可在XLSM和XLSB文件中工作。
  • codepage is applied to BIFF2 - BIFF5 files without CodePage records and to CSV files without BOM in type:"binary". 应用于没有CodePage记录的BIFF2-BIFF5文件和没有BOM的type:"binary"的CSV文件。BIFF8 XLS always defaults to 1200.BIFF8 XLS始终默认为1200。
  • PRN affects parsing of text files without a common delimiter character.影响对没有公共分隔符的文本文件的解析。
  • Currently only XOR encryption is supported. 目前仅支持XOR加密。Unsupported error will be thrown for files employing other encryption methods.对于使用其他加密方法的文件,将引发不支持的错误。
  • Newer Excel functions are serialized with the _xlfn. prefix, hidden from the user. 较新的Excel函数使用_xlfn.前缀进行序列化,对用户隐藏。SheetJS will strip _xlfn. normally. The xlfn option preserves them.SheetJS将正常地剥离_xlfn.xlfn选项将保留它们。
  • WTF is mainly for development. 主要用于发展。By default, the parser will suppress read errors on single worksheets, allowing you to read from the worksheets that do parse properly. 默认情况下,解析器将抑制单个工作表上的读取错误,允许您从正确解析的工作表中读取。Setting WTF:true forces those errors to be thrown.设置WTF:true将强制抛出这些错误。
  • By default, "sparse" mode worksheets are generated. Individual cells are accessed by indexing the worksheet object with an A1-Style address. "dense" worksheets store cells in an array of arrays at sheet["!data"].默认情况下,会生成“稀疏”模式的工作表。通过使用A1样式地址对工作表对象进行索引,可以访问各个单元格。“密集”工作表将单元格存储在sheet["!data"]的数组中。
  • UTC applies to CSV, Text and HTML formats. 适用于CSV、文本和HTML格式。When explicitly set to false, the parsers will assume the files are specified in local time. 当显式设置为false时,解析器将假定文件是在本地时间指定的。By default, as is the case for other file formats, dates and times are interpreted in UTC.默认情况下,与其他文件格式一样,日期和时间以UTC表示。

Input Type输入类型

Strings can be interpreted in multiple ways. 字符串可以用多种方式解释。The type parameter for read tells the library how to parse the data argument:readtype参数告诉库如何解析数据参数:

typeexpected input预期输入
"base64"string: Base64 encoding of the filestring:文件的Base64编码
"binary"string: binary string (byte n is data.charCodeAt(n))string:二进制字符串(字节ndata.charCodeAt(n)
"string"string: JS string (only appropriate for UTF-8 text formats)string:JS字符串(仅适用于UTF-8文本格式)
"buffer"nodejs Buffer
"array"array: array of 8-bit unsigned integers (byte n is data[n])array:8位无符号整数的数组(字节ndata[n]
"file"string: path of file that will be read (nodejs only)string:将被读取的文件的路径(仅限nodejs)

Some common types are automatically deduced from the data input type, including NodeJS Buffer objects, Uint8Array and ArrayBuffer objects, and arrays of numbers.一些常见的类型是从数据输入类型自动推导出来的,包括NodeJS Buffer对象、Uint8ArrayArrayBuffer对象以及数字数组。

When a JS string is passed with no type, the library assumes the data is a Base64 string. 当传递的JSstring没有type时,库会假定数据是Base64字符串。FileReader#readAsBinaryString or ASCII data requires "binary" type. 或ASCII数据需要"binary"类型。DOM strings including FileReader#readAsText should use type "string".包括FileReader#readAsText在内的DOM字符串应使用类型"string"

Guessing File Type猜测文件类型

Implementation Details实施细节 (click to show)

Excel and other spreadsheet tools read the first few bytes and apply other heuristics to determine a file type. Excel和其他电子表格工具读取前几个字节,并应用其他启发式方法来确定文件类型。This enables file type punning: renaming files with the .xls extension will tell your computer to use Excel to open the file but Excel will know how to handle it. This library applies similar logic:这允许文件类型双关:重命名扩展名为.xls的文件会告诉你的计算机使用Excel打开文件,但Excel会知道如何处理它。这个库应用了类似的逻辑:

Byte 0Raw File TypeSpreadsheet Types电子表格类型
0xD0CFB ContainerBIFF 5/8 or protected XLSX/XLSB or WQ3/QPW or XLRBIFF 5/8或受保护的XLSX/XLSB或WQ3/QPW或XLR
0x09BIFF StreamBIFF 2/3/4/5
0x3CXML/HTMLSpreadsheetML / Flat ODS / UOS1 / HTML / plain text电子表格ML/平面ODS/UOS1/HTML/纯文本
0x50ZIP ArchiveXLSB or XLSX/M or ODS or UOS2 or NUMBERS or textXLSB或XLSX/M或ODS或UOS2或NUMBERS或text
0x49Plain TextSYLK or plain text
0x54Plain TextDIF or plain text
0xEFUTF-8 TextSpreadsheetML / Flat ODS / UOS1 / HTML / plain text
0xFFUTF-16 TextSpreadsheetML / Flat ODS / UOS1 / HTML / plain text
0x00Record StreamLotus WK* or Quattro Pro or plain text
0x7BPlain textRTF or plain text
0x0APlain textSpreadsheetML / Flat ODS / UOS1 / HTML / plain text
0x0DPlain textSpreadsheetML / Flat ODS / UOS1 / HTML / plain text
0x20Plain textSpreadsheetML / Flat ODS / UOS1 / HTML / plain text

DBF files are detected based on the first byte as well as the third and fourth bytes (corresponding to month and day of the file date)DBF文件是基于第一个字节以及第三个和第四个字节(对应于文件日期的月份和日期)来检测的

Works for Windows files are detected based on the BOF record with type 0xFF基于类型为0xFFBOF记录检测到适用于Windows的Works文件

Plain text format guessing follows the priority order:纯文本格式猜测遵循优先级顺序:

FormatTest
XML<?xml appears in the first 1024 characters出现在前1024个字符中
HTMLstarts with < and HTML tags appear in the first 1024 characters<开头,HTML标记出现在前1024个字符中 *
XMLstarts with < and the first tag is valid<开头,第一个标记有效
RTFstarts with {\rt{\rt开头
DSVstarts with /sep=.$/, separator is the specified character/sep=.$/开头,分隔符是指定的字符
DSVmore unquoted 更多未成对的`
DSVmore unquoted ; chars than \t or , in the first 1024在前1024个字符中未成对的;字符多于\t,
TSVmore unquoted \t chars than , chars in the first 1024前1024个字符中的未成对的\t多于,
CSVone of the first 1024 characters is a comma ","前1024个字符之一是逗号","
ETHstarts with socialcalc:version:socialcalc:version:开始
PRNPRN option is set to true选项设置为true
CSV(fallback)
  • HTML tags include: html, table, head, meta, script, style, divHTML标签包括:htmltableheadmetascriptstylediv
Why are random text files valid?为什么随机文本文件有效? (click to hide)

Excel is extremely aggressive in reading files. Excel在读取文件方面极为激进。Adding an XLS extension to any display text file (where the only characters are ANSI display chars) tricks Excel into thinking that the file is potentially a CSV or TSV file, even if it is only one column! This library attempts to replicate that behavior.向任何显示文本文件添加XLS扩展名(其中唯一的字符是ANSI显示字符)会诱使Excel认为该文件可能是CSV或TSV文件,即使它只有一列!此库试图复制该行为。

The best approach is to validate the desired worksheet and ensure it has the expected number of rows or columns. Extracting the range is extremely simple:最好的方法是验证所需的工作表,并确保它具有预期的行数或列数。提取范围非常简单:

var range = XLSX.utils.decode_range(worksheet['!ref']);
var ncols = range.e.c - range.s.c + 1, nrows = range.e.r - range.s.r + 1;