Parsing Options解析选项
XLSX.read(data, read_opts)
attempts to parse 尝试解析data
.data
。
XLSX.readFile(filename, read_opts)
attempts to read 尝试读取filename
and parse.filename
并进行解析。
The read functions accept an options argument:read函数接受options参数:
type | ||
raw | false | true ,纯文本解析将不会解析值 ** |
codepage | ||
cellFormula | true | .f field.f 字段 |
cellHTML | true | .h field.h 字段 |
cellNF | false | .z field.z 字段 |
cellStyles | false | .s field.s 字段 |
cellText | true | .w field.w 字段 |
cellDates | false | d (default is n )d (默认值为n ) |
dateNF | ||
sheetStubs | false | z for stub cellsz 类型的单元格对象 |
sheetRows | 0 | sheetRows rowssheetRows 第一行 ** |
bookDeps | false | true ,则分析计算链 |
bookFiles | false | true ,则将原始文件添加到book对象 ** |
bookProps | false | true ,则仅解析足够获取书籍元数据 ** |
bookSheets | false | true ,则只进行足够的解析以获取工作表名称 |
bookVBA | false | vbaraw fieldtrue ,则将VBA blob复制到vbaraw 字段 ** |
password | "" | |
WTF | false | true ,则在意外的文件功能上引发错误 ** |
sheets | ||
PRN | false | true ,则允许分析PRN文件 ** |
xlfn | false | _xlfn. prefixes in formulaetrue ,则公式中保留_xlfn. 的前缀 ** |
FS |
Even if即使cellNF
is false, formatted text will be generated and saved to.w
cellNF
为false
,也会生成格式化文本并保存到.w
In some cases, sheets may be parsed even if在某些情况下,即使bookSheets
is false.bookSheets
是false
,也可以解析书页。Excel aggressively tries to interpret values from CSV and other plain text.Excel积极尝试从CSV和其他纯文本中解释值。This leads to surprising behavior!这会导致令人惊讶的行为!Theraw
option suppresses value parsing.raw
选项抑制值解析。bookSheets
andbookProps
combine to give both sets of informationbookSheets
和bookProps
结合在一起提供了这两组信息如果Deps
will be an empty object ifbookDeps
is falsebookDeps
为false
,则Deps
将为空对象bookFiles
behavior depends on file type:行为取决于文件类型:基于ZIP格式的keys
array (paths in the ZIP) for ZIP-based formatskeys
数组(ZIP中的路径)ZIP的files
hash (mapping paths to objects representing the files) for ZIPfiles
哈希(映射路径到表示文件的对象)使用CFB容器的格式的cfb
object for formats using CFB containerscfb
对象
查看JSON对象输出时将生成sheetRows-1
rows will be generated when looking at the JSON object output (since the header row is counted as a row when parsing the data)sheetRows-1
行(因为在解析数据时,标题行被计为一行)By default all worksheets are parsed.默认情况下,分析所有工作表。基于输入类型的图纸sheets
restricts based on input type:sheets
:number: zero-based index of worksheet to parse (数字:要分析的工作表的从零开始的索引(0
is first worksheet)0
是第一个工作表)string: name of worksheet to parse (case insensitive)字符串:要分析的工作表的名称(不区分大小写)array of numbers and strings to select multiple worksheets.用于选择多个工作表的数字和字符串数组。
bookVBA
merely exposes the raw VBA CFB object.仅显示原始VBA CFB对象。It does not parse the data.它不解析数据。XLSM and XLSB store the VBA CFB object inXLSM和XLSB将VBA CFB对象存储在xl/vbaProject.bin
.xl/vbaProject.bin
中。BIFF8 XLS mixes the VBA entries alongside the core Workbook entry, so the library generates a new XLSB-compatible blob from the XLS CFB container.BIFF8 XLS将VBA条目与核心工作簿条目混合在一起,因此库从XLS CFB容器生成一个新的与XLSB兼容的blob。codepage
is applied to BIFF2 - BIFF5 files without适用于没有CodePage
records and to CSV files without BOM intype:"binary"
.CodePage
记录的BIFF2-BIFF5文件和类型为type:"binary"
的没有BOM的CSV文件。BIFF8 XLS always defaults to 1200.BIFF8 XLS始终默认为1200。PRN
affects parsing of text files without a common delimiter character.影响没有公共分隔符的文本文件的分析。Currently only XOR encryption is supported.目前只支持异或加密。Unsupported error will be thrown for files employing other encryption methods.对于使用其他加密方法的文件,将抛出不支持的错误。Newer Excel functions are serialized with the较新的Excel函数用_xlfn.
prefix, hidden from the user._xlfn.
前缀序列化,对用户隐藏。SheetJS will stripSheetJS将正常地剥离_xlfn.
normally._xlfn.
。Thexlfn
option preserves them.xlfn
选项保留了它们。WTF is mainly for development.WTF主要用于开发。By default, the parser will suppress read errors on single worksheets, allowing you to read from the worksheets that do parse properly.默认情况下,解析器将抑制单个工作表上的读取错误,允许您从正确解析的工作表中读取。Setting设置WTF:true
forces those errors to be thrown.WTF:true
将强制抛出这些错误。
Input Type输入类型
Strings can be interpreted in multiple ways. 字符串可以用多种方式解释。The type
parameter for read
tells the library how to parse the data argument:read
的type
参数告诉库如何解析数据参数:
type | expected input |
---|---|
"base64" | |
"binary" | n is data.charCodeAt(n) )n 是data.charCodeAt(n) ) |
"string" | |
"buffer" | nodejs Buffer |
"array" | n is data[n] )n 是data[n] ) |
"file" |
Guessing File Type猜测文件类型
Implementation Details实施细节 (click to show)
Excel and other spreadsheet tools read the first few bytes and apply other heuristics to determine a file type. Excel和其他电子表格工具读取前几个字节,并应用其他启发式方法来确定文件类型。This enables file type punning: renaming files with the 这将启用文件类型双关:使用.xls
extension will tell your computer to use Excel to open the file but Excel will know how to handle it. .xls
扩展名重命名文件将告诉您的计算机使用Excel打开文件,但Excel将知道如何处理它。This library applies similar logic:该库应用了类似的逻辑:
Byte 0 | Raw File Type | |
---|---|---|
0xD0 | CFB Container | BIFF 5/8 or protected XLSX/XLSB or WQ3/QPW or XLR |
0x09 | BIFF Stream | BIFF 2/3/4/5 |
0x3C | XML/HTML | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x50 | ZIP Archive | XLSB or XLSX/M or ODS or UOS2 or NUMBERS or text |
0x49 | Plain Text | SYLK or plain text |
0x54 | Plain Text | DIF or plain text |
0xEF | UTF8 Encoded | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0xFF | UTF16 Encoded | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x00 | Record Stream | Lotus WK* or Quattro Pro or plain text |
0x7B | Plain text | RTF or plain text |
0x0A | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x0D | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x20 | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
DBF files are detected based on the first byte as well as the third and fourth bytes (corresponding to month and day of the file date)DBF文件基于第一个字节以及第三和第四个字节(对应于文件日期的月份和日期)进行检测
Works for Windows files are detected based on the BOF record with type 基于类型为0xFF
0xFF
的BOF记录检测Windows文件的工作
Plain text format guessing follows the priority order:纯文本格式猜测遵循优先级顺序:
Format | Test |
---|---|
XML | <?xml |
HTML | < and HTML tags appear in the first 1024 characters< 开头,HTML标记出现在前1024个字符中 * |
XML | < and the first tag is valid< 开头,第一个标记有效 |
RTF | {\rt {\rt 开头 |
DSV | /sep=.$/ , separator is the specified character/sep=.$/ 开头,分隔符是指定的字符 |
DSV | ` |
DSV | ; chars than \t or , in the first 1024; 字符比\t 和, 字符多 |
TSV | \t chars than , chars in the first 1024\t 字符比, 字符多 |
CSV | "," |
ETH | socialcalc:version: socialcalc:version: 开头 |
PRN | PRN true |
CSV | (fallback) |
HTML tags include:HTML标记包括:html
,table
,head
,meta
,script
,style
,div
html
、table
、head
、meta
、script
、style
、div
Why are random text files valid?为什么随机文本文件有效? (click to show)
Excel is extremely aggressive in reading files. Excel在读取文件方面非常积极。Adding an XLS extension to any display text file (where the only characters are ANSI display chars) tricks Excel into thinking that the file is potentially a CSV or TSV file, even if it is only one column! 将XLS扩展名添加到任何显示文本文件(其中唯一的字符是ANSI显示字符)会诱使Excel认为该文件可能是CSV或TSV文件,即使它只是一列!This library attempts to replicate that behavior.这个库试图复制这种行为。
The best approach is to validate the desired worksheet and ensure it has the expected number of rows or columns. 最好的方法是验证所需的工作表,并确保其具有预期的行数或列数。Extracting the range is extremely simple:提取范围非常简单:
var range = XLSX.utils.decode_range(worksheet['!ref']);
var ncols = range.e.c - range.s.c + 1, nrows = range.e.r - range.s.r + 1;