Large Datasets大型数据集
For maximal compatibility, the library reads entire files at once and generates files at once. Browsers and other JS engines enforce tight memory limits. In these cases, the library offers strategies to optimize for memory or space by using platform-specific APIs.为了实现最大的兼容性,库一次读取整个文件并一次生成文件。浏览器和其他JS引擎强制执行严格的内存限制。在这些情况下,库提供了通过使用特定于平台的API来优化内存或空间的策略。
Dense Mode密集模式
read
, readFile
and aoa_to_sheet
accept the dense
option. read
、readFile
和aoa_to_sheet
接受密集选项。When enabled, the methods create worksheet objects that store cells in arrays of arrays:启用后,这些方法将创建工作表对象,将单元格存储在数组中:
var dense_wb = XLSX.read(ab, {dense: true});
var dense_sheet = XLSX.utils.aoa_to_sheet(aoa, {dense: true});
Historical Note历史记录 (click to show)
The earliest versions of the library aimed for IE6+ compatibility. In early testing, both in Chrome 26 and in IE6, the most efficient worksheet storage for small sheets was a large object whose keys were cell addresses.该库的最早版本旨在实现IE6+兼容性。在早期的测试中,无论是在Chrome 26还是在IE6中,对于小表格来说,最有效的工作表存储是一个大对象,其键是单元格地址。
Over time, V8 (the engine behind Chrome and NodeJS) evolved in a way that made the array of arrays approach more efficient but reduced the performance of the large object approach.随着时间的推移,V8(Chrome和NodeJS背后的引擎)的发展使数组方法更加高效,但降低了大对象方法的性能。
In the interest of preserving backwards compatibility, the library opts to make the array of arrays approach available behind a special 为了保持向后兼容性,库选择在一个特殊的dense
option.dense
选项后面提供数组方法。
The various API functions will seamlessly handle dense and sparse worksheets.各种API函数将无缝处理密集和稀疏的工作表。
Streaming Write流式写入
The streaming write functions are available in the 流写入功能在XLSX.stream
object. They take the same arguments as the normal write functions:XLSX.stream
对象中可用。它们采用与正常写入函数相同的参数:
XLSX.stream.to_csv
is the streaming version of是XLSX.utils.sheet_to_csv
.XLSX.utils.sheet_to_csv
的流媒体版本。XLSX.stream.to_html
is the streaming version of是XLSX.utils.sheet_to_html
.XLSX.utils.sheet_to_html
的流媒体版本。XLSX.stream.to_json
is the streaming version of是XLSX.utils.sheet_to_json
.XLSX.utils.sheet_to_json
的流媒体版本。
"Stream" refers to the NodeJS push streams API.“流”是指NodeJS推送流API。
Historical Note (click to show)
NodeJS push streams were introduced in 2012. The text streaming methods NodeJS推送流于2012年推出。NodeJS v0.10及更高版本支持文本流方法to_csv
and to_html
are supported in NodeJS v0.10 and later while the object streaming method to_json
is supported in NodeJS v0.12 and later.to_csv
和to_html
,而NodeJS v0.12及更高版支持对象流方法to_json
。
The first streaming write function, 第一个流媒体写入功能to_csv
, was introduced in early 2017. It used and still uses the same NodeJS streaming API.to_csv
于2017年初推出。它使用并仍然使用相同的NodeJS流API。
Years later, browser vendors are settling on a different stream API.多年后,浏览器供应商开始采用不同的流API。
For maximal compatibility, the library uses NodeJS push streams.为了实现最大的兼容性,该库使用NodeJS推送流。
NodeJS
In a CommonJS context, NodeJS Streams and 在CommonJS上下文中,NodeJS Streams和fs
immediately work with SheetJS:fs
可以立即使用SheetJS:
const XLSX = require("xlsx"); // "just works"
In NodeJS ESM, the dependency must be loaded manually:在NodeJS ESM中,依赖项必须手动加载:
import * as XLSX from 'xlsx';
import { Readable } from 'stream';
XLSX.stream.set_readable(Readable); // manually load stream helpers
Additionally, for file-related operations in NodeJS ESM, 此外,对于NodeJS ESM中与文件相关的操作,必须加载fs
must be loaded:fs
:
import * as XLSX from 'xlsx';
import * as fs from 'fs';
XLSX.set_fs(fs); // manually load fs helpers
It is strongly encouraged to use CommonJS in NodeJS whenever possible.强烈建议尽可能在NodeJS中使用CommonJS。
XLSX.stream.to_csv
This example reads a worksheet passed as an argument to the script, pulls the first worksheet, converts to CSV and writes to 此示例读取作为参数传递给脚本的工作表,提取第一个工作表,转换为CSV并写入SheetJSNodeJStream.csv
:SheetJSNodeJStream.csv
:
var XLSX = require("xlsx"), fs = require("fs");
var wb = XLSX.readFile(process.argv[2]);
var ws = wb.Sheets[wb.SheetNames[0]];
var ostream = fs.createWriteStream("SheetJSNodeJStream.csv");
XLSX.stream.to_csv(ws).pipe(ostream);
XLSX.stream.to_json
stream.to_json
uses Object-mode streams. 使用对象模式流。A Transform
stream can be used to generate a normal stream for streaming to a file or the screen:Transform
流可用于生成用于流式传输到文件或屏幕的正常流:
var XLSX = require("xlsx"), Transform = require("stream").Transform;
var wb = XLSX.readFile(process.argv[2], {dense: true});
var ws = wb.Sheets[wb.SheetNames[0]];
/* this Transform stream converts JS objects to text此转换流将JS对象转换为文本 */
var conv = new Transform({writableObjectMode:true});
conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); };
/* pipe `to_json` -> transformer -> standard output */
XLSX.stream.to_json(ws, {raw: true}).pipe(conv).pipe(process.stdout);
Demo
This demo was last tested in the following deployments:此演示上次在以下部署中进行了测试:
Node Version | Date | Node Status when tested |
---|---|---|
0.12.18 | 2023-09-02 | End-of-Life |
4.9.1 | 2023-09-02 | End-of-Life |
6.17.1 | 2023-09-02 | End-of-Life |
8.17.0 | 2023-09-02 | End-of-Life |
10.24.1 | 2023-09-02 | End-of-Life |
12.22.12 | 2023-09-02 | End-of-Life |
14.21.3 | 2023-09-02 | End-of-Life |
16.20.0 | 2023-09-02 | Maintenance LTS |
18.17.1 | 2023-09-02 | Active LTS |
20.5.1 | 2023-09-02 | Current |
While streaming methods work in End-of-Life versions of NodeJS, production deployments should upgrade to a Current or LTS version of NodeJS.虽然流媒体方法在NodeJS的报废版本中工作,但生产部署应该升级到NodeJS的当前或LTS版本。
1) Install the 安装NodeJS module
npm i --save https://cdn.sheetjs.com/xlsx-0.20.1/xlsx-0.20.1.tgz
2) Download SheetJSNodeJStream.js:下载表SheetJSNodeJStream.js:
curl -LO https://docs.sheetjs.com/stream/SheetJSNodeJStream.js
3) Download the test file:下载测试文件:
curl -LO https://sheetjs.com/pres.xlsx
4) Run the script:运行脚本:
node SheetJSNodeJStream.js pres.xlsx
Expected Output (click to show)
The console will display a list of objects:控制台将显示对象列表:
{"Name":"Bill Clinton","Index":42}
{"Name":"GeorgeW Bush","Index":43}
{"Name":"Barack Obama","Index":44}
{"Name":"Donald Trump","Index":45}
{"Name":"Joseph Biden","Index":46}
The script will also generate 该脚本还将生成SheetJSNodeJStream.csv
:SheetJSNodeJStream.csv
:
Name,Index
Bill Clinton,42
GeorgeW Bush,43
Barack Obama,44
Donald Trump,45
Joseph Biden,46
Browser浏览器
The live demo was last tested on 2023-09-02 in Chromium 116.现场演示最后一次测试是在2023-09-02的Chromium 116中。
NodeJS streaming APIs are not available in the browser. The following function supplies a pseudo stream object compatible with the NodeJS流式API在浏览器中不可用。以下函数提供与to_csv
function:to_csv
函数兼容的伪流对象:
function sheet_to_csv_cb(ws, cb, opts, batch = 1000) {
XLSX.stream.set_readable(() => ({
__done: false,
// this function will be assigned by the SheetJS stream methods该函数将由SheetJS流方法分配
_read: function() { this.__done = true; },
// this function is called by the stream methods该函数由流方法调用
push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; },
resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); }
}));
return XLSX.stream.to_csv(ws, opts);
}
// assuming `workbook` is a workbook, stream the first sheet
const ws = workbook.Sheets[workbook.SheetNames[0]];
const strm = sheet_to_csv_cb(ws, (csv)=>{ if(csv != null) console.log(csv); });
strm.resume();
Web Workers
For processing large files in the browser, it is strongly encouraged to use Web Workers. 对于在浏览器中处理大型文件,强烈建议使用Web Workers。The Worker demo includes examples using the File System Access API.Worker演示包括使用文件系统访问API的示例。
Web Worker Details (click to show)
Typically, the file and stream processing occurs in the Web Worker. CSV rows can be sent back to the main thread in the callback:通常,文件和流处理发生在Web Worker中。CSV行可以在回调中发送回主线程:
/* load standalone script from CDN从CDN加载独立脚本 */
importScripts("https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/xlsx.full.min.js");
function sheet_to_csv_cb(ws, cb, opts, batch = 1000) {
XLSX.stream.set_readable(() => ({
__done: false,
// this function will be assigned by the SheetJS stream methods该函数将由SheetJS流方法分配
_read: function() { this.__done = true; },
// this function is called by the stream methods该函数由流方法调用
push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; },
resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); }
}));
return XLSX.stream.to_csv(ws, opts);
}
/* this callback will run once the main context sends a message该回调将在主上下文发送消息后运行 */
self.addEventListener('message', async(e) => {
try {
postMessage({state: "fetching " + e.data.url});
/* Fetch file获取文件 */
const res = await fetch(e.data.url);
const ab = await res.arrayBuffer();
/* Parse file分析文件 */
postMessage({state: "parsing"});
const wb = XLSX.read(ab, {dense: true});
const ws = wb.Sheets[wb.SheetNames[0]];
/* Generate CSV rows生成CSV行 */
postMessage({state: "csv"});
const strm = sheet_to_csv_cb(ws, (csv) => {
if(csv != null) postMessage({csv});
else postMessage({state: "done"});
});
strm.resume();
} catch(e) {
/* Pass the error message back将错误消息传回 */
postMessage({error: String(e.message || e) });
}
}, false);
The main thread will receive messages with CSV rows for further processing:主线程将接收带有CSV行的消息以供进一步处理:
worker.onmessage = function(e) {
if(e.data.error) { console.error(e.data.error); /* show an error message显示错误消息 */ }
else if(e.data.state) { console.info(e.data.state); /* current state当前状态 */ }
else {
/* e.data.csv is the row generated by the stream是流生成的行 */
console.log(e.data.csv);
}
};
Live Demo现场演示
The following live demo fetches and parses a file in a Web Worker. 下面的实时演示获取并解析Web Worker中的文件。The to_csv
streaming function is used to generate CSV rows and pass back to the main thread for further processing.to_csv
流函数用于生成csv行,并传递回主线程进行进一步处理。
For Chromium browsers, the File System Access API provides a modern worker-only approach. 对于Chromium浏览器,文件系统访问API提供了一种仅限工作人员使用的现代方法。The Web Workers demo includes a live example of CSV streaming write.WebWorkers演示包括一个CSV流写入的实时示例。
The demo has a URL input box. Feel free to change the URL. For example,这个演示有一个URL输入框。请随意更改URL。例如
https://raw.githubusercontent.com/SheetJS/test_files/master/large_strings.xls
is an XLS file over 50 MB
https://raw.githubusercontent.com/SheetJS/libreoffice_test-files/master/calc/xlsx-import/perf/8-by-300000-cells.xlsx
is an XLSX file with 300000 rows (approximately 20 MB)是一个具有300000行(约20 MB)的XLSX文件
Deno
Deno does not support NodeJS streams in normal execution, so a wrapper is usedDeno在正常执行中不支持NodeJS流,因此使用了包装器:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.1/package/types/index.d.ts"
import { stream } from 'https://cdn.sheetjs.com/xlsx-0.20.1/package/xlsx.mjs';
/* Callback invoked on each row (string) and at the end (null)对每行(字符串)和末尾(null
)调用的回调 */
const csv_cb = (d:string|null) => {
if(d == null) return;
/* The strings include line endings, so raw write ops should be used字符串包括行结尾,因此应该使用原始写操作 */
Deno.stdout.write(new TextEncoder().encode(d));
};
/* Prepare `Readable` function准备`Readable`函数 */
const Readable = () => ({
__done: false,
// this function will be assigned by the SheetJS stream methods该函数将由SheetJS流方法分配
_read: function() { this.__done = true; },
// this function is called by the stream methods该函数由流方法调用
push: function(d: any) {
if(!this.__done) csv_cb(d);
if(d == null) this.__done = true;
},
resume: function pump() {
for(var i = 0; i < 1000 && !this.__done; ++i) this._read();
if(!this.__done) setTimeout(pump.bind(this), 0);
}
})
/* Wire up接线 */
stream.set_readable(Readable);
/* assuming `workbook` is a workbook, stream the first sheet假设`workbook`是工作簿,流式传输第一张工作表 */
const ws = workbook.Sheets[workbook.SheetNames[0]];
stream.to_csv(wb.Sheets[wb.SheetNames[0]]).resume();
This demo was last tested on 2023-09-02 against Deno 此演示最后一次测试是在2023-09-02对Deno 1.36.4
1.36.4
SheetJSDenoStream.ts is a small example script that downloads https://sheetjs.com/pres.numbers and prints CSV row objects.是一个下载的小示例脚本https://sheetjs.com/pres.numbers并打印CSV行对象。
1) Run 运行deno run -A https://docs.sheetjs.com/stream/SheetJSDenoStream.ts