Skip to main content

Large Datasets大型数据集

For maximal compatibility, the library reads entire files at once and generates files at once. Browsers and other JS engines enforce tight memory limits. In these cases, the library offers strategies to optimize for memory or space by using platform-specific APIs.为了实现最大的兼容性,库一次读取整个文件并一次生成文件。浏览器和其他JS引擎强制执行严格的内存限制。在这些情况下,库提供了通过使用特定于平台的API来优化内存或空间的策略。

Dense Mode密集模式

read, readFile and aoa_to_sheet accept the dense option. readreadFileaoa_to_sheet接受密集选项。When enabled, the methods create worksheet objects that store cells in arrays of arrays:启用后,这些方法将创建工作表对象,将单元格存储在数组中:

var dense_wb = XLSX.read(ab, {dense: true});

var dense_sheet = XLSX.utils.aoa_to_sheet(aoa, {dense: true});
Historical Note历史记录 (click to show)

The earliest versions of the library aimed for IE6+ compatibility. In early testing, both in Chrome 26 and in IE6, the most efficient worksheet storage for small sheets was a large object whose keys were cell addresses.该库的最早版本旨在实现IE6+兼容性。在早期的测试中,无论是在Chrome 26还是在IE6中,对于小表格来说,最有效的工作表存储是一个大对象,其键是单元格地址。

Over time, V8 (the engine behind Chrome and NodeJS) evolved in a way that made the array of arrays approach more efficient but reduced the performance of the large object approach.随着时间的推移,V8(Chrome和NodeJS背后的引擎)的发展使数组方法更加高效,但降低了大对象方法的性能。

In the interest of preserving backwards compatibility, the library opts to make the array of arrays approach available behind a special dense option.为了保持向后兼容性,库选择在一个特殊的dense选项后面提供数组方法。

The various API functions will seamlessly handle dense and sparse worksheets.各种API函数将无缝处理密集和稀疏的工作表。

Streaming Write流式写入

The streaming write functions are available in the XLSX.stream object. They take the same arguments as the normal write functions:流写入功能在XLSX.stream对象中可用。它们采用与正常写入函数相同的参数:

  • XLSX.stream.to_csv is the streaming version of XLSX.utils.sheet_to_csv.XLSX.utils.sheet_to_csv的流媒体版本。
  • XLSX.stream.to_html is the streaming version of XLSX.utils.sheet_to_html.XLSX.utils.sheet_to_html的流媒体版本。
  • XLSX.stream.to_json is the streaming version of XLSX.utils.sheet_to_json.XLSX.utils.sheet_to_json的流媒体版本。

"Stream" refers to the NodeJS push streams API.“流”是指NodeJS推送流API。

Historical Note (click to show)

NodeJS push streams were introduced in 2012. The text streaming methods to_csv and to_html are supported in NodeJS v0.10 and later while the object streaming method to_json is supported in NodeJS v0.12 and later.NodeJS推送流于2012年推出。NodeJS v0.10及更高版本支持文本流方法to_csvto_html,而NodeJS v0.12及更高版支持对象流方法to_json

The first streaming write function, to_csv, was introduced in early 2017. It used and still uses the same NodeJS streaming API.第一个流媒体写入功能to_csv于2017年初推出。它使用并仍然使用相同的NodeJS流API。

Years later, browser vendors are settling on a different stream API.多年后,浏览器供应商开始采用不同的流API。

For maximal compatibility, the library uses NodeJS push streams.为了实现最大的兼容性,该库使用NodeJS推送流。

NodeJS

In a CommonJS context, NodeJS Streams and fs immediately work with SheetJS:在CommonJS上下文中,NodeJS Streams和fs可以立即使用SheetJS:

const XLSX = require("xlsx"); // "just works"
ECMAScript Module MachinationsECMAScript模块机械

In NodeJS ESM, the dependency must be loaded manually:在NodeJS ESM中,依赖项必须手动加载:

import * as XLSX from 'xlsx';
import { Readable } from 'stream';

XLSX.stream.set_readable(Readable); // manually load stream helpers

Additionally, for file-related operations in NodeJS ESM, fs must be loaded:此外,对于NodeJS ESM中与文件相关的操作,必须加载fs

import * as XLSX from 'xlsx';
import * as fs from 'fs';

XLSX.set_fs(fs); // manually load fs helpers

It is strongly encouraged to use CommonJS in NodeJS whenever possible.强烈建议尽可能在NodeJS中使用CommonJS。

XLSX.stream.to_csv

This example reads a worksheet passed as an argument to the script, pulls the first worksheet, converts to CSV and writes to SheetJSNodeJStream.csv:此示例读取作为参数传递给脚本的工作表,提取第一个工作表,转换为CSV并写入SheetJSNodeJStream.csv

var XLSX = require("xlsx"), fs = require("fs");

var wb = XLSX.readFile(process.argv[2]);
var ws = wb.Sheets[wb.SheetNames[0]];
var ostream = fs.createWriteStream("SheetJSNodeJStream.csv");

XLSX.stream.to_csv(ws).pipe(ostream);

XLSX.stream.to_json

stream.to_json uses Object-mode streams. 使用对象模式流。A Transform stream can be used to generate a normal stream for streaming to a file or the screen:Transform流可用于生成用于流式传输到文件或屏幕的正常流:

var XLSX = require("xlsx"), Transform = require("stream").Transform;
var wb = XLSX.readFile(process.argv[2], {dense: true});
var ws = wb.Sheets[wb.SheetNames[0]];

/* this Transform stream converts JS objects to text此转换流将JS对象转换为文本 */
var conv = new Transform({writableObjectMode:true});
conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); };

/* pipe `to_json` -> transformer -> standard output */
XLSX.stream.to_json(ws, {raw: true}).pipe(conv).pipe(process.stdout);

Demo

note

This demo was last tested in the following deployments:此演示上次在以下部署中进行了测试:

Node VersionDateNode Status when tested
0.12.182023-09-02End-of-Life
4.9.12023-09-02End-of-Life
6.17.12023-09-02End-of-Life
8.17.02023-09-02End-of-Life
10.24.12023-09-02End-of-Life
12.22.122023-09-02End-of-Life
14.21.32023-09-02End-of-Life
16.20.02023-09-02Maintenance LTS
18.17.12023-09-02Active LTS
20.5.12023-09-02Current

While streaming methods work in End-of-Life versions of NodeJS, production deployments should upgrade to a Current or LTS version of NodeJS.虽然流媒体方法在NodeJS的报废版本中工作,但生产部署应该升级到NodeJS的当前或LTS版本。

1) Install the 安装NodeJS module

npm i --save https://cdn.sheetjs.com/xlsx-0.20.1/xlsx-0.20.1.tgz

2) Download SheetJSNodeJStream.js:下载表SheetJSNodeJStream.js

curl -LO https://docs.sheetjs.com/stream/SheetJSNodeJStream.js

3) Download the test file:下载测试文件

curl -LO https://sheetjs.com/pres.xlsx

4) Run the script:运行脚本:

node SheetJSNodeJStream.js pres.xlsx
Expected Output (click to show)

The console will display a list of objects:控制台将显示对象列表:

{"Name":"Bill Clinton","Index":42}
{"Name":"GeorgeW Bush","Index":43}
{"Name":"Barack Obama","Index":44}
{"Name":"Donald Trump","Index":45}
{"Name":"Joseph Biden","Index":46}

The script will also generate SheetJSNodeJStream.csv:该脚本还将生成SheetJSNodeJStream.csv

Name,Index
Bill Clinton,42
GeorgeW Bush,43
Barack Obama,44
Donald Trump,45
Joseph Biden,46

Browser浏览器

note

The live demo was last tested on 2023-09-02 in Chromium 116.现场演示最后一次测试是在2023-09-02的Chromium 116中。

NodeJS streaming APIs are not available in the browser. The following function supplies a pseudo stream object compatible with the to_csv function:NodeJS流式API在浏览器中不可用。以下函数提供与to_csv函数兼容的伪流对象:

function sheet_to_csv_cb(ws, cb, opts, batch = 1000) {
XLSX.stream.set_readable(() => ({
__done: false,
// this function will be assigned by the SheetJS stream methods该函数将由SheetJS流方法分配
_read: function() { this.__done = true; },
// this function is called by the stream methods该函数由流方法调用
push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; },
resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); }
}));
return XLSX.stream.to_csv(ws, opts);
}

// assuming `workbook` is a workbook, stream the first sheet
const ws = workbook.Sheets[workbook.SheetNames[0]];
const strm = sheet_to_csv_cb(ws, (csv)=>{ if(csv != null) console.log(csv); });
strm.resume();

Web Workers

For processing large files in the browser, it is strongly encouraged to use Web Workers. 对于在浏览器中处理大型文件,强烈建议使用Web Workers。The Worker demo includes examples using the File System Access API.Worker演示包括使用文件系统访问API的示例。

Web Worker Details (click to show)

Typically, the file and stream processing occurs in the Web Worker. CSV rows can be sent back to the main thread in the callback:通常,文件和流处理发生在Web Worker中。CSV行可以在回调中发送回主线程:

worker.js
/* load standalone script from CDN从CDN加载独立脚本 */
importScripts("https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/xlsx.full.min.js");

function sheet_to_csv_cb(ws, cb, opts, batch = 1000) {
XLSX.stream.set_readable(() => ({
__done: false,
// this function will be assigned by the SheetJS stream methods该函数将由SheetJS流方法分配
_read: function() { this.__done = true; },
// this function is called by the stream methods该函数由流方法调用
push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; },
resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); }
}));
return XLSX.stream.to_csv(ws, opts);
}

/* this callback will run once the main context sends a message该回调将在主上下文发送消息后运行 */
self.addEventListener('message', async(e) => {
try {
postMessage({state: "fetching " + e.data.url});
/* Fetch file获取文件 */
const res = await fetch(e.data.url);
const ab = await res.arrayBuffer();

/* Parse file分析文件 */
postMessage({state: "parsing"});
const wb = XLSX.read(ab, {dense: true});
const ws = wb.Sheets[wb.SheetNames[0]];

/* Generate CSV rows生成CSV行 */
postMessage({state: "csv"});
const strm = sheet_to_csv_cb(ws, (csv) => {
if(csv != null) postMessage({csv});
else postMessage({state: "done"});
});
strm.resume();
} catch(e) {
/* Pass the error message back将错误消息传回 */
postMessage({error: String(e.message || e) });
}
}, false);

The main thread will receive messages with CSV rows for further processing:主线程将接收带有CSV行的消息以供进一步处理:

main.js
worker.onmessage = function(e) {
if(e.data.error) { console.error(e.data.error); /* show an error message显示错误消息 */ }
else if(e.data.state) { console.info(e.data.state); /* current state当前状态 */ }
else {
/* e.data.csv is the row generated by the stream是流生成的行 */
console.log(e.data.csv);
}
};

Live Demo现场演示

The following live demo fetches and parses a file in a Web Worker. 下面的实时演示获取并解析Web Worker中的文件。The to_csv streaming function is used to generate CSV rows and pass back to the main thread for further processing.to_csv流函数用于生成csv行,并传递回主线程进行进一步处理。

For Chromium browsers, the File System Access API provides a modern worker-only approach. 对于Chromium浏览器,文件系统访问API提供了一种仅限工作人员使用的现代方法。The Web Workers demo includes a live example of CSV streaming write.WebWorkers演示包括一个CSV流写入的实时示例。

The demo has a URL input box. Feel free to change the URL. For example,这个演示有一个URL输入框。请随意更改URL。例如

https://raw.githubusercontent.com/SheetJS/test_files/master/large_strings.xls is an XLS file over 50 MB

https://raw.githubusercontent.com/SheetJS/libreoffice_test-files/master/calc/xlsx-import/perf/8-by-300000-cells.xlsx is an XLSX file with 300000 rows (approximately 20 MB)是一个具有300000行(约20 MB)的XLSX文件

Result
Loading...
Live Editor

Deno

Deno does not support NodeJS streams in normal execution, so a wrapper is usedDeno在正常执行中不支持NodeJS流,因此使用了包装器:

// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.1/package/types/index.d.ts"
import { stream } from 'https://cdn.sheetjs.com/xlsx-0.20.1/package/xlsx.mjs';

/* Callback invoked on each row (string) and at the end (null)对每行(字符串)和末尾(null)调用的回调 */
const csv_cb = (d:string|null) => {
if(d == null) return;
/* The strings include line endings, so raw write ops should be used字符串包括行结尾,因此应该使用原始写操作 */
Deno.stdout.write(new TextEncoder().encode(d));
};

/* Prepare `Readable` function准备`Readable`函数 */
const Readable = () => ({
__done: false,
// this function will be assigned by the SheetJS stream methods该函数将由SheetJS流方法分配
_read: function() { this.__done = true; },
// this function is called by the stream methods该函数由流方法调用
push: function(d: any) {
if(!this.__done) csv_cb(d);
if(d == null) this.__done = true;
},
resume: function pump() {
for(var i = 0; i < 1000 && !this.__done; ++i) this._read();
if(!this.__done) setTimeout(pump.bind(this), 0);
}
})
/* Wire up接线 */
stream.set_readable(Readable);

/* assuming `workbook` is a workbook, stream the first sheet假设`workbook`是工作簿,流式传输第一张工作表 */
const ws = workbook.Sheets[workbook.SheetNames[0]];
stream.to_csv(wb.Sheets[wb.SheetNames[0]]).resume();
note

This demo was last tested on 2023-09-02 against Deno 1.36.4此演示最后一次测试是在2023-09-02对Deno 1.36.4

SheetJSDenoStream.ts is a small example script that downloads https://sheetjs.com/pres.numbers and prints CSV row objects.是一个下载的小示例脚本https://sheetjs.com/pres.numbers并打印CSV行对象。

1) Run 运行deno run -A https://docs.sheetjs.com/stream/SheetJSDenoStream.ts