Skip to main content

Data Import数据导入

Parsing Workbooks解析工作簿

API

Extract data from spreadsheet bytes从电子表格字节中提取数据

var workbook = XLSX.read(data, opts);

The read method can extract data from spreadsheet bytes stored in a JS string, "binary string", NodeJS buffer or typed array (Uint8Array or ArrayBuffer).read方法可以从存储在JS字符串、“二进制字符串”、Node.js缓冲区或类型化数组(Uint8ArrayArrayBuffer)中的电子表格字节中提取数据。

Read spreadsheet bytes from a local file and extract data从本地文件读取电子表格字节并提取数据

var workbook = XLSX.readFile(filename, opts);

The readFile method attempts to read a spreadsheet file at the supplied path.readFile方法尝试在提供的路径上读取电子表格文件。

The second opts argument is optional. 第二个opts参数是可选的。"Parsing Options"“解析选项” covers the supported properties and behaviors.涵盖支持的属性和行为。

warning警告

Browsers generally do not allow reading files by specifying filename (it is a security risk), and running XLSX.readFile in the browser will throw an error.浏览器通常不允许通过指定文件名来读取文件(这有安全风险),在浏览器中运行XLSX.readFile会引发错误。

Deno scripts must be invoked with --allow-read to read from the filesystem.调用Deno脚本时必须使用--allow-read以从文件流中读取。

Examples示例

Here are a few common scenarios (click on each subtitle to see the code).以下是一些常见的场景(单击每个子标题以查看代码)。

The demos cover special deployments in more detail.演示更详细地介绍了特殊部署。

Example: Local File示例:本地文件

XLSX.readFile supports reading local files in platforms like NodeJS. 支持在Node.js等平台中读取本地文件。In other platforms like React Native, XLSX.read should be called with file data.在其他平台(如React Native)中,应使用文件数据调用XLSX.read

In-browser processing where users drag-and-drop files or use a file element are covered in the "User Submissions" example.“用户提交”示例介绍了在浏览器处理中,用户拖放文件或使用文件元素。

readFile uses fs.readFileSync under the hood:在引擎盖下使用fs.readFileSync

var XLSX = require("xlsx");

var workbook = XLSX.readFile("test.xlsx");

For Node ESM, fs must be loaded manually:对于节点ESM,必须手动加载fs

import * as fs from "fs";
import { readFile, set_fs } from "xlsx/xlsx.mjs";
set_fs(fs);

const workbook = readFile("test.xlsx");

Example: User Submissions示例:用户提交

This example focuses on user-submitted files through a drag-and-drop event, HTML file input element, or network request.本例重点介绍用户通过拖放事件、HTML文件输入元素或网络请求提交的文件。

For modern websites targeting Chrome 76+针对Chrome 76的现代网站+, File#arrayBuffer is recommended:,建议使用File#arrayBuffer

Assume drop_dom_element is the DOM element that will listen for changes:假设drop_dom_element是将侦听更改的dom元素:

<div id="drop_dom_element">Drop files here</div>

The event property is e.dataTransfer. 事件属性为e.dataTransferThe code snippet highlights the difference between the drag-and-drop example and the file input example:代码片段突出显示了拖放示例和文件输入示例之间的差异:

// XLSX is a global from the standalone script

async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);

https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.演示FileReader技术。

For maximal compatibility (IE10+)最大兼容性(IE10+), the FileReader approach is recommended:,建议使用FileReader方法:

Assume drop_dom_element is the DOM element that will listen for changes:假设drop_dom_element是将侦听更改的dom元素:

<div id="drop_dom_element">Drop files here</div>

The event property is e.dataTransfer. 事件属性为e.dataTransferThe code snippet highlights the difference between the drag-and-drop example and the file input example:代码片段突出显示了拖放示例和文件输入示例之间的差异:

function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);

The oldie demo shows an IE-compatible fallback scenario.oldie演示展示了一个与IE兼容的回退场景。

Example: Remote File示例:远程文件

This example focuses on fetching files ("Ajax" in browser parlance) using APIs like XMLHttpRequest and fetch as well as third-party libraries.本例重点介绍使用XMLHttpRequestfetch等API以及第三方库获取文件(用浏览器的说法是“Ajax”)。

For modern websites targeting Chrome 42+, fetch is recommended:对于以Chrome 42+为目标的现代网站,建议使用fetch

// XLSX is a global from the standalone script

(async() => {
const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
})();

For broader support, the XMLHttpRequest approach is recommended:为了获得更广泛的支持,建议使用XMLHttpRequest方法:

var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";

/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";

req.onload = function(e) {
var workbook = XLSX.read(req.response);

/* DO SOMETHING WITH workbook HERE */
};

req.send();

The xhr demo includes a longer discussion and more examples.xhr演示包括更长的讨论和更多的示例。

http://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.显示了IE6+的回退方法。

Example: Readable Streams示例:可读流

caution小心

The recommended approach is to buffer streams in memory and process once all of the data has been collected. 建议的方法是在内存中缓冲流,并在收集完所有数据后进行处理。A proper streaming parse is technically impossible.正确的流解析在技术上是不可能的。

Technical details技术细节 (click to show)

XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary and XML entries. XLSX、XLSB、NUMBERS和ODS文件最终是包含二进制和XML条目的ZIP文件。The ZIP file format stores the table of contents ("end of central directory" record) at the end of the file, so a proper parse of a ZIP file requires scanning from the end. ZIP文件格式将目录(“中央目录结束”记录)存储在文件的末尾,因此正确解析ZIP文件需要从末尾进行扫描。Streams do not provide random access into the data, so the only correct approach involves buffering the entire stream.流不提供对数据的随机访问,因此唯一正确的方法是缓冲整个流。

XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format". 用于Mac文件的XLS、XLR、QPW和Works 4使用“复合文件二进制格式”。It is a container format that can hold multiple "files" and "folders". 它是一种容器格式,可以容纳多个“文件”和“文件夹”。It also has a table of contents ("directory sectors") but these can be placed anywhere in the file! 它还有一个目录(“目录扇区”),但可以放在文件中的任何位置! The only correct approach involves buffering enough of the stream to find the full table of contents, but the added complexity has little benefit when testing against real-world files generated by various versions of Excel and other tools.唯一正确的方法是缓冲足够的流以找到完整的目录,但在测试由各种版本的Excel和其他工具生成的真实文件时,增加的复杂性几乎没有好处。

When dealing with ReadableStream, the easiest approach is to buffer the stream and process the whole thing at the end:在处理ReadableStream时,最简单的方法是缓冲流并在最后处理整个过程:

// XLSX is a global from the standalone script

async function process_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}

/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));

let off = 0;
for(const u8 of buffers) {
out.set(u8, off);
off += u8.length;
}

return out;
}

const data = await process_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data, {type: 'array'});

More detailed examples are covered in the included demos包含的演示中包含了更详细的示例

Processing JSON and JS Data处理JSON和JS数据

JSON and JS data tend to represent single worksheets. JSON和JS数据往往表示单个工作表。 This section will use a few utility functions to generate workbooks.本节将使用一些实用程序函数来生成工作簿。

Create a new Workbook创建新工作簿

var workbook = XLSX.utils.book_new();

The book_new utility function creates an empty workbook with no worksheets.book_new实用工具函数创建一个没有工作表的空工作簿。

Spreadsheet software generally require at least one worksheet and enforce the requirement in the user interface. 电子表格软件通常需要至少一个工作表,并在用户界面中强制执行该要求。This library enforces the requirement at write time, throwing errors if an empty workbook is passed to write functions.该库在写入时强制执行该要求,如果将空工作簿传递给写入函数,则会引发错误。

API

Create a worksheet from an array of arrays of JS values从JS值数组创建工作表

var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);

The aoa_to_sheet utility function walks an "array of arrays" in row-major order, generating a worksheet object. aoa_to_sheet实用程序函数按行主顺序遍历“数组数组”,生成一个工作表对象。The following snippet generates a sheet with cell A1 set to the string A1, cell B1 set to B1, etc:以下代码段生成一张表,其中单元格A1设置为字符串A1,单元格B1设置为B1,等等:

var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]);

"Array of Arrays Input"“数组的数组输入” describes the function and the optional opts argument in more detail.更详细地描述了函数和可选opts参数。

Create a worksheet from an array of JS objects从JS对象数组创建工作表

var worksheet = XLSX.utils.json_to_sheet(jsa, opts);

The json_to_sheet utility function walks an array of JS objects in order, generating a worksheet object. json_to_sheet实用程序函数按顺序遍历JS对象数组,生成一个工作表对象。By default, it will generate a header row and one row per object in the array. 默认情况下,它将在数组中为每个对象生成标题行和一行。The optional opts argument has settings to control the column order and header output.可选opts参数具有控制列顺序和标题输出的设置。

"Array of Objects Input"“对象数组输入” describes the function and the optional opts argument in more detail.更详细地描述了函数和可选opts参数。

Examples示例

"Complete Example" contains a detailed example "Get Data from a JSON Endpoint and Generate a Workbook"包含一个详细示例“从JSON端点获取数据并生成工作簿”

x-spreadsheet is an interactive data grid for previewing and modifying structured data in the web browser. 是一种交互式数据网格,用于在web浏览器中预览和修改结构化数据。 The demo includes a sample script with the xtos function for converting from x-spreadsheet to a workbook. 演示包括一个带有xtos函数的示例脚本,用于将x-spreadsheet转换为工作簿。Live Demo: 现场演示:https://oss.sheetjs.com/sheetjs/x-spreadsheet

Records from a database query (SQL or no-SQL)数据库查询中的记录(SQL或无SQL) (click to show)

The database demodatabase演示 includes examples of working with databases and query results.包括使用数据库和查询结果的示例。

Numerical Computations with TensorFlow.jsTensorFlow.js的数值计算 (click to show)

@tensorflow/tfjs and other libraries expect data in simple arrays, well-suited for worksheets where each column is a data vector. 其他库希望数据以简单数组形式存在,非常适合于每列都是数据向量的工作表。 That is the transpose of how most people use spreadsheets, where each row is a vector.这是大多数人使用电子表格方式的转换,其中每一行都是一个向量。

When recovering data from tfjs, the returned data points are stored in a typed array. tfjs恢复数据时,返回的数据点存储在类型化数组中。An array of arrays can be constructed with loops. 数组数组可以用循环构造。Array#unshift can prepend a title row before the conversion:Array#unshift可以在转换之前预加标题行:

const XLSX = require("xlsx");
const tf = require('@tensorflow/tfjs');

/* suppose xs and ys are vectors (1D tensors) -> tfarr will be a typed array */
const tfdata = tf.stack([xs, ys]).transpose();
const shape = tfdata.shape;
const tfarr = tfdata.dataSync();

/* construct the array of arrays */
const aoa = [];
for(let j = 0; j < shape[0]; ++j) {
aoa[j] = [];
for(let i = 0; i < shape[1]; ++i) aoa[j][i] = tfarr[j * shape[1] + i];
}
/* add headers to the top */
aoa.unshift(["x", "y"]);

/* generate worksheet */
const worksheet = XLSX.utils.aoa_to_sheet(aoa);

The array demo shows a complete example.array演示展示了一个完整的示例。

Processing HTML Tables处理HTML表

API

Create a worksheet by scraping an HTML TABLE in the page通过在页面中删除HTML表来创建工作表

var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);

The table_to_sheet utility function takes a DOM TABLE element and iterates through the rows to generate a worksheet. table_to_sheet实用程序函数接受DOM table元素,并遍历行以生成工作表。 The opts argument is optional. opts参数是可选的。"HTML Table Input" describes the function in more detail.更详细地描述了该功能。

Create a workbook by scraping an HTML TABLE in the page通过在页面中删除HTML表来创建工作簿

var workbook = XLSX.utils.table_to_book(dom_element, opts);

The table_to_book utility function follows the same logic as table_to_sheet. table_to_book实用程序函数遵循与table_to_sheet相同的逻辑。After generating a worksheet, it creates a blank workbook and appends the spreadsheet.生成工作表后,它会创建一个空白工作簿并附加电子表格。

The options argument supports the same options as table_to_sheet, with the addition of a sheet property to control the worksheet name. opts参数支持与table_to_sheet相同的选项,并添加了一个sheet属性来控制工作表名称。If the property is missing or no options are specified, the default name Sheet1 is used.如果缺少属性或未指定任何选项,则使用默认名称Sheet1

Examples示例

Here are a few common scenarios (click on each subtitle to see the code):以下是一些常见场景(单击每个子标题以查看代码):

HTML TABLE element in a webpage网页中的HTML表元素 (click to show)
<!-- include the standalone script and shim.  this uses the UNPKG CDN -->
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/shim.min.js"></script>
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>

<!-- example table with id attribute -->
<table id="tableau">
<tr><td>Sheet</td><td>JS</td></tr>
<tr><td>12345</td><td>67</td></tr>
</table>

<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));

/* DO SOMETHING WITH workbook HERE */
</script>

Multiple tables on a web page can be converted to individual worksheets:网页上的多个表可以转换为单个工作表:

/* create new workbook */
var workbook = XLSX.utils.book_new();

/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");

/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");

/* workbook now has 2 worksheets */

Alternatively, the HTML code can be extracted and parsed:或者,可以提取和解析HTML代码:

var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)

The chrome demo shows a complete example and details the required permissions and other settings.chrome演示展示了一个完整的示例,并详细介绍了所需的权限和其他设置。

In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:在扩展中,建议在内容脚本中生成工作簿,并将对象传递回扩展:

/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
Server-Side HTML Tables with Headless Chrome带有无头Chrome的服务器端HTML表 (click to show)

The headless demo includes a complete demo to convert HTML files to XLSB workbooks. headless演示包括将HTML文件转换为XLSB工作簿的完整演示。The core idea is to add the script to the page, parse the table in the page context, generate a base64 workbook and send it back for further processing:核心思想是将脚本添加到页面,在页面上下文中解析表,生成base64工作簿并将其发送回以进行进一步处理:

const XLSX = require("xlsx");
const { readFileSync } = require("fs"), puppeteer = require("puppeteer");

const url = `https://sheetjs.com/demos/table`;

/* get the standalone build source (node_modules/xlsx/dist/xlsx.full.min.js) */
const lib = readFileSync(require.resolve("xlsx/dist/xlsx.full.min.js"), "utf8");

(async() => {
/* start browser and go to web page */
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, {waitUntil: "networkidle2"});

/* inject library */
await page.addScriptTag({content: lib});

/* this function `s5s` will be called by the script below, receiving the Base64-encoded file */
await page.exposeFunction("s5s", async(b64) => {
const workbook = XLSX.read(b64, {type: "base64" });

/* DO SOMETHING WITH workbook HERE */
});

/* generate XLSB file in webpage context and send back result */
await page.addScriptTag({content: `
/* call table_to_book on first table */
var workbook = XLSX.utils.table_to_book(document.querySelector("TABLE"));

/* generate XLSX file */
var b64 = XLSX.write(workbook, {type: "base64", bookType: "xlsb"});

/* call "s5s" hook exposed from the node process */
window.s5s(b64);
`});

/* cleanup */
await browser.close();
})();
Server-Side HTML Tables with Headless WebKit带有无头WebKit的服务器端HTML表 (click to show)

The headless demo includes a complete demo to convert HTML files to XLSB workbooks using PhantomJS. headless演示包括使用PhantomJS将HTML文件转换为XLSB工作簿的完整演示。The core idea is to add the script to the page, parse the table in the page context, generate a binary workbook and send it back for further processing:核心思想是将脚本添加到页面中,在页面上下文中解析表,生成binary工作簿并将其发送回进行进一步处理:

var XLSX = require('xlsx');
var page = require('webpage').create();

/* this code will be run in the page */
var code = [ "function(){",
/* call table_to_book on first table */
"var wb = XLSX.utils.table_to_book(document.body.getElementsByTagName('table')[0]);",

/* generate XLSB file and return binary string */
"return XLSX.write(wb, {type: 'binary', bookType: 'xlsb'});",
"}" ].join("");

page.open('https://sheetjs.com/demos/table', function() {
/* Load the browser script from the UNPKG CDN */
page.includeJs("https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js", function() {
/* The code will return an XLSB file encoded as binary string */
var bin = page.evaluateJavaScript(code);

var workbook = XLSX.read(bin, {type: "binary"});
/* DO SOMETHING WITH workbook HERE */

phantom.exit();
});
});
NodeJS HTML Tables without a browser没有浏览器的Node.js HTML表 (click to show)

NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. Node.js不包括DOM实现,而Puppeter需要大量的Chromium构建。jsdom is a lightweight alternative:是一种轻量级替代方案:

const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");

/* obtain HTML string. This example reads from test.html */
const html_str = fs.readFileSync("test.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);