Skip to main content

Data Import数据导入

The "Import Tutorial" is a gentle introduction to data import and analysis.“导入教程”是对数据导入和分析的简单介绍。

Parsing Workbooks分析工作簿

API

Extract data from spreadsheet bytes从电子表格字节中提取数据

var workbook = XLSX.read(data, opts);

The read method can extract data from spreadsheet bytes stored in a JS string, "binary string", NodeJS buffer or typed array (Uint8Array or ArrayBuffer).read方法可以从存储在JS字符串、“二进制字符串”、NodeJS缓冲区或类型化数组(Uint8ArrayArrayBuffer)中的电子表格字节中提取数据。

Read spreadsheet bytes from a local file and extract data从本地文件读取电子表格字节并提取数据

var workbook = XLSX.readFile(filename, opts);

The readFile method attempts to read a spreadsheet file at the supplied path.readFile方法尝试在提供的路径上读取电子表格文件。

The second opts argument is optional. 第二个opts参数是可选的。"Parsing Options" covers the supported properties and behaviors.涵盖了支持的属性和行为。

Browsers generally do not allow reading files by specifying filename (it is a security risk), and running XLSX.readFile in the browser will throw an error.浏览器通常不允许通过指定文件名来读取文件(这是一个安全风险),在浏览器中运行XLSX.readFile会引发错误。

Deno scripts must be invoked with --allow-read to read from the filesystem.必须使用--allow-read来读取从文件系统调用Deno脚本。

Examples示例

Here are a few common scenarios (click on each subtitle to see the code).以下是一些常见的场景(点击每个副标题查看代码)。

The demos cover special deployments in more detail.演示文档更详细地介绍了特殊部署。

Example: Local File

XLSX.readFile supports reading local files in platforms like NodeJS. In other platforms like React Native, XLSX.read should be called with file data.支持在NodeJS等平台上读取本地文件。在React Native等其他平台中,应使用文件数据调用XLSX.read

In-browser processing where users drag-and-drop files or use a file element are covered in the "User Submissions" example.“用户提交”示例涵盖了用户拖放文件或使用文件元素的浏览器内处理。

readFile uses fs.readFileSync under the hood:

var XLSX = require("xlsx");

var workbook = XLSX.readFile("test.xlsx");

For Node ESM, fs must be loaded manually:对于节点ESM,必须手动加载fs

import * as fs from "fs";
import { readFile, set_fs } from "xlsx";
set_fs(fs);

const workbook = readFile("test.xlsx");

Example: User Submissions示例:用户提交

This example focuses on user-submitted files through a drag-and-drop event, HTML file input element, or network request.此示例主要关注用户通过拖放事件、HTML文件输入元素或网络请求提交的文件。

For modern websites targeting Chrome 76+, File#arrayBuffer is recommended:

Assume drop_dom_element is the DOM element that will listen for changes:假设drop_dom_element是将侦听更改的dom元素:

<div id="drop_dom_element">Drop files here</div>

The event property is e.dataTransfer. 事件属性为e.dataTransfer The code snippet highlights the difference between the drag-and-drop example and the file input example:代码片段突出显示了拖放示例和文件输入示例之间的区别:

// XLSX is a global from the standalone script

async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);

https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.

For maximal compatibility (IE10+), the FileReader approach is recommended:

Assume drop_dom_element is the DOM element that will listen for changes:假设drop_dom_element是将侦听更改的dom元素:

<div id="drop_dom_element">Drop files here</div>

The event property is e.dataTransfer. 事件属性为e.dataTransfer The code snippet highlights the difference between the drag-and-drop example and the file input example:代码片段突出显示了拖放示例和文件输入示例之间的区别:

function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);

The oldie demo shows an IE-compatible fallback scenario.

Example: Remote File

This example focuses on fetching files ("Ajax" in browser parlance) using APIs like XMLHttpRequest and fetch as well as third-party libraries.本示例重点介绍使用XMLHttpRequestfetch等API以及第三方库获取文件(浏览器中的“Ajax”)。

For modern websites targeting Chrome 42+, fetch is recommended:对于以Chrome 42+为目标的现代网站,建议使用fetch

// XLSX is a global from the standalone script

(async() => {
const url = "https://sheetjs.com/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
})();

For broader support, the XMLHttpRequest approach is recommended:

var url = "https://sheetjs.com/pres.xlsx";

/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";

req.onload = function(e) {
var workbook = XLSX.read(req.response);

/* DO SOMETHING WITH workbook HERE */
};

req.send();

The xhr demo includes a longer discussion and more examples.xhr演示包括更长的讨论和更多的示例。

https://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.显示了IE6+的回退方法。

Example: Readable Streams示例:可读流

The recommended approach is to buffer streams in memory and process once all of the data has been collected. A proper streaming parse is technically impossible.建议的方法是在内存中缓冲流,并在收集完所有数据后进行处理。正确的流解析在技术上是不可能的。

Technical details技术细节 (click to show)

XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary and XML entries. The ZIP file format stores the table of contents ("end of central directory" record) at the end of the file, so a proper parse of a ZIP file requires scanning from the end. Streams do not provide random access into the data, so the only correct approach involves buffering the entire stream.XLSX、XLSB、NUMBERS和ODS文件最终是包含二进制和XML项的ZIP文件。ZIP文件格式将目录(“中心目录末尾”记录)存储在文件末尾,因此ZIP文件的正确解析需要从末尾进行扫描。流不提供对数据的随机访问,因此唯一正确的方法是缓冲整个流。

XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format". It is a container format that can hold multiple "files" and "folders". It also has a table of contents ("directory sectors") but these can be placed anywhere in the file! The only correct approach involves buffering enough of the stream to find the full table of contents, but the added complexity has little benefit when testing against real-world files generated by various versions of Excel and other tools.Mac文件的XLS、XLR、QPW和Works 4使用“复合文件二进制格式”。它是一种容器格式,可以容纳多个“文件”和“文件夹”。它还有一个目录(“目录扇区”),但这些可以放在文件中的任何位置!唯一正确的方法是缓冲足够的流以找到完整的目录,但在针对各种版本的Excel和其他工具生成的真实世界文件进行测试时,增加的复杂性几乎没有好处。

When dealing with ReadableStream, the easiest approach is to buffer the stream and process the whole thing at the end:在处理ReadableStream时,最简单的方法是缓冲流并在最后处理整个过程:

// XLSX is a global from the standalone script

async function buffer_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}

/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));

let off = 0;
for(const u8 of buffers) {
out.set(u8, off);
off += u8.length;
}

return out;
}

const data = await buffer_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data);

More detailed examples are covered in the included demos附带的演示中包含了更详细的示例

Processing JSON and JS Data处理JSON和JS数据

JSON and JS data tend to represent single worksheets. This section will use a few utility functions to generate workbooks.JSON和JS数据往往表示单个工作表。本节将使用一些实用程序函数来生成工作簿。

Create a new Workbook创建新工作簿

var workbook = XLSX.utils.book_new();

The book_new utility function creates an empty workbook with no worksheets.book_new实用程序函数创建一个没有工作表的空工作簿。

API

Create a worksheet from an array of arrays of JS values从JS值数组创建工作表

var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);

The aoa_to_sheet utility function walks an "array of arrays" in row-major order, generating a worksheet object. aoa_to_sheet实用程序函数按行主顺序遍历“数组数组”,生成一个工作表对象。The following snippet generates a sheet with cell A1 set to the string A1, cell B1 set to B1, etc:

var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]);

"Array of Arrays Input" describes the function and the optional opts argument in more detail.

Create a worksheet from an array of JS objects从JS对象数组创建工作表

var worksheet = XLSX.utils.json_to_sheet(jsa, opts);

The json_to_sheet utility function walks an array of JS objects in order, generating a worksheet object. json_to_sheet实用程序函数按顺序遍历一组JS对象,生成一个工作表对象。By default, it will generate a header row and one row per object in the array. 默认情况下,它将为数组中的每个对象生成一个标题行和一行。The optional opts argument has settings to control the column order and header output.可选的opts参数具有控制列顺序和标头输出的设置。

"Array of Objects Input" describes the function and the optional opts argument in more detail.更详细地描述了函数和可选的opts参数。

Examples

"Export Tutorial" contains a detailed example of fetching data from a JSON Endpoint and generating a workbook.包含从JSON端点获取数据并生成工作簿的详细示例。

x-spreadsheet is an interactive data grid for previewing and modifying structured data in the web browser.是一个交互式数据网格,用于在web浏览器中预览和修改结构化数据。

"TensorFlow.js" covers strategies for creating worksheets from ML library exports (datasets stored in Typed Arrays).涵盖了从ML库导出(存储在类型数组中的数据集)创建工作表的策略。

Records from a database query (SQL or no-SQL)数据库查询中的记录(SQL或无SQL) (click to show)

The data demo includes examples of working with databases and query results.data演示包括使用数据库和查询结果的示例。

Processing HTML Tables处理HTML表

API

Create a worksheet by scraping an HTML TABLE in the page通过在页面中抓取HTML TABLE创建工作表

var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);

The table_to_sheet utility function takes a DOM TABLE element and iterates through the rows to generate a worksheet. The opts argument is optional. "HTML Table Input" describes the function in more detail.

Create a workbook by scraping an HTML TABLE in the page通过在页面中抓取HTML TABLE创建工作簿

var workbook = XLSX.utils.table_to_book(dom_element, opts);

The table_to_book utility function follows the same logic as table_to_sheet. After generating a worksheet, it creates a blank workbook and appends the spreadsheet.

The options argument supports the same options as table_to_sheet, with the addition of a sheet property to control the worksheet name. If the property is missing or no options are specified, the default name Sheet1 is used.

Examples

The Headless Demo includes examples of server-side spreadsheet generation from HTML TABLE elements using headless Chromium ("Puppeteer") and other browsers ("Playwright")

Here are a few common scenarios (click on each subtitle to see the code):以下是一些常见的场景(点击每个副标题查看代码):

HTML TABLE element in a webpage (click to show)
<!-- include the standalone script and shim -->
<script src="https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/shim.min.js"></script>
<script src="https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/xlsx.full.min.js"></script>

<!-- example table with id attribute -->
<table id="tableau">
<tr><td>Sheet</td><td>JS</td></tr>
<tr><td>12345</td><td>67</td></tr>
</table>

<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));

/* DO SOMETHING WITH workbook HERE */
XLSX.writeFile(workbook, "SheetJSHTMLExport.xlsx");
</script>

Multiple tables on a web page can be converted to individual worksheets:网页上的多个表可以转换为单个工作表:

/* create new workbook */
var workbook = XLSX.utils.book_new();

/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");

/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");

/* workbook now has 2 worksheets */

Alternatively, the HTML code can be extracted and parsed:或者,可以提取和解析HTML代码:

var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)

The chrome demo shows a complete example and details the required permissions and other settings.

In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:在扩展中,建议在内容脚本中生成工作簿,并将对象传递回扩展:

/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
NodeJS HTML Tables without a browser没有浏览器的NodeJS HTML表 (click to show)

NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. NodeJS不包括DOM实现,Puppeter需要大量的Chromium构建。The "Synthetic DOM" demo includes examples for NodeJS.“Synthetic DOM”演示包括NodeJS的示例。