Data Import数据导入
Parsing Workbooks解析工作簿
API
Extract data from spreadsheet bytes从电子表格字节中提取数据
var workbook = XLSX.read(data, opts);
The read
method can extract data from spreadsheet bytes stored in a JS string, "binary string", NodeJS buffer or typed array (Uint8Array
or ArrayBuffer
).read
方法可以从存储在JS字符串、“二进制字符串”、Node.js缓冲区或类型化数组(Uint8Array
或ArrayBuffer
)中的电子表格字节中提取数据。
Read spreadsheet bytes from a local file and extract data从本地文件读取电子表格字节并提取数据
var workbook = XLSX.readFile(filename, opts);
The readFile
method attempts to read a spreadsheet file at the supplied path.readFile
方法尝试在提供的路径上读取电子表格文件。
The second 第二个opts
argument is optional. opts
参数是可选的。"Parsing Options"“解析选项” covers the supported properties and behaviors.涵盖支持的属性和行为。
warning警告
Browsers generally do not allow reading files by specifying filename (it is a security risk), and running 浏览器通常不允许通过指定文件名来读取文件(这有安全风险),在浏览器中运行XLSX.readFile
in the browser will throw an error.XLSX.readFile
会引发错误。
Deno scripts must be invoked with 调用Deno脚本时必须使用--allow-read
to read from the filesystem.--allow-read
以从文件流中读取。
Examples示例
Here are a few common scenarios (click on each subtitle to see the code).以下是一些常见的场景(单击每个子标题以查看代码)。
The demos cover special deployments in more detail.演示更详细地介绍了特殊部署。
Example: Local File示例:本地文件
XLSX.readFile
supports reading local files in platforms like NodeJS. 支持在Node.js等平台中读取本地文件。In other platforms like React Native, 在其他平台(如React Native)中,应使用文件数据调用XLSX.read
should be called with file data.XLSX.read
。
In-browser processing where users drag-and-drop files or use a file element are covered in the "User Submissions" example.“用户提交”示例介绍了在浏览器处理中,用户拖放文件或使用文件元素。
- NodeJS
- Deno
- Electron
- React Native
- Photoshop
readFile
uses 在引擎盖下使用fs.readFileSync
under the hood:fs.readFileSync
:
var XLSX = require("xlsx");
var workbook = XLSX.readFile("test.xlsx");
For Node ESM, 对于节点ESM,必须手动加载fs
must be loaded manually:fs
:
import * as fs from "fs";
import { readFile, set_fs } from "xlsx/xlsx.mjs";
set_fs(fs);
const workbook = readFile("test.xlsx");
readFile
uses 在引擎盖下使用Deno.readFileSync
under the hood:Deno.readFileSync
:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.18.9/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.18.9/package/xlsx.mjs';
const workbook = XLSX.readFile("test.xlsx");
Applications reading files must be invoked with the 读取文件的应用程序必须使用--allow-read
flag. --allow-read
标志调用。The deno
demo has more examplesdeno
演示有更多示例
readFile
can be used in the renderer process:可以在渲染器过程中使用:
/* From the renderer process */
var XLSX = require("xlsx");
var workbook = XLSX.readFile(path);
Electron APIs have changed over time. 随着时间的推移,Electron API发生了变化。The electron
demo shows a complete example and details the required version-specific settings.electron
演示展示了一个完整的示例,并详细介绍了所需的特定于版本的设置。
caution
React Native does not provide a way to read files from the filesystem. React Native不提供从文件系统读取文件的方法。A separate third-party library must be used.必须使用单独的第三方库。
Since React Native internals change between releases, libraries may only work with specific versions of React Native. 由于React本机内部在不同版本之间会发生变化,因此库只能与React本机的特定版本一起工作。Project documentation should be consulted before picking a library.在选择库之前,应查阅项目文档。
The react
demo includes a sample React Native app.react
演示包括一个示例react Native应用程序。
The following libraries have been tested:以下库已经过测试:
The base64
encoding returns strings compatible with the base64
type:base64
编码返回与base64
类型兼容的字符串:
import * as XLSX from "xlsx";
import { FileSystem } from "react-native-file-access";
const b64 = await FileSystem.readFile(path, "base64");
/* b64 is a base64 string */
const workbook = XLSX.read(b64, {type: "base64"});
The ascii
encoding returns binary strings compatible with the binary
type:ascii
编码返回与binary
类型兼容的二进制字符串:
import * as XLSX from "xlsx";
import { readFile } from "react-native-fs";
const bstr = await readFile(path, "ascii");
/* bstr is a binary string */
const workbook = XLSX.read(bstr, {type: "binary"});
readFile
wraps the 在Photoshop和其他ExtendScript目标中包装File
logic in Photoshop and other ExtendScript targets. File
逻辑。The specified path should be an absolute path:指定的路径应为绝对路径:
#include "xlsx.extendscript.js"
/* Read test.xlsx from the Documents folder */
var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");
For user-configurable paths, 对于用户可配置的路径,openDialog
can show a file picker:openDialog
可以显示文件选择器:
#include "xlsx.extendscript.js"
/* Ask user to select path */
var thisFile = File.openDialog("Select a spreadsheet");
var workbook = XLSX.readFile(thisFile.absoluteURI);
The extendscript
demo includes a more complex example.extendscript
演示包括一个更复杂的示例。
Example: User Submissions示例:用户提交
This example focuses on user-submitted files through a drag-and-drop event, HTML file input element, or network request.本例重点介绍用户通过拖放事件、HTML文件输入元素或网络请求提交的文件。
- Browser
- NodeJS
- Deno
For modern websites targeting Chrome 76+针对Chrome 76的现代网站+, ,建议使用File#arrayBuffer
is recommended:File#arrayBuffer
:
Drag and Drop拖放HTML File Input ElementHTML文件输入元素
Assume 假设drop_dom_element
is the DOM element that will listen for changes:drop_dom_element
是将侦听更改的dom元素:
<div id="drop_dom_element">Drop files here</div>
The event property is 事件属性为e.dataTransfer
. e.dataTransfer
。The code snippet highlights the difference between the drag-and-drop example and the file input example:代码片段突出显示了拖放示例和文件输入示例之间的差异:
// XLSX is a global from the standalone script
async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);
Starting with an HTML INPUT element with 从HTMLtype="file"
:<INPUT type="file" />
元素开始:
<input type="file" id="input_dom_element">
The event property is 事件属性为e.target
. e.target
。The code snippet highlights the difference between the drag-and-drop example and the file input example:代码片段突出显示了拖放示例和文件输入示例之间的差异:
// XLSX is a global from the standalone script
async function handleFileAsync(e) {
const file = e.target.files[0];
const data = await file.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
input_dom_element.addEventListener("change", handleFileAsync, false);
https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.演示FileReader技术。
For maximal compatibility (IE10+)最大兼容性(IE10+), the ,建议使用FileReader
approach is recommended:FileReader
方法:
Drag and Drop拖放HTML File Input ElementHTML文件输入元素
Assume 假设drop_dom_element
is the DOM element that will listen for changes:drop_dom_element
是将侦听更改的dom元素:
<div id="drop_dom_element">Drop files here</div>
The event property is 事件属性为e.dataTransfer
. e.dataTransfer
。The code snippet highlights the difference between the drag-and-drop example and the file input example:代码片段突出显示了拖放示例和文件输入示例之间的差异:
function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);
Starting with an HTML INPUT element with 从HTMLtype="file"
:<INPUT type="file" >
元素开始:
<input type="file" id="input_dom_element">
The event property is 事件属性为e.target
. e.target
。The code snippet highlights the difference between the drag-and-drop example and the file input example:代码片段突出显示了拖放示例和文件输入示例之间的差异:
function handleFile(e) {
var file = e.target.files[0];
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(e.target.result);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(file);
}
input_dom_element.addEventListener("change", handleFile, false);
The oldie
demo shows an IE-compatible fallback scenario.oldie
演示展示了一个与IE兼容的回退场景。
read
can accept a NodeJS buffer. 可以接受Node.js缓冲区。readFile
can read files generated by a HTTP POST request body parser like formidable:可以读取HTTP POST请求主体解析器生成的文件,如formidable:
const XLSX = require("xlsx");
const http = require("http");
const formidable = require("formidable");
const server = http.createServer((req, res) => {
const form = new formidable.IncomingForm();
form.parse(req, (err, fields, files) => {
/* grab the first file */
const f = Object.entries(files)[0][1];
const path = f.filepath;
const workbook = XLSX.readFile(path);
/* DO SOMETHING WITH workbook HERE */
});
}).listen(process.env.PORT || 7262);
The server
demo has more advanced examples.server
演示有更高级的示例。
Drash is a framework for Deno's HTTP server. 是Deno的HTTP服务器的框架。In a 在POST
request handler, the body parser can pull file data into a Uint8Array
:POST
请求处理程序中,主体解析器可以将文件数据拉入Uint8Array
:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.18.9/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.18.9/package/xlsx.mjs';
/* load the codepage support library for extended support with older formats */
import * as cptable from 'https://cdn.sheetjs.com/xlsx-0.18.9/package/dist/cpexcel.full.mjs';
XLSX.set_cptable(cptable);
import * as Drash from "https://deno.land/x/drash@v2.5.4/mod.ts";
class SheetResource extends Drash.Resource {
public paths = ["/"];
public POST(request: Drash.Request, response: Drash.Response) {
const file = request.bodyParam<Drash.Types.BodyFile>("file");
if (!file) throw new Error("File is required!");
var wb = XLSX.read(file.content, {type: "buffer"});
var html = XLSX.utils.sheet_to_html(wb.Sheets[wb.SheetNames[0]]);
return response.html(html);
}
}
const server = new Drash.Server({ hostname: "", port: 7262, protocol: "http",
resources: [
SheetResource,
],
});
server.run();
note
Deno must be run with the 必须使用--allow-net
flag to enable network requests:--allow-net
标志运行Deno以启用网络请求:
$ deno run --allow-net test-server.ts
To test, submit a POST request to http://localhost:7262 including a file:要进行测试,请向提交POST请求http://localhost:7262包括文件:
curl -X POST -F "file=@test.xlsx" http://localhost:7262/
Example: Remote File示例:远程文件
This example focuses on fetching files ("Ajax" in browser parlance) using APIs like 本例重点介绍使用XMLHttpRequest
and fetch
as well as third-party libraries.XMLHttpRequest
和fetch
等API以及第三方库获取文件(用浏览器的说法是“Ajax”)。
- Browser
- NodeJS
- Deno
- Electron
For modern websites targeting Chrome 42+, 对于以Chrome 42+为目标的现代网站,建议使用fetch
is recommended:fetch
:
// XLSX is a global from the standalone script
(async() => {
const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
})();
For broader support, the 为了获得更广泛的支持,建议使用XMLHttpRequest
approach is recommended:XMLHttpRequest
方法:
var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";
req.onload = function(e) {
var workbook = XLSX.read(req.response);
/* DO SOMETHING WITH workbook HERE */
};
req.send();
The xhr
demo includes a longer discussion and more examples.xhr
演示包括更长的讨论和更多的示例。
http://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.显示了IE6+的回退方法。
Node 17.5 and 18.0 have native support for fetch:Node 17.5和18.0原生支持fetch
:
const XLSX = require("xlsx");
const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
For broader compatibility, third-party modules are recommended.为了更广泛的兼容性,建议使用第三方模块。
request requires a 需要null
encoding to yield Buffers:null
编码以生成缓冲区:
var XLSX = require("xlsx");
var request = require("request");
var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
request({url: url, encoding: null}, function(err, resp, body) {
var workbook = XLSX.read(body);
/* DO SOMETHING WITH workbook HERE */
});
axios works the same way in browser and in NodeJS:在浏览器和Node.js中的工作方式相同:
const XLSX = require("xlsx");
const axios = require("axios");
const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
(async() => {
const res = await axios.get(url, {responseType: "arraybuffer"});
/* res.data is a Buffer */
const workbook = XLSX.read(res.data);
/* DO SOMETHING WITH workbook HERE */
})();
Deno has native support for fetch.Deno原生支持fetch
。
// @deno-types="https://cdn.sheetjs.com/xlsx-0.18.9/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.18.9/package/xlsx.mjs';
/* load the codepage support library for extended support with older formats */
import * as cptable from 'https://cdn.sheetjs.com/xlsx-0.18.9/package/dist/cpexcel.full.mjs';
XLSX.set_cptable(cptable);
const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
note
Deno must be run with the 必须使用--allow-net
flag to enable network requests:--allow-net
标志运行Deno以启用网络请求:
$ deno run --allow-net test-fetch.ts
The 主进程中的net
module in the main process can make HTTP/HTTPS requests to external resources. net
模块可以向外部资源发出HTTP/HTTPS请求。Responses should be manually concatenated using 应使用Buffer.concat
:Buffer.concat
手动连接响应:
const XLSX = require("xlsx");
const { net } = require("electron");
const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
const req = net.request(url);
req.on("response", (res) => {
const bufs = []; // this array will collect all of the buffers
res.on("data", (chunk) => { bufs.push(chunk); });
res.on("end", () => {
const workbook = XLSX.read(Buffer.concat(bufs));
/* DO SOMETHING WITH workbook HERE */
});
});
req.end();
Example: Readable Streams示例:可读流
caution小心
The recommended approach is to buffer streams in memory and process once all of the data has been collected. 建议的方法是在内存中缓冲流,并在收集完所有数据后进行处理。A proper streaming parse is technically impossible.正确的流解析在技术上是不可能的。
Technical details技术细节 (click to show)
XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary and XML entries. XLSX、XLSB、NUMBERS和ODS文件最终是包含二进制和XML条目的ZIP文件。The ZIP file format stores the table of contents ("end of central directory" record) at the end of the file, so a proper parse of a ZIP file requires scanning from the end. ZIP文件格式将目录(“中央目录结束”记录)存储在文件的末尾,因此正确解析ZIP文件需要从末尾进行扫描。Streams do not provide random access into the data, so the only correct approach involves buffering the entire stream.流不提供对数据的随机访问,因此唯一正确的方法是缓冲整个流。
XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format". 用于Mac文件的XLS、XLR、QPW和Works 4使用“复合文件二进制格式”。It is a container format that can hold multiple "files" and "folders". 它是一种容器格式,可以容纳多个“文件”和“文件夹”。It also has a table of contents ("directory sectors") but these can be placed anywhere in the file! 它还有一个目录(“目录扇区”),但可以放在文件中的任何位置! The only correct approach involves buffering enough of the stream to find the full table of contents, but the added complexity has little benefit when testing against real-world files generated by various versions of Excel and other tools.唯一正确的方法是缓冲足够的流以找到完整的目录,但在测试由各种版本的Excel和其他工具生成的真实文件时,增加的复杂性几乎没有好处。
- Browser
- NodeJS
When dealing with 在处理ReadableStream
, the easiest approach is to buffer the stream and process the whole thing at the end:ReadableStream
时,最简单的方法是缓冲流并在最后处理整个过程:
// XLSX is a global from the standalone script
async function process_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}
/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));
let off = 0;
for(const u8 of buffers) {
out.set(u8, off);
off += u8.length;
}
return out;
}
const data = await process_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data, {type: 'array'});
When dealing with Readable Streams, the easiest approach is to buffer the stream and process the whole thing at the end:在处理可读流时,最简单的方法是缓冲流并在最后处理整个过程:
var fs = require("fs");
var XLSX = require("xlsx");
function process_RS(stream, cb) {
var buffers = [];
stream.on("data", function(data) { buffers.push(data); });
stream.on("end", function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer, {type:"buffer"});
/* DO SOMETHING WITH workbook IN THE CALLBACK */
cb(workbook);
});
}
More detailed examples are covered in the included demos包含的演示中包含了更详细的示例
Processing JSON and JS Data处理JSON和JS数据
JSON and JS data tend to represent single worksheets. JSON和JS数据往往表示单个工作表。 This section will use a few utility functions to generate workbooks.本节将使用一些实用程序函数来生成工作簿。
Create a new Workbook创建新工作簿
var workbook = XLSX.utils.book_new();
The book_new
utility function creates an empty workbook with no worksheets.book_new
实用工具函数创建一个没有工作表的空工作簿。
Spreadsheet software generally require at least one worksheet and enforce the requirement in the user interface. 电子表格软件通常需要至少一个工作表,并在用户界面中强制执行该要求。This library enforces the requirement at write time, throwing errors if an empty workbook is passed to write functions.该库在写入时强制执行该要求,如果将空工作簿传递给写入函数,则会引发错误。
API
Create a worksheet from an array of arrays of JS values从JS值数组创建工作表
var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);
The aoa_to_sheet
utility function walks an "array of arrays" in row-major order, generating a worksheet object. aoa_to_sheet
实用程序函数按行主顺序遍历“数组数组”,生成一个工作表对象。The following snippet generates a sheet with cell 以下代码段生成一张表,其中单元格A1
set to the string A1
, cell B1
set to B1
, etc:A1
设置为字符串A1
,单元格B1
设置为B1
,等等:
var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]);
"Array of Arrays Input"“数组的数组输入” describes the function and the optional 更详细地描述了函数和可选opts
argument in more detail.opts
参数。
Create a worksheet from an array of JS objects从JS对象数组创建工作表
var worksheet = XLSX.utils.json_to_sheet(jsa, opts);
The json_to_sheet
utility function walks an array of JS objects in order, generating a worksheet object. json_to_sheet
实用程序函数按顺序遍历JS对象数组,生成一个工作表对象。By default, it will generate a header row and one row per object in the array. 默认情况下,它将在数组中为每个对象生成标题行和一行。The optional 可选opts
argument has settings to control the column order and header output.opts
参数具有控制列顺序和标题输出的设置。
"Array of Objects Input"“对象数组输入” describes the function and the optional 更详细地描述了函数和可选opts
argument in more detail.opts
参数。
Examples示例
"Complete Example" contains a detailed example "Get Data from a JSON Endpoint and Generate a Workbook"包含一个详细示例“从JSON端点获取数据并生成工作簿”
x-spreadsheet is an interactive data grid for previewing and modifying structured data in the web browser. 是一种交互式数据网格,用于在web浏览器中预览和修改结构化数据。 The demo includes a sample script with the 演示包括一个带有xtos
function for converting from x-spreadsheet to a workbook. xtos
函数的示例脚本,用于将x-spreadsheet转换为工作簿。Live Demo: 现场演示:https://oss.sheetjs.com/sheetjs/x-spreadsheet
Records from a database query (SQL or no-SQL)数据库查询中的记录(SQL或无SQL) (click to show)
The database
demodatabase
演示 includes examples of working with databases and query results.包括使用数据库和查询结果的示例。
Numerical Computations with TensorFlow.jsTensorFlow.js的数值计算 (click to show)
@tensorflow/tfjs
and other libraries expect data in simple arrays, well-suited for worksheets where each column is a data vector. 其他库希望数据以简单数组形式存在,非常适合于每列都是数据向量的工作表。 That is the transpose of how most people use spreadsheets, where each row is a vector.这是大多数人使用电子表格方式的转换,其中每一行都是一个向量。
When recovering data from 从tfjs
, the returned data points are stored in a typed array. tfjs
恢复数据时,返回的数据点存储在类型化数组中。An array of arrays can be constructed with loops. 数组数组可以用循环构造。Array#unshift
can prepend a title row before the conversion:Array#unshift
可以在转换之前预加标题行:
const XLSX = require("xlsx");
const tf = require('@tensorflow/tfjs');
/* suppose xs and ys are vectors (1D tensors) -> tfarr will be a typed array */
const tfdata = tf.stack([xs, ys]).transpose();
const shape = tfdata.shape;
const tfarr = tfdata.dataSync();
/* construct the array of arrays */
const aoa = [];
for(let j = 0; j < shape[0]; ++j) {
aoa[j] = [];
for(let i = 0; i < shape[1]; ++i) aoa[j][i] = tfarr[j * shape[1] + i];
}
/* add headers to the top */
aoa.unshift(["x", "y"]);
/* generate worksheet */
const worksheet = XLSX.utils.aoa_to_sheet(aoa);
The array
demo shows a complete example.array
演示展示了一个完整的示例。
Processing HTML Tables处理HTML表
API
Create a worksheet by scraping an HTML TABLE in the page通过在页面中删除HTML表来创建工作表
var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
The table_to_sheet
utility function takes a DOM TABLE element and iterates through the rows to generate a worksheet. table_to_sheet
实用程序函数接受DOM table元素,并遍历行以生成工作表。 The opts
argument is optional. opts
参数是可选的。"HTML Table Input" describes the function in more detail.更详细地描述了该功能。
Create a workbook by scraping an HTML TABLE in the page通过在页面中删除HTML表来创建工作簿
var workbook = XLSX.utils.table_to_book(dom_element, opts);
The table_to_book
utility function follows the same logic as table_to_sheet
. table_to_book
实用程序函数遵循与table_to_sheet
相同的逻辑。After generating a worksheet, it creates a blank workbook and appends the spreadsheet.生成工作表后,它会创建一个空白工作簿并附加电子表格。
The options argument supports the same options as table_to_sheet
, with the addition of a sheet
property to control the worksheet name. opts
参数支持与table_to_sheet
相同的选项,并添加了一个sheet
属性来控制工作表名称。If the property is missing or no options are specified, the default name 如果缺少属性或未指定任何选项,则使用默认名称Sheet1
is used.Sheet1
。
Examples示例
Here are a few common scenarios (click on each subtitle to see the code):以下是一些常见场景(单击每个子标题以查看代码):
HTML TABLE element in a webpage网页中的HTML表元素 (click to show)
<!-- include the standalone script and shim. this uses the UNPKG CDN -->
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/shim.min.js"></script>
<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>
<!-- example table with id attribute -->
<table id="tableau">
<tr><td>Sheet</td><td>JS</td></tr>
<tr><td>12345</td><td>67</td></tr>
</table>
<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));
/* DO SOMETHING WITH workbook HERE */
</script>
Multiple tables on a web page can be converted to individual worksheets:网页上的多个表可以转换为单个工作表:
/* create new workbook */
var workbook = XLSX.utils.book_new();
/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");
/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");
/* workbook now has 2 worksheets */
Alternatively, the HTML code can be extracted and parsed:或者,可以提取和解析HTML代码:
var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)
The chrome
demo shows a complete example and details the required permissions and other settings.chrome
演示展示了一个完整的示例,并详细介绍了所需的权限和其他设置。
In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:在扩展中,建议在内容脚本中生成工作簿,并将对象传递回扩展:
/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
Server-Side HTML Tables with Headless Chrome带有无头Chrome的服务器端HTML表 (click to show)
The headless
demo includes a complete demo to convert HTML files to XLSB workbooks. headless
演示包括将HTML文件转换为XLSB工作簿的完整演示。The core idea is to add the script to the page, parse the table in the page context, generate a 核心思想是将脚本添加到页面,在页面上下文中解析表,生成base64
workbook and send it back for further processing:base64
工作簿并将其发送回以进行进一步处理:
const XLSX = require("xlsx");
const { readFileSync } = require("fs"), puppeteer = require("puppeteer");
const url = `https://sheetjs.com/demos/table`;
/* get the standalone build source (node_modules/xlsx/dist/xlsx.full.min.js) */
const lib = readFileSync(require.resolve("xlsx/dist/xlsx.full.min.js"), "utf8");
(async() => {
/* start browser and go to web page */
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, {waitUntil: "networkidle2"});
/* inject library */
await page.addScriptTag({content: lib});
/* this function `s5s` will be called by the script below, receiving the Base64-encoded file */
await page.exposeFunction("s5s", async(b64) => {
const workbook = XLSX.read(b64, {type: "base64" });
/* DO SOMETHING WITH workbook HERE */
});
/* generate XLSB file in webpage context and send back result */
await page.addScriptTag({content: `
/* call table_to_book on first table */
var workbook = XLSX.utils.table_to_book(document.querySelector("TABLE"));
/* generate XLSX file */
var b64 = XLSX.write(workbook, {type: "base64", bookType: "xlsb"});
/* call "s5s" hook exposed from the node process */
window.s5s(b64);
`});
/* cleanup */
await browser.close();
})();
Server-Side HTML Tables with Headless WebKit带有无头WebKit的服务器端HTML表 (click to show)
The headless
demo includes a complete demo to convert HTML files to XLSB workbooks using PhantomJS. headless
演示包括使用PhantomJS将HTML文件转换为XLSB工作簿的完整演示。The core idea is to add the script to the page, parse the table in the page context, generate a 核心思想是将脚本添加到页面中,在页面上下文中解析表,生成binary
workbook and send it back for further processing:binary
工作簿并将其发送回进行进一步处理:
var XLSX = require('xlsx');
var page = require('webpage').create();
/* this code will be run in the page */
var code = [ "function(){",
/* call table_to_book on first table */
"var wb = XLSX.utils.table_to_book(document.body.getElementsByTagName('table')[0]);",
/* generate XLSB file and return binary string */
"return XLSX.write(wb, {type: 'binary', bookType: 'xlsb'});",
"}" ].join("");
page.open('https://sheetjs.com/demos/table', function() {
/* Load the browser script from the UNPKG CDN */
page.includeJs("https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js", function() {
/* The code will return an XLSB file encoded as binary string */
var bin = page.evaluateJavaScript(code);
var workbook = XLSX.read(bin, {type: "binary"});
/* DO SOMETHING WITH workbook HERE */
phantom.exit();
});
});
NodeJS HTML Tables without a browser没有浏览器的Node.js HTML表 (click to show)
NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. Node.js不包括DOM实现,而Puppeter需要大量的Chromium构建。jsdom is a lightweight alternative:是一种轻量级替代方案:
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");
/* obtain HTML string. This example reads from test.html */
const html_str = fs.readFileSync("test.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);