Skip to main content

Import Tutorial导入教程

Many government agencies distribute official data and statistics in workbooks. SheetJS libraries help translate these files to useful information.许多政府机构以工作簿的形式分发官方数据和统计数据。SheetJS库有助于将这些文件转换为有用的信息。

The goal of this example is to process Federal Student Aid Portfolio data from a XLS worksheet. We will download and parse a workbook from the US Department of Education. 本示例的目标是处理XLS工作表中的联邦学生资助组合数据。我们将从美国教育部下载并解析一份工作簿。Once the raw data is parsed, we will extract the total outstanding dollar amount and display the data in a table.一旦解析了原始数据,我们将提取未偿美元总额,并将数据显示在表中。

The "Live Demo" section includes a working demo in this page!“现场演示”部分包括此页面中的工作演示! "Run the Demo Locally"“在本地运行演示” shows how to run the workflow in iOS / Android apps, desktop apps, NodeJS scripts and other environments.展示了如何在iOS/Android应用程序、桌面应用程序、NodeJS脚本和其他环境中运行工作流。

The following sequence diagram shows the process:以下序列图显示了该过程:

UserPageSiteparse fileprocess datagenerate tableload sitefetch fileraw fileshow tableUserPageSite

Download File下载文件

The raw data is available in a XLS workbook1. 原始数据在XLS工作簿1中可用。It has been mirrored at https://sheetjs.com/data/PortfolioSummary.xls它已镜像在https://sheetjs.com/data/PortfolioSummary.xls

This official dataset is distributed in XLS workbooks.该官方数据集分布在XLS工作簿中。

SheetJS supports a number of legacy and modern formats, ensuring that historical data is not lost in the sands of time.SheetJS支持多种传统和现代格式,确保历史数据不会在时间的沙滩上丢失。

Downloading the file is straightforward with fetch:下载文件只需使用fetch即可:

const url = "https://sheetjs.com/data/PortfolioSummary.xls";
const file = await (await fetch(url)).arrayBuffer();
Code Explanation代码说明 (click to show)

fetch is a low-level API for downloading data from an endpoint. It separates the network step from the response parsing step.是用于从端点下载数据的低级API。它将网络步骤与响应解析步骤分开。

Network Step网络步骤

fetch(url) returns a Promise representing the network request. 返回表示网络请求的PromiseThe browser will attempt to download data from the URL. 浏览器将尝试从URL下载数据。If the network request succeeded, the Promise will "return" with a Response object.如果网络请求成功,Promise将“返回”一个Response对象。

Using modern syntax, inside an async function, code should await the fetch:使用现代语法,在async函数内部,代码应该await获取:

const response = await fetch(url);

Checking Status Code检查状态代码

If the file is not available, the fetch will still succeed.如果该文件不可用,则fetch仍将成功。

The status code, stored in the status property of the Response object, is a standard HTTP status code number. Code should check the result.状态代码存储在Response对象的status属性中,是一个标准的HTTP状态代码编号。代码应该检查结果。

Typically servers will return status 404 "File not Found" if the file is not available. A successful request should have status 200 "OK".如果文件不可用,服务器通常会返回状态404“找不到文件”。成功的请求应具有状态200“OK”。

Extracting Data提取数据

Response#arrayBuffer will pull the raw bytes into an ArrayBuffer, an object which can represent the file data. 将原始字节拉入ArrayBuffer,该对象可以表示文件数据。Like fetch, the arrayBuffer method returns a Promise that must be await-ed:fetch一样,arrayBuffer方法返回一个必须等待的Promise

const file = await response.arrayBuffer();

The Response object has other useful methods. Response对象还有其他有用的方法。Response#json will parse the data with JSON.parse, suitable for data from an API endpoint.将使用JSON.parse解析数据,适用于来自API端点的数据。

Production Use生产用途

Functions can test each part independently and report different errors:功能可以独立测试每个部件,并报告不同的错误:

async function get_file_from_endpoint(url) {
/* perform network request执行网络请求 */
let response;
try {
response = await fetch(url);
} catch(e) {
/* network error网络错误 */
throw new Error(`Network Error: ${e.message}`);
}

/* check status code检查状态代码 */
if(response.status == 404) {
/* server 404 error服务器404错误 -- file not found */
throw new Error("File not found");
}
if(response.status != 200) {
/* for most servers, a successful response will have status 200对于大多数服务器,成功响应的状态为200 */
throw new Error(`Server status ${response.status}: ${response.statusText}`);
}

/* get data */
let ab;
try {
ab = await response.arrayBuffer();
} catch(e) {
/* data error */
throw new Error(`Data Error: ${e.message}`);
}

return ab;
}

The file data is stored in an ArrayBuffer.文件数据存储在ArrayBuffer中。

Parse File分析文件

With the file data in hand, XLSX.read2 parses the workbook:有了文件数据,XLSX.read2将解析工作簿:

const workbook = XLSX.read(file);

The workbook object follows the "Common Spreadsheet Format"3, an in-memory format for representing workbooks, worksheets, cells, and spreadsheet features.workbook对象遵循“通用电子表格格式”3,这是一种内存格式,用于表示工作簿、工作表、单元格和电子表格功能。

Explore Dataset浏览数据集

Spreadsheets in the wild use many different inconsistent conventions.电子表格在野外使用许多不同的不一致约定。

To determine how to process the data, it is best to inspect the file first.要确定如何处理数据,最好先检查文件。

List Sheet Names列出数据表名称

As explained in the "Workbook Object"4 section, the SheetNames property is a ordered list of the sheet names in the workbook.如“工作簿对象”4部分所述,SheetNames属性是工作簿中工作表名称的有序列表。

The following live code block displays an ordered list of the sheet names:以下活动代码块显示数据表名称的有序列表:

Result
Loading...
Live EditorLive编辑器

Inspect Worksheet Data检查工作表数据

The Sheets property of the workbook object5 is an object whose keys are sheet names and whose values are sheet objects. 工作簿对象5Sheets属性是一个对象,其键是工作表名称,其值是工作表对象。For example, the first worksheet is pulled by indexing SheetNames and using the name to index Sheets:例如,第一个工作表是通过对SheetNames进行索引并使用该名称对Sheets进行索引来提取的:

var first_sheet = workbook.Sheets[workbook.SheetNames[0]];

The actual worksheet object can be inspected directly6, but it is strongly recommended to use utility functions to present JS-friendly data structures.实际的工作表对象可以直接检查6,但强烈建议使用实用程序函数来呈现JS友好的数据结构。

Preview HTML预览HTML

The sheet_to_html utility function7 generates an HTML table from worksheet objects. sheet_to_html实用程序函数7从工作表对象生成html表。The following live example shows the first 20 rows of data in a table:下面的实际示例显示了表中的前20行数据:

Live example现场示例 (click to show)

SheetJS CE primarily focuses on data processing.SheetJS CE主要专注于数据处理。

SheetJS Pro supports reading cell styles from files and generating styled HTML tables with colors, fonts, alignment and rich text.支持从文件中读取单元格样式,并生成具有颜色、字体、对齐方式和富文本的样式HTML表。

Result
Loading...
Live EditorLive编辑器

The key points from looking at the table are:查看表格的要点如下:

  • The data starts on row 7数据从第7行开始
  • Rows 5 and 6 are the header rows, with merged cells for common titles第5行和第6行是标题行,具有用于公共标题的合并单元格
  • For yearly data (2007-2012), columns A and B are merged对于年度数据(2007-2012),A列和B列合并
  • For quarterly data (2013Q1 - 2023Q2), column A stores the year. Cells may be merged vertically to span 4 quarters对于季度数据(2013Q1-2023Q2),A列存储年份。单元格可以垂直合并为四个四分之一

Extract Data提取数据

Extract Raw Data提取原始数据

XLSX.utils.sheet_to_json8 generates arrays of data from worksheet objects.从工作表对象生成数据数组。

For a complex layout like this, it is easiest to generate an "array of arrays" where each row is an array of cell values. The screenshot shows rows 5-8:对于这样的复杂布局,最容易生成“数组数组”,其中每行都是单元格值的数组。屏幕截图显示第5-8行:

Rows 5-8

In the array of arrays, row 5 has a number of gaps corresponding to empty cells and cells that are covered in the merge ranges:在数组的数组中,行5具有与空单元格和合并范围中覆盖的单元格相对应的多个间隙:

// Row 5 -- the gaps correspond to cells with no content间隙对应于没有内容的单元格
[ , , "Direct Loans", , "Federal Family Education Loans (FFEL)", , "Perkins Loans", , "Total1" ]

Row 7 includes the data for FY2007:第7行包括2007财年的数据:

// Row 7 -- column B is covered by the merge合并包含B列
[ 2007, , 106.8, 7, 401.9, 22.6, 8.2, 2.8, 516, 28.3 ]

XLSX.utils.sheet_to_json will generate an array of arrays if the option header: 1 is specified如果指定了选项header: 1,将生成一个数组数组9:

const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header: 1});

Fill Merged Blocks填充合并的块

Cells A13:A16 are merged:单元格A13:A16合并:

Rows 13-16

The merged data only applies to the top-left cell (A13). 合并后的数据仅适用于左上角的单元格(A13)。The array of arrays will have holes in cells A14:A16 (written as null):数组的数组将在单元A14:A16(写为null)中具有孔:

// Row 13
[2013, "Q1", 508.7, 23.4, 444.9, 22.1, 8.2, 3, 961.9, 38.7]

// Row 14
[null, "Q2", 553, 24.1, 437, 21.6, 8.3, 3, 998.6, 38.9]

// Row 15
[null, "Q3", 569.2, 24.3, 429.5, 21.2, 8.2, 2.9, 1006.8, 38.7]

// Row 16
[null, "Q4", 609.1, 25.6, 423, 20.9, 8.1, 2.9, 1040.2, 39.6]
Live example (click to show)
Result
Loading...
Live EditorLive编辑器

The worksheet !merges property10 includes every merge range in the sheet. 工作表!merges属性10包括工作表中的每个合并范围。It is possible to loop through every merge block and fill cells, but in this case it is easier to post-process the raw data:可以循环遍历每个合并块并填充单元格,但在这种情况下,后处理原始数据更容易:

let last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));

JavaScript code can be extremely concise. The "Code Explanation" blocks explain the code in more detail.JavaScript代码可以非常简洁。“代码解释”块更详细地解释代码。

Code Explanation代码说明 (click to show)

Analyzing every row in the dataset分析数据集中的每一行

Array#forEach takes a function and calls it for every element in the array. Any modifications to objects affect the objects in the original array.获取一个函数并为数组中的每个元素调用它。对对象的任何修改都会影响原始数组中的对象。

For example, this loop will print out the first column in the arrays:例如,此循环将打印出数组中的第一列:

raw_data.forEach(r => {
console.log(r);
});

Tracking the last value seen in a column跟踪列中最后一个值

When looping over the array, Array#forEach can modify variables outside of the function body. For example, the following loop keeps track of the last value:在数组上循环时,array#forEach可以修改函数体之外的变量。例如,以下循环将跟踪最后一个值:

let last_value = null;
raw_data.forEach(r => {
if(r[0] != null) last_value = r[0];
});

Filling in data填写数据

Array#forEach can mutate objects. The following code will assign the last value to the first column if it is not specified:可以使对象发生变异。如果未指定最后一个值,则以下代码会将该值分配给第一列:

let last_value = null;
raw_data.forEach(r => {
if(r[0] != null) last_value = r[0];
else if(r[0] == null && last_value != null) r[0] = last_value;
});

Simplifying the code简化代码

When r[0] == null and last_value == null, assigning r[0] = last_value will not affect the result in the actual data rows:r[0] == nulllast_value == null时,分配r[0] = last_values不会影响实际数据行中的结果:

let last_value = null;
raw_data.forEach(r => {
if(r[0] != null) last_value = r[0];
else if(r[0] == null) r[0] = last_value;
});

For simple data rows, either r[0] == null or r[0] != null, so the if block can be rewritten as a ternary expression:对于简单数据行,r[0]==nullr[0]!=null,因此if块可以重写为三元表达式:

let last_value = null;
raw_data.forEach(r => {
(r[0] != null) ? (last_value = r[0]) : (r[0] = last_value);
});

Observing that r[0] must equal last_value, the inner statement can be rewritten to compute the final value and assign to both variables:注意到r[0]必须等于last_value,可以重写内部语句来计算最终值并分配给这两个变量:

let last_value = null;
raw_data.forEach(r => {
last_value = r[0] = (r[0] != null ? r[0] : last_value);
});

It is tempting to take advantage of implicit logical rules:利用隐含的逻辑规则是很诱人的:

let last_value = null;
raw_data.forEach(r => {
last_value = r[0] = (r[0] || last_value);
});

This is strongly discouraged since the value 0 is false. 强烈建议这样做,因为值0falseThe explicit null test distinguishes null and undefined from 0显式null测试将nullundefined0区分开来

After post-processing, the rows now have proper year fields:经过后处理后,这些行现在具有适当的年份字段:

// Row 13
[2013, "Q1", 508.7, 23.4, 444.9, 22.1, 8.2, 3, 961.9, 38.7]

// Row 14
[2013, "Q2", 553, 24.1, 437, 21.6, 8.3, 3, 998.6, 38.9]

// Row 15
[2013, "Q3", 569.2, 24.3, 429.5, 21.2, 8.2, 2.9, 1006.8, 38.7]

// Row 16
[2013, "Q4", 609.1, 25.6, 423, 20.9, 8.1, 2.9, 1040.2, 39.6]
Live example (click to show)
Result
Loading...
Live EditorLive编辑器

Select Data Rows选择数据行

At this point, every data row will have the year in column A. 此时,每个数据行的A列都将包含年份。Since this year is between 2007 and 2023, Array#filter can be used to select the rows:由于今年介于2007年和2023年之间,因此可以使用Array#filter来选择行:

const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2023);
Live example (click to show)
Result
Loading...
Live Editor

Generate Row Objects生成行对象

Looking at the headers:查看标题:

Rows 5-8

The desired data is in column I. The column index can be calculated using XLSX.utils.decode_col11.所需的数据在列I中。可以使用XLSX.utils.decode_col11计算列索引。

Column Index calculation (click to show)
Result
Loading...
Live Editor

The desired columns are:

ColumnDescription描述Property in Object对象中的属性
A / 0Fiscal Year财政年度FY
B / 1Fiscal Quarter (if applicable)财政季度(如适用)FQ
I / 8Total Dollars Outstanding未偿总美元total

An Array#map over the data can generate the desired row objects:数据上的Array#map可以生成所需的行对象:

const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));

This will generate an array of row objects. Each row object will look like the following row:这将生成一个行对象数组。每一行对象看起来都像下面的一行:

// 2016 Q1 - $1220.3 (billion)
{ "FY": 2016, "FQ": "Q1", "total": 1220.3 }
Live example现场例子 (click to show)
Result
Loading...
Live EditorLive编辑器

Present Data现有数据

At this point, objects is an array of objects.在这一点上,objects是一个对象数组。

ReactJS

The live demos in this example use ReactJS. In ReactJS, arrays of objects are best presented in simple HTML tables12:本例中的实时演示使用ReactJS。在ReactJS中,对象数组最好以简单的HTML表12表示:

<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody>
{objects.map((o,R) => ( <tr key={R}>
<td>{o.FY}</td>
<td>{o.FQ}</td>
<td>{o.total}</td>
</tr>))}
</tbody>
</table>

Vanilla JS

https://sheetjs.com/sl.html is a hosted version of this demo.是此演示的托管版本。

Without a framework, HTML table row elements can be programmatically created with document.createElement and added to the table body element. 在没有框架的情况下,可以使用document.createElement以编程方式创建HTML表行元素,并将其添加到表体元素中。For example, if the page has a stub table:例如,如果页面有一个存根表:

<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody id="tbody"></tbody>
</table>

TR elements can be added to the table body using appendChild:元素可以使用appendChild添加到表体:

/* add rows to table body */
objects.forEach(o => {
const row = document.createElement("TR");
row.innerHTML = `<td>${o.FY}</td><td>${o.FQ||""}</td><td>${o.total}</td>`;
tbody.appendChild(row);
});

Command-Line Tools命令行工具

In the command line, there are ways to display data in a table:在命令行中,有几种方法可以在表中显示数据:

 FY    FQ    Total
-- -- -----
2007 516
2013 Q1 961.9

For data pipelines, tab-separated rows are strongly recommended:对于数据管道,强烈建议使用制表符分隔的行:

  /* print header row*/
console.log(`FY\tFQ\tTotal`);
/* print tab-separated values */
objects.forEach(o => {
console.log(`${o.FY}\t${o.FQ||""}\t${o.total}`);
});

Live Demo现场演示

This demo runs in the web browser! It should automatically fetch the data file and display a table.此演示在web浏览器中运行!它应该自动获取数据文件并显示一个表。

This example includes a row count that can be increased or decreased此示例包括可以增加或减少的行计数

Result
Loading...
Live Editor

Run the Demo Locally在本地运行演示

Save the following script to SheetJSStandaloneDemo.html:将以下脚本保存到SheetJSStandaloneDemo.html

SheetJSStandaloneDemo.html
<body>
<table>
<thead><tr><th>Fiscal Year</th><th>Quarter</th><th>Total (in $B)</th></tr></thead>
<tbody id="tbody"></tbody>
</table>
<script src="https://cdn.sheetjs.com/xlsx-0.20.1/package/dist/xlsx.full.min.js"></script>
<script>
(async() => {
/* parse workbook */
const url = "https://sheetjs.com/data/PortfolioSummary.xls";
const workbook = XLSX.read(await (await fetch(url)).arrayBuffer());

/* get first worksheet */
const worksheet = workbook.Sheets[workbook.SheetNames[0]];
const raw_data = XLSX.utils.sheet_to_json(worksheet, {header:1});

/* fill years */
var last_year = 0;
raw_data.forEach(r => last_year = r[0] = (r[0] != null ? r[0] : last_year));

/* select data rows */
const rows = raw_data.filter(r => r[0] >= 2007 && r[0] <= 2023);

/* generate row objects */
const objects = rows.map(r => ({FY: r[0], FQ: r[1], total: r[8]}));

/* add rows to table body */
objects.forEach(o => {
const row = document.createElement("TR");
row.innerHTML = `<td>${o.FY}</td><td>${o.FQ||""}</td><td>${o.total}</td>`;
tbody.appendChild(row);
});
})();
</script>
</body>

After saving the file, run a local web server in the folder with the HTML file. For example, if NodeJS is installed:保存文件后,在包含HTML文件的文件夹中运行本地web服务器。例如,如果安装了NodeJS:

npx http-server .

The server process will display a URL (typically http://127.0.0.1:8080). 服务器进程将显示一个URL(通常http://127.0.0.1:8080)。Open http://127.0.0.1:8080/SheetJSStandaloneDemo.html in your browser.在您的浏览器中打开http://127.0.0.1:8080/SheetJSStandaloneDemo.html


  1. The dataset URL has changed many times over the years. 这些年来,数据集URL发生了多次更改。The current location for the CC0-licensed dataset can be found by searching for "National Student Loan Data System" on data.gov. CC0许可数据集的当前位置可以通过在data.gov上搜索“国家学生贷款数据系统”找到。PortfolioSummary.xls is the file name within the dataset.是数据集中的文件名。
  2. See read in "Reading Files"
  3. See "SheetJS Data Model"
  4. See "Workbook Object"
  5. See "Workbook Object"
  6. See "Sheet Objects"
  7. See sheet_to_html in "Utilities"
  8. See sheet_to_json in "Utilities"
  9. See sheet_to_json in "Utilities"
  10. See !merges in "Sheet Objects"
  11. See "Column Names" in "Addresses and Ranges"
  12. See "Array of Objects" in "ReactJS"
  13. See "Running on Device" in the React Native documentation for more details.