Skip to main content

Math and Statistics数学与统计学

Each valid number in Excel can be represented as an "IEEE754 double"1.Excel中的每个有效数字都可以表示为“IEEE754 double”1

With full support for IEEE754 doubles and singles, JavaScript is an excellent language for mathematics and statistical analysis. It has also proven to be a viable platform for machine learning.JavaScript完全支持IEEE754双精度和单精度,是数学和统计分析的优秀语言。它也被证明是一个可行的机器学习平台。

Demos演示文档

Demos for various libraries are included in separate pages:不同库的演示包含在不同的页面中:

Typed Arrays类型化数组

Modern JavaScript math and statistics libraries typically use Float64Array or Float32Array objects to efficiently store data variables.现代JavaScript数学和统计库通常使用Float64ArrayFloat32Array对象来有效地存储数据变量。

Technical details技术细节 (click to show)

Under the hood, ArrayBuffer objects represent raw binary data. 在后台,ArrayBuffer对象表示原始二进制数据。"Typed arrays" such as Float64Array and Float32Array are objects designed for efficient interpretation and mutation of ArrayBuffer data.Float64ArrayFloat32Array等“类型数组”是为有效解释和突变ArrayBuffer数据而设计的对象。

ArrayBuffer objects are roughly analogous to heap-allocated memory. Typed arrays behave like typed pointers.对象大致类似于堆分配的内存。类型化数组的行为类似于类型化游标。

JavaScript

const buf = new ArrayBuffer(16);
const dbl = new Float64Array(buf);
dbl[1] = 3.14159;
const u8 = new Uint8Array(buf);
for(let i = 0; i < 8; ++i)
console.log(u8[i+8]);

Equivalent C

void *const buf = malloc(16);
double *const dbl = (double *)buf;
dbl[1] = 3.14159;
uint8_t *const u8 = (uint8_t *)buf;
for(uint8_t i = 0; i < 8; ++i)
printf("%u\n", u8[i+8]);

Reading from Sheets从工作表中读取

Each typed array class has a from static method for converting data into a typed array. 每个类型化数组类都有一个from静态方法,用于将数据转换为类型化数组。Float64Array.from returns a double typed array (8 bytes per value) and Float32Array.from generates a float typed array (4 bytes).Float64Array.from返回一个double类型数组(每个值8个字节),Float32Array.from生成一个float类型数组(4个字节)。

const column_f32 = Float32Array.from(arr); // 4-byte floats
const column_f64 = Float64Array.from(arr); // 8-byte doubles

Values in the array will be coerced to the relevant data type. Unsupported entries will be converted to quiet NaN values.数组中的值将被强制为相关的数据类型。不支持的条目将转换为安静的NaN值。

Extracting Worksheet Data提取工作表数据

The SheetJS sheet_to_json2 method with the option header: 1 generates an array of arrays from a worksheet object. 具有选项header: 1的SheetJS sheet_to_json2方法从工作表对象生成一个数组数组。The result is in row-major order:结果按行主要顺序排列:

const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1});

Categorical Variables范畴变量

Dichotomous variables are commonly represented as spreadsheet TRUE or FALSE. 二分变量通常表示为电子表格TRUEFALSEThe SheetJS sheet_to_json method will translate these values to true and false. SheetJS sheet_to_json方法将这些值转换为truefalseTyped array methods will interpret values as 1 and 0 respectively.类型化数组方法将把值分别解释为10

Polychotomous variables must be manually mapped to numeric values. For example, using the Iris dataset:多光子变量必须手动映射到数值。例如,使用Iris数据集:

Iris dataset

[
["sepal length", "sepal width", "petal length", "petal width", "class"],
[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],
[4.9, 3, 1.4, 0.2, "Iris-setosa"],
]

Column E (class) is a polychotomous variable and must be manually translated:列E(class)是一个多分类变量,必须手动翻译:

const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1});

/* index_to_class will be needed to recover the values later稍后将需要index_to_class来恢复值 */
const index_to_class = [];

/* map from class name to number从类名映射到编号 */
const class_to_index = new Map();

/* loop over the data在数据上循环 */
for(let R = 1; R < aoa.length; ++R) {
/* Column E = SheetJS row 4列E=SheetJS第4行 */
const category = aoa[R][4];
const val = class_to_index.get(category);
if(val == null) {
/* assign a new index分配新索引 */
class_to_index.set(category, index_to_class.length);
aoa[R][4] = index_to_class.length;
index_to_class.push(category);
} else aoa[R][4] = val;
}
Live Demo (click to show)

This example fetches and parses iris.xlsx. 此示例获取并解析iris.xlsxThe first worksheet is processed and the new data and mapping are printed.处理第一个工作表,并打印新的数据和映射。

Result
Loading...
Live Editor

One Variable per Column每列一个变量

It is common to store datasets where each row represents an observation and each column represents a variable:通常存储数据集,其中每行表示一个观测值,每列表示一个变量:

Iris dataset

var aoa = [
["sepal length", "sepal width", "petal length", "petal width", "class"],
[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],
[4.9, 3, 1.4, 0.2, "Iris-setosa"],
]

An array map operation can pull data from an individual column. After mapping, a slice can remove the header label. For example, the following snippet pulls column C ("petal length") into a Float64Array:数组map操作可以从单个列中提取数据。映射后,slice可以移除标头标签。例如,以下片段将列C(“花瓣长度”)拉入Float64Array

const C = XLSX.utils.decode_col("C"); // Column "C" = SheetJS index 2
const petal_length = Float64Array.from(aoa.map(row => row[C]).slice(1));

One Variable per Row每行一个变量

Some datasets are stored in tables where each row represents a variable and each column represents an observation:一些数据集存储在表中,其中每行表示一个变量,每列表示一个观测值:

JavaScriptSpreadsheet电子表格
var aoa = [
["sepal length", 5.1, 4.9],
["sepal width", 3.5, 3],
["petal length", 1.4, 1.4],
["petal width", 0.2, 0.2],
["class", "setosa", "setosa"]
]

Single column of data

From the row-major array of arrays, each entry of the outer array is a row.从数组的行主数组来看,外部数组的每个条目都是一行。

Many sheets include header columns. 许多图纸都包含标题列。The slice method can remove the header. slice方法可以移除标头。After removing the header, Float64Array.from can generate a typed array. For example, this snippet pulls row 3 ("petal length") into a Float64Array:移除标头后,Float64Array.from可以生成一个类型化数组。例如,此片段将第3行(“花瓣长度”)拉入Float64Array

const petal_length = Float64Array.from(aoa[2].slice(1));

Writing to Sheets写入工作表

The SheetJS aoa_to_sheet3 method can generate a worksheet from an array of arrays. SheetJS aoa_to_sheet3方法可以从数组中生成工作表。Similarly, sheet_add_aoa4 can add an array of arrays of data into an existing worksheet object. 类似地,sheet_add_aoa4可以将数据数组数组添加到现有的工作表对象中。The origin option5 controls where data will be written in the worksheet.origin option5控制数据在工作表中的写入位置。

Neither method understands typed arrays, so data columns must be converted to arrays of arrays.两种方法都不理解类型化数组,因此数据列必须转换为数组的数组。

One Variable per Row每行一个变量

A single typed array can be converted to a pure JS array with Array.from:使用Array.from可以将单个类型的数组转换为纯JS数组:

const arr = Array.from(row);

An array of arrays can be created from the array:可以从阵列创建阵列阵列:

const aoa = [
arr // this array is the first element of the array literal此数组是数组文字的第一个元素
];

aoa_to_sheet and sheet_add_aoa treat this as one row. 将此视为一行。By default, data will be written to cells in the first row of the worksheet.默认情况下,数据将写入工作表第一行的单元格。

Titles can be added to data rows with an unshift operation, but it is more efficient to build up the worksheet with aoa_to_sheet:标题可以通过取消unshift操作添加到数据行,但使用aoa_to_sheet构建工作表更有效:

/* sample data样本数据 */
const data = new Float64Array([54337.95, 3.14159, 2.718281828]);
const title = "Values";

/* convert sample data to array将示例数据转换为数组 */
const arr = Array.from(data);
/* create worksheet from title (array of arrays)根据标题(数组数组)创建工作表 */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add data starting at B1从B1开始添加数据 */
XLSX.utils.sheet_add_aoa(ws, [ arr ], { origin: "B1" });

Typed Array to single row with title

Live Demo (click to hide)

In this example, two typed arrays are exported. aoa_to_sheet creates the worksheet and sheet_add_aoa will add the data to the sheet.在本例中,导出了两个类型化数组。aoa_to_sheet创建工作表,sheet_add_aoa将数据添加到工作表中。

Result
Loading...
Live Editor

One Variable per Column每列一个变量

A single typed array can be converted to a pure JS array with Array.from. 使用Array.from可以将单个类型的数组转换为纯JS数组。For columns, each value should be individually wrapped in an array:对于列,每个值都应单独封装在一个数组中:

JavaScriptSpreadsheet
var data = [
[54337.95],
[3.14159],
[2.718281828]
];

Single column of data

Array.from takes a second argument. If it is a function, the function will be called on each element and the value will be used in place of the original value (in effect, mapping over the data). 接受第二个论点。如果它是一个函数,则将对每个元素调用该函数,并使用该值来代替原始值(实际上,映射到数据上)。To generate a data column, each element must be wrapped in an array literal:若要生成数据列,必须将每个元素包装在数组文字中:

var arr = Array.from(column, (value) => ([ value ]));

aoa_to_sheet and sheet_add_aoa treat this as rows with one column of data per row. By default, data will be written to cells in column "A".aoa_to_sheetsheet_add_aoa将其视为每行有一列数据的行。默认情况下,数据将写入列“A”中的单元格。

Titles can be added to data columns with an unshift operation, but it is more efficient to build up the worksheet with aoa_to_sheet:标题可以通过取消unshift操作添加到数据列,但使用aoa_to_sheet构建工作表更有效:

/* sample data样本数据 */
const data = new Float64Array([54337.95, 3.14159, 2.718281828]);
const title = "Values";

/* convert sample data to array将示例数据转换为数组 */
const arr = Array.from(data, (value) => ([value]));
/* create worksheet from title (array of arrays)根据标题(数组数组)创建工作表 */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add data starting at B1从B1开始添加数据 */
XLSX.utils.sheet_add_aoa(ws, arr, { origin: "A2" });

Typed Array to single column with title

Live Demo (click to hide)

In this example, two typed arrays are exported. 在本例中,导出了两个类型化数组。aoa_to_sheet creates the worksheet and sheet_add_aoa will add the data to the sheet.aoa_to_sheet创建工作表,sheet_add_aoa将数据添加到工作表中。

Result
Loading...
Live Editor