structInterpret bytes as packed binary data将字节解释为压缩二进制数据

Source code: Lib/struct.py


This module performs conversions between Python values and C structs represented as Python bytes objects. 该模块执行Python值和表示为Pythonbytes对象的C结构之间的转换。This can be used in handling binary data stored in files or from network connections, among other sources. 这可以用于处理存储在文件中或来自网络连接的二进制数据,以及其他来源。It uses Format Strings as compact descriptions of the layout of the C structs and the intended conversion to/from Python values.它使用格式字符串作为C结构布局的紧凑描述,以及与Python值之间的预期转换。

Note

By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. 默认情况下,打包给定C结构的结果包括填充字节,以保持所涉及的C类型的正确对齐;同样,拆包时考虑对齐。This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. 选择此行为是为了使压缩结构的字节与相应C结构的内存布局完全对应。To handle platform-independent data formats or omit implicit pad bytes, use standard size and alignment instead of native size and alignment: see Byte Order, Size, and Alignment for details.要处理与平台无关的数据格式或省略隐式pad字节,请使用standard大小和对齐方式,而不是native大小和对齐方式:有关详细信息,请参阅字节顺序、大小和对齐方式

Several struct functions (and methods of Struct) take a buffer argument. 一些struct函数(和Struct方法)采用buffer参数。This refers to objects that implement the Buffer Protocol and provide either a readable or read-writable buffer. 这是指实现缓冲协议并提供可读或可读写缓冲区的对象。The most common types used for that purpose are bytes and bytearray, but many other types that can be viewed as an array of bytes implement the buffer protocol, so that they can be read/filled without additional copying from a bytes object.用于此目的的最常见类型是bytesbytearray,但许多其他可以视为字节数组的类型实现了缓冲协议,因此可以读取/填充它们,而无需从bytes对象进行额外复制。

Functions and Exceptions功能和例外

The module defines the following exception and functions:该模块定义了以下异常和功能:

exceptionstruct.error

Exception raised on various occasions; argument is a string describing what is wrong.在各种场合提出的例外情况;参数是描述错误的字符串。

struct.pack(format, v1, v2, ...)

Return a bytes object containing the values v1, v2, … packed according to the format string format. 返回一个bytes对象,其中包含根据格式字符串format打包的值v1v2…。The arguments must match the values required by the format exactly.参数必须与格式所需的值完全匹配。

struct.pack_into(format, buffer, offset, v1, v2, ...)

Pack the values v1, v2, … according to the format string format and write the packed bytes into the writable buffer buffer starting at position offset. 根据格式字符串格式打包值v1v2…,并从位置偏移开始将打包的字节写入可写缓冲区。Note that offset is a required argument.请注意,offset是必需的参数。

struct.unpack(format, buffer)

Unpack from the buffer buffer (presumably packed by pack(format, ...)) according to the format string format. 根据格式字符串format从缓冲区buffer解包(可能用pack(format, ...)打包)。The result is a tuple even if it contains exactly one item. 结果是一个元组,即使它只包含一个项。The buffer’s size in bytes must match the size required by the format, as reflected by calcsize().缓冲区的字节大小必须与格式所需的大小相匹配,如calcsize()所示。

struct.unpack_from(format, /, buffer, offset=0)

Unpack from buffer starting at position offset, according to the format string format. 根据格式字符串格式,从位置offset开始从buffer解包。The result is a tuple even if it contains exactly one item. 结果是一个元组,即使它只包含一个项。The buffer’s size in bytes, starting at position offset, must be at least the size required by the format, as reflected by calcsize().缓冲区的大小(以字节为单位),从位置offset开始,必须至少为格式所需的大小,如calcsize()所反映。

struct.iter_unpack(format, buffer)

Iteratively unpack from the buffer buffer according to the format string format. 根据格式字符串format从缓冲区buffer迭代解包。This function returns an iterator which will read equally-sized chunks from the buffer until all its contents have been consumed. 此函数返回一个迭代器,该迭代器将从缓冲区中读取大小相等的块,直到其所有内容都被消耗。The buffer’s size in bytes must be a multiple of the size required by the format, as reflected by calcsize().缓冲区的字节大小必须是格式所需大小的倍数,如calcsize()所示。

Each iteration yields a tuple as specified by the format string.每次迭代产生一个由格式字符串指定的元组。

New in version 3.4.版本3.4中新增。

struct.calcsize(format)

Return the size of the struct (and hence of the bytes object produced by pack(format, ...)) corresponding to the format string format.返回结构的大小(因此也是由pack(format, ...)生成的bytes对象的大小)对应于格式字符串format

Format Strings设置字符串格式

Format strings are the mechanism used to specify the expected layout when packing and unpacking data. 格式字符串是在打包和解包数据时用于指定预期布局的机制。They are built up from Format Characters, which specify the type of data being packed/unpacked. 它们由格式字符组成,格式字符指定要打包/解包的数据类型。In addition, there are special characters for controlling the Byte Order, Size, and Alignment.此外,还有一些特殊字符用于控制字节顺序、大小和对齐方式

Byte Order, Size, and Alignment字节顺序、大小和对齐方式

By default, C types are represented in the machine’s native format and byte order, and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler).默认情况下,C类型以机器的本机格式和字节顺序表示,并在必要时通过跳过pad字节来正确对齐(根据C编译器使用的规则)。

Alternatively, the first character of the format string can be used to indicate the byte order, size and alignment of the packed data, according to the following table:或者,根据下表,格式字符串的第一个字符可用于指示压缩数据的字节顺序、大小和对齐方式:

Character性格

Byte order字节顺序

Size大小

Alignment对齐

@

native原生

native原生

native原生

=

native原生

standard标准

none

<

little-endian小端

standard标准

none

>

big-endian大端元

standard标准

none

!

network (= big-endian)网络(=大端)

standard标准

none

If the first character is not one of these, '@' is assumed.如果第一个字符不是这些字符之一,假设为'@'

Native byte order is big-endian or little-endian, depending on the host system. 本机字节顺序是大端或小端,具体取决于主机系统。For example, Intel x86 and AMD64 (x86-64) are little-endian; Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature switchable endianness (bi-endian). 例如,Intel x86和AMD64(x86-64)是little endian;摩托罗拉68000和PowerPC G5是big endian;ARM和英特尔安腾具有可切换的端位性(双端位)。Use sys.byteorder to check the endianness of your system.使用sys.byteorder检查系统的endianness。

Native size and alignment are determined using the C compiler’s sizeof expression. 本机大小和对齐方式是使用C编译器的sizeof表达式确定的。This is always combined with native byte order.这总是与本机字节顺序相结合。

Standard size depends only on the format character; see the table in the Format Characters section.标准大小仅取决于格式字符;请参阅字符格式部分中的表格。

Note the difference between '@' and '=': both use native byte order, but the size and alignment of the latter is standardized.请注意'@''='之间的差异:两者都使用本机字节顺序,但后者的大小和对齐方式是标准化的。

The form '!' represents the network byte order which is always big-endian as defined in IETF RFC 1700.表单'!'表示网络字节顺序,该顺序始终是IETF RFC 1700中定义的大端。

There is no way to indicate non-native byte order (force byte-swapping); use the appropriate choice of '<' or '>'.无法指示非本机字节顺序(强制字节交换);使用适当的'<''>'

Notes:注意:

  1. Padding is only automatically added between successive structure members. 填充仅在连续结构成员之间自动添加。No padding is added at the beginning or the end of the encoded struct.未在编码结构的开头或结尾添加填充。

  2. No padding is added when using non-native size and alignment, e.g. with ‘<’, ‘>’, ‘=’, and ‘!’.使用非本机大小和对齐方式时,不添加填充,例如使用“<”、“>”、“=”和“!”。

  3. To align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero. 要将结构的结尾与特定类型的对齐要求对齐,请使用该类型的代码结束格式,重复计数为零。See Examples.请参阅示例

Format Characters设置字符格式

Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. 格式字符具有以下含义:;考虑到C和Python值的类型,它们之间的转换应该是显而易见的。The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<', '>', '!' or '='. “标准大小”列是指使用标准大小时压缩值的大小(以字节为单位);也就是说,当格式字符串以'<''>''!''='之一开头时。When using native size, the size of the packed value is platform-dependent.使用本机大小时,压缩值的大小取决于平台。

Format总体安排

C Type类型

Python typePython类型

Standard size标准尺寸

Notes笔记

x

pad byte

no value无价值

c

char

bytes of length 1长度为1的字节

1

b

signed char

integer整数

1

(1), (2)

B

unsigned char

integer整数

1

(2)

?

_Bool

bool

1

(1)

h

short

integer

2

(2)

H

unsigned short

integer

2

(2)

i

int

integer

4

(2)

I

unsigned int

integer

4

(2)

l

long

integer

4

(2)

L

unsigned long

integer整数

4

(2)

q

long long

integer整数

8

(2)

Q

unsigned long long

integer

8

(2)

n

ssize_t

integer

(3)

N

size_t

integer

(3)

e

(6)

float

2

(4)

f

float

float

4

(4)

d

double

float

8

(4)

s

char[]

bytes

p

char[]

bytes

P

void*

integer

(5)

Changed in version 3.3:版本3.3中更改: Added support for the 'n' and 'N' formats.添加了对'n''N'格式的支持。

Changed in version 3.6:版本3.6中更改: Added support for the 'e' format.增加了对'e'格式的支持。

Notes:注意:

  1. The '?' conversion code corresponds to the _Bool type defined by C99. '?'转换代码对应于C99定义的_Bool类型。If this type is not available, it is simulated using a char. 如果此类型不可用,则使用char对其进行模拟。In standard mode, it is always represented by one byte.在标准模式下,它始终由一个字节表示。

  2. When attempting to pack a non-integer using any of the integer conversion codes, if the non-integer has a __index__() method then that method is called to convert the argument to an integer before packing.当试图使用任何整数转换代码打包非整数时,如果该非整数具有一个__index__()方法,则在打包之前调用该方法将参数转换为整数。

    Changed in version 3.2:版本3.2中更改: Added use of the __index__() method for non-integers.增加了对非整数的__index__()方法的使用。

  3. The 'n' and 'N' conversion codes are only available for the native size (selected as the default or with the '@' byte order character). 'n''N'转换码仅适用于本机大小(选择为默认值或使用'@'字节顺序字符)。For the standard size, you can use whichever of the other integer formats fits your application.对于标准大小,您可以使用其他任何适合您的应用程序的整数格式。

  4. For the 'f', 'd' and 'e' conversion codes, the packed representation uses the IEEE 754 binary32, binary64 or binary16 format (for 'f', 'd' or 'e' respectively), regardless of the floating-point format used by the platform.对于'f''d''e'转换码,压缩表示使用IEEE 754 binary32、binary64或binary16格式(分别用于'f''d''e'),而不考虑平台使用的浮点格式。

  5. The 'P' format character is only available for the native byte ordering (selected as the default or with the '@' byte order character). 'P'格式字符仅适用于本机字节顺序(选择为默认值或与'@'字节顺序字符一起使用)。The byte order character '=' chooses to use little- or big-endian ordering based on the host system. 字节顺序字符'='根据主机系统选择使用小端或大端排序。The struct module does not interpret this as native ordering, so the 'P' format is not available.结构模块不会将其解释为本机排序,因此'P'格式不可用。

  6. The IEEE 754 binary16 “half precision” type was introduced in the 2008 revision of the IEEE 754 standard. IEEE 754 binary16“半精度”类型在2008年版本的IEEE 754标准中引入。It has a sign bit, a 5-bit exponent and 11-bit precision (with 10 bits explicitly stored), and can represent numbers between approximately 6.1e-05 and 6.5e+04 at full precision. 它具有符号位、5位指数和11位精度(显式存储了10位),可以以全精度表示约6.1e-056.5e+04之间的数字。This type is not widely supported by C compilers: on a typical machine, an unsigned short can be used for storage, but not for math operations. C编译器并不广泛支持这种类型:在典型的机器上,无符号short可以用于存储,但不能用于数学运算。See the Wikipedia page on the half-precision floating-point format for more information.有关更多信息,请参阅Wikipedia页面上的半精度浮点格式

A format character may be preceded by an integral repeat count. 格式字符前面可以是整数重复计数。For example, the format string '4h' means exactly the same as 'hhhh'.例如,格式字符串'4h'的含义与'hhhh'完全相同。

Whitespace characters between formats are ignored; a count and its format must not contain whitespace though.忽略格式之间的空白字符;但计数及其格式不能包含空格。

For the 's' format character, the count is interpreted as the length of the bytes, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. 对于's'格式字符,计数被解释为字节的长度,而不是像其他格式字符那样的重复计数;例如,'10s'表示单个10字节字符串,而'10c'表示10个字符。If a count is not given, it defaults to 1. 如果未给出计数,则默认为1。For packing, the string is truncated or padded with null bytes as appropriate to make it fit. 对于打包,字符串被截断或填充为空字节以使其适合。For unpacking, the resulting bytes object always has exactly the specified number of bytes. 对于解包,生成的bytes对象始终具有指定的字节数。As a special case, '0s' means a single, empty string (while '0c' means 0 characters).作为特例,'0s'表示单个空字符串(而'0c'表示0个字符)。

When packing a value x using one of the integer formats ('b', 'B', 'h', 'H', 'i', 'I', 'l', 'L', 'q', 'Q'), if x is outside the valid range for that format then struct.error is raised.当使用整数格式('b''B''h''H''i''I''l''L''q''Q')之一打包值x时,如果x超出该格式的有效范围,则会引发struct.error

Changed in version 3.1:版本3.1中更改: Previously, some of the integer formats wrapped out-of-range values and raised DeprecationWarning instead of struct.error.以前,一些整数格式包装了超出范围的值,并引发了DeprecationWarning而不是struct.error

The 'p' format character encodes a “Pascal string”, meaning a short variable-length string stored in a fixed number of bytes, given by the count. 'p'格式字符编码“Pascal字符串”,这意味着存储在“固定字节数”中的短可变长度字符串,由计数给出。The first byte stored is the length of the string, or 255, whichever is smaller. 存储的第一个字节是字符串的长度,或255,以较小者为准。The bytes of the string follow. 随后是字符串的字节。If the string passed in to pack() is too long (longer than the count minus 1), only the leading count-1 bytes of the string are stored. 如果传递给pack()的字符串太长(长于count减1),则只存储字符串的前导count-1字节。If the string is shorter than count-1, it is padded with null bytes so that exactly count bytes in all are used. 如果字符串短于count-1,则用空字节填充,以便使用所有字节中的count个字节。Note that for unpack(), the 'p' format character consumes count bytes, but that the string returned can never contain more than 255 bytes.请注意,对于unpack()'p'格式字符消耗count字节,但返回的字符串不能超过255字节。

For the '?' format character, the return value is either True or False. 对于'?'格式字符,返回值为TrueFalseWhen packing, the truth value of the argument object is used. Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be True when unpacking.打包时,使用参数对象的真值。本机或标准布尔表示中的0或1将被打包,任何非零值在解包时都将为True

Examples示例

Note

All examples assume a native byte order, size, and alignment with a big-endian machine.所有示例都假设本机字节顺序、大小和与大端机器的对齐方式。

A basic example of packing/unpacking three integers:打包/解包三个整数的基本示例:

>>> from struct import *
>>> pack('hhl', 1, 2, 3)
b'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
>>> calcsize('hhl')
8

Unpacked fields can be named by assigning them to variables or by wrapping the result in a named tuple:未打包的字段可以通过将其分配给变量或将结果包装到命名元组中来命名:

>>> record = b'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)

The ordering of format characters may have an impact on size since the padding needed to satisfy alignment requirements is different:格式字符的顺序可能会影响大小,因为满足对齐要求所需的填充不同:

>>> pack('ci', b'*', 0x12131415)
b'*\x00\x00\x00\x12\x13\x14\x15'
>>> pack('ic', 0x12131415, b'*')
b'\x12\x13\x14\x15*'
>>> calcsize('ci')
8
>>> calcsize('ic')
5

The following format 'llh0l' specifies two pad bytes at the end, assuming longs are aligned on 4-byte boundaries:以下格式'llh0l'在末尾指定了两个pad字节,假设长在4字节边界上对齐:

>>> pack('llh0l', 1, 2, 3)
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'

This only works when native size and alignment are in effect; standard size and alignment does not enforce any alignment.这仅在本机大小和对齐有效时有效;标准尺寸和对齐不强制任何对齐。

See also

Module array

Packed binary storage of homogeneous data.齐次数据的压缩二进制存储。

Module xdrlib

Packing and unpacking of XDR data.XDR数据的打包和解包。

Classes

The struct module also defines the following type:模块还定义了以下类型:

classstruct.Struct(format)

Return a new Struct object which writes and reads binary data according to the format string format. 返回一个新的Struct对象,该对象根据格式字符串format写入和读取二进制数据。Creating a Struct object once and calling its methods is more efficient than calling the struct functions with the same format since the format string only needs to be compiled once.创建一次结构对象并调用其方法比使用相同格式调用struct函数更有效,因为格式字符串只需要编译一次。

Note

The compiled versions of the most recent format strings passed to Struct and the module-level functions are cached, so programs that use only a few format strings needn’t worry about reusing a single Struct instance.传递给Struct和模块级函数的最新格式字符串的编译版本被缓存,因此仅使用少数格式字符串的程序不必担心重用单个Struct实例。

Compiled Struct objects support the following methods and attributes:编译的结构对象支持以下方法和属性:

pack(v1, v2, ...)

Identical to the pack() function, using the compiled format. pack()函数相同,使用编译格式。(len(result) will equal size.)len(result)将等于size。)

pack_into(buffer, offset, v1, v2, ...)

Identical to the pack_into() function, using the compiled format.pack_into()函数相同,使用编译格式。

unpack(buffer)

Identical to the unpack() function, using the compiled format. unpack()函数相同,使用编译格式。The buffer’s size in bytes must equal size.缓冲区的字节大小必须等于size

unpack_from(buffer, offset=0)

Identical to the unpack_from() function, using the compiled format. unpack_from()函数相同,使用编译格式。The buffer’s size in bytes, starting at position offset, must be at least size.缓冲区的大小(以字节为单位),从位置offset开始,必须至少为size

iter_unpack(buffer)

Identical to the iter_unpack() function, using the compiled format. iter_unpack()函数相同,使用编译格式。The buffer’s size in bytes must be a multiple of size.缓冲区的大小(以字节为单位)必须是size的倍数。

New in version 3.4.版本3.4中新增。

format

The format string used to construct this Struct object.用于构造此结构对象的格式字符串。

Changed in version 3.7:版本3.7中更改: The format string type is now str instead of bytes.格式字符串类型现在是str,而不是bytes

size

The calculated size of the struct (and hence of the bytes object produced by the pack() method) corresponding to format.与format相对应的结构(以及pack()方法生成的format对象)的计算大小。