Binary Strings

5.4. Binary Strings

The bytea data type allows storage of binary strings; see Table 5-6.

Table 5-6. Binary String Types

Type NameStorageDescription
bytea4 bytes plus the actual binary stringVariable (not specifically limited) length binary string

A binary string is a sequence of octets (or bytes). Binary strings are distinguished from characters strings by two characteristics: First, binary strings specifically allow storing octets of zero value and other "non-printable" octets. Second, operations on binary strings process the actual bytes, whereas the encoding and processing of character strings depends on locale settings.

When entering bytea values, octets of certain values must be escaped (but all octet values may be escaped) when used as part of a string literal in an SQL statement. In general, to escape an octet, it is converted into the three-digit octal number equivalent of its decimal octet value, and preceded by two backslashes. Some octet values have alternate escape sequences, as shown in Table 5-7.

Table 5-7. bytea Literal Escaped Octets

Decimal Octet ValueDescriptionInput Escaped RepresentationExamplePrinted Result
0zero octet'\\000'SELECT '\\000'::bytea;\000
39single quote'\'' or '\\047'SELECT '\''::bytea;'
92backslash'\\\\' or '\\134'SELECT '\\\\'::bytea;\\

Note that the result in each of the examples in Table 5-7 was exactly one octet in length, even though the output representation of the zero octet and backslash are more than one character. Bytea output octets are also escaped. In general, each "non-printable" octet decimal value is converted into its equivalent three digit octal value, and preceded by one backslash. Most "printable" octets are represented by their standard representation in the client character set. The octet with decimal value 92 (backslash) has a special alternate output representation. Details are in Table 5-8.

Table 5-8. bytea Output Escaped Octets

Decimal Octet ValueDescriptionOutput Escaped RepresentationExamplePrinted Result
92backslash\\SELECT '\\134'::bytea;\\
0 to 31 and 127 to 255"non-printable" octets\### (octal value)SELECT '\\001'::bytea;\001
32 to 126"printable" octetsASCII representationSELECT '\\176'::bytea;~

To use the bytea escaped octet notation, string literals (input strings) must contain two backslashes because they must pass through two parsers in the PostgreSQL server. The first backslash is interpreted as an escape character by the string-literal parser, and therefore is consumed, leaving the characters that follow. The remaining backslash is recognized by the bytea input function as the prefix of a three digit octal value. For example, a string literal passed to the backend as '\\001' becomes '\001' after passing through the string-literal parser. The '\001' is then sent to the bytea input function, where it is converted to a single octet with a decimal value of 1.

For a similar reason, a backslash must be input as '\\\\' (or '\\134'). The first and third backslashes are interpreted as escape characters by the string-literal parser, and therefore are consumed, leaving two backslashes in the string passed to the bytea input function, which interprets them as representing a single backslash. For example, a string literal passed to the server as '\\\\' becomes '\\' after passing through the string-literal parser. The '\\' is then sent to the bytea input function, where it is converted to a single octet with a decimal value of 92.

A single quote is a bit different in that it must be input as '\'' (or '\\047'), not as '\\''. This is because, while the literal parser interprets the single quote as a special character, and will consume the single backslash, the bytea input function does not recognize a single quote as a special octet. Therefore a string literal passed to the backend as '\'' becomes ''' after passing through the string-literal parser. The ''' is then sent to the bytea input function, where it is retains its single octet decimal value of 39.

Depending on the front end to PostgreSQL you use, you may have additional work to do in terms of escaping and unescaping bytea strings. For example, you may also have to escape line feeds and carriage returns if your interface automatically translates these. Or you may have to double up on backslashes if the parser for your language or choice also treats them as an escape character.

The SQL standard defines a different binary string type, called BLOB or BINARY LARGE OBJECT. The input format is different compared to bytea, but the provided functions and operators are mostly the same.

© Copyright 2003-2023 www.php-editors.com. The ultimate PHP Editor and PHP IDE site.