SY Foreign Key Encoding Approaches
Introduction
At present, the SY foreign key in the TK table is used for three
purposes:
- SY foreign keys with zero version and symbol reference components
with a tktyp field of '9"xx', indicates that the tktyp field
contains a repeated character string. This provides a mechanism
for compressed storage of a string containing repeated characters.
For example, a string of seven space characters can be compressed
with a single space character stored with a repeat count of seven.
- SY foreign keys with zero version and symbol reference components
with a tktyp field of '8xxx', indicates inline storage of a short
symbol. Short words [three characters or fewer] are directly stored
in the tktyp field.
- SY foreign key with a non-zero symbol reference component
is a true foreign key pointer to a SY table row. In this case,
the tktyp field redundantly stores the foreign key symbol rei¶mference.
Areas for improvement were identified at meetings held on 2/27
and 3/20 with Dr. Lechner:
- TK table entries are created for single space characters betweenords in text. Since space characters are the normal delimiters
in text fields supported by BDE, a less costly approach is required.
- Currently short words are stored as data in the TK table entry.
By storing short words as inline text, the symbol dictionary aspect
of the BDESYM architecture was being circumvented. These words
should be treated as symbols, and SY row entries created.
- The current method for supporting repeating strings requires
a data field in each TK table row [tytyp] to support the function.
Encoding the repeated string information into the TK table SY
foreign key field would enable the elimination of the tktyp data
field in the TK table.
We have identified several approaches for restructuring the TK
table SY foreign key to support these improvements.
SY Foreign Key Structure
SY xx hhhh : SY Table identifier
xx Version
hhhh symbol reference [row number]
Each of the options retains the table identifier and version components
of the SY foreign key. The options present variations on encoding
information into the symbol reference component.
The row number component of the SY key is internally encoded as
a 16 bit unsigned value. Thus, 65,535 distinct rows can exist
in the SY table [or encoded in hexadecimal representation -- FFFF
rows]. The encoding schemes detailed below restrict the number
of rows supported and encode information into reserved portions
of the key space.
Hexadecimal representation is a convenient format for representation
of SY table row reference information. Native C language support
is available with hexadecimal specifiers [%x] for input and output
format lists.
Option #1 -- Hexadecimal Representation with Four Bit Repeat
Count and Eight Bit Character.
hhhh => FxCC : F Indicates Repeated Character
x repeat count [1-16 encoded 0-F]
CC eight bit character [encoded in hexadecimal]
=> EFFF-0000 : Indicates SY table reference with trailing space
character
This encoding approach supports 61,439 distinct SY table references
and supports compression of repeated character strings with lengths
of 1 to 16 characters.
Option #2 -- Hexadecimal Representation with Seven Bit Repeat
Count and Eight Bit Character.
hhhh => 1... .... CC : Zero bit of first byte on indicates
Repeating character
Remaining bits of first byte are repeat count
a single character can be repeated 127 times
CC eight bit character [encoded in hexadecimal]
=> 0... .... ### : One bit of first byte on indicates true
SY key
This encoding approach supports 32,767 distinct SY references
and repeated character strings of length 127.
Recommended Approach.
We recommend proceeding with the approach outlined in the first
option presented. There are three reasons for this recommendation.
- It provides eight bit character support. Support for eight
bit characters increases in importance as BDE is ported to non-native
UNIX platforms.
- The reserved SY table key space is minimized by this approach.
Thus a larger symbol dictionary is supported.
- Compression of short repeated character strings [1-16 characters]
is supported. Although the ability to encode a 127 character string
of space characters in a single TK row is provided by the other
encoding approach, short repeated character strings are more likely
to be encountered in practice.
Documentation Details
- 96 Spring BDESYM Team 3/23/96 - Version 1.2
- Taken from the PDR
[BDESYM Home] [PDR]
[CDR] [Final Report]