SY Foreign Key Encoding Approaches


Introduction

At present, the SY foreign key in the TK table is used for three purposes:

  1. SY foreign keys with zero version and symbol reference components with a tktyp field of '9"xx', indicates that the tktyp field contains a repeated character string. This provides a mechanism for compressed storage of a string containing repeated characters. For example, a string of seven space characters can be compressed with a single space character stored with a repeat count of seven.
  2. SY foreign keys with zero version and symbol reference components with a tktyp field of '8xxx', indicates inline storage of a short symbol. Short words [three characters or fewer] are directly stored in the tktyp field.
  3. SY foreign key with a non-zero symbol reference component is a true foreign key pointer to a SY table row. In this case, the tktyp field redundantly stores the foreign key symbol rei¶mference.

Areas for improvement were identified at meetings held on 2/27 and 3/20 with Dr. Lechner:

We have identified several approaches for restructuring the TK table SY foreign key to support these improvements.

SY Foreign Key Structure

SY xx hhhh : SY Table identifier

xx Version

hhhh symbol reference [row number]

Each of the options retains the table identifier and version components of the SY foreign key. The options present variations on encoding information into the symbol reference component.

The row number component of the SY key is internally encoded as a 16 bit unsigned value. Thus, 65,535 distinct rows can exist in the SY table [or encoded in hexadecimal representation -- FFFF rows]. The encoding schemes detailed below restrict the number of rows supported and encode information into reserved portions of the key space.

Hexadecimal representation is a convenient format for representation of SY table row reference information. Native C language support is available with hexadecimal specifiers [%x] for input and output format lists.

Option #1 -- Hexadecimal Representation with Four Bit Repeat Count and Eight Bit Character.

hhhh => FxCC : F Indicates Repeated Character

x repeat count [1-16 encoded 0-F]

CC eight bit character [encoded in hexadecimal]

=> EFFF-0000 : Indicates SY table reference with trailing space character

This encoding approach supports 61,439 distinct SY table references and supports compression of repeated character strings with lengths of 1 to 16 characters.

Option #2 -- Hexadecimal Representation with Seven Bit Repeat Count and Eight Bit Character.

hhhh => 1... .... CC : Zero bit of first byte on indicates Repeating character

Remaining bits of first byte are repeat count

a single character can be repeated 127 times

CC eight bit character [encoded in hexadecimal]

=> 0... .... ### : One bit of first byte on indicates true SY key

This encoding approach supports 32,767 distinct SY references and repeated character strings of length 127.

Recommended Approach.

We recommend proceeding with the approach outlined in the first option presented. There are three reasons for this recommendation.

  1. It provides eight bit character support. Support for eight bit characters increases in importance as BDE is ported to non-native UNIX platforms.
  2. The reserved SY table key space is minimized by this approach. Thus a larger symbol dictionary is supported.
  3. Compression of short repeated character strings [1-16 characters] is supported. Although the ability to encode a 127 character string of space characters in a single TK row is provided by the other encoding approach, short repeated character strings are more likely to be encountered in practice.

Documentation Details


[BDESYM Home] [PDR] [CDR] [Final Report]