Let
be a
finite alphabet with
. Given a binary string
(we also
assume that
starts with 1), Gurevich [Gur91] designed
a dynamic binary coding scheme
to code
such that the following statements hold.
.
(not coded) is distinguishable from each coded symbol.
In other words, no coded symbol can be a substring of
.
of a coded symbol
is a prefix of a coded symbol
, then
.
can be written as a unique concatenation of the following
fixed binary strings 1, 10, 000, 100, which
are not prefixes of any coded symbol.
The four strings 1, 10, 000, 100 are called base strings [Wan95a]. Gurevich [Gur91] originally used strings 1, 10, 00, and 000 as base strings, which, in general, does not form a unique concatenation for strings beginning with 1.
The construction of such a coding system can be done as follows.
Let
be the regular set
. Let
be the least even integer such that
. Hence,
.
The string
has at most
substrings
of length
. So we can select a set
of
-strings of length
such that
,
no string in
is a substring of
, and every string in
starts with
01. Moreover, from
it is straightforward to show that the third condition also holds
[Gur91].
To see that the fourth condition is satisfied, let
be a string
which does not start with 01 and is different from 0, then it is
straightforward to show by induction that
forms a unique
concatenation from the base strings.
Note that the last two conditions enable coding
without recoding the
0's and 1's in other already coded symbols.
So we can use
to code
by assigning one element in
to
represent one symbol in
, and this dynamic coding scheme satisfies
all of the above four conditions.