Instances can be sampled with certain probabilities. Ben-David, Chor, Goldreich, and Luby [BCGL92] gave the following definition.
Given a distribution
, if there is a deterministic algorithm
that on input
outputs
in time polynomial in
,
then
is p-samplable
[Gur89]
[BCGL92]
as follows. The sampling algorithm picks
a truncated real
uniformly at random. It then
finds, via binary search,
the unique string
with
.
It outputs
if
. Let
be the set of such
such that no string in
is a prefix
of a different string in
. Then the probability of sampling
is equal to
.
However, there are p-samplable distributions that
are not p-computable if certain
one-way functions exist [BCGL92].
A one-way function is, loosely speaking, a p-time computable function
that is hard to invert on most instances.
Ben-David et al. [BCGL92]
showed that all p-samplable distributions
can be effectively ``enumerated'' in a certain way to construct
a universal p-samplable distribution
such that for any p-samplable distribution
,
is p-time reducible to
.
They then showed that for any NP-complete problem
, if
is p-time reducible to
via a reduction that is
length-preserving within a polynomial (any natural NP-complete
problem satisfies this requirement), then
there is a p-samplable distribution
, obtained
from
, such that
is p-time reducible
to
. However,
depends heavily on an effective enumeration of all p-samplable
distributions and so it is not clear what impact it may have
on learning
whether an NP-complete problem is difficult on average under
a distribution encountered in practice. Moreover,
using such a distribution to sample an instance is less
convenient than simply picking one uniformly at random.
Remarkably, Impagliazzo and
Levin [IL90] showed that
uniformly picking instances at random is all that is needed
for generating hard instances. In particular, they showed that
any NP search problem with a p-samplable distribution is p-time
randomly reducible to an NP search problem with the default uniform
distribution.
So any complete distributional
NP search problem is also complete for NP search problems with
p-samplable distributions.
Blass and Gurevich [BG93] provided the
following nice exposition of the main idea of
the proof.
A sampling algorithm is length-preserving if it uses
exactly
coin tosses to produce a string of length
.
It can be shown that every search problem with a p-samplable
distribution is reducible to a search problem on instances
sampled by a length-preserving algorithm
[VL88]
[BCGL92].
There are two simple
but unsuccessful ways to attempt to reduce a source problem
with a p-samplable distribution
to a target problem with a uniform
distribution. Let
be a length-preserving sampling algorithm
for
. One may simply let the source problem be the target
problem and use the identity function as a reduction. The problem with
this approach is that
some instance
with
small uniform probability could
have large probability
, which could
cause domination to fail. This happens
when
is too large.
Another
way to obtain a target problem is to use a reduction that
reduces an instance
of the source problem to
an instance
of the target problem such that
.
This can be done by using a randomized reduction that, on input
,
randomly selects a string
of length
and then checks whether
.
This method meets the domination requirement but it
fails if
is too small.
The idea of the proof is to interpolate between
these two methods, leaning toward the first when
is small and toward to second when
is large.
Let
.
An instance
of the target problem should
be approximately
bits long for an instance
of the source problem,
consisting of
bits of
information about some
and
bits of information
about
.
Some difficulties occur when implementing this idea. First, there is no
efficient way to compute
from
. But one can randomly guess
, which
lies between 0 and
. The probability of a correct hit
is therefore the reciprocal of a polynomial, which is sufficient
to provide a nonrare good-input domain.
Second, we do not know how
to select the right bits of information about
and
. This can be
solved again by randomization. We randomly choose two
matrices
and
, over the two-element field
,
of appropriate size to hash
and
to
vectors
and
of lengths
and
, where
and
are also viewed as vectors. Randomly
selected hash matrices have a reasonable probability.
Thus, an instance of the target problem will be
with a witness
if
is a witness of
for the source problem.
Proof.
be an instance of
and
.
Let
be a length-preserving p-time sampling algorithm for
.
Let
and
.
Let
range over binary strings of length
, and let
range over
binary strings of length
.
We will also view them as
vectors.
Let
and
range over
and
matrices, respectively, over the field of two elements
.
We assume that these matrices are written in binary,
via writing down the first row of
, then the second row,
and so on. Denote by
(similarly
)
the binary string obtained by multiplying
and
.
Assume that integers are represented
in the shortest binary notation.
Denote by
the concatenation operation for strings.
For any binary string
, if there are
,
, and
such that
, and
has full rank
(i.e., rank
) with
,
, and
, then set
to be the set of
such
, where
;
otherwise, define
to be empty.
For any binary string
, define
to be true
if
, and false otherwise. Since
is an NP predicate
and
is p-time computable,
is an NP predicate as well.
We now define a randomized reduction from
to
. Let
be a set of
satisfying the following conditions:
;
; and
with
,
either
or
.
For all
, define
.
Define
by
. Then
is p-time computable and
.
Clearly,
is one-one (by Condition 3 above)
and p-time computable on
.
Note that
,
which is proportional to
, and
. Hence, the domination
condition is satisfied.
All we have left to show is that
is nonrare.
The rarity function of
is
.
Let

Let
.
For each
, let
be the number of
satisfying Condition 3.
Then
.
We claim that (1)
, and (2)
. By these two claims and
the fact that
, it follows that
, and so
is nonrare.
To show the first claim,
we first note that
for a full-rank matrix,
only the 0-row (meaning the row of 0's)
cannot serve as the first row;
given the first row, only the first row and the 0-row
cannot serve as the second row of a full-rank matrix;
given the first two rows, only four rows, namely, the four linear
combinations of the first two rows, cannot serve as the third
row, and so on. Thus, the number
of
full-rank matrices
is
.
Let
and
range over
and
let
.
Then
, and
.
Hence,
.
By the principle of inclusion and exclusion,
.
Let
. Then
.
Let
.
We want to show that
if
.
Note that
iff
, and
for each full-rank
with
, there exists
a unique
such that
. Thus,
is equal to the number of full-rank
with
. If
, then
for any full-rank
and so
. Assume
.
Note that for any nonzero binary vectors
of length
,
there is a nonsingular
matrix
such that
. Then
iff
, and
has full rank iff
does.
This means that
is equal to the number of full-rank
with
for any particular choice of nonzero
.
Let
. Then
is equal to the number of
full-rank
matrices where
the last column is 0. It follows that
is
equal to the number of
full-rank
matrices, which is equal to
.
So
.
Thus,
,
where
since
by definition. The minimum of this quadratic function on
is
.
Thus,
.
To show the second claim,
we note that for each fixed
, the number of
satisfying
is
. Let
.
For each
with
, there are exactly
matrices
satisfying the equation
.
This is true because if the
th bit of
is 1 (such a bit
exists since
), then
the value of the
th bit of a row
can be chosen,
depending on the values of the other
bits,
to make
.
Thus,
.
Directly applying the above proof to
decision problems is difficult since
does not seem to
be certifiable. We will show, in the next section, that
any distributional NP search problem with a p-time computable
distribution is p-time randomized truth-table
reducible to a distributional NP decision problem in DistNP. It
follows that Theorem 4.6 is true for
decision problems under p-time randomized truth-table reductions.