We use the binary alphabet
for encoding strings.
For a binary string
, we use
to denote its length.
Denote by
the standard lexicographical order on
.
A probability distribution
is a real-valued function from
to [0,1] such that
.
We assume that
on the empty string is always
.
We also use distribution, probability, weight,
and density to denote
probability distribution.
The distribution function of
, denoted by
, is
defined by
.
For a set
, we use
or
to denote
, and we use
to denote its
cardinal.
The conditional distribution of
on a set
is equal to
if
and
, and 0 otherwise.
For a function
, we use
to denote
for
and we use
to denote the inverse of
.
We use
to denote the positive reals and
to denote the natural numbers.
Expected polynomial time is an
obvious choice as a natural notion
of average-case efficiency of an algorithm.
An algorithm runs in expected polynomial time
over a probability distribution
if
there exist
such that
for all
,
,
where
is the running time of the algorithm on
,
and
is the conditional distribution of
on strings of
length
.
However, this definition is machine dependent and
so cannot be used to build a robust theory. There are
several subtle and important issues
regarding defining a robust notion of
average polynomial time, and we discuss them below.
These issues
were either mentioned explicitly or hinted at
by Levin [Lev86],
and
various aspects of the issues have been elaborated on
by Johnson [Joh84],
Gurevich [Gur91b][Gur91a][Gur89],
Venkatesan [Ven91], and
Impagliazzo [Imp95].
From this,
Levin's
definition of average polynomial time
(given here as Definition 2.1)
can be derived naturally and can be well-justified.
Model Independence. Let
.
Let
be a subset of
and
proportional to
.
Suppose an algorithm
runs in polynomial time on every
,
and runs in
time on every
.
Then it is easy
to see that the expected running time on strings of length
is bounded above by a polynomial in
.
However, the expected running time
will be exponential in a machine
model that is quadratically slower. If a problem is easy
on average in one model of computation, then it should be
easy on average in all polynomially equivalent models.
Even within the same model, if
is polynomial on average,
then for any
,
should also be polynomial on average.
Moreover, if a problem is polynomial on average under one encoding
method, then it should be polynomial on average under all
polynomially equivalent encodings.
Balancing. Let
be a subset of
and
proportional to
.
Suppose an algorithm solves a problem in
polynomial time on every
.
Then we still have no guarantee of an algorithm
that is fast on average,
since the algorithm may take exponential time to solve the instances in
.
A balance is therefore required between the portion of
hard instances and the hardness of these instances.
The portion of the
hard instances should be polynomially related
to the time needed to solve them.
Clearly, it is at least necessary
that only a sub-polynomial portion of instances should
require super-polynomial time, though this will not in general
be sufficient.
Distribution. An average could be taken on instances of fixed size or on all instances. One may tend to consider distributions defined on instances of a bounded size at any fixed point in time, but it is also enough to consider distributions defined on all instances by first selecting a length according to some appropriate distribution and then selecting strings of that length according to the given distribution on strings of bounded size.
Under the worst-case measurement, the running time of an algorithm
is measured as a function of the length of the input.
Under the average-case measurement,
an efficient algorithm with input distribution
is allowed to run a longer
time on instances with smaller weights.
Instances with smaller
weights are rarer and so one may use a function
to measure the ``rareness'' of instances-defined in such a way
that if the weight of an instance
is smaller, then the value
of its rareness
is larger.
As discussed above regarding the balancing issue,
the portion of inputs
with high values of rareness
should be small. Probably the most general way to satisfy this
requirement is to have, for some positive constants
and
and
for all
,
,
or equivalently,
for some
.
One may then measure the average-case running time
of an algorithm on input
as a function of
rather than
. This captures and formalizes the
idea that a longer time is allowed on inputs with smaller weights, and
it also guarantees model independence.
The running time
of an algorithm is said to be
polynomial on average
if there exists a
such that, for all
,
.
Hence,
. Let
.
Then
.
This gives rise to the following definition
due to Levin [Lev86].
Clearly, Definition 2.1 is model-independent and encoding-independent, and it satisfies the balancing requirement, as shown in footnote 3, based on which the following lemma is straightforward.

Further discussion of the balancing issue can be found
in [Gur91b][Gur89].
Regarding the distribution issue,
Impagliazzo [Imp95] observed that
Definition 2.1 is equivalent to taking the average on
instances up to length
since,
when
is sufficiently large,
is greater than a fixed positive constant.
If the average is taken over
strings of length exactly
, then Lemma 2.2 is still true under the
assumption that, for some fixed polynomial
and
for every
,
either
or
[Gur91a].
Clearly, every polynomial is polynomial on average with respect to
any distribution
.
If a function
is an expected polynomial over
a distribution
, namely, for some constants
and
and for all
,
,
then, with
,

It follows that
is polynomial on
-average.
If
and
are polynomial on
-average,
then so are
,
,
, and
for any positive
real number
[Gur91a].
To see the case of
,
for instance, it suffices to note that
.
A problem is solvable in average polynomial time with respect to
distribution
if it can be
solved by a deterministic
algorithm whose running time is polynomial
on
-average. A problem with an associated probability distribution
is called a distributional problem.
Other names such as
``random problems" [Lev86] and
``randomized problems" [Gur91a] have
also been used in the literature. The term ``distributional problem,"
which we use in this paper, was first used in [BCGL92].