[Coq-Club] On the extraction and parametricity of random generators as sets of values

Gabriel Scherer

2018-06-22 20:50:47 UTC

The problem is that the "sem" function only considers the specialization of the RNG at "stream", but the result of "bindGen'" must evaluate "g" parametrically in the RNG "s".

What if you defined sem to be wider?

Definition sem value (g : G value) : value -> Prop :=
fun x => (exists s `(RNG s) r, g s _ r = x).

(This corresponds to making the soundness result hold trivially by
making both sides equal.)

One solution I'm thinking of is to make G a sigma carrying proofs of instances of that theorem.

Indeed it works. Given some well-chosen `same_stream` coinductive (see
at the end of this email), you can define

Record ParamG value := {
get : G value;
param : (forall s `(RNG s) rs rstream,
same_stream (stream_of s _ rs) rstream -> get s _ rs = get stream
_ rstream);
}.

and then prove soundness for ParamG. Using a parametricity plugin
might automatically derive the "param" proofs for the (G value)
combinators that you implement in practice. (It would derive the
binary version, saying you get equal values from any two RNGs whose
streams are the same).

----

Class RNG (s : Type) := {
next : s -> bool * s;
}.

CoInductive stream := Next : bool -> stream -> stream.

CoFixpoint stream_of s `(RNG s) (st : s) : stream :=
let '(b, st_next) := next st in
Next b (stream_of s _ st_next).

CoInductive same_stream : stream -> stream -> Prop :=
| Same : forall b1 r1 b2 r2,
b1 = b2 ->
same_stream r1 r2 ->
same_stream (Next b1 r1) (Next b2 r2)
.

CoFixpoint same_stream_refl : forall st, same_stream st st.
destruct st. constructor; auto.
Qed.

Definition G value := forall s `(RNG s), s -> value.

Definition next_stream '(Next b s) := (b, s).
Instance rng_stream : RNG stream := { next := next_stream }.

Record ParamG value := {
get : G value;
param : (forall s `(RNG s) rs rstream,
same_stream (stream_of s _ rs) rstream -> get s _ rs = get stream
_ rstream);
}.

Lemma soundness_for_paramG :
forall s `{RNG s} value (g : ParamG value) x,
(exists (r : s), get _ g _ _ r = x) -> (exists (r : stream), get _
g _ _ r = x).
intros s rng value g x [st H].
exists (stream_of _ _ st).
symmetry. subst x.
apply param.
apply same_stream_refl.
Qed.

----

The definitions below are not necessary above, but they are (1)
trickier than I hoped and (2) necessary to connect the two ways to
state parametricity,
same_stream (stream_of s) st
and
same_stream (stream_of s1) (stream_of s2)
.

Definition open_stream '(Next b st) := Next b st.
Definition open st : (st = open_stream st) :=
let '(Next b st) := st in eq_refl.

CoFixpoint same_stream_of st : same_stream (stream_of stream rng_stream st) st.
destruct st.
match goal with [ |- same_stream ?S _ ] => rewrite (open S) end.
constructor; auto.
Qed.

Hello Coq-club,
Sorry for the length. I have a few questions about making code efficient for
extraction and parametricity in Coq.
In QuickChick, a random generator is a function of type "G value := seed ->
value", and to reason about it, we interpret it as the set of values it may
generate, i.e., its image as a function. This assumes the seed is a true
source of randomness, for example represented as an arbitrary stream of
bits: "G value := stream -> value" (or tree for splittable randomness). But
of course, for performance, we would rather evaluate the function with a
finite-state PRNG (of type "seed").
(QuickChick currently axiomatizes the first interpretation, and extracts
these axioms to the second.)
In order to support the two interpretations more faithfully, we can
generalize the type of generators to "G value := forall `(RNG s), s ->
value", for some appropriate class "RNG" that has instances for "seed" and
"stream".
Class RNG (s : Type) := {
next : s -> bool * s;
}.
Now we can run such a generator quickly with a "seed", and we can reason
about it as a set of values, via a semantic function "sem : G value -> set
value" where "x \in sem g = exists (r : stream), g _ _ r = x" (specializing
"g" at "stream").
1. Dictionary erasure by extraction
Definition G value := forall `(RNG s), s -> value.
This generalization of the type of generators means they must take an extra
parameter for the the RNG dictionary. But for extraction we really want to
specialize generators at type "seed".
- Can OCaml optimize away that constant parameter?
- Can we tell extraction to specialize that parameter?
- Is there a different representation which extracts to something more
efficient, and which also works well for formalization? (Module functors
seem too cumbersome to relate different instantiations as I do below.)
2. Shrinking the formalization gap by parametricity
One thing we can try to prove is that the set of values generated using any
RNG "s" is a subset of those generated using "stream". It seems feasible
case-by-case, but is that provable as a general theorem?
forall `{RNG s} (g : G value),
(exists (r : s), g _ _ r = x) -> x \in sem g.
That seems to require an argument of parametricity that is not internalized
in Coq AFAICT. This project https://github.com/parametricity-coq/paramcoq
seems relevant but I didn't have an opportunity to try it as it requires Coq
8.5. Are there up-to-date alternatives?
I hoped that gap would be only a theoretical question that could be ignored
in practice. However, there is at least one existing generator combinator
where that matters, a monadic bind function where the continuation also
forall (g : G A), (forall a, a \in sem g -> G B) -> G B.
The problem is that the "sem" function only considers the specialization of
the RNG at "stream", but the result of "bindGen'" must evaluate "g"
parametrically in the RNG "s". The "soundness" theorem above is precisely
what would relate those specializations.
How can we define "bindGen'", if that "soundness" theorem cannot be proved?
One solution I'm thinking of is to make G a sigma carrying proofs of
instances of that theorem.
Regards,
Li-yao