Li-yao Xia
2018-06-22 17:07:15 UTC
Hello Coq-club,
Sorry for the length. I have a few questions about making code efficient
for extraction and parametricity in Coq.
In QuickChick, a random generator is a function of type "G value := seed
-> value", and to reason about it, we interpret it as the set of values
it may generate, i.e., its image as a function. This assumes the seed is
a true source of randomness, for example represented as an arbitrary
stream of bits: "G value := stream -> value" (or tree for splittable
randomness). But of course, for performance, we would rather evaluate
the function with a finite-state PRNG (of type "seed").
(QuickChick currently axiomatizes the first interpretation, and extracts
these axioms to the second.)
In order to support the two interpretations more faithfully, we can
generalize the type of generators to "G value := forall `(RNG s), s ->
value", for some appropriate class "RNG" that has instances for "seed"
and "stream".
Class RNG (s : Type) := {
next : s -> bool * s;
}.
Now we can run such a generator quickly with a "seed", and we can reason
about it as a set of values, via a semantic function "sem : G value ->
set value" where "x \in sem g = exists (r : stream), g _ _ r = x"
(specializing "g" at "stream").
1. Dictionary erasure by extraction
Definition G value := forall `(RNG s), s -> value.
This generalization of the type of generators means they must take an
extra parameter for the the RNG dictionary. But for extraction we really
want to specialize generators at type "seed".
- Can OCaml optimize away that constant parameter?
- Can we tell extraction to specialize that parameter?
- Is there a different representation which extracts to something more
efficient, and which also works well for formalization? (Module functors
seem too cumbersome to relate different instantiations as I do below.)
2. Shrinking the formalization gap by parametricity
One thing we can try to prove is that the set of values generated using
any RNG "s" is a subset of those generated using "stream". It seems
feasible case-by-case, but is that provable as a general theorem?
Conjecture soundness :
forall `{RNG s} (g : G value),
(exists (r : s), g _ _ r = x) -> x \in sem g.
That seems to require an argument of parametricity that is not
internalized in Coq AFAICT. This project
https://github.com/parametricity-coq/paramcoq seems relevant but I
didn't have an opportunity to try it as it requires Coq 8.5. Are there
up-to-date alternatives?
I hoped that gap would be only a theoretical question that could be
ignored in practice. However, there is at least one existing generator
combinator where that matters, a monadic bind function where the
continuation also expects a proof that its input was generated from the
first argument "g":
Definition bindGen' :
forall (g : G A), (forall a, a \in sem g -> G B) -> G B.
The problem is that the "sem" function only considers the specialization
of the RNG at "stream", but the result of "bindGen'" must evaluate "g"
parametrically in the RNG "s". The "soundness" theorem above is
precisely what would relate those specializations.
How can we define "bindGen'", if that "soundness" theorem cannot be
proved? One solution I'm thinking of is to make G a sigma carrying
proofs of instances of that theorem.
Regards,
Li-yao
Sorry for the length. I have a few questions about making code efficient
for extraction and parametricity in Coq.
In QuickChick, a random generator is a function of type "G value := seed
-> value", and to reason about it, we interpret it as the set of values
it may generate, i.e., its image as a function. This assumes the seed is
a true source of randomness, for example represented as an arbitrary
stream of bits: "G value := stream -> value" (or tree for splittable
randomness). But of course, for performance, we would rather evaluate
the function with a finite-state PRNG (of type "seed").
(QuickChick currently axiomatizes the first interpretation, and extracts
these axioms to the second.)
In order to support the two interpretations more faithfully, we can
generalize the type of generators to "G value := forall `(RNG s), s ->
value", for some appropriate class "RNG" that has instances for "seed"
and "stream".
Class RNG (s : Type) := {
next : s -> bool * s;
}.
Now we can run such a generator quickly with a "seed", and we can reason
about it as a set of values, via a semantic function "sem : G value ->
set value" where "x \in sem g = exists (r : stream), g _ _ r = x"
(specializing "g" at "stream").
1. Dictionary erasure by extraction
Definition G value := forall `(RNG s), s -> value.
This generalization of the type of generators means they must take an
extra parameter for the the RNG dictionary. But for extraction we really
want to specialize generators at type "seed".
- Can OCaml optimize away that constant parameter?
- Can we tell extraction to specialize that parameter?
- Is there a different representation which extracts to something more
efficient, and which also works well for formalization? (Module functors
seem too cumbersome to relate different instantiations as I do below.)
2. Shrinking the formalization gap by parametricity
One thing we can try to prove is that the set of values generated using
any RNG "s" is a subset of those generated using "stream". It seems
feasible case-by-case, but is that provable as a general theorem?
Conjecture soundness :
forall `{RNG s} (g : G value),
(exists (r : s), g _ _ r = x) -> x \in sem g.
That seems to require an argument of parametricity that is not
internalized in Coq AFAICT. This project
https://github.com/parametricity-coq/paramcoq seems relevant but I
didn't have an opportunity to try it as it requires Coq 8.5. Are there
up-to-date alternatives?
I hoped that gap would be only a theoretical question that could be
ignored in practice. However, there is at least one existing generator
combinator where that matters, a monadic bind function where the
continuation also expects a proof that its input was generated from the
first argument "g":
Definition bindGen' :
forall (g : G A), (forall a, a \in sem g -> G B) -> G B.
The problem is that the "sem" function only considers the specialization
of the RNG at "stream", but the result of "bindGen'" must evaluate "g"
parametrically in the RNG "s". The "soundness" theorem above is
precisely what would relate those specializations.
How can we define "bindGen'", if that "soundness" theorem cannot be
proved? One solution I'm thinking of is to make G a sigma carrying
proofs of instances of that theorem.
Regards,
Li-yao