Fast constant-time gcd and modular inversion

This page tracks progress in formally verifying the correctness of this type of software. Major tasks specifically for inversion modulo a prime p:

• Verifying that, given x, the software outputs a particular formula related to divstep^N(delta_0,p,x mod p), for particular constants N, delta_0.
• Verifying that divstep^N(delta_0,p,x mod p) has third component 0 for these constants N, delta_0.
• Verifying that if divstep^N(delta_0,p,x mod p) has third component 0 then the formula mentioned above is the inverse of x modulo p (or 0 if x is a multiple of p; if x is limited to B bits and p is not far below B bits then there are few enough multiples of p to be comprehensively tested).

### Inversion formulas

Theorems in the safegcd paper express the inverse of x modulo p in terms of the divstep^N transition matrix if divstep^N(delta_0,p,x mod p) has third component 0. The proofs are short and should be easy to convert into input to theorem-proving systems such as HOL Light. It is important to actually do this so as to remove any possibility of error.

### Number of divsteps

Current speed records rely on convex-hull calculations that, for all pairs (delta_0,delta_i) of interest, convert a convex hull containing (f_0,g_0) into a convex hull containing (f_i,g_i) whenever divstep^i(delta_0,f_0,g_0) = (delta_i,f_i,g_i).

As a case study regarding the feasibility of fully verifying these calculations, `divsteps-40-20210131.ml.bz2` is a HOL Light script that ends with a theorem saying that divstep^114(1,f,g) has third component 0 for all integers f,g between 0 and 2^40.

To verify this theorem in HOL Light, inside a Debian Stretch VM, starting as `root`:

``````    apt install git make camlp5 -y
su - hol
wget https://gcd.cr.yp.to/verif/divsteps-40-20210131.ml.bz2
bunzip2 divsteps-40-20210131.ml.bz2
git clone https://github.com/jrh13/hol-light.git
cd hol-light; make
ocaml
#use "hol.ml";;
#use "../divsteps-40-20210131.ml";;
``````

This takes 20.2 hours on one 1.7GHz Broadwell CPU core, using 5 gigabytes of RAM, and ends by printing the following:

``````    val input_range : thm =
|- !f g.
input_range f g <=>
&0 <= f /\ f <= &1099511627776 /\ &0 <= g /\ g <= &1099511627776
val divstep : thm =
|- !delta1 delta f1 g1 g f.
divstep delta f g delta1 f1 g1 <=>
f rem &2 = &1 /\
(if g rem &2 = &0
then delta1 = &1 + delta /\ f1 = f /\ g1 = g div &2
else if delta <= &0
then delta1 = &1 + delta /\ f1 = f /\ g1 = (g + f) div &2
else delta1 = &1 - delta /\ f1 = g /\ g1 = (g - f) div &2)
val divsteps : thm =
|- divsteps n delta f g deltan fn gn <=>
(if n = 0
then deltan = delta /\ fn = f /\ f rem &2 = &1 /\ gn = g
else ?deltan1 fn1 gn1.
divsteps (n - 1) delta f g deltan1 fn1 gn1 /\
divstep deltan1 fn1 gn1 deltan fn gn)
val main_theorem : thm =
|- !f g.
f rem &2 = &1 /\ input_range f g
==> (?delta1 f1 g1. divsteps 114 (&1) f g delta1 f1 (&0))
``````

Faster scripts for smaller sizes: `divsteps-20-20210131.ml.bz2`, `divsteps-10-20210131.ml.bz2`.

The script generating these proofs should be able to work for any size, but further proof optimization is planned before a computation for 256-bit inputs.

2021.02.14 update: O'Connor and Poelstra have posted proofs verified by reflection in Coq that, for 256-bit inputs, 724 divsteps suffice starting from delta_0=1 and 590 divsteps suffice starting from delta_0=1/2. These proofs also verify inversion formulas.

Version: This is version 2021.02.14 of the "Verification" web page.