Fast constant-time gcd and modular inversion

The following software has not been verified!

Prerequisites: AVX2 (Intel Haswell and newer; AMD Excavator and newer).

Optimization target: Skylake. Also works well on Broadwell, Kaby Lake, Coffee Lake, etc. Somewhat worse on Haswell because of the slower CMOVs.

There are two software packages:

The inverse256 code is intended to allow any prime p between 2^255 and 2^256, with timings independent of p, after a small precomputation of a table of p data. The inverse25519 code is specific to p=2^255−19 and is slightly faster.

How to test inverse25519skylake:

    wget -m https://gcd.cr.yp.to/software/inverse25519skylake-latest-version.txt
    version=$(cat gcd.cr.yp.to/software/inverse25519skylake-latest-version.txt)
    wget -m https://gcd.cr.yp.to/software/inverse25519skylake-$version.tar.gz
    tar -xzf gcd.cr.yp.to/software/inverse25519skylake-$version.tar.gz
    cd inverse25519skylake-$version
    make
    ./test < /dev/urandom

How to test inverse256skylake:

    wget -m https://gcd.cr.yp.to/software/inverse256skylake-latest-version.txt
    version=$(cat gcd.cr.yp.to/software/inverse256skylake-latest-version.txt)
    wget -m https://gcd.cr.yp.to/software/inverse256skylake-$version.tar.gz
    tar -xzf gcd.cr.yp.to/software/inverse256skylake-$version.tar.gz
    cd inverse256skylake-$version
    make
    ./test < /dev/urandom

The ./test program reads 32-byte inputs, interpreted in little-endian form as integers between 0 and 2^256−1; uses inverse25519, inverse256_BTC_p, etc. to invert each input x modulo the applicable prime p; and uses GMP to check the following:

The program prints occasional output lines such as

    loop 1048576

indicating that it has tested 1048576 inputs. It also specifically checks integers p−1000 through p+999 (and thus also 0 through 999 and 2p−1000 through 2^256−1).

The ./test program also runs a few benchmarks showing the cycle counts of successive calls to the inversion function. Typical output on a 3GHz Skylake (samba) for inverse25519:

    nothing 85 41 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 43 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42
    cycles 5639 4280 4224 4227 4165 3951 3901 3856 3880 3873 3884 3846 3896 3898 3885 3882 3900 3860 3816 3894 3865 3867 3864 3870 3881 3794 3880 3881 3881 3872 3862 3870 3880 3899 3901 3902 3850 3863 3931 3806 3864 3844 3858 3897 3877 3897 3782 3868 3875 3876 3854 3865 3893 3799 3912 3882 3895 3918 3913 3910 3781 3849 3856
    sorted 3781 3782 3794 3799 3806 3816 3844 3846 3849 3850 3854 3856 3856 3858 3860 3862 3863 3864 3864 3865 3865 3867 3868 3870 3870 3872 3873 3875 3876 3877 3880 3880 3880 3881 3881 3881 3882 3882 3884 3885 3893 3894 3895 3896 3897 3897 3898 3899 3900 3901 3901 3902 3910 3912 3913 3918 3931 3951 4165 4224 4227 4280 5639

The nothing line shows the cost of the cycle counter. The cycles line shows the cost of the cycle counter plus inverse25519. The first cycle counts are larger because of cache effects.

Releases:


Version: This is version 2021.01.31 of the "Software" web page.