|
The computation of Gröbner bases of multivariate polynomial ideals by
the Faugere F4 algorithm can be parallelised when the base ring is
a finite field. These are the versions of the algorithm
for which parallelisation is currently supported:
- (i)
- Standard (non-dense) F4: For ideals defined over GF(p)
with p prime and 2<p<223.5 (POSIX threads for the
specialist mixed sparse/dense linear algebra).
- (ii)
- General Dense F4: For ideals with dense input polynomials
defined over GF(pk) with p<223.5 and k≥1 (POSIX threads
and GPUs, mostly mapping to the parallel linear algebra (above).
- (iii)
- Special HFE Dense F4: For ideals with dense input polynomials
defined over GF(2), in the `HFE' mode (where the hash table is not
used in the symbolic reduction); the whole reduction phase is
completely parallelised
with POSIX threads, which is faster than the distinct calls to matrix
multiplication in the general dense algorithm and significant
memory saving is also achieved (POSIX threads and GPUs) [new in V2.26]).
For each of the above cases, parallelisation with POSIX threads
is simply selected by calling the following procedure (before calling
the functions below):
SetNthreads(NT);
where NT is the number of threads desired. Note that (as for linear
algebra) the speedup will not necessarily be close to the number of threads,
and hyperthreading may not be beneficial.
For the dense versions (second and third cases above), the GPU mode is
automatically selected by default in the CUDA executable when running on
a computer with CUDA support; this can be disabled by first calling
SetGPU(false);. A hybrid mode (POSIX threads and GPU combined)
can be selected by also setting the number of threads while running the
CUDA executable; this is more likely to be beneficial in the Special HFE
Dense version.
Once parallelisation is selected by one of the above methods, the parallel
F4 algorithm can be called by the procedure Groebner or the function
GroebnerBasis and the other functions which take a multivariate
polynomial ideal defined over a finite field and implicitly create
a Gröbner basis first (such as Dimension and Variety).
The General Dense algorithm can be selected by passing the parameter
Dense to the Groebner or GroebnerBasis functions,
but this will also be automatically selected if the input polynomials
are considered dense, while passing Dense := false will force the
Standard (non-dense) algorithm to be used.
The Special HFE dense algorithm is selected by passing the parameters
Dense and HFE to the Groebner or GroebnerBasis
functions. (As before, Dense may be omitted if the input is
already dense.)
In this example we solve (n, k, r)-MinRank problems by first
using the General Dense F4 algorithm to compute a grevlex
Gröbner basis of the relevant ideal
(based on the minors of a symbolic matrix with linear forms)
and then using a block Wiedemann algorithm to compute the variety
of the ideal (see [Ste15] for more details).
After choosing a parameter r, a hard instance can be found
with by taking n=r + 3 and k=(n - r)2.
The runs are on a server with dual Intel 3.2GHz Gold 6146 CPUs with 24
cores total, and an NVIDIA V100 GPU.
> K := GF(PreviousPrime(2^16));
> r := 6; n := r + 3; k := (n - r)^2;
> n, k, r;
9 9 6
> time B := MinRankSystem(K, n, k, r);
Time: 160.030
> I := Ideal(B);
> I: Minimal;
Ideal of Polynomial ring of rank 9 over GF(65521)
> // Single thread
> SetNthreads(1);
> // DynamicStrategy handles large reduction to zero better:
> time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy);
Time: 2511.040
> QuotientDimension(I);
41580
> time V := Variety(I: Al := "Wiedemann");
Time: 2449.620
> V;
[ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ]
> // Check solution is correct:
> time { Evaluate(f, V[1]): f in B };
{ 0 }
Time: 1.320
> // 24 threads
> SetNthreads(24);
> I := Ideal(B); // Recreate ideal
> time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy);
Time: 548.030[r]
> time V := Variety(I: Al := "Wiedemann");
Time: 539.280[r]
> V;
[ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ]
> // GPU
> SetNthreads(1); SetGPU(true);
> I := Ideal(B); // Recreate ideal
> time Groebner(I: PairsLimit := 2000, Dense, DynamicStrategy);
Time: 477.280
> time V := Variety(I: Al := "Wiedemann");
Time: 520.450
> V;
[ <40578, 37019, 22371, 45628, 40625, 2374, 42277, 33586, 51176> ]
Taking r=7 gives a much bigger system where the parallel
speedup is greater, especially for the Wiedemann stage which completely
depends on several large matrix multiplications.
> K := GF(PreviousPrime(2^16));
> r := 7; n := r + 3; k := (n - r)^2;
> n, k, r;
10 9 7
> time B := MinRankSystem(K, n, k, r);
Time: 305.080
> I := Ideal(B);
> I: Minimal;
Ideal of Polynomial ring of rank 9 over GF(65521)
> #B;
2025
> {* Length(f): f in B *};
{* 22283^^5, 22284^^83, 22285^^490, 22286^^1447 *}
With one thread only:
> SetNthreads(1);
> time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy);
Time: 16108.420
> QuotientDimension(I);
108900
> time V := Variety(I: Al := "Wiedemann");
Time: 35348.420
> V;
[ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ]
> // Check solution is correct:
> time { Evaluate(f, V[1]): f in B };
{ 0 }
Time: 4.220
With 24 threads:
> SetNthreads(24);
> I := Ideal(B); // Recreate ideal
> time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy);
Time: 2582.500[r]
> time V := Variety(I: Al := "Wiedemann");
Time: 5317.040[r]
> V;
[ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ]
With one thread and a GPU:
> SetNthreads(1); SetGPU(true);
> I := Ideal(B); // Recreate ideal
> time Groebner(I: PairsLimit := 5000, Dense, DynamicStrategy);
Time: 2127.830
> time V := Variety(I: Al := "Wiedemann");
Time: 4316.960
> V;
[ <38565, 61856, 56127, 27828, 31304, 30257, 38178, 28576, 24230> ]
In this example we solve some over-determined n-variable polynomial
systems generated by m random dense quadratic polynomials,
where the constant terms are adjusted so there is at least one
solution over the original field.
The hardest system in Table 1 of [MS17] has n=27, m=2.n=54.
> function RandomDenseSystem(K, n, m, d)
> P<[x]> := PolynomialRing(K, n, "grevlex");
> B := [
> &+[Random(K)*s: s in MonomialsOfDegree(P, k), k in [0 .. d]]:
> i in [1 .. m]
> ];
> sol := [Random(K): j in [1 .. n]];
> return [f - Evaluate(f, sol): f in B];
> end function;
> n := 27; B := RandomDenseSystem(GF(31), n, 2*n, 2);
> // Single thread only
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000);
Time: 2799.290
> G;
[
x[1] + 12, x[2] + 19, x[3] + 8, x[4] + 10, x[5] + 25, ... x[27] + 21
]
> Variety(Ideal(G));
[ <19, 12, 23, 21, 6, 27, 19, 14, 18, 17, 10, 14, 29, 25, 12, 0, 12,
13, 30, 13, 7, 17, 12, 30, 29, 22, 10> ]
> // 24 threads
> SetNthreads(24);
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000);
Time: 815.190[r]
> // GPU
> SetNthreads(1); SetGPU(true);
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 5000);
Time: 519.140
Taking n=32 gives a much harder system where the parallel
speedup is greater.
> n := 32; B := RandomDenseSystem(GF(31), n, 2*n, 2);
> // Single thread only
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000);
Time: 156326.220
> Basis(I);
[
x[1] + 12, x[2] + 19, x[3] + 8, x[4] + 10, x[5] + 25, ... x[27] + 21
]
> Variety(Ideal(G));
[ <19, 12, 23, 21, 6, 27, 19, 14, 18, 17, 10, 14, 29, 25, 12, 0, 12,
13, 30, 13, 7, 17, 12, 30, 29, 22, 10> ]
> // 24 threads
> SetNthreads(24);
> I := Ideal(B); // Recreate ideal
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000);
Time: 21921.790[r]
> // GPU
> SetNthreads(1); SetGPU(true);
> I := Ideal(B); // Recreate ideal
> time G := GroebnerBasis(B, 6: Dense, PairsLimit := 10000);
Time: 13335.090
In this example we compute the Gröbner basis of a 80-variable HFE
multivariate polynomial system over GF(2) with secret degree 100 to solve
the underlying cryptosystem. Since the secret degree is 100, it suffices
to compute pairs up to total degree 4. On the 3.6GHz Intel Core i7-7700,
using 4 threads gives a speedup of about 3.4 over using 1 thread.
> B := HFESystem(2, 80, 100);
> Universe(B);
Polynomial ring of rank 80 over GF(2)
Order: Graded Reverse Lexicographical
Variables: x[1], x[2], x[3], x[4], x[5], x[6], ...
> SetNthreads(1);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 1000);
Time: 77.010
> [Sprint(f): f in G];
[ x[1] + 1, x[2] + 1, x[3], x[4], x[5], x[6], x[7], x[8], ...
x[75] + 1, x[76] + 1, x[77] + 1, x[78], x[79] + 1, x[80] + 1 ]
> Variety(Ideal(G));
[ <1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1,
0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1> ]
> SetNthreads(4);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 1000);
Time: 26.990[r]
A much larger example has 200 variables with secret degree 100 again.
On the 3.6GHz Intel Core i7-7700, using 4 threads gives a speedup of about
3.4 over using 1 thread.
> B := HFESystem(2, 200, 100);
> Universe(B);
Polynomial ring of rank 200 over GF(2)
Order: Graded Reverse Lexicographical
Variables: x[1], x[2], x[3], x[4], x[5], x[6], ...
> SetNthreads(1);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 8997.120
> [Sprint(f): f in G];
[ x[1] + 1, x[2], x[3], x[4] + 1, x[5] + 1, x[6], x[7], x[8] + 1, ...
x[195], x[196] + 1, x[197] + 1, x[198] + 1, x[199] + 1, x[200] ]
> Variety(Ideal(G));
[ <1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, ...
0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0> ]
> SetNthreads(4);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 2618.300[r]
The same polynomial system is here solved on a server with dual Intel
3.2GHz Gold 6146 CPUs with 24 cores total, and a NVIDIA V100 GPU.
The GPU helps the echelonisation phase at the end of each step, so
gives a moderate improvement.
> B := HFESystem(2, 200, 100);
> SetNthreads(1);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 6227.490
> SetNthreads(24);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 786.670[r]
> SetNthreads(1); SetGPU(true);
> time G := GroebnerBasis(B, 4: HFE, ReductionHeuristic := 2500);
Time: 700.450[r]
[Next][Prev] [Right] [Left] [Up] [Index] [Root]
|