|
Magma provides an experimental implementation of the fastest general
purpose factoring algorithm known: the Number Field Sieve (NFS).
The implementation may be used both as a General Number Field Sieve and
a Special Number Field Sieve -- the only difference is in the selection
of a suitable polynomial.
In order to make use of the Magma NFS, the user should have some
knowledge of the algorithm. The Magma NFS implementation also
requires a significant amount of memory and disk space to be available
for the duration of the factorization. For example, factorization
of an 80-digit number may require at least 64 megabytes of RAM and
half a gigabyte of disk space.
Magma's NFS implementation uses one linear polynomial (the
"rational side") and one polynomial of higher degree (the
"algebraic side"). At the time of writing this is not a major
restriction, since the best methods for selecting polynomials for
factorization of numbers of more than 100 digits involve one linear
and one non-linear polynomial. Magma provides a number of functions
to assist in choosing a good algebraic-side polynomial for the
factorization of a particular number, following the ideas of Montgomery
and Murphy in [Mur99].
Magma provides two methods for using the NFS implementation.
The first is the one-step function NFS, which provides a
naive NFS factorization attempt using default algorithm parameters.
The second, more powerful method is to work with an NFS process object,
splitting the algorithm into four stages: Sieving, Auxiliary data, Linear
algebra and Final factorization. This approach allows greater control over
the algorithm, as the user may supply their own algorithm parameter values.
It also allows the user to distribute the computationally intensive sieving
and final factorization stages over several machines or processors.
Some functions are included to allow Magma users to co-operate in
factorization attempts using CWI tools.
A verbose flag may be set to obtain informative printing on progress
in the various stages of the NFS algorithm.
Set the verbose printing level for the NFS algorithms to the integer v.
Currently the legal values for v are 0, 1, 2 and 3.
If the level is 0, no verbose output is produced.
If the level is 1, NFS will produce basic information about its progress, and
will also print information on NFS algorithm parameters.
If the level is 2, NFS will provide more detailed information about progress
and parameters.
If the level is 3, NFS will print out extremely detailed information about
progress and data. This level will only be useful for experts and developers.
Magma's Number Field Sieve implementation provides a one-step black-box
function NFS. Here, the user provides the integer n to be factored,
a homogeneous bivariate integer polynomial F and integers m1 and m2
such that F(m1, m2) ≡ 0 mod n. Magma will attempt to factor n
using F, m1 and m2, automatically selecting the other parameters
(see below) for the algorithm.
The automatically chosen parameters are NOT optimal in general,
and therefore no conclusions should be drawn about the speed of
the implementation or the algorithm itself based on the use
of this function.
For example, note that the default algebraic factor base size
of NFS is chosen to be rather large to decrease the
likelihood of running out of useful relations. This slows the
algorithm considerably, since it increases the size of the matrix
to be reduced -- but it also means that the algorithm should
succeed in finding a factor unless one chooses a really bad polynomial.
NFS(n, F, m1, m2) : RngIntElt, RngMPolElt, RngIntElt, RngIntElt -> RngIntElt
Performs the factorization of an integer n using the
Number Field Sieve with algebraic polynomial F, where
the integers m1 and m2 satisfy F(m1, m2) ≡ 0 mod n.
Returns a nontrivial factor of n if one is found, or 0 otherwise.
An NFS Process (an object of category NFSProc) encapsulates the data
of a Magma NFS factorization.
It contains the number n to be factored, the algebraic polynomial
F and the integers m1 and m2. It also provides access to a number
of NFS algorithm parameters (such as approximate factor base sizes). These
parameters are attributes of the NFS process.
If any of the parameters are not set, sensible (but not necessarily optimal)
defaults will be provided by Magma.
The NFS algorithm is divided into four stages:
- 1.
- Sieving
- 2.
- Auxiliary data gathering
- 3.
- Linear algebra
- 4.
- Factorization
The stages are described in detail below.
After creating an NFS process for the factorization attempt, the user should
proceed through each of the four stages in the above order.
Given a (composite) integer n, a bivariate homogeneous integer polynomial F,
and nonzero integers m1 and m2 such that F(m1, m2) ≡ 0 mod n, this
function creates an NFS process object for an NFS factorization of n.
The attributes associated with an NFS process are:
> ListAttributes(NFSProc);
AlgebraicError OutputFilename
AlgebraicFBBound RationalError
AlgebraicLargePrimeBound RationalFBBound
CacheSize RationalLargePrimeBound
F m1
Firstb m2
Lastb n
Maximuma
OutputFilename is the base name for NFS-generated data files.
These files (and their actual names) are discussed below.
AlgebraicFBBound is the upper bound for smooth primes in the
algebraic-side factor base, and RationalFBBound, the upper bound
for smooth primes in rational-side factor base.
Maximuma bounds the sieve interval for a: NFS will sieve for
relations with |a| ≤ Maximuma.
Firstb is the first value of b to sieve on, and Lastb is
the last.
AlgebraicLargePrimeBound gives the upper bound for "large"
(non-smooth) primes in the algebraic-side factor base. Similarly,
RationalLargePrimeBound is the upper bound for the rational side.
AlgebraicError defines an "error" tolerance for logarithm arithmetic
on algebraic side. Similarly, RationalError defines an "error"
tolerance for the rational side.
CacheSize is a flag reflecting the computer cache memory size, for
optimisation.
As a guideline for the selection of attributes, we include here
a few examples of attributes that we have determined to be good
for the Magma NFS implementation.
The best choice for the factor base size depends on many variables, including
the average log size and the Murphy α parameter (defined in [Mur99]) for the polynomial F.
Our polynomials above are quite good: if the user does not know much
about determining the quality of polynomials, then he or she should use
much larger factor bases.
Sample attributes for a 70-digit number:
> n := 5235869680233366295366904510725458053043111241035678897933802235060927;
> R<X,Y> := PolynomialRing(Integers( ), 2);
> F := 2379600*X^4 - 12052850016*X^3*Y - 13804671642407*X^2*Y^2 +
> 11449640164912254*X*Y^3 + 7965530070546332840*Y^4 ;
> m1 := 6848906180202117;
> m2 := 1;
> P := NFSProcess(n,F,m1,m2);
> P`AlgebraicFBBound := 8*10^5;
> P`RationalFBBound := 6*10^5;
> P`OutputFilename := "/tmp/nfs_70_digit";
> P`Maximuma := 4194280;
> P`AlgebraicError := 16;
> P`RationalError := 14;
Sample attributes for an 80-digit number:
> n := 1871831866357686493451122722951040222063279350383738650253906933489072\
> 2483083589;
> P<X,Y> := PolynomialRing(Integers(),2);
> F := 15901200*X^4 + 9933631795*X^3*Y - 112425819157429*X^2*Y^2 -
> 231659214929438137*X*Y^3 - 73799500175565303965*Y^4;
> m1 := 1041619817688573426;
> m2 := 1;
> P := NFSProcess(n, F, m1, m2);
> P`AlgebraicFBBound := 8*10^5;
> P`RationalFBBound := 6*10^5;
> P`OutputFilename := "/tmp/nfs_80_dgit";
> P`Maximuma := 10485760;
> P`AlgebraicError := 16;
> P`RationalError := 14;
Sample attributes for an 87-digit number:
> n := 12118618732463427472219179104631767765107839384219612469780841876821498\
> 2402918637227743;
> P<X,Y> := PolynomialRing(Integers(),2);
> F := 190512000*X^4 - 450872401242*X^3*Y +
> 1869594915648551*X^2*Y^2 + 2568544235742498*X*Y^3 -
> 9322965583419801010104*Y^4;
> m1 := 28241170741195273211;
> m2 := 1;
> P := NFSProcess(n, F, m1, m2);
> P`AlgebraicFBBound := 16*10^5;
> P`RationalFBBound := 10^6;
> P`OutputFilename := "/tmp/nfs_87_digit";
> P`Maximuma := 2^24;
> P`AlgebraicError := 24;
> P`RationalError := 18;
Magma's NFS uses a "line-by-line" (or "classical")
sieving algorithm. Future versions may include lattice sieving.
The line-by-line siever sieves values of F(a, b) on the algebraic
side and corresponding values a .m2 - b .m1 on the
rational side.
This is done by fixing a value of b (beginning with the parameter
Firstb, if supplied), then sieving all
values of a between -a0 and a0, where a0 is
approximately equal to the parameter Maximuma (some rounding
off is done to make sure that the sieve interval length is divisible
by a high power of 2). When this is completed b is incremented, and
the next value of b is processed.
The sieving continues until either the maximum value of b (specified
by the parameter Lastb) has been reached, or until enough
relations are obtained to complete the factorization. If Lastb is
not defined, the sieve simply continues until enough relations are found.
The number of relations required may be determined by the function
NumberOfRelationsRequired.
"Cycles" among partial relations are counted after every 256 iterations.
The sieve implementation uses (rounded natural) logarithms of primes
to mark the sieve interval. Moreover, the implementation does not sieve
with prime powers. Therefore, we must allow for some error in scanning
the sieve arrays for useful relations; the acceptable sieve threshold
errors for each side are defined by the AlgebraicError and
RationalError parameters.
If, in addition, the user wants to take advantage of large prime
relations (recommended), then larger error terms should be used.
The implementation will keep relations having up to 2 large primes
on each side, but will only find such relations if the user selects
large enough sieve threshold error bounds.
The user should be cautious when sieving for (and subsequently using)
relations with large primes, as they greatly increase overall disk space
requirements.
Some experimentation may be required in order to determine the best
error bounds for speed or disk space optimization purposes.
The CacheSize parameter may be used to take advantage of the cache
memory size of the computer: a value of 1 indicates a small
cache size, 2 a medium cache size, and 3 a for large cache size.
The minimum number of relations required for an NFS factor attempt with
NFS process P.
Given an NFS process P for factoring an integer n, generates relations
to factor n with the Number Field Sieve algorithm.
Returns the number of full relations plus the number of cycles found.
In this stage of the algorithm, "cycles" [LD95]
are detected in the partial relations from the sieving stage,
and quadratic characters are calculated for the relations.
This greatly improves the efficiency of the NFS.
In a typical factorization, the user should call the procedures
CreateCycleFile and CreateCharacterFile in succession.
Creates a file with all the cycle information that the NFS algorithm
requires to complete the matrix reduction and final factorization stages
for the NFS process P.
Returns the number of cycles in the partial relations of the NFS process P.
This function is mainly intended for factoring with multiple processors.
Returns the number of cycles in the partial data file corresponding
to the base file name fn.
This function is mainly intended for factoring with multiple processors.
Creates a file with the quadratic character data for the full relations
and cycles in the NFS process P.
Creates a file with the quadratic character data for the full relations
and cycles in the NFS process P.
There are cc sets of 32 quadratic character columns created.
In this stage, the relations are collected together to form a matrix,
and then block Lanczos reduction is applied to find linear dependencies
among the relations. These dependencies become candidates for factorization.
Finds dependencies between relations in the NFS process P.
In this stage, number field square roots are extracted and we attempt
to factor the dependencies found in the linear algebra stage.
Try to factor with each dependency in the NFS process P until a proper
factor is found.
Returns the factor, or 0 if no factor is found.
Attempt to factor with the k-th dependency in the NFS process P.
Returns a proper factor if found, 0 otherwise.
Many data files are ued for an NFS factorization. The user can control
the names and location of the files by specifying the OutputFilename
parameter; then all output files will have names beginning with the
OutputFilename string, with a range of suffixes depending on their
purpose.
In general, all files are appended to rather than overwritten; so to avoid
inconsistencies (and to save disk space) the user should call
RemoveFiles after a successful factorization.
When distributing factorizations, or collecting results from sieving stages
that have been broken up into several runs for some reason (for example, if
a process has been interrupted), Magma provides the function
MergeFiles. This takes a sequence of base filenames (which are
treated as if they were the value for OutputFilename), and reads in the
corresponding relation and partial relation files; it then combines the
contents of these files, removing duplicates and corrupted lines of data, and
places the results into new relation and partial relation files.
Deletes any data files created by the NFS process P.
Merges the NFS relation files named in the sequence S
(and their associated partial relation files) into a pair of
new relation and partial relation files, while removing
duplicate and corrupted lines of data; returns the number of
relations and the number of partial relations in the new output
files.
The combined full relations are stored in a file named fn,
and the partial relations in a file named fn_partials.
Here we describe the files used in a typical Magma NFS factorization.
These files all use formats peculiar to Magma's NFS.
The first kind of file created by NFS stores the relations generated in the
sieving stage by the FindRelations procedure.
The name of the file is precisely the OutputFilename string.
NFS also stores partial relations generated in the sieving stage; these are
stored in a file named OutputFilename_partials.
Whenever cycles [LD95] are counted (for example, in
CycleCount, a file named OutputFilename_cycles is
created to store them in. Some other files are also created and
then deleted during the cycle counting process.
The quadratic characters calculated in CreateCharacterFile are
stored in a file named OutputFilename_cc.
The linear algebra stage creates a file named
OutputFilename_null_space, which lists relations making up null
space vectors for the NFS matrix.
Magma provides a number of tools for distributing the sieving and
final factoring stages over a number of computers.
To distribute the sieving stage, each processor should get a
unique range of b-values to sieve and unique data file names.
During the sieving, the user must manually check when the combined
data has enough relations to factor the number.
To do this, the data files must first be merged using MergeFiles,
and then the cycles can be counted with CycleCount.
If the combined number of full relations plus the number of
cycles exceeds the size of both factor bases combined,
then the user can proceed to the other stages of the factorization
attempt using the merged data file name.
To distribute the factorization stage, the user may choose a dependency
for each process to factor, then call Factor(P,k) where P is the
NFS process and k the number specifying the dependency to factor, with
a different value of k for each process.
Here we demonstrate a distributed NFS factorization (of a very small n)
over two processes, A and B -- which may be on different machines, or
different magma processes on the same machine, or even in the same magma
process.
We begin with process A:
> R<X,Y> := PolynomialRing(Integers( ),2);
> n := 70478782497479747987234958341;
> F := 814*X^4 + 3172*X^3*Y - 49218*X^2*Y^2 - 142775*X*Y^3
> - 65862*Y^4;
> m1 := 3050411;
> m2 := 1;
> A := NFSProcess(n,F,m1,m2);
> A`Firstb := 0;
> A`Lastb := 99;
> A`OutputFilename := "/tmp/nfs-distrib-A";
> FindRelations(A);
3852
Now, process B, with n, F, m1 and m2 as above:
> B := NFSProcess(n,F,m1,m2);
> B`Firstb := 99;
> B`Lastb := 199;
> B`OutputFilename := "/tmp/nfs-distrib-B";
> FindRelations(B);
2455
Then later, on a single machine,
> input_files := ["/tmp/nfs-distrib-A","/tmp/nfs-distrib-B"];
> P := NFSProcess(n,F,m1,m2);
> P`OutputFilename := "/tmp/nfs-distrib-all";
> MergeFiles(input_files, P`OutputFilename);
4162 25925
> CycleCount(P);
4368
> CreateCycleFile(P);
> CreateCharacterFile(P);
> FindDependencies(P);
Now, the final factorization stage may be distributed over more than one
processor also. We attempt to factor a relation on A:
> A`OutputFilename := "/tmp/nfs-distrib-all";
> Factor(A,9); // factor dependency 9
0
No factor was found on machine A, but meantime on B:
> B`OutputFilename := "/tmp/nfs-distrib-all";
> Factor(P,1); // factor dependency 1
94899629
> n mod $1, n div $1;
0 742666575624650207929
We have a successful factorisation.
At the time of writing, the record NFS factorizations were lead
by the CWI group and by people using CWI's or Arjen Lenstra's code.
The CWI tools use a different data file format to Magma's native
format, but Magma supplies some tools to allow users to assist in
CWI factorization attempts.
The user may generate relations in CWI relation format, rather than
Magma native format, by using FindRelationsInCWIFormat.
The user should note that relations in CWI format cannot at present
be used in the Auxiliary data, Linear algebra or Factorization
stages of the Magma NFS.
Alternatively, assuming some Magma NFS relations have already
been computed for a process, then the user may use the procedure
ConvertToCWIFormat to convert the relation data files
from Magma native format to CWI format.
The resulting data file is named OutputFilename_CWI_format,
and will contain both the full and partial relations of the process.
Given an NFS process P for factoring an integer n, generates relations
to factor n with the Number Field Sieve algorithm, in a file format
suitable for use with CWI's NFS tools. Returns the number of relations
found.
Converts the relation files of the NFS process P to CWI format,
storing primes only greater than or equal to the prime printing
bound pb. The resulting data file name will be named
P`OutputFilename_CWI_format.
Magma does not provide a function to select
an optimal polynomial for the factorization of a given number.
However, Magma does provide some functions that are useful for
the implementation of the polynomial selection algorithms developed
by Peter Montgomery and Brian Murphy in
[Mur99].
The functions BaseMPolynomial, MurphyAlphaApproximation,
OptimalSkewness, BestTranslation, PolynomialSieve,
and DickmanRho, will be useful for those wanting to implement
polynomial selection routines within the Magma interpreter language.
Given integers n, m and d, returns a homogeneous bivariate polynomial
F = ∑i=0d ciXiYd - i such that the coefficients ci give
a base m representation of n: that is, ∑i=0d cimi = n. The
coefficients also satisfy |ci| ≤m/2.
This polynomial F may be used to factorize n using the number field
sieve (with m1 := m and m2 := 1).
This function requires that d ≥2 and n ≥md.
Given a univariate or homogeneous bivariate polynomial F, return
an approximation of the α value of F,
using primes less than the positive integer bound b.
The α value of a polynomial is defined in [Mur99].
Since random sampling is used for primes dividing the discriminant,
successive calls to this function will give slightly different results.
Given a univariate or homogeneous bivariate polynomial F,
return its optimal skewness and corresponding average log size.
The optimal skewness and average log size values are defined in [Mur99].
This example illustrates an effective (though not optimal) method
for finding a "good" polynomial for use in NFS factorizations.
Here we search for a degree d=4 polynomial to use in factoring
a 52-digit integer n.
We define the rating of a polynomial to be the sum of the α value
and corresponding "average log size" (see [Mur99]).
We then proceed by
iterating over base m polynomials with successive leading coefficients
(with the values of m near ((m)1/d(m)^(1/(d + 1)))1/2, and
chosen to minimize the second-to-leading coefficient),
and choosing as a result the polynomial with the smallest rating.
> n := RandomPrime(90)*RandomPrime(90);
> n;
3596354707256253204076739374167770148715218949803889
> d := 4;
> approx_m := Iroot( Iroot( n, d+1 ) * Iroot( n, d ) , 2 );
> leading_coeff := n div approx_m^d;
> leading_coeff;
143082
> m := Iroot( n div leading_coeff, d );
> P<X,Y> := PolynomialRing( Integers(), 2 );
> F<X,Y> := BaseMPolynomial(n,m,d);
> F;
143082*X^4 + 463535*X^3*Y - 173869838910*X^2*Y^2 + 167201617413*X*Y^3 +
159859288415*Y^4
> skew, als := OptimalSkewness( F );
> alpha := MurphyAlphaApproximation( F, 2000 );
> rating := als + alpha;
> rating;
23.143714548914575193314917
>
> best_rating := rating;
> best_m := m;
> for i in [1..100] do
> leading_coeff := leading_coeff + 1;
> m := Iroot( n div leading_coeff, d );
> F<X,Y> := BaseMPolynomial(n,m,d);
> skew, als := OptimalSkewness( F );
> alpha := MurphyAlphaApproximation( F, 2000 );
> rating := als + alpha;
> if rating lt best_rating then
> best_rating := rating;
> best_m := m;
> end if;
> end for;
> best_rating;
20.899568473033257031950385
> best_m;
398116527578
> F<X,Y> := BaseMPolynomial(n,best_m,d);
> F;
143160*X^4 + 199085*X^3*Y - 9094377652*X^2*Y^2 - 93898749030*X*Y^3 -
169859083883*Y^4
> OptimalSkewness( F );
165.514255523681640625 20.969934467920612180646408
> MurphyAlphaApproximation( F, 2000 );
-0.0542716157630141449500150842
> time NFS( n, F, best_m, 1 );
...
Given a univariate or homogeneous bivariate polynomial F,
an integer m, and real value a (which should be the average
log size of F for some optimal skewness), returns a polynomial
G and an integer m' such that G(m') = F(m), together with
the average log size and optimal skewness of G. The translation
G is selected such that the average log size is a local minimum.
PrimeBound: RngIntElt Default: 1000
Given a homogeneous bivariate integer polynomial F of degree d,
together with integers m, J0 and J1 and a real value MaxAlpha,
returns a list of tuples, each of which contains a polynomial
G = F + j1x2yd - 2 - (|j0| + j1m)xyd - 1 + (j0m)yd,
where |j0| ≤J0 and |j1| ≤J1 such that the α
value (see [Mur99]) of G is "better" (that is, lower) than
MaxAlpha.
Each tuple contains the data
<average log size + α, skewness, α,
G, m, j0, j1 >.
If the optional parameter PrimeBound is set, it is used as an upper
bound for primes used to calculate α.
[Next][Prev] [Right] [Left] [Up] [Index] [Root]
|