kmx git

Commit ddddcf2e71e55796313d61661eb9666e9d52a5df

2017-05-09T14:01:11
move all documentation to doc/
diff --git a/bn.tex b/bn.tex
deleted file mode 100644
index 5804318..0000000
--- a/bn.tex
+++ /dev/null
@@ -1,1913 +0,0 @@
-\documentclass[synpaper]{book}
-\usepackage{hyperref}
-\usepackage{makeidx}
-\usepackage{amssymb}
-\usepackage{color}
-\usepackage{alltt}
-\usepackage{graphicx}
-\usepackage{layout}
-\def\union{\cup}
-\def\intersect{\cap}
-\def\getsrandom{\stackrel{\rm R}{\gets}}
-\def\cross{\times}
-\def\cat{\hspace{0.5em} \| \hspace{0.5em}}
-\def\catn{$\|$}
-\def\divides{\hspace{0.3em} | \hspace{0.3em}}
-\def\nequiv{\not\equiv}
-\def\approx{\raisebox{0.2ex}{\mbox{\small $\sim$}}}
-\def\lcm{{\rm lcm}}
-\def\gcd{{\rm gcd}}
-\def\log{{\rm log}}
-\def\ord{{\rm ord}}
-\def\abs{{\mathit abs}}
-\def\rep{{\mathit rep}}
-\def\mod{{\mathit\ mod\ }}
-\renewcommand{\pmod}[1]{\ ({\rm mod\ }{#1})}
-\newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor}
-\newcommand{\ceil}[1]{\left\lceil{#1}\right\rceil}
-\def\Or{{\rm\ or\ }}
-\def\And{{\rm\ and\ }}
-\def\iff{\hspace{1em}\Longleftrightarrow\hspace{1em}}
-\def\implies{\Rightarrow}
-\def\undefined{{\rm ``undefined"}}
-\def\Proof{\vspace{1ex}\noindent {\bf Proof:}\hspace{1em}}
-\let\oldphi\phi
-\def\phi{\varphi}
-\def\Pr{{\rm Pr}}
-\newcommand{\str}[1]{{\mathbf{#1}}}
-\def\F{{\mathbb F}}
-\def\N{{\mathbb N}}
-\def\Z{{\mathbb Z}}
-\def\R{{\mathbb R}}
-\def\C{{\mathbb C}}
-\def\Q{{\mathbb Q}}
-\definecolor{DGray}{gray}{0.5}
-\newcommand{\emailaddr}[1]{\mbox{$<${#1}$>$}}
-\def\twiddle{\raisebox{0.3ex}{\mbox{\tiny $\sim$}}}
-\def\gap{\vspace{0.5ex}}
-\makeindex
-\begin{document}
-\frontmatter
-\pagestyle{empty}
-\title{LibTomMath User Manual \\ v1.0}
-\author{Tom St Denis \\ tstdenis82@gmail.com}
-\maketitle
-This text, the library and the accompanying textbook are all hereby placed in the public domain.  This book has been
-formatted for B5 [176x250] paper using the \LaTeX{} {\em book} macro package.
-
-\vspace{10cm}
-
-\begin{flushright}Open Source.  Open Academia.  Open Minds.
-
-\mbox{ }
-
-Tom St Denis,
-
-Ontario, Canada
-\end{flushright}
-
-\tableofcontents
-\listoffigures
-\mainmatter
-\pagestyle{headings}
-\chapter{Introduction}
-\section{What is LibTomMath?}
-LibTomMath is a library of source code which provides a series of efficient and carefully written functions for manipulating
-large integer numbers.  It was written in portable ISO C source code so that it will build on any platform with a conforming
-C compiler.
-
-In a nutshell the library was written from scratch with verbose comments to help instruct computer science students how
-to implement ``bignum'' math.  However, the resulting code has proven to be very useful.  It has been used by numerous
-universities, commercial and open source software developers.  It has been used on a variety of platforms ranging from
-Linux and Windows based x86 to ARM based Gameboys and PPC based MacOS machines.
-
-\section{License}
-As of the v0.25 the library source code has been placed in the public domain with every new release.  As of the v0.28
-release the textbook ``Implementing Multiple Precision Arithmetic'' has been placed in the public domain with every new
-release as well.  This textbook is meant to compliment the project by providing a more solid walkthrough of the development
-algorithms used in the library.
-
-Since both\footnote{Note that the MPI files under mtest/ are copyrighted by Michael Fromberger.  They are not required to use LibTomMath.} are in the
-public domain everyone is entitled to do with them as they see fit.
-
-\section{Building LibTomMath}
-
-LibTomMath is meant to be very ``GCC friendly'' as it comes with a makefile well suited for GCC.  However, the library will
-also build in MSVC, Borland C out of the box.  For any other ISO C compiler a makefile will have to be made by the end
-developer.
-
-\subsection{Static Libraries}
-To build as a static library for GCC issue the following
-\begin{alltt}
-make
-\end{alltt}
-
-command.  This will build the library and archive the object files in ``libtommath.a''.  Now you link against
-that and include ``tommath.h'' within your programs.  Alternatively to build with MSVC issue the following
-\begin{alltt}
-nmake -f makefile.msvc
-\end{alltt}
-
-This will build the library and archive the object files in ``tommath.lib''.  This has been tested with MSVC
-version 6.00 with service pack 5.
-
-\subsection{Shared Libraries}
-To build as a shared library for GCC issue the following
-\begin{alltt}
-make -f makefile.shared
-\end{alltt}
-This requires the ``libtool'' package (common on most Linux/BSD systems).  It will build LibTomMath as both shared
-and static then install (by default) into /usr/lib as well as install the header files in /usr/include.  The shared
-library (resource) will be called ``libtommath.la'' while the static library called ``libtommath.a''.  Generally
-you use libtool to link your application against the shared object.
-
-There is limited support for making a ``DLL'' in windows via the ``makefile.cygwin\_dll'' makefile.  It requires
-Cygwin to work with since it requires the auto-export/import functionality.  The resulting DLL and import library
-``libtommath.dll.a'' can be used to link LibTomMath dynamically to any Windows program using Cygwin.
-
-\subsection{Testing}
-To build the library and the test harness type
-
-\begin{alltt}
-make test
-\end{alltt}
-
-This will build the library, ``test'' and ``mtest/mtest''.  The ``test'' program will accept test vectors and verify the
-results.  ``mtest/mtest'' will generate test vectors using the MPI library by Michael Fromberger\footnote{A copy of MPI
-is included in the package}.  Simply pipe mtest into test using
-
-\begin{alltt}
-mtest/mtest | test
-\end{alltt}
-
-If you do not have a ``/dev/urandom'' style RNG source you will have to write your own PRNG and simply pipe that into
-mtest.  For example, if your PRNG program is called ``myprng'' simply invoke
-
-\begin{alltt}
-myprng | mtest/mtest | test
-\end{alltt}
-
-This will output a row of numbers that are increasing.  Each column is a different test (such as addition, multiplication, etc)
-that is being performed.  The numbers represent how many times the test was invoked.  If an error is detected the program
-will exit with a dump of the relevent numbers it was working with.
-
-\section{Build Configuration}
-LibTomMath can configured at build time in three phases we shall call ``depends'', ``tweaks'' and ``trims''.
-Each phase changes how the library is built and they are applied one after another respectively.
-
-To make the system more powerful you can tweak the build process.  Classes are defined in the file
-``tommath\_superclass.h''.  By default, the symbol ``LTM\_ALL'' shall be defined which simply
-instructs the system to build all of the functions.  This is how LibTomMath used to be packaged.  This will give you
-access to every function LibTomMath offers.
-
-However, there are cases where such a build is not optional.  For instance, you want to perform RSA operations.  You
-don't need the vast majority of the library to perform these operations.  Aside from LTM\_ALL there is
-another pre--defined class ``SC\_RSA\_1'' which works in conjunction with the RSA from LibTomCrypt.  Additional
-classes can be defined base on the need of the user.
-
-\subsection{Build Depends}
-In the file tommath\_class.h you will see a large list of C ``defines'' followed by a series of ``ifdefs''
-which further define symbols.  All of the symbols (technically they're macros $\ldots$) represent a given C source
-file.  For instance, BN\_MP\_ADD\_C represents the file ``bn\_mp\_add.c''.  When a define has been enabled the
-function in the respective file will be compiled and linked into the library.  Accordingly when the define
-is absent the file will not be compiled and not contribute any size to the library.
-
-You will also note that the header tommath\_class.h is actually recursively included (it includes itself twice).
-This is to help resolve as many dependencies as possible.  In the last pass the symbol LTM\_LAST will be defined.
-This is useful for ``trims''.
-
-\subsection{Build Tweaks}
-A tweak is an algorithm ``alternative''.  For example, to provide tradeoffs (usually between size and space).
-They can be enabled at any pass of the configuration phase.
-
-\begin{small}
-\begin{center}
-\begin{tabular}{|l|l|}
-\hline \textbf{Define} & \textbf{Purpose} \\
-\hline BN\_MP\_DIV\_SMALL & Enables a slower, smaller and equally \\
-                          & functional mp\_div() function \\
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-
-\subsection{Build Trims}
-A trim is a manner of removing functionality from a function that is not required.  For instance, to perform
-RSA cryptography you only require exponentiation with odd moduli so even moduli support can be safely removed.
-Build trims are meant to be defined on the last pass of the configuration which means they are to be defined
-only if LTM\_LAST has been defined.
-
-\subsubsection{Moduli Related}
-\begin{small}
-\begin{center}
-\begin{tabular}{|l|l|}
-\hline \textbf{Restriction} & \textbf{Undefine} \\
-\hline Exponentiation with odd moduli only & BN\_S\_MP\_EXPTMOD\_C \\
-                                           & BN\_MP\_REDUCE\_C \\
-                                           & BN\_MP\_REDUCE\_SETUP\_C \\
-                                           & BN\_S\_MP\_MUL\_HIGH\_DIGS\_C \\
-                                           & BN\_FAST\_S\_MP\_MUL\_HIGH\_DIGS\_C \\
-\hline Exponentiation with random odd moduli & (The above plus the following) \\
-                                           & BN\_MP\_REDUCE\_2K\_C \\
-                                           & BN\_MP\_REDUCE\_2K\_SETUP\_C \\
-                                           & BN\_MP\_REDUCE\_IS\_2K\_C \\
-                                           & BN\_MP\_DR\_IS\_MODULUS\_C \\
-                                           & BN\_MP\_DR\_REDUCE\_C \\
-                                           & BN\_MP\_DR\_SETUP\_C \\
-\hline Modular inverse odd moduli only     & BN\_MP\_INVMOD\_SLOW\_C \\
-\hline Modular inverse (both, smaller/slower) & BN\_FAST\_MP\_INVMOD\_C \\
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-
-\subsubsection{Operand Size Related}
-\begin{small}
-\begin{center}
-\begin{tabular}{|l|l|}
-\hline \textbf{Restriction} & \textbf{Undefine} \\
-\hline Moduli $\le 2560$ bits              & BN\_MP\_MONTGOMERY\_REDUCE\_C \\
-                                           & BN\_S\_MP\_MUL\_DIGS\_C \\
-                                           & BN\_S\_MP\_MUL\_HIGH\_DIGS\_C \\
-                                           & BN\_S\_MP\_SQR\_C \\
-\hline Polynomial Schmolynomial            & BN\_MP\_KARATSUBA\_MUL\_C \\
-                                           & BN\_MP\_KARATSUBA\_SQR\_C \\
-                                           & BN\_MP\_TOOM\_MUL\_C \\
-                                           & BN\_MP\_TOOM\_SQR\_C \\
-
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-
-
-\section{Purpose of LibTomMath}
-Unlike  GNU MP (GMP) Library, LIP, OpenSSL or various other commercial kits (Miracl), LibTomMath was not written with
-bleeding edge performance in mind.  First and foremost LibTomMath was written to be entirely open.  Not only is the
-source code public domain (unlike various other GPL/etc licensed code), not only is the code freely downloadable but the
-source code is also accessible for computer science students attempting to learn ``BigNum'' or multiple precision
-arithmetic techniques.
-
-LibTomMath was written to be an instructive collection of source code.  This is why there are many comments, only one
-function per source file and often I use a ``middle-road'' approach where I don't cut corners for an extra 2\% speed
-increase.
-
-Source code alone cannot really teach how the algorithms work which is why I also wrote a textbook that accompanies
-the library (beat that!).
-
-So you may be thinking ``should I use LibTomMath?'' and the answer is a definite maybe.  Let me tabulate what I think
-are the pros and cons of LibTomMath by comparing it to the math routines from GnuPG\footnote{GnuPG v1.2.3 versus LibTomMath v0.28}.
-
-\newpage\begin{figure}[here]
-\begin{small}
-\begin{center}
-\begin{tabular}{|l|c|c|l|}
-\hline \textbf{Criteria} & \textbf{Pro} & \textbf{Con} & \textbf{Notes} \\
-\hline Few lines of code per file & X & & GnuPG $ = 300.9$, LibTomMath  $ = 71.97$ \\
-\hline Commented function prototypes & X && GnuPG function names are cryptic. \\
-\hline Speed && X & LibTomMath is slower.  \\
-\hline Totally free & X & & GPL has unfavourable restrictions.\\
-\hline Large function base & X & & GnuPG is barebones. \\
-\hline Five modular reduction algorithms & X & & Faster modular exponentiation for a variety of moduli. \\
-\hline Portable & X & & GnuPG requires configuration to build. \\
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-\caption{LibTomMath Valuation}
-\end{figure}
-
-It may seem odd to compare LibTomMath to GnuPG since the math in GnuPG is only a small portion of the entire application.
-However, LibTomMath was written with cryptography in mind.  It provides essentially all of the functions a cryptosystem
-would require when working with large integers.
-
-So it may feel tempting to just rip the math code out of GnuPG (or GnuMP where it was taken from originally) in your
-own application but I think there are reasons not to.  While LibTomMath is slower than libraries such as GnuMP it is
-not normally significantly slower.  On x86 machines the difference is normally a factor of two when performing modular
-exponentiations.  It depends largely on the processor, compiler and the moduli being used.
-
-Essentially the only time you wouldn't use LibTomMath is when blazing speed is the primary concern.  However,
-on the other side of the coin LibTomMath offers you a totally free (public domain) well structured math library
-that is very flexible, complete and performs well in resource contrained environments.  Fast RSA for example can
-be performed with as little as 8KB of ram for data (again depending on build options).
-
-\chapter{Getting Started with LibTomMath}
-\section{Building Programs}
-In order to use LibTomMath you must include ``tommath.h'' and link against the appropriate library file (typically
-libtommath.a).  There is no library initialization required and the entire library is thread safe.
-
-\section{Return Codes}
-There are three possible return codes a function may return.
-
-\index{MP\_OKAY}\index{MP\_YES}\index{MP\_NO}\index{MP\_VAL}\index{MP\_MEM}
-\begin{figure}[here!]
-\begin{center}
-\begin{small}
-\begin{tabular}{|l|l|}
-\hline \textbf{Code} & \textbf{Meaning} \\
-\hline MP\_OKAY & The function succeeded. \\
-\hline MP\_VAL  & The function input was invalid. \\
-\hline MP\_MEM  & Heap memory exhausted. \\
-\hline &\\
-\hline MP\_YES  & Response is yes. \\
-\hline MP\_NO   & Response is no. \\
-\hline
-\end{tabular}
-\end{small}
-\end{center}
-\caption{Return Codes}
-\end{figure}
-
-The last two codes listed are not actually ``return'ed'' by a function.  They are placed in an integer (the caller must
-provide the address of an integer it can store to) which the caller can access.  To convert one of the three return codes
-to a string use the following function.
-
-\index{mp\_error\_to\_string}
-\begin{alltt}
-char *mp_error_to_string(int code);
-\end{alltt}
-
-This will return a pointer to a string which describes the given error code.  It will not work for the return codes
-MP\_YES and MP\_NO.
-
-\section{Data Types}
-The basic ``multiple precision integer'' type is known as the ``mp\_int'' within LibTomMath.  This data type is used to
-organize all of the data required to manipulate the integer it represents.  Within LibTomMath it has been prototyped
-as the following.
-
-\index{mp\_int}
-\begin{alltt}
-typedef struct  \{
-    int used, alloc, sign;
-    mp_digit *dp;
-\} mp_int;
-\end{alltt}
-
-Where ``mp\_digit'' is a data type that represents individual digits of the integer.  By default, an mp\_digit is the
-ISO C ``unsigned long'' data type and each digit is $28-$bits long.  The mp\_digit type can be configured to suit other
-platforms by defining the appropriate macros.
-
-All LTM functions that use the mp\_int type will expect a pointer to mp\_int structure.  You must allocate memory to
-hold the structure itself by yourself (whether off stack or heap it doesn't matter).  The very first thing that must be
-done to use an mp\_int is that it must be initialized.
-
-\section{Function Organization}
-
-The arithmetic functions of the library are all organized to have the same style prototype.  That is source operands
-are passed on the left and the destination is on the right.  For instance,
-
-\begin{alltt}
-mp_add(&a, &b, &c);       /* c = a + b */
-mp_mul(&a, &a, &c);       /* c = a * a */
-mp_div(&a, &b, &c, &d);   /* c = [a/b], d = a mod b */
-\end{alltt}
-
-Another feature of the way the functions have been implemented is that source operands can be destination operands as well.
-For instance,
-
-\begin{alltt}
-mp_add(&a, &b, &b);       /* b = a + b */
-mp_div(&a, &b, &a, &c);   /* a = [a/b], c = a mod b */
-\end{alltt}
-
-This allows operands to be re-used which can make programming simpler.
-
-\section{Initialization}
-\subsection{Single Initialization}
-A single mp\_int can be initialized with the ``mp\_init'' function.
-
-\index{mp\_init}
-\begin{alltt}
-int mp_init (mp_int * a);
-\end{alltt}
-
-This function expects a pointer to an mp\_int structure and will initialize the members of the structure so the mp\_int
-represents the default integer which is zero.  If the functions returns MP\_OKAY then the mp\_int is ready to be used
-by the other LibTomMath functions.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   if ((result = mp_init(&number)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* use the number */
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-\subsection{Single Free}
-When you are finished with an mp\_int it is ideal to return the heap it used back to the system.  The following function
-provides this functionality.
-
-\index{mp\_clear}
-\begin{alltt}
-void mp_clear (mp_int * a);
-\end{alltt}
-
-The function expects a pointer to a previously initialized mp\_int structure and frees the heap it uses.  It sets the
-pointer\footnote{The ``dp'' member.} within the mp\_int to \textbf{NULL} which is used to prevent double free situations.
-Is is legal to call mp\_clear() twice on the same mp\_int in a row.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   if ((result = mp_init(&number)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* use the number */
-
-   /* We're done with it. */
-   mp_clear(&number);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-\subsection{Multiple Initializations}
-Certain algorithms require more than one large integer.  In these instances it is ideal to initialize all of the mp\_int
-variables in an ``all or nothing'' fashion.  That is, they are either all initialized successfully or they are all
-not initialized.
-
-The  mp\_init\_multi() function provides this functionality.
-
-\index{mp\_init\_multi} \index{mp\_clear\_multi}
-\begin{alltt}
-int mp_init_multi(mp_int *mp, ...);
-\end{alltt}
-
-It accepts a \textbf{NULL} terminated list of pointers to mp\_int structures.  It will attempt to initialize them all
-at once.  If the function returns MP\_OKAY then all of the mp\_int variables are ready to use, otherwise none of them
-are available for use.  A complementary mp\_clear\_multi() function allows multiple mp\_int variables to be free'd
-from the heap at the same time.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int num1, num2, num3;
-   int result;
-
-   if ((result = mp_init_multi(&num1,
-                               &num2,
-                               &num3, NULL)) != MP\_OKAY) \{
-      printf("Error initializing the numbers.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* use the numbers */
-
-   /* We're done with them. */
-   mp_clear_multi(&num1, &num2, &num3, NULL);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-\subsection{Other Initializers}
-To initialized and make a copy of an mp\_int the mp\_init\_copy() function has been provided.
-
-\index{mp\_init\_copy}
-\begin{alltt}
-int mp_init_copy (mp_int * a, mp_int * b);
-\end{alltt}
-
-This function will initialize $a$ and make it a copy of $b$ if all goes well.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int num1, num2;
-   int result;
-
-   /* initialize and do work on num1 ... */
-
-   /* We want a copy of num1 in num2 now */
-   if ((result = mp_init_copy(&num2, &num1)) != MP_OKAY) \{
-     printf("Error initializing the copy.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* now num2 is ready and contains a copy of num1 */
-
-   /* We're done with them. */
-   mp_clear_multi(&num1, &num2, NULL);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-Another less common initializer is mp\_init\_size() which allows the user to initialize an mp\_int with a given
-default number of digits.  By default, all initializers allocate \textbf{MP\_PREC} digits.  This function lets
-you override this behaviour.
-
-\index{mp\_init\_size}
-\begin{alltt}
-int mp_init_size (mp_int * a, int size);
-\end{alltt}
-
-The $size$ parameter must be greater than zero.  If the function succeeds the mp\_int $a$ will be initialized
-to have $size$ digits (which are all initially zero).
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   /* we need a 60-digit number */
-   if ((result = mp_init_size(&number, 60)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* use the number */
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-\section{Maintenance Functions}
-
-\subsection{Reducing Memory Usage}
-When an mp\_int is in a state where it won't be changed again\footnote{A Diffie-Hellman modulus for instance.} excess
-digits can be removed to return memory to the heap with the mp\_shrink() function.
-
-\index{mp\_shrink}
-\begin{alltt}
-int mp_shrink (mp_int * a);
-\end{alltt}
-
-This will remove excess digits of the mp\_int $a$.  If the operation fails the mp\_int should be intact without the
-excess digits being removed.  Note that you can use a shrunk mp\_int in further computations, however, such operations
-will require heap operations which can be slow.  It is not ideal to shrink mp\_int variables that you will further
-modify in the system (unless you are seriously low on memory).
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   if ((result = mp_init(&number)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* use the number [e.g. pre-computation]  */
-
-   /* We're done with it for now. */
-   if ((result = mp_shrink(&number)) != MP_OKAY) \{
-      printf("Error shrinking the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* use it .... */
-
-
-   /* we're done with it. */
-   mp_clear(&number);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-\subsection{Adding additional digits}
-
-Within the mp\_int structure are two parameters which control the limitations of the array of digits that represent
-the integer the mp\_int is meant to equal.   The \textit{used} parameter dictates how many digits are significant, that is,
-contribute to the value of the mp\_int.  The \textit{alloc} parameter dictates how many digits are currently available in
-the array.  If you need to perform an operation that requires more digits you will have to mp\_grow() the mp\_int to
-your desired size.
-
-\index{mp\_grow}
-\begin{alltt}
-int mp_grow (mp_int * a, int size);
-\end{alltt}
-
-This will grow the array of digits of $a$ to $size$.  If the \textit{alloc} parameter is already bigger than
-$size$ the function will not do anything.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   if ((result = mp_init(&number)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* use the number */
-
-   /* We need to add 20 digits to the number  */
-   if ((result = mp_grow(&number, number.alloc + 20)) != MP_OKAY) \{
-      printf("Error growing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-
-   /* use the number */
-
-   /* we're done with it. */
-   mp_clear(&number);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-\chapter{Basic Operations}
-\section{Small Constants}
-Setting mp\_ints to small constants is a relatively common operation.  To accomodate these instances there are two
-small constant assignment functions.  The first function is used to set a single digit constant while the second sets
-an ISO C style ``unsigned long'' constant.  The reason for both functions is efficiency.  Setting a single digit is quick but the
-domain of a digit can change (it's always at least $0 \ldots 127$).
-
-\subsection{Single Digit}
-
-Setting a single digit can be accomplished with the following function.
-
-\index{mp\_set}
-\begin{alltt}
-void mp_set (mp_int * a, mp_digit b);
-\end{alltt}
-
-This will zero the contents of $a$ and make it represent an integer equal to the value of $b$.  Note that this
-function has a return type of \textbf{void}.  It cannot cause an error so it is safe to assume the function
-succeeded.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   if ((result = mp_init(&number)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* set the number to 5 */
-   mp_set(&number, 5);
-
-   /* we're done with it. */
-   mp_clear(&number);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-\subsection{Long Constants}
-
-To set a constant that is the size of an ISO C ``unsigned long'' and larger than a single digit the following function
-can be used.
-
-\index{mp\_set\_int}
-\begin{alltt}
-int mp_set_int (mp_int * a, unsigned long b);
-\end{alltt}
-
-This will assign the value of the 32-bit variable $b$ to the mp\_int $a$.  Unlike mp\_set() this function will always
-accept a 32-bit input regardless of the size of a single digit.  However, since the value may span several digits
-this function can fail if it runs out of heap memory.
-
-To get the ``unsigned long'' copy of an mp\_int the following function can be used.
-
-\index{mp\_get\_int}
-\begin{alltt}
-unsigned long mp_get_int (mp_int * a);
-\end{alltt}
-
-This will return the 32 least significant bits of the mp\_int $a$.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   if ((result = mp_init(&number)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* set the number to 654321 (note this is bigger than 127) */
-   if ((result = mp_set_int(&number, 654321)) != MP_OKAY) \{
-      printf("Error setting the value of the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   printf("number == \%lu", mp_get_int(&number));
-
-   /* we're done with it. */
-   mp_clear(&number);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-This should output the following if the program succeeds.
-
-\begin{alltt}
-number == 654321
-\end{alltt}
-
-\subsection{Long Constants - platform dependant}
-
-\index{mp\_set\_long}
-\begin{alltt}
-int mp_set_long (mp_int * a, unsigned long b);
-\end{alltt}
-
-This will assign the value of the platform-dependant sized variable $b$ to the mp\_int $a$.
-
-To get the ``unsigned long'' copy of an mp\_int the following function can be used.
-
-\index{mp\_get\_long}
-\begin{alltt}
-unsigned long mp_get_long (mp_int * a);
-\end{alltt}
-
-This will return the least significant bits of the mp\_int $a$ that fit into an ``unsigned long''.
-
-\subsection{Long Long Constants}
-
-\index{mp\_set\_long\_long}
-\begin{alltt}
-int mp_set_long_long (mp_int * a, unsigned long long b);
-\end{alltt}
-
-This will assign the value of the 64-bit variable $b$ to the mp\_int $a$.
-
-To get the ``unsigned long long'' copy of an mp\_int the following function can be used.
-
-\index{mp\_get\_long\_long}
-\begin{alltt}
-unsigned long long mp_get_long_long (mp_int * a);
-\end{alltt}
-
-This will return the 64 least significant bits of the mp\_int $a$.
-
-\subsection{Initialize and Setting Constants}
-To both initialize and set small constants the following two functions are available.
-\index{mp\_init\_set} \index{mp\_init\_set\_int}
-\begin{alltt}
-int mp_init_set (mp_int * a, mp_digit b);
-int mp_init_set_int (mp_int * a, unsigned long b);
-\end{alltt}
-
-Both functions work like the previous counterparts except they first mp\_init $a$ before setting the values.
-
-\begin{alltt}
-int main(void)
-\{
-   mp_int number1, number2;
-   int    result;
-
-   /* initialize and set a single digit */
-   if ((result = mp_init_set(&number1, 100)) != MP_OKAY) \{
-      printf("Error setting number1: \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* initialize and set a long */
-   if ((result = mp_init_set_int(&number2, 1023)) != MP_OKAY) \{
-      printf("Error setting number2: \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* display */
-   printf("Number1, Number2 == \%lu, \%lu",
-          mp_get_int(&number1), mp_get_int(&number2));
-
-   /* clear */
-   mp_clear_multi(&number1, &number2, NULL);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt}
-
-If this program succeeds it shall output.
-\begin{alltt}
-Number1, Number2 == 100, 1023
-\end{alltt}
-
-\section{Comparisons}
-
-Comparisons in LibTomMath are always performed in a ``left to right'' fashion.  There are three possible return codes
-for any comparison.
-
-\index{MP\_GT} \index{MP\_EQ} \index{MP\_LT}
-\begin{figure}[here]
-\begin{center}
-\begin{tabular}{|c|c|}
-\hline \textbf{Result Code} & \textbf{Meaning} \\
-\hline MP\_GT & $a > b$ \\
-\hline MP\_EQ & $a = b$ \\
-\hline MP\_LT & $a < b$ \\
-\hline
-\end{tabular}
-\end{center}
-\caption{Comparison Codes for $a, b$}
-\label{fig:CMP}
-\end{figure}
-
-In figure \ref{fig:CMP} two integers $a$ and $b$ are being compared.  In this case $a$ is said to be ``to the left'' of
-$b$.
-
-\subsection{Unsigned comparison}
-
-An unsigned comparison considers only the digits themselves and not the associated \textit{sign} flag of the
-mp\_int structures.  This is analogous to an absolute comparison.  The function mp\_cmp\_mag() will compare two
-mp\_int variables based on their digits only.
-
-\index{mp\_cmp\_mag}
-\begin{alltt}
-int mp_cmp_mag(mp_int * a, mp_int * b);
-\end{alltt}
-This will compare $a$ to $b$ placing $a$ to the left of $b$.  This function cannot fail and will return one of the
-three compare codes listed in figure \ref{fig:CMP}.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number1, number2;
-   int result;
-
-   if ((result = mp_init_multi(&number1, &number2, NULL)) != MP_OKAY) \{
-      printf("Error initializing the numbers.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* set the number1 to 5 */
-   mp_set(&number1, 5);
-
-   /* set the number2 to -6 */
-   mp_set(&number2, 6);
-   if ((result = mp_neg(&number2, &number2)) != MP_OKAY) \{
-      printf("Error negating number2.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   switch(mp_cmp_mag(&number1, &number2)) \{
-       case MP_GT:  printf("|number1| > |number2|"); break;
-       case MP_EQ:  printf("|number1| = |number2|"); break;
-       case MP_LT:  printf("|number1| < |number2|"); break;
-   \}
-
-   /* we're done with it. */
-   mp_clear_multi(&number1, &number2, NULL);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-If this program\footnote{This function uses the mp\_neg() function which is discussed in section \ref{sec:NEG}.} completes
-successfully it should print the following.
-
-\begin{alltt}
-|number1| < |number2|
-\end{alltt}
-
-This is because $\vert -6 \vert = 6$ and obviously $5 < 6$.
-
-\subsection{Signed comparison}
-
-To compare two mp\_int variables based on their signed value the mp\_cmp() function is provided.
-
-\index{mp\_cmp}
-\begin{alltt}
-int mp_cmp(mp_int * a, mp_int * b);
-\end{alltt}
-
-This will compare $a$ to the left of $b$.  It will first compare the signs of the two mp\_int variables.  If they
-differ it will return immediately based on their signs.  If the signs are equal then it will compare the digits
-individually.  This function will return one of the compare conditions codes listed in figure \ref{fig:CMP}.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number1, number2;
-   int result;
-
-   if ((result = mp_init_multi(&number1, &number2, NULL)) != MP_OKAY) \{
-      printf("Error initializing the numbers.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* set the number1 to 5 */
-   mp_set(&number1, 5);
-
-   /* set the number2 to -6 */
-   mp_set(&number2, 6);
-   if ((result = mp_neg(&number2, &number2)) != MP_OKAY) \{
-      printf("Error negating number2.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   switch(mp_cmp(&number1, &number2)) \{
-       case MP_GT:  printf("number1 > number2"); break;
-       case MP_EQ:  printf("number1 = number2"); break;
-       case MP_LT:  printf("number1 < number2"); break;
-   \}
-
-   /* we're done with it. */
-   mp_clear_multi(&number1, &number2, NULL);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-If this program\footnote{This function uses the mp\_neg() function which is discussed in section \ref{sec:NEG}.} completes
-successfully it should print the following.
-
-\begin{alltt}
-number1 > number2
-\end{alltt}
-
-\subsection{Single Digit}
-
-To compare a single digit against an mp\_int the following function has been provided.
-
-\index{mp\_cmp\_d}
-\begin{alltt}
-int mp_cmp_d(mp_int * a, mp_digit b);
-\end{alltt}
-
-This will compare $a$ to the left of $b$ using a signed comparison.  Note that it will always treat $b$ as
-positive.  This function is rather handy when you have to compare against small values such as $1$ (which often
-comes up in cryptography).  The function cannot fail and will return one of the tree compare condition codes
-listed in figure \ref{fig:CMP}.
-
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   if ((result = mp_init(&number)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* set the number to 5 */
-   mp_set(&number, 5);
-
-   switch(mp_cmp_d(&number, 7)) \{
-       case MP_GT:  printf("number > 7"); break;
-       case MP_EQ:  printf("number = 7"); break;
-       case MP_LT:  printf("number < 7"); break;
-   \}
-
-   /* we're done with it. */
-   mp_clear(&number);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-If this program functions properly it will print out the following.
-
-\begin{alltt}
-number < 7
-\end{alltt}
-
-\section{Logical Operations}
-
-Logical operations are operations that can be performed either with simple shifts or boolean operators such as
-AND, XOR and OR directly.  These operations are very quick.
-
-\subsection{Multiplication by two}
-
-Multiplications and divisions by any power of two can be performed with quick logical shifts either left or
-right depending on the operation.
-
-When multiplying or dividing by two a special case routine can be used which are as follows.
-\index{mp\_mul\_2} \index{mp\_div\_2}
-\begin{alltt}
-int mp_mul_2(mp_int * a, mp_int * b);
-int mp_div_2(mp_int * a, mp_int * b);
-\end{alltt}
-
-The former will assign twice $a$ to $b$ while the latter will assign half $a$ to $b$.  These functions are fast
-since the shift counts and maskes are hardcoded into the routines.
-
-\begin{small} \begin{alltt}
-int main(void)
-\{
-   mp_int number;
-   int result;
-
-   if ((result = mp_init(&number)) != MP_OKAY) \{
-      printf("Error initializing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* set the number to 5 */
-   mp_set(&number, 5);
-
-   /* multiply by two */
-   if ((result = mp\_mul\_2(&number, &number)) != MP_OKAY) \{
-      printf("Error multiplying the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-   switch(mp_cmp_d(&number, 7)) \{
-       case MP_GT:  printf("2*number > 7"); break;
-       case MP_EQ:  printf("2*number = 7"); break;
-       case MP_LT:  printf("2*number < 7"); break;
-   \}
-
-   /* now divide by two */
-   if ((result = mp\_div\_2(&number, &number)) != MP_OKAY) \{
-      printf("Error dividing the number.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-   switch(mp_cmp_d(&number, 7)) \{
-       case MP_GT:  printf("2*number/2 > 7"); break;
-       case MP_EQ:  printf("2*number/2 = 7"); break;
-       case MP_LT:  printf("2*number/2 < 7"); break;
-   \}
-
-   /* we're done with it. */
-   mp_clear(&number);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt} \end{small}
-
-If this program is successful it will print out the following text.
-
-\begin{alltt}
-2*number > 7
-2*number/2 < 7
-\end{alltt}
-
-Since $10 > 7$ and $5 < 7$.
-
-To multiply by a power of two the following function can be used.
-
-\index{mp\_mul\_2d}
-\begin{alltt}
-int mp_mul_2d(mp_int * a, int b, mp_int * c);
-\end{alltt}
-
-This will multiply $a$ by $2^b$ and store the result in ``c''.  If the value of $b$ is less than or equal to
-zero the function will copy $a$ to ``c'' without performing any further actions.  The multiplication itself
-is implemented as a right-shift operation of $a$ by $b$ bits.
-
-To divide by a power of two use the following.
-
-\index{mp\_div\_2d}
-\begin{alltt}
-int mp_div_2d (mp_int * a, int b, mp_int * c, mp_int * d);
-\end{alltt}
-Which will divide $a$ by $2^b$, store the quotient in ``c'' and the remainder in ``d'.  If $b \le 0$ then the
-function simply copies $a$ over to ``c'' and zeroes $d$.  The variable $d$ may be passed as a \textbf{NULL}
-value to signal that the remainder is not desired.  The division itself is implemented as a left-shift
-operation of $a$ by $b$ bits.
-
-\subsection{Polynomial Basis Operations}
-
-Strictly speaking the organization of the integers within the mp\_int structures is what is known as a
-``polynomial basis''.  This simply means a field element is stored by divisions of a radix.  For example, if
-$f(x) = \sum_{i=0}^{k} y_ix^k$ for any vector $\vec y$ then the array of digits in $\vec y$ are said to be
-the polynomial basis representation of $z$ if $f(\beta) = z$ for a given radix $\beta$.
-
-To multiply by the polynomial $g(x) = x$ all you have todo is shift the digits of the basis left one place.  The
-following function provides this operation.
-
-\index{mp\_lshd}
-\begin{alltt}
-int mp_lshd (mp_int * a, int b);
-\end{alltt}
-
-This will multiply $a$ in place by $x^b$ which is equivalent to shifting the digits left $b$ places and inserting zeroes
-in the least significant digits.  Similarly to divide by a power of $x$ the following function is provided.
-
-\index{mp\_rshd}
-\begin{alltt}
-void mp_rshd (mp_int * a, int b)
-\end{alltt}
-This will divide $a$ in place by $x^b$ and discard the remainder.  This function cannot fail as it performs the operations
-in place and no new digits are required to complete it.
-
-\subsection{AND, OR and XOR Operations}
-
-While AND, OR and XOR operations are not typical ``bignum functions'' they can be useful in several instances.  The
-three functions are prototyped as follows.
-
-\index{mp\_or} \index{mp\_and} \index{mp\_xor}
-\begin{alltt}
-int mp_or  (mp_int * a, mp_int * b, mp_int * c);
-int mp_and (mp_int * a, mp_int * b, mp_int * c);
-int mp_xor (mp_int * a, mp_int * b, mp_int * c);
-\end{alltt}
-
-Which compute $c = a \odot b$ where $\odot$ is one of OR, AND or XOR.
-
-\section{Addition and Subtraction}
-
-To compute an addition or subtraction the following two functions can be used.
-
-\index{mp\_add} \index{mp\_sub}
-\begin{alltt}
-int mp_add (mp_int * a, mp_int * b, mp_int * c);
-int mp_sub (mp_int * a, mp_int * b, mp_int * c)
-\end{alltt}
-
-Which perform $c = a \odot b$ where $\odot$ is one of signed addition or subtraction.  The operations are fully sign
-aware.
-
-\section{Sign Manipulation}
-\subsection{Negation}
-\label{sec:NEG}
-Simple integer negation can be performed with the following.
-
-\index{mp\_neg}
-\begin{alltt}
-int mp_neg (mp_int * a, mp_int * b);
-\end{alltt}
-
-Which assigns $-a$ to $b$.
-
-\subsection{Absolute}
-Simple integer absolutes can be performed with the following.
-
-\index{mp\_neg}
-\begin{alltt}
-int mp_abs (mp_int * a, mp_int * b);
-\end{alltt}
-
-Which assigns $\vert a \vert$ to $b$.
-
-\section{Integer Division and Remainder}
-To perform a complete and general integer division with remainder use the following function.
-
-\index{mp\_div}
-\begin{alltt}
-int mp_div (mp_int * a, mp_int * b, mp_int * c, mp_int * d);
-\end{alltt}
-
-This divides $a$ by $b$ and stores the quotient in $c$ and $d$.  The signed quotient is computed such that
-$bc + d = a$.  Note that either of $c$ or $d$ can be set to \textbf{NULL} if their value is not required.  If
-$b$ is zero the function returns \textbf{MP\_VAL}.
-
-
-\chapter{Multiplication and Squaring}
-\section{Multiplication}
-A full signed integer multiplication can be performed with the following.
-\index{mp\_mul}
-\begin{alltt}
-int mp_mul (mp_int * a, mp_int * b, mp_int * c);
-\end{alltt}
-Which assigns the full signed product $ab$ to $c$.  This function actually breaks into one of four cases which are
-specific multiplication routines optimized for given parameters.  First there are the Toom-Cook multiplications which
-should only be used with very large inputs.  This is followed by the Karatsuba multiplications which are for moderate
-sized inputs.  Then followed by the Comba and baseline multipliers.
-
-Fortunately for the developer you don't really need to know this unless you really want to fine tune the system.  mp\_mul()
-will determine on its own\footnote{Some tweaking may be required.} what routine to use automatically when it is called.
-
-\begin{alltt}
-int main(void)
-\{
-   mp_int number1, number2;
-   int result;
-
-   /* Initialize the numbers */
-   if ((result = mp_init_multi(&number1,
-                               &number2, NULL)) != MP_OKAY) \{
-      printf("Error initializing the numbers.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* set the terms */
-   if ((result = mp_set_int(&number, 257)) != MP_OKAY) \{
-      printf("Error setting number1.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   if ((result = mp_set_int(&number2, 1023)) != MP_OKAY) \{
-      printf("Error setting number2.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* multiply them */
-   if ((result = mp_mul(&number1, &number2,
-                        &number1)) != MP_OKAY) \{
-      printf("Error multiplying terms.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* display */
-   printf("number1 * number2 == \%lu", mp_get_int(&number1));
-
-   /* free terms and return */
-   mp_clear_multi(&number1, &number2, NULL);
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt}
-
-If this program succeeds it shall output the following.
-
-\begin{alltt}
-number1 * number2 == 262911
-\end{alltt}
-
-\section{Squaring}
-Since squaring can be performed faster than multiplication it is performed it's own function instead of just using
-mp\_mul().
-
-\index{mp\_sqr}
-\begin{alltt}
-int mp_sqr (mp_int * a, mp_int * b);
-\end{alltt}
-
-Will square $a$ and store it in $b$.  Like the case of multiplication there are four different squaring
-algorithms all which can be called from mp\_sqr().  It is ideal to use mp\_sqr over mp\_mul when squaring terms because
-of the speed difference.
-
-\section{Tuning Polynomial Basis Routines}
-
-Both of the Toom-Cook and Karatsuba multiplication algorithms are faster than the traditional $O(n^2)$ approach that
-the Comba and baseline algorithms use.  At $O(n^{1.464973})$ and $O(n^{1.584962})$ running times respectively they require
-considerably less work.  For example, a 10000-digit multiplication would take roughly 724,000 single precision
-multiplications with Toom-Cook or 100,000,000 single precision multiplications with the standard Comba (a factor
-of 138).
-
-So why not always use Karatsuba or Toom-Cook?   The simple answer is that they have so much overhead that they're not
-actually faster than Comba until you hit distinct  ``cutoff'' points.  For Karatsuba with the default configuration,
-GCC 3.3.1 and an Athlon XP processor the cutoff point is roughly 110 digits (about 70 for the Intel P4).  That is, at
-110 digits Karatsuba and Comba multiplications just about break even and for 110+ digits Karatsuba is faster.
-
-Toom-Cook has incredible overhead and is probably only useful for very large inputs.  So far no known cutoff points
-exist and for the most part I just set the cutoff points very high to make sure they're not called.
-
-A demo program in the ``etc/'' directory of the project called ``tune.c'' can be used to find the cutoff points.  This
-can be built with GCC as follows
-
-\begin{alltt}
-make XXX
-\end{alltt}
-Where ``XXX'' is one of the following entries from the table \ref{fig:tuning}.
-
-\begin{figure}[here]
-\begin{center}
-\begin{small}
-\begin{tabular}{|l|l|}
-\hline \textbf{Value of XXX} & \textbf{Meaning} \\
-\hline tune & Builds portable tuning application \\
-\hline tune86 & Builds x86 (pentium and up) program for COFF \\
-\hline tune86c & Builds x86 program for Cygwin \\
-\hline tune86l & Builds x86 program for Linux (ELF format) \\
-\hline
-\end{tabular}
-\end{small}
-\end{center}
-\caption{Build Names for Tuning Programs}
-\label{fig:tuning}
-\end{figure}
-
-When the program is running it will output a series of measurements for different cutoff points.  It will first find
-good Karatsuba squaring and multiplication points.  Then it proceeds to find Toom-Cook points.  Note that the Toom-Cook
-tuning takes a very long time as the cutoff points are likely to be very high.
-
-\chapter{Modular Reduction}
-
-Modular reduction is process of taking the remainder of one quantity divided by another.  Expressed
-as (\ref{eqn:mod}) the modular reduction is equivalent to the remainder of $b$ divided by $c$.
-
-\begin{equation}
-a \equiv b \mbox{ (mod }c\mbox{)}
-\label{eqn:mod}
-\end{equation}
-
-Of particular interest to cryptography are reductions where $b$ is limited to the range $0 \le b < c^2$ since particularly
-fast reduction algorithms can be written for the limited range.
-
-Note that one of the four optimized reduction algorithms are automatically chosen in the modular exponentiation
-algorithm mp\_exptmod when an appropriate modulus is detected.
-
-\section{Straight Division}
-In order to effect an arbitrary modular reduction the following algorithm is provided.
-
-\index{mp\_mod}
-\begin{alltt}
-int mp_mod(mp_int *a, mp_int *b, mp_int *c);
-\end{alltt}
-
-This reduces $a$ modulo $b$ and stores the result in $c$.  The sign of $c$ shall agree with the sign
-of $b$.  This algorithm accepts an input $a$ of any range and is not limited by $0 \le a < b^2$.
-
-\section{Barrett Reduction}
-
-Barrett reduction is a generic optimized reduction algorithm that requires pre--computation to achieve
-a decent speedup over straight division.  First a $\mu$ value must be precomputed with the following function.
-
-\index{mp\_reduce\_setup}
-\begin{alltt}
-int mp_reduce_setup(mp_int *a, mp_int *b);
-\end{alltt}
-
-Given a modulus in $b$ this produces the required $\mu$ value in $a$.  For any given modulus this only has to
-be computed once.  Modular reduction can now be performed with the following.
-
-\index{mp\_reduce}
-\begin{alltt}
-int mp_reduce(mp_int *a, mp_int *b, mp_int *c);
-\end{alltt}
-
-This will reduce $a$ in place modulo $b$ with the precomputed $\mu$ value in $c$.  $a$ must be in the range
-$0 \le a < b^2$.
-
-\begin{alltt}
-int main(void)
-\{
-   mp_int   a, b, c, mu;
-   int      result;
-
-   /* initialize a,b to desired values, mp_init mu,
-    * c and set c to 1...we want to compute a^3 mod b
-    */
-
-   /* get mu value */
-   if ((result = mp_reduce_setup(&mu, b)) != MP_OKAY) \{
-      printf("Error getting mu.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* square a to get c = a^2 */
-   if ((result = mp_sqr(&a, &c)) != MP_OKAY) \{
-      printf("Error squaring.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* now reduce `c' modulo b */
-   if ((result = mp_reduce(&c, &b, &mu)) != MP_OKAY) \{
-      printf("Error reducing.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* multiply a to get c = a^3 */
-   if ((result = mp_mul(&a, &c, &c)) != MP_OKAY) \{
-      printf("Error reducing.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* now reduce `c' modulo b  */
-   if ((result = mp_reduce(&c, &b, &mu)) != MP_OKAY) \{
-      printf("Error reducing.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* c now equals a^3 mod b */
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt}
-
-This program will calculate $a^3 \mbox{ mod }b$ if all the functions succeed.
-
-\section{Montgomery Reduction}
-
-Montgomery is a specialized reduction algorithm for any odd moduli.  Like Barrett reduction a pre--computation
-step is required.  This is accomplished with the following.
-
-\index{mp\_montgomery\_setup}
-\begin{alltt}
-int mp_montgomery_setup(mp_int *a, mp_digit *mp);
-\end{alltt}
-
-For the given odd moduli $a$ the precomputation value is placed in $mp$.  The reduction is computed with the
-following.
-
-\index{mp\_montgomery\_reduce}
-\begin{alltt}
-int mp_montgomery_reduce(mp_int *a, mp_int *m, mp_digit mp);
-\end{alltt}
-This reduces $a$ in place modulo $m$ with the pre--computed value $mp$.   $a$ must be in the range
-$0 \le a < b^2$.
-
-Montgomery reduction is faster than Barrett reduction for moduli smaller than the ``comba'' limit.  With the default
-setup for instance, the limit is $127$ digits ($3556$--bits).   Note that this function is not limited to
-$127$ digits just that it falls back to a baseline algorithm after that point.
-
-An important observation is that this reduction does not return $a \mbox{ mod }m$ but $aR^{-1} \mbox{ mod }m$
-where $R = \beta^n$, $n$ is the n number of digits in $m$ and $\beta$ is radix used (default is $2^{28}$).
-
-To quickly calculate $R$ the following function was provided.
-
-\index{mp\_montgomery\_calc\_normalization}
-\begin{alltt}
-int mp_montgomery_calc_normalization(mp_int *a, mp_int *b);
-\end{alltt}
-Which calculates $a = R$ for the odd moduli $b$ without using multiplication or division.
-
-The normal modus operandi for Montgomery reductions is to normalize the integers before entering the system.  For
-example, to calculate $a^3 \mbox { mod }b$ using Montgomery reduction the value of $a$ can be normalized by
-multiplying it by $R$.  Consider the following code snippet.
-
-\begin{alltt}
-int main(void)
-\{
-   mp_int   a, b, c, R;
-   mp_digit mp;
-   int      result;
-
-   /* initialize a,b to desired values,
-    * mp_init R, c and set c to 1....
-    */
-
-   /* get normalization */
-   if ((result = mp_montgomery_calc_normalization(&R, b)) != MP_OKAY) \{
-      printf("Error getting norm.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* get mp value */
-   if ((result = mp_montgomery_setup(&c, &mp)) != MP_OKAY) \{
-      printf("Error setting up montgomery.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* normalize `a' so now a is equal to aR */
-   if ((result = mp_mulmod(&a, &R, &b, &a)) != MP_OKAY) \{
-      printf("Error computing aR.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* square a to get c = a^2R^2 */
-   if ((result = mp_sqr(&a, &c)) != MP_OKAY) \{
-      printf("Error squaring.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* now reduce `c' back down to c = a^2R^2 * R^-1 == a^2R */
-   if ((result = mp_montgomery_reduce(&c, &b, mp)) != MP_OKAY) \{
-      printf("Error reducing.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* multiply a to get c = a^3R^2 */
-   if ((result = mp_mul(&a, &c, &c)) != MP_OKAY) \{
-      printf("Error reducing.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* now reduce `c' back down to c = a^3R^2 * R^-1 == a^3R */
-   if ((result = mp_montgomery_reduce(&c, &b, mp)) != MP_OKAY) \{
-      printf("Error reducing.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* now reduce (again) `c' back down to c = a^3R * R^-1 == a^3 */
-   if ((result = mp_montgomery_reduce(&c, &b, mp)) != MP_OKAY) \{
-      printf("Error reducing.  \%s",
-             mp_error_to_string(result));
-      return EXIT_FAILURE;
-   \}
-
-   /* c now equals a^3 mod b */
-
-   return EXIT_SUCCESS;
-\}
-\end{alltt}
-
-This particular example does not look too efficient but it demonstrates the point of the algorithm.  By
-normalizing the inputs the reduced results are always of the form $aR$ for some variable $a$.  This allows
-a single final reduction to correct for the normalization and the fast reduction used within the algorithm.
-
-For more details consider examining the file \textit{bn\_mp\_exptmod\_fast.c}.
-
-\section{Restricted Dimminished Radix}
-
-``Dimminished Radix'' reduction refers to reduction with respect to moduli that are ameniable to simple
-digit shifting and small multiplications.  In this case the ``restricted'' variant refers to moduli of the
-form $\beta^k - p$ for some $k \ge 0$ and $0 < p < \beta$ where $\beta$ is the radix (default to $2^{28}$).
-
-As in the case of Montgomery reduction there is a pre--computation phase required for a given modulus.
-
-\index{mp\_dr\_setup}
-\begin{alltt}
-void mp_dr_setup(mp_int *a, mp_digit *d);
-\end{alltt}
-
-This computes the value required for the modulus $a$ and stores it in $d$.  This function cannot fail
-and does not return any error codes.  After the pre--computation a reduction can be performed with the
-following.
-
-\index{mp\_dr\_reduce}
-\begin{alltt}
-int mp_dr_reduce(mp_int *a, mp_int *b, mp_digit mp);
-\end{alltt}
-
-This reduces $a$ in place modulo $b$ with the pre--computed value $mp$.  $b$ must be of a restricted
-dimminished radix form and $a$ must be in the range $0 \le a < b^2$.  Dimminished radix reductions are
-much faster than both Barrett and Montgomery reductions as they have a much lower asymtotic running time.
-
-Since the moduli are restricted this algorithm is not particularly useful for something like Rabin, RSA or
-BBS cryptographic purposes.  This reduction algorithm is useful for Diffie-Hellman and ECC where fixed
-primes are acceptable.
-
-Note that unlike Montgomery reduction there is no normalization process.  The result of this function is
-equal to the correct residue.
-
-\section{Unrestricted Dimminshed Radix}
-
-Unrestricted reductions work much like the restricted counterparts except in this case the moduli is of the
-form $2^k - p$ for $0 < p < \beta$.  In this sense the unrestricted reductions are more flexible as they
-can be applied to a wider range of numbers.
-
-\index{mp\_reduce\_2k\_setup}
-\begin{alltt}
-int mp_reduce_2k_setup(mp_int *a, mp_digit *d);
-\end{alltt}
-
-This will compute the required $d$ value for the given moduli $a$.
-
-\index{mp\_reduce\_2k}
-\begin{alltt}
-int mp_reduce_2k(mp_int *a, mp_int *n, mp_digit d);
-\end{alltt}
-
-This will reduce $a$ in place modulo $n$ with the pre--computed value $d$.  From my experience this routine is
-slower than mp\_dr\_reduce but faster for most moduli sizes than the Montgomery reduction.
-
-\chapter{Exponentiation}
-\section{Single Digit Exponentiation}
-\index{mp\_expt\_d\_ex}
-\begin{alltt}
-int mp_expt_d_ex (mp_int * a, mp_digit b, mp_int * c, int fast)
-\end{alltt}
-This function computes $c = a^b$.
-
-With parameter \textit{fast} set to $0$ the old version of the algorithm is used,
-when \textit{fast} is $1$, a faster but not statically timed version of the algorithm is used.
-
-The old version uses a simple binary left-to-right algorithm.
-It is faster than repeated multiplications by $a$ for all values of $b$ greater than three.
-
-The new version uses a binary right-to-left algorithm.
-
-The difference between the old and the new version is that the old version always
-executes $DIGIT\_BIT$ iterations. The new algorithm executes only $n$ iterations
-where $n$ is equal to the position of the highest bit that is set in $b$.
-
-\index{mp\_expt\_d}
-\begin{alltt}
-int mp_expt_d (mp_int * a, mp_digit b, mp_int * c)
-\end{alltt}
-mp\_expt\_d(a, b, c) is a wrapper function to mp\_expt\_d\_ex(a, b, c, 0).
-
-\section{Modular Exponentiation}
-\index{mp\_exptmod}
-\begin{alltt}
-int mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y)
-\end{alltt}
-This computes $Y \equiv G^X \mbox{ (mod }P\mbox{)}$ using a variable width sliding window algorithm.  This function
-will automatically detect the fastest modular reduction technique to use during the operation.  For negative values of
-$X$ the operation is performed as $Y \equiv (G^{-1} \mbox{ mod }P)^{\vert X \vert} \mbox{ (mod }P\mbox{)}$ provided that
-$gcd(G, P) = 1$.
-
-This function is actually a shell around the two internal exponentiation functions.  This routine will automatically
-detect when Barrett, Montgomery, Restricted and Unrestricted Dimminished Radix based exponentiation can be used.  Generally
-moduli of the a ``restricted dimminished radix'' form lead to the fastest modular exponentiations.  Followed by Montgomery
-and the other two algorithms.
-
-\section{Root Finding}
-\index{mp\_n\_root}
-\begin{alltt}
-int mp_n_root (mp_int * a, mp_digit b, mp_int * c)
-\end{alltt}
-This computes $c = a^{1/b}$ such that $c^b \le a$ and $(c+1)^b > a$.  The implementation of this function is not
-ideal for values of $b$ greater than three.  It will work but become very slow.  So unless you are working with very small
-numbers (less than 1000 bits) I'd avoid $b > 3$ situations.  Will return a positive root only for even roots and return
-a root with the sign of the input for odd roots.  For example, performing $4^{1/2}$ will return $2$ whereas $(-8)^{1/3}$
-will return $-2$.
-
-This algorithm uses the ``Newton Approximation'' method and will converge on the correct root fairly quickly.  Since
-the algorithm requires raising $a$ to the power of $b$ it is not ideal to attempt to find roots for large
-values of $b$.  If particularly large roots are required then a factor method could be used instead.  For example,
-$a^{1/16}$ is equivalent to $\left (a^{1/4} \right)^{1/4}$ or simply
-$\left ( \left ( \left ( a^{1/2} \right )^{1/2} \right )^{1/2} \right )^{1/2}$
-
-\chapter{Prime Numbers}
-\section{Trial Division}
-\index{mp\_prime\_is\_divisible}
-\begin{alltt}
-int mp_prime_is_divisible (mp_int * a, int *result)
-\end{alltt}
-This will attempt to evenly divide $a$ by a list of primes\footnote{Default is the first 256 primes.} and store the
-outcome in ``result''.  That is if $result = 0$ then $a$ is not divisible by the primes, otherwise it is.  Note that
-if the function does not return \textbf{MP\_OKAY} the value in ``result'' should be considered undefined\footnote{Currently
-the default is to set it to zero first.}.
-
-\section{Fermat Test}
-\index{mp\_prime\_fermat}
-\begin{alltt}
-int mp_prime_fermat (mp_int * a, mp_int * b, int *result)
-\end{alltt}
-Performs a Fermat primality test to the base $b$.  That is it computes $b^a \mbox{ mod }a$ and tests whether the value is
-equal to $b$ or not.  If the values are equal then $a$ is probably prime and $result$ is set to one.  Otherwise $result$
-is set to zero.
-
-\section{Miller-Rabin Test}
-\index{mp\_prime\_miller\_rabin}
-\begin{alltt}
-int mp_prime_miller_rabin (mp_int * a, mp_int * b, int *result)
-\end{alltt}
-Performs a Miller-Rabin test to the base $b$ of $a$.  This test is much stronger than the Fermat test and is very hard to
-fool (besides with Carmichael numbers).  If $a$ passes the test (therefore is probably prime) $result$ is set to one.
-Otherwise $result$ is set to zero.
-
-Note that is suggested that you use the Miller-Rabin test instead of the Fermat test since all of the failures of
-Miller-Rabin are a subset of the failures of the Fermat test.
-
-\subsection{Required Number of Tests}
-Generally to ensure a number is very likely to be prime you have to perform the Miller-Rabin with at least a half-dozen
-or so unique bases.  However, it has been proven that the probability of failure goes down as the size of the input goes up.
-This is why a simple function has been provided to help out.
-
-\index{mp\_prime\_rabin\_miller\_trials}
-\begin{alltt}
-int mp_prime_rabin_miller_trials(int size)
-\end{alltt}
-This returns the number of trials required for a $2^{-96}$ (or lower) probability of failure for a given ``size'' expressed
-in bits.  This comes in handy specially since larger numbers are slower to test.  For example, a 512-bit number would
-require ten tests whereas a 1024-bit number would only require four tests.
-
-You should always still perform a trial division before a Miller-Rabin test though.
-
-\section{Primality Testing}
-\index{mp\_prime\_is\_prime}
-\begin{alltt}
-int mp_prime_is_prime (mp_int * a, int t, int *result)
-\end{alltt}
-This will perform a trial division followed by $t$ rounds of Miller-Rabin tests on $a$ and store the result in $result$.
-If $a$ passes all of the tests $result$ is set to one, otherwise it is set to zero.  Note that $t$ is bounded by
-$1 \le t < PRIME\_SIZE$ where $PRIME\_SIZE$ is the number of primes in the prime number table (by default this is $256$).
-
-\section{Next Prime}
-\index{mp\_prime\_next\_prime}
-\begin{alltt}
-int mp_prime_next_prime(mp_int *a, int t, int bbs_style)
-\end{alltt}
-This finds the next prime after $a$ that passes mp\_prime\_is\_prime() with $t$ tests.  Set $bbs\_style$ to one if you
-want only the next prime congruent to $3 \mbox{ mod } 4$, otherwise set it to zero to find any next prime.
-
-\section{Random Primes}
-\index{mp\_prime\_random}
-\begin{alltt}
-int mp_prime_random(mp_int *a, int t, int size, int bbs,
-                    ltm_prime_callback cb, void *dat)
-\end{alltt}
-This will find a prime greater than $256^{size}$ which can be ``bbs\_style'' or not depending on $bbs$ and must pass
-$t$ rounds of tests.  The ``ltm\_prime\_callback'' is a typedef for
-
-\begin{alltt}
-typedef int ltm_prime_callback(unsigned char *dst, int len, void *dat);
-\end{alltt}
-
-Which is a function that must read $len$ bytes (and return the amount stored) into $dst$.  The $dat$ variable is simply
-copied from the original input.  It can be used to pass RNG context data to the callback.  The function
-mp\_prime\_random() is more suitable for generating primes which must be secret (as in the case of RSA) since there
-is no skew on the least significant bits.
-
-\textit{Note:}  As of v0.30 of the LibTomMath library this function has been deprecated.  It is still available
-but users are encouraged to use the new mp\_prime\_random\_ex() function instead.
-
-\subsection{Extended Generation}
-\index{mp\_prime\_random\_ex}
-\begin{alltt}
-int mp_prime_random_ex(mp_int *a,    int t,
-                       int     size, int flags,
-                       ltm_prime_callback cb, void *dat);
-\end{alltt}
-This will generate a prime in $a$ using $t$ tests of the primality testing algorithms.  The variable $size$
-specifies the bit length of the prime desired.  The variable $flags$ specifies one of several options available
-(see fig. \ref{fig:primeopts}) which can be OR'ed together.  The callback parameters are used as in
-mp\_prime\_random().
-
-\begin{figure}[here]
-\begin{center}
-\begin{small}
-\begin{tabular}{|r|l|}
-\hline \textbf{Flag}         & \textbf{Meaning} \\
-\hline LTM\_PRIME\_BBS       & Make the prime congruent to $3$ modulo $4$ \\
-\hline LTM\_PRIME\_SAFE      & Make a prime $p$ such that $(p - 1)/2$ is also prime. \\
-                             & This option implies LTM\_PRIME\_BBS as well. \\
-\hline LTM\_PRIME\_2MSB\_OFF & Makes sure that the bit adjacent to the most significant bit \\
-                             & Is forced to zero.  \\
-\hline LTM\_PRIME\_2MSB\_ON  & Makes sure that the bit adjacent to the most significant bit \\
-                             & Is forced to one. \\
-\hline
-\end{tabular}
-\end{small}
-\end{center}
-\caption{Primality Generation Options}
-\label{fig:primeopts}
-\end{figure}
-
-\chapter{Input and Output}
-\section{ASCII Conversions}
-\subsection{To ASCII}
-\index{mp\_toradix}
-\begin{alltt}
-int mp_toradix (mp_int * a, char *str, int radix);
-\end{alltt}
-This still store $a$ in ``str'' as a base-``radix'' string of ASCII chars.  This function appends a NUL character
-to terminate the string.  Valid values of ``radix'' line in the range $[2, 64]$.  To determine the size (exact) required
-by the conversion before storing any data use the following function.
-
-\index{mp\_radix\_size}
-\begin{alltt}
-int mp_radix_size (mp_int * a, int radix, int *size)
-\end{alltt}
-This stores in ``size'' the number of characters (including space for the NUL terminator) required.  Upon error this
-function returns an error code and ``size'' will be zero.
-
-\subsection{From ASCII}
-\index{mp\_read\_radix}
-\begin{alltt}
-int mp_read_radix (mp_int * a, char *str, int radix);
-\end{alltt}
-This will read the base-``radix'' NUL terminated string from ``str'' into $a$.  It will stop reading when it reads a
-character it does not recognize (which happens to include th NUL char... imagine that...).  A single leading $-$ sign
-can be used to denote a negative number.
-
-\section{Binary Conversions}
-
-Converting an mp\_int to and from binary is another keen idea.
-
-\index{mp\_unsigned\_bin\_size}
-\begin{alltt}
-int mp_unsigned_bin_size(mp_int *a);
-\end{alltt}
-
-This will return the number of bytes (octets) required to store the unsigned copy of the integer $a$.
-
-\index{mp\_to\_unsigned\_bin}
-\begin{alltt}
-int mp_to_unsigned_bin(mp_int *a, unsigned char *b);
-\end{alltt}
-This will store $a$ into the buffer $b$ in big--endian format.  Fortunately this is exactly what DER (or is it ASN?)
-requires.  It does not store the sign of the integer.
-
-\index{mp\_read\_unsigned\_bin}
-\begin{alltt}
-int mp_read_unsigned_bin(mp_int *a, unsigned char *b, int c);
-\end{alltt}
-This will read in an unsigned big--endian array of bytes (octets) from $b$ of length $c$ into $a$.  The resulting
-integer $a$ will always be positive.
-
-For those who acknowledge the existence of negative numbers (heretic!) there are ``signed'' versions of the
-previous functions.
-
-\begin{alltt}
-int mp_signed_bin_size(mp_int *a);
-int mp_read_signed_bin(mp_int *a, unsigned char *b, int c);
-int mp_to_signed_bin(mp_int *a, unsigned char *b);
-\end{alltt}
-They operate essentially the same as the unsigned copies except they prefix the data with zero or non--zero
-byte depending on the sign.  If the sign is zpos (e.g. not negative) the prefix is zero, otherwise the prefix
-is non--zero.
-
-\chapter{Algebraic Functions}
-\section{Extended Euclidean Algorithm}
-\index{mp\_exteuclid}
-\begin{alltt}
-int mp_exteuclid(mp_int *a, mp_int *b,
-                 mp_int *U1, mp_int *U2, mp_int *U3);
-\end{alltt}
-
-This finds the triple U1/U2/U3 using the Extended Euclidean algorithm such that the following equation holds.
-
-\begin{equation}
-a \cdot U1 + b \cdot U2 = U3
-\end{equation}
-
-Any of the U1/U2/U3 paramters can be set to \textbf{NULL} if they are not desired.
-
-\section{Greatest Common Divisor}
-\index{mp\_gcd}
-\begin{alltt}
-int mp_gcd (mp_int * a, mp_int * b, mp_int * c)
-\end{alltt}
-This will compute the greatest common divisor of $a$ and $b$ and store it in $c$.
-
-\section{Least Common Multiple}
-\index{mp\_lcm}
-\begin{alltt}
-int mp_lcm (mp_int * a, mp_int * b, mp_int * c)
-\end{alltt}
-This will compute the least common multiple of $a$ and $b$ and store it in $c$.
-
-\section{Jacobi Symbol}
-\index{mp\_jacobi}
-\begin{alltt}
-int mp_jacobi (mp_int * a, mp_int * p, int *c)
-\end{alltt}
-This will compute the Jacobi symbol for $a$ with respect to $p$.  If $p$ is prime this essentially computes the Legendre
-symbol.  The result is stored in $c$ and can take on one of three values $\lbrace -1, 0, 1 \rbrace$.  If $p$ is prime
-then the result will be $-1$ when $a$ is not a quadratic residue modulo $p$.  The result will be $0$ if $a$ divides $p$
-and the result will be $1$ if $a$ is a quadratic residue modulo $p$.
-
-\section{Modular square root}
-\index{mp\_sqrtmod\_prime}
-\begin{alltt}
-int mp_sqrtmod_prime(mp_int *n, mp_int *p, mp_int *r)
-\end{alltt}
-
-This will solve the modular equatioon $r^2 = n \mod p$ where $p$ is a prime number greater than 2 (odd prime).
-The result is returned in the third argument $r$, the function returns \textbf{MP\_OKAY} on success,
-other return values indicate failure.
-
-The implementation is split for two different cases:
-
-1. if $p \mod 4 == 3$ we apply \href{http://cacr.uwaterloo.ca/hac/}{Handbook of Applied Cryptography algorithm 3.36} and compute $r$ directly as
-$r = n^{(p+1)/4} \mod p$
-
-2. otherwise we use \href{https://en.wikipedia.org/wiki/Tonelli-Shanks_algorithm}{Tonelli-Shanks algorithm}
-
-The function does not check the primality of parameter $p$ thus it is up to the caller to assure that this parameter
-is a prime number. When $p$ is a composite the function behaviour is undefined, it may even return a false-positive
-\textbf{MP\_OKAY}.
-
-\section{Modular Inverse}
-\index{mp\_invmod}
-\begin{alltt}
-int mp_invmod (mp_int * a, mp_int * b, mp_int * c)
-\end{alltt}
-Computes the multiplicative inverse of $a$ modulo $b$ and stores the result in $c$ such that $ac \equiv 1 \mbox{ (mod }b\mbox{)}$.
-
-\section{Single Digit Functions}
-
-For those using small numbers (\textit{snicker snicker}) there are several ``helper'' functions
-
-\index{mp\_add\_d} \index{mp\_sub\_d} \index{mp\_mul\_d} \index{mp\_div\_d} \index{mp\_mod\_d}
-\begin{alltt}
-int mp_add_d(mp_int *a, mp_digit b, mp_int *c);
-int mp_sub_d(mp_int *a, mp_digit b, mp_int *c);
-int mp_mul_d(mp_int *a, mp_digit b, mp_int *c);
-int mp_div_d(mp_int *a, mp_digit b, mp_int *c, mp_digit *d);
-int mp_mod_d(mp_int *a, mp_digit b, mp_digit *c);
-\end{alltt}
-
-These work like the full mp\_int capable variants except the second parameter $b$ is a mp\_digit.  These
-functions fairly handy if you have to work with relatively small numbers since you will not have to allocate
-an entire mp\_int to store a number like $1$ or $2$.
-
-\input{bn.ind}
-
-\end{document}
diff --git a/doc/bn.tex b/doc/bn.tex
new file mode 100644
index 0000000..5804318
--- /dev/null
+++ b/doc/bn.tex
@@ -0,0 +1,1913 @@
+\documentclass[synpaper]{book}
+\usepackage{hyperref}
+\usepackage{makeidx}
+\usepackage{amssymb}
+\usepackage{color}
+\usepackage{alltt}
+\usepackage{graphicx}
+\usepackage{layout}
+\def\union{\cup}
+\def\intersect{\cap}
+\def\getsrandom{\stackrel{\rm R}{\gets}}
+\def\cross{\times}
+\def\cat{\hspace{0.5em} \| \hspace{0.5em}}
+\def\catn{$\|$}
+\def\divides{\hspace{0.3em} | \hspace{0.3em}}
+\def\nequiv{\not\equiv}
+\def\approx{\raisebox{0.2ex}{\mbox{\small $\sim$}}}
+\def\lcm{{\rm lcm}}
+\def\gcd{{\rm gcd}}
+\def\log{{\rm log}}
+\def\ord{{\rm ord}}
+\def\abs{{\mathit abs}}
+\def\rep{{\mathit rep}}
+\def\mod{{\mathit\ mod\ }}
+\renewcommand{\pmod}[1]{\ ({\rm mod\ }{#1})}
+\newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor}
+\newcommand{\ceil}[1]{\left\lceil{#1}\right\rceil}
+\def\Or{{\rm\ or\ }}
+\def\And{{\rm\ and\ }}
+\def\iff{\hspace{1em}\Longleftrightarrow\hspace{1em}}
+\def\implies{\Rightarrow}
+\def\undefined{{\rm ``undefined"}}
+\def\Proof{\vspace{1ex}\noindent {\bf Proof:}\hspace{1em}}
+\let\oldphi\phi
+\def\phi{\varphi}
+\def\Pr{{\rm Pr}}
+\newcommand{\str}[1]{{\mathbf{#1}}}
+\def\F{{\mathbb F}}
+\def\N{{\mathbb N}}
+\def\Z{{\mathbb Z}}
+\def\R{{\mathbb R}}
+\def\C{{\mathbb C}}
+\def\Q{{\mathbb Q}}
+\definecolor{DGray}{gray}{0.5}
+\newcommand{\emailaddr}[1]{\mbox{$<${#1}$>$}}
+\def\twiddle{\raisebox{0.3ex}{\mbox{\tiny $\sim$}}}
+\def\gap{\vspace{0.5ex}}
+\makeindex
+\begin{document}
+\frontmatter
+\pagestyle{empty}
+\title{LibTomMath User Manual \\ v1.0}
+\author{Tom St Denis \\ tstdenis82@gmail.com}
+\maketitle
+This text, the library and the accompanying textbook are all hereby placed in the public domain.  This book has been
+formatted for B5 [176x250] paper using the \LaTeX{} {\em book} macro package.
+
+\vspace{10cm}
+
+\begin{flushright}Open Source.  Open Academia.  Open Minds.
+
+\mbox{ }
+
+Tom St Denis,
+
+Ontario, Canada
+\end{flushright}
+
+\tableofcontents
+\listoffigures
+\mainmatter
+\pagestyle{headings}
+\chapter{Introduction}
+\section{What is LibTomMath?}
+LibTomMath is a library of source code which provides a series of efficient and carefully written functions for manipulating
+large integer numbers.  It was written in portable ISO C source code so that it will build on any platform with a conforming
+C compiler.
+
+In a nutshell the library was written from scratch with verbose comments to help instruct computer science students how
+to implement ``bignum'' math.  However, the resulting code has proven to be very useful.  It has been used by numerous
+universities, commercial and open source software developers.  It has been used on a variety of platforms ranging from
+Linux and Windows based x86 to ARM based Gameboys and PPC based MacOS machines.
+
+\section{License}
+As of the v0.25 the library source code has been placed in the public domain with every new release.  As of the v0.28
+release the textbook ``Implementing Multiple Precision Arithmetic'' has been placed in the public domain with every new
+release as well.  This textbook is meant to compliment the project by providing a more solid walkthrough of the development
+algorithms used in the library.
+
+Since both\footnote{Note that the MPI files under mtest/ are copyrighted by Michael Fromberger.  They are not required to use LibTomMath.} are in the
+public domain everyone is entitled to do with them as they see fit.
+
+\section{Building LibTomMath}
+
+LibTomMath is meant to be very ``GCC friendly'' as it comes with a makefile well suited for GCC.  However, the library will
+also build in MSVC, Borland C out of the box.  For any other ISO C compiler a makefile will have to be made by the end
+developer.
+
+\subsection{Static Libraries}
+To build as a static library for GCC issue the following
+\begin{alltt}
+make
+\end{alltt}
+
+command.  This will build the library and archive the object files in ``libtommath.a''.  Now you link against
+that and include ``tommath.h'' within your programs.  Alternatively to build with MSVC issue the following
+\begin{alltt}
+nmake -f makefile.msvc
+\end{alltt}
+
+This will build the library and archive the object files in ``tommath.lib''.  This has been tested with MSVC
+version 6.00 with service pack 5.
+
+\subsection{Shared Libraries}
+To build as a shared library for GCC issue the following
+\begin{alltt}
+make -f makefile.shared
+\end{alltt}
+This requires the ``libtool'' package (common on most Linux/BSD systems).  It will build LibTomMath as both shared
+and static then install (by default) into /usr/lib as well as install the header files in /usr/include.  The shared
+library (resource) will be called ``libtommath.la'' while the static library called ``libtommath.a''.  Generally
+you use libtool to link your application against the shared object.
+
+There is limited support for making a ``DLL'' in windows via the ``makefile.cygwin\_dll'' makefile.  It requires
+Cygwin to work with since it requires the auto-export/import functionality.  The resulting DLL and import library
+``libtommath.dll.a'' can be used to link LibTomMath dynamically to any Windows program using Cygwin.
+
+\subsection{Testing}
+To build the library and the test harness type
+
+\begin{alltt}
+make test
+\end{alltt}
+
+This will build the library, ``test'' and ``mtest/mtest''.  The ``test'' program will accept test vectors and verify the
+results.  ``mtest/mtest'' will generate test vectors using the MPI library by Michael Fromberger\footnote{A copy of MPI
+is included in the package}.  Simply pipe mtest into test using
+
+\begin{alltt}
+mtest/mtest | test
+\end{alltt}
+
+If you do not have a ``/dev/urandom'' style RNG source you will have to write your own PRNG and simply pipe that into
+mtest.  For example, if your PRNG program is called ``myprng'' simply invoke
+
+\begin{alltt}
+myprng | mtest/mtest | test
+\end{alltt}
+
+This will output a row of numbers that are increasing.  Each column is a different test (such as addition, multiplication, etc)
+that is being performed.  The numbers represent how many times the test was invoked.  If an error is detected the program
+will exit with a dump of the relevent numbers it was working with.
+
+\section{Build Configuration}
+LibTomMath can configured at build time in three phases we shall call ``depends'', ``tweaks'' and ``trims''.
+Each phase changes how the library is built and they are applied one after another respectively.
+
+To make the system more powerful you can tweak the build process.  Classes are defined in the file
+``tommath\_superclass.h''.  By default, the symbol ``LTM\_ALL'' shall be defined which simply
+instructs the system to build all of the functions.  This is how LibTomMath used to be packaged.  This will give you
+access to every function LibTomMath offers.
+
+However, there are cases where such a build is not optional.  For instance, you want to perform RSA operations.  You
+don't need the vast majority of the library to perform these operations.  Aside from LTM\_ALL there is
+another pre--defined class ``SC\_RSA\_1'' which works in conjunction with the RSA from LibTomCrypt.  Additional
+classes can be defined base on the need of the user.
+
+\subsection{Build Depends}
+In the file tommath\_class.h you will see a large list of C ``defines'' followed by a series of ``ifdefs''
+which further define symbols.  All of the symbols (technically they're macros $\ldots$) represent a given C source
+file.  For instance, BN\_MP\_ADD\_C represents the file ``bn\_mp\_add.c''.  When a define has been enabled the
+function in the respective file will be compiled and linked into the library.  Accordingly when the define
+is absent the file will not be compiled and not contribute any size to the library.
+
+You will also note that the header tommath\_class.h is actually recursively included (it includes itself twice).
+This is to help resolve as many dependencies as possible.  In the last pass the symbol LTM\_LAST will be defined.
+This is useful for ``trims''.
+
+\subsection{Build Tweaks}
+A tweak is an algorithm ``alternative''.  For example, to provide tradeoffs (usually between size and space).
+They can be enabled at any pass of the configuration phase.
+
+\begin{small}
+\begin{center}
+\begin{tabular}{|l|l|}
+\hline \textbf{Define} & \textbf{Purpose} \\
+\hline BN\_MP\_DIV\_SMALL & Enables a slower, smaller and equally \\
+                          & functional mp\_div() function \\
+\hline
+\end{tabular}
+\end{center}
+\end{small}
+
+\subsection{Build Trims}
+A trim is a manner of removing functionality from a function that is not required.  For instance, to perform
+RSA cryptography you only require exponentiation with odd moduli so even moduli support can be safely removed.
+Build trims are meant to be defined on the last pass of the configuration which means they are to be defined
+only if LTM\_LAST has been defined.
+
+\subsubsection{Moduli Related}
+\begin{small}
+\begin{center}
+\begin{tabular}{|l|l|}
+\hline \textbf{Restriction} & \textbf{Undefine} \\
+\hline Exponentiation with odd moduli only & BN\_S\_MP\_EXPTMOD\_C \\
+                                           & BN\_MP\_REDUCE\_C \\
+                                           & BN\_MP\_REDUCE\_SETUP\_C \\
+                                           & BN\_S\_MP\_MUL\_HIGH\_DIGS\_C \\
+                                           & BN\_FAST\_S\_MP\_MUL\_HIGH\_DIGS\_C \\
+\hline Exponentiation with random odd moduli & (The above plus the following) \\
+                                           & BN\_MP\_REDUCE\_2K\_C \\
+                                           & BN\_MP\_REDUCE\_2K\_SETUP\_C \\
+                                           & BN\_MP\_REDUCE\_IS\_2K\_C \\
+                                           & BN\_MP\_DR\_IS\_MODULUS\_C \\
+                                           & BN\_MP\_DR\_REDUCE\_C \\
+                                           & BN\_MP\_DR\_SETUP\_C \\
+\hline Modular inverse odd moduli only     & BN\_MP\_INVMOD\_SLOW\_C \\
+\hline Modular inverse (both, smaller/slower) & BN\_FAST\_MP\_INVMOD\_C \\
+\hline
+\end{tabular}
+\end{center}
+\end{small}
+
+\subsubsection{Operand Size Related}
+\begin{small}
+\begin{center}
+\begin{tabular}{|l|l|}
+\hline \textbf{Restriction} & \textbf{Undefine} \\
+\hline Moduli $\le 2560$ bits              & BN\_MP\_MONTGOMERY\_REDUCE\_C \\
+                                           & BN\_S\_MP\_MUL\_DIGS\_C \\
+                                           & BN\_S\_MP\_MUL\_HIGH\_DIGS\_C \\
+                                           & BN\_S\_MP\_SQR\_C \\
+\hline Polynomial Schmolynomial            & BN\_MP\_KARATSUBA\_MUL\_C \\
+                                           & BN\_MP\_KARATSUBA\_SQR\_C \\
+                                           & BN\_MP\_TOOM\_MUL\_C \\
+                                           & BN\_MP\_TOOM\_SQR\_C \\
+
+\hline
+\end{tabular}
+\end{center}
+\end{small}
+
+
+\section{Purpose of LibTomMath}
+Unlike  GNU MP (GMP) Library, LIP, OpenSSL or various other commercial kits (Miracl), LibTomMath was not written with
+bleeding edge performance in mind.  First and foremost LibTomMath was written to be entirely open.  Not only is the
+source code public domain (unlike various other GPL/etc licensed code), not only is the code freely downloadable but the
+source code is also accessible for computer science students attempting to learn ``BigNum'' or multiple precision
+arithmetic techniques.
+
+LibTomMath was written to be an instructive collection of source code.  This is why there are many comments, only one
+function per source file and often I use a ``middle-road'' approach where I don't cut corners for an extra 2\% speed
+increase.
+
+Source code alone cannot really teach how the algorithms work which is why I also wrote a textbook that accompanies
+the library (beat that!).
+
+So you may be thinking ``should I use LibTomMath?'' and the answer is a definite maybe.  Let me tabulate what I think
+are the pros and cons of LibTomMath by comparing it to the math routines from GnuPG\footnote{GnuPG v1.2.3 versus LibTomMath v0.28}.
+
+\newpage\begin{figure}[here]
+\begin{small}
+\begin{center}
+\begin{tabular}{|l|c|c|l|}
+\hline \textbf{Criteria} & \textbf{Pro} & \textbf{Con} & \textbf{Notes} \\
+\hline Few lines of code per file & X & & GnuPG $ = 300.9$, LibTomMath  $ = 71.97$ \\
+\hline Commented function prototypes & X && GnuPG function names are cryptic. \\
+\hline Speed && X & LibTomMath is slower.  \\
+\hline Totally free & X & & GPL has unfavourable restrictions.\\
+\hline Large function base & X & & GnuPG is barebones. \\
+\hline Five modular reduction algorithms & X & & Faster modular exponentiation for a variety of moduli. \\
+\hline Portable & X & & GnuPG requires configuration to build. \\
+\hline
+\end{tabular}
+\end{center}
+\end{small}
+\caption{LibTomMath Valuation}
+\end{figure}
+
+It may seem odd to compare LibTomMath to GnuPG since the math in GnuPG is only a small portion of the entire application.
+However, LibTomMath was written with cryptography in mind.  It provides essentially all of the functions a cryptosystem
+would require when working with large integers.
+
+So it may feel tempting to just rip the math code out of GnuPG (or GnuMP where it was taken from originally) in your
+own application but I think there are reasons not to.  While LibTomMath is slower than libraries such as GnuMP it is
+not normally significantly slower.  On x86 machines the difference is normally a factor of two when performing modular
+exponentiations.  It depends largely on the processor, compiler and the moduli being used.
+
+Essentially the only time you wouldn't use LibTomMath is when blazing speed is the primary concern.  However,
+on the other side of the coin LibTomMath offers you a totally free (public domain) well structured math library
+that is very flexible, complete and performs well in resource contrained environments.  Fast RSA for example can
+be performed with as little as 8KB of ram for data (again depending on build options).
+
+\chapter{Getting Started with LibTomMath}
+\section{Building Programs}
+In order to use LibTomMath you must include ``tommath.h'' and link against the appropriate library file (typically
+libtommath.a).  There is no library initialization required and the entire library is thread safe.
+
+\section{Return Codes}
+There are three possible return codes a function may return.
+
+\index{MP\_OKAY}\index{MP\_YES}\index{MP\_NO}\index{MP\_VAL}\index{MP\_MEM}
+\begin{figure}[here!]
+\begin{center}
+\begin{small}
+\begin{tabular}{|l|l|}
+\hline \textbf{Code} & \textbf{Meaning} \\
+\hline MP\_OKAY & The function succeeded. \\
+\hline MP\_VAL  & The function input was invalid. \\
+\hline MP\_MEM  & Heap memory exhausted. \\
+\hline &\\
+\hline MP\_YES  & Response is yes. \\
+\hline MP\_NO   & Response is no. \\
+\hline
+\end{tabular}
+\end{small}
+\end{center}
+\caption{Return Codes}
+\end{figure}
+
+The last two codes listed are not actually ``return'ed'' by a function.  They are placed in an integer (the caller must
+provide the address of an integer it can store to) which the caller can access.  To convert one of the three return codes
+to a string use the following function.
+
+\index{mp\_error\_to\_string}
+\begin{alltt}
+char *mp_error_to_string(int code);
+\end{alltt}
+
+This will return a pointer to a string which describes the given error code.  It will not work for the return codes
+MP\_YES and MP\_NO.
+
+\section{Data Types}
+The basic ``multiple precision integer'' type is known as the ``mp\_int'' within LibTomMath.  This data type is used to
+organize all of the data required to manipulate the integer it represents.  Within LibTomMath it has been prototyped
+as the following.
+
+\index{mp\_int}
+\begin{alltt}
+typedef struct  \{
+    int used, alloc, sign;
+    mp_digit *dp;
+\} mp_int;
+\end{alltt}
+
+Where ``mp\_digit'' is a data type that represents individual digits of the integer.  By default, an mp\_digit is the
+ISO C ``unsigned long'' data type and each digit is $28-$bits long.  The mp\_digit type can be configured to suit other
+platforms by defining the appropriate macros.
+
+All LTM functions that use the mp\_int type will expect a pointer to mp\_int structure.  You must allocate memory to
+hold the structure itself by yourself (whether off stack or heap it doesn't matter).  The very first thing that must be
+done to use an mp\_int is that it must be initialized.
+
+\section{Function Organization}
+
+The arithmetic functions of the library are all organized to have the same style prototype.  That is source operands
+are passed on the left and the destination is on the right.  For instance,
+
+\begin{alltt}
+mp_add(&a, &b, &c);       /* c = a + b */
+mp_mul(&a, &a, &c);       /* c = a * a */
+mp_div(&a, &b, &c, &d);   /* c = [a/b], d = a mod b */
+\end{alltt}
+
+Another feature of the way the functions have been implemented is that source operands can be destination operands as well.
+For instance,
+
+\begin{alltt}
+mp_add(&a, &b, &b);       /* b = a + b */
+mp_div(&a, &b, &a, &c);   /* a = [a/b], c = a mod b */
+\end{alltt}
+
+This allows operands to be re-used which can make programming simpler.
+
+\section{Initialization}
+\subsection{Single Initialization}
+A single mp\_int can be initialized with the ``mp\_init'' function.
+
+\index{mp\_init}
+\begin{alltt}
+int mp_init (mp_int * a);
+\end{alltt}
+
+This function expects a pointer to an mp\_int structure and will initialize the members of the structure so the mp\_int
+represents the default integer which is zero.  If the functions returns MP\_OKAY then the mp\_int is ready to be used
+by the other LibTomMath functions.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   if ((result = mp_init(&number)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* use the number */
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+\subsection{Single Free}
+When you are finished with an mp\_int it is ideal to return the heap it used back to the system.  The following function
+provides this functionality.
+
+\index{mp\_clear}
+\begin{alltt}
+void mp_clear (mp_int * a);
+\end{alltt}
+
+The function expects a pointer to a previously initialized mp\_int structure and frees the heap it uses.  It sets the
+pointer\footnote{The ``dp'' member.} within the mp\_int to \textbf{NULL} which is used to prevent double free situations.
+Is is legal to call mp\_clear() twice on the same mp\_int in a row.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   if ((result = mp_init(&number)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* use the number */
+
+   /* We're done with it. */
+   mp_clear(&number);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+\subsection{Multiple Initializations}
+Certain algorithms require more than one large integer.  In these instances it is ideal to initialize all of the mp\_int
+variables in an ``all or nothing'' fashion.  That is, they are either all initialized successfully or they are all
+not initialized.
+
+The  mp\_init\_multi() function provides this functionality.
+
+\index{mp\_init\_multi} \index{mp\_clear\_multi}
+\begin{alltt}
+int mp_init_multi(mp_int *mp, ...);
+\end{alltt}
+
+It accepts a \textbf{NULL} terminated list of pointers to mp\_int structures.  It will attempt to initialize them all
+at once.  If the function returns MP\_OKAY then all of the mp\_int variables are ready to use, otherwise none of them
+are available for use.  A complementary mp\_clear\_multi() function allows multiple mp\_int variables to be free'd
+from the heap at the same time.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int num1, num2, num3;
+   int result;
+
+   if ((result = mp_init_multi(&num1,
+                               &num2,
+                               &num3, NULL)) != MP\_OKAY) \{
+      printf("Error initializing the numbers.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* use the numbers */
+
+   /* We're done with them. */
+   mp_clear_multi(&num1, &num2, &num3, NULL);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+\subsection{Other Initializers}
+To initialized and make a copy of an mp\_int the mp\_init\_copy() function has been provided.
+
+\index{mp\_init\_copy}
+\begin{alltt}
+int mp_init_copy (mp_int * a, mp_int * b);
+\end{alltt}
+
+This function will initialize $a$ and make it a copy of $b$ if all goes well.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int num1, num2;
+   int result;
+
+   /* initialize and do work on num1 ... */
+
+   /* We want a copy of num1 in num2 now */
+   if ((result = mp_init_copy(&num2, &num1)) != MP_OKAY) \{
+     printf("Error initializing the copy.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* now num2 is ready and contains a copy of num1 */
+
+   /* We're done with them. */
+   mp_clear_multi(&num1, &num2, NULL);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+Another less common initializer is mp\_init\_size() which allows the user to initialize an mp\_int with a given
+default number of digits.  By default, all initializers allocate \textbf{MP\_PREC} digits.  This function lets
+you override this behaviour.
+
+\index{mp\_init\_size}
+\begin{alltt}
+int mp_init_size (mp_int * a, int size);
+\end{alltt}
+
+The $size$ parameter must be greater than zero.  If the function succeeds the mp\_int $a$ will be initialized
+to have $size$ digits (which are all initially zero).
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   /* we need a 60-digit number */
+   if ((result = mp_init_size(&number, 60)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* use the number */
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+\section{Maintenance Functions}
+
+\subsection{Reducing Memory Usage}
+When an mp\_int is in a state where it won't be changed again\footnote{A Diffie-Hellman modulus for instance.} excess
+digits can be removed to return memory to the heap with the mp\_shrink() function.
+
+\index{mp\_shrink}
+\begin{alltt}
+int mp_shrink (mp_int * a);
+\end{alltt}
+
+This will remove excess digits of the mp\_int $a$.  If the operation fails the mp\_int should be intact without the
+excess digits being removed.  Note that you can use a shrunk mp\_int in further computations, however, such operations
+will require heap operations which can be slow.  It is not ideal to shrink mp\_int variables that you will further
+modify in the system (unless you are seriously low on memory).
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   if ((result = mp_init(&number)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* use the number [e.g. pre-computation]  */
+
+   /* We're done with it for now. */
+   if ((result = mp_shrink(&number)) != MP_OKAY) \{
+      printf("Error shrinking the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* use it .... */
+
+
+   /* we're done with it. */
+   mp_clear(&number);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+\subsection{Adding additional digits}
+
+Within the mp\_int structure are two parameters which control the limitations of the array of digits that represent
+the integer the mp\_int is meant to equal.   The \textit{used} parameter dictates how many digits are significant, that is,
+contribute to the value of the mp\_int.  The \textit{alloc} parameter dictates how many digits are currently available in
+the array.  If you need to perform an operation that requires more digits you will have to mp\_grow() the mp\_int to
+your desired size.
+
+\index{mp\_grow}
+\begin{alltt}
+int mp_grow (mp_int * a, int size);
+\end{alltt}
+
+This will grow the array of digits of $a$ to $size$.  If the \textit{alloc} parameter is already bigger than
+$size$ the function will not do anything.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   if ((result = mp_init(&number)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* use the number */
+
+   /* We need to add 20 digits to the number  */
+   if ((result = mp_grow(&number, number.alloc + 20)) != MP_OKAY) \{
+      printf("Error growing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+
+   /* use the number */
+
+   /* we're done with it. */
+   mp_clear(&number);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+\chapter{Basic Operations}
+\section{Small Constants}
+Setting mp\_ints to small constants is a relatively common operation.  To accomodate these instances there are two
+small constant assignment functions.  The first function is used to set a single digit constant while the second sets
+an ISO C style ``unsigned long'' constant.  The reason for both functions is efficiency.  Setting a single digit is quick but the
+domain of a digit can change (it's always at least $0 \ldots 127$).
+
+\subsection{Single Digit}
+
+Setting a single digit can be accomplished with the following function.
+
+\index{mp\_set}
+\begin{alltt}
+void mp_set (mp_int * a, mp_digit b);
+\end{alltt}
+
+This will zero the contents of $a$ and make it represent an integer equal to the value of $b$.  Note that this
+function has a return type of \textbf{void}.  It cannot cause an error so it is safe to assume the function
+succeeded.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   if ((result = mp_init(&number)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* set the number to 5 */
+   mp_set(&number, 5);
+
+   /* we're done with it. */
+   mp_clear(&number);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+\subsection{Long Constants}
+
+To set a constant that is the size of an ISO C ``unsigned long'' and larger than a single digit the following function
+can be used.
+
+\index{mp\_set\_int}
+\begin{alltt}
+int mp_set_int (mp_int * a, unsigned long b);
+\end{alltt}
+
+This will assign the value of the 32-bit variable $b$ to the mp\_int $a$.  Unlike mp\_set() this function will always
+accept a 32-bit input regardless of the size of a single digit.  However, since the value may span several digits
+this function can fail if it runs out of heap memory.
+
+To get the ``unsigned long'' copy of an mp\_int the following function can be used.
+
+\index{mp\_get\_int}
+\begin{alltt}
+unsigned long mp_get_int (mp_int * a);
+\end{alltt}
+
+This will return the 32 least significant bits of the mp\_int $a$.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   if ((result = mp_init(&number)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* set the number to 654321 (note this is bigger than 127) */
+   if ((result = mp_set_int(&number, 654321)) != MP_OKAY) \{
+      printf("Error setting the value of the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   printf("number == \%lu", mp_get_int(&number));
+
+   /* we're done with it. */
+   mp_clear(&number);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+This should output the following if the program succeeds.
+
+\begin{alltt}
+number == 654321
+\end{alltt}
+
+\subsection{Long Constants - platform dependant}
+
+\index{mp\_set\_long}
+\begin{alltt}
+int mp_set_long (mp_int * a, unsigned long b);
+\end{alltt}
+
+This will assign the value of the platform-dependant sized variable $b$ to the mp\_int $a$.
+
+To get the ``unsigned long'' copy of an mp\_int the following function can be used.
+
+\index{mp\_get\_long}
+\begin{alltt}
+unsigned long mp_get_long (mp_int * a);
+\end{alltt}
+
+This will return the least significant bits of the mp\_int $a$ that fit into an ``unsigned long''.
+
+\subsection{Long Long Constants}
+
+\index{mp\_set\_long\_long}
+\begin{alltt}
+int mp_set_long_long (mp_int * a, unsigned long long b);
+\end{alltt}
+
+This will assign the value of the 64-bit variable $b$ to the mp\_int $a$.
+
+To get the ``unsigned long long'' copy of an mp\_int the following function can be used.
+
+\index{mp\_get\_long\_long}
+\begin{alltt}
+unsigned long long mp_get_long_long (mp_int * a);
+\end{alltt}
+
+This will return the 64 least significant bits of the mp\_int $a$.
+
+\subsection{Initialize and Setting Constants}
+To both initialize and set small constants the following two functions are available.
+\index{mp\_init\_set} \index{mp\_init\_set\_int}
+\begin{alltt}
+int mp_init_set (mp_int * a, mp_digit b);
+int mp_init_set_int (mp_int * a, unsigned long b);
+\end{alltt}
+
+Both functions work like the previous counterparts except they first mp\_init $a$ before setting the values.
+
+\begin{alltt}
+int main(void)
+\{
+   mp_int number1, number2;
+   int    result;
+
+   /* initialize and set a single digit */
+   if ((result = mp_init_set(&number1, 100)) != MP_OKAY) \{
+      printf("Error setting number1: \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* initialize and set a long */
+   if ((result = mp_init_set_int(&number2, 1023)) != MP_OKAY) \{
+      printf("Error setting number2: \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* display */
+   printf("Number1, Number2 == \%lu, \%lu",
+          mp_get_int(&number1), mp_get_int(&number2));
+
+   /* clear */
+   mp_clear_multi(&number1, &number2, NULL);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt}
+
+If this program succeeds it shall output.
+\begin{alltt}
+Number1, Number2 == 100, 1023
+\end{alltt}
+
+\section{Comparisons}
+
+Comparisons in LibTomMath are always performed in a ``left to right'' fashion.  There are three possible return codes
+for any comparison.
+
+\index{MP\_GT} \index{MP\_EQ} \index{MP\_LT}
+\begin{figure}[here]
+\begin{center}
+\begin{tabular}{|c|c|}
+\hline \textbf{Result Code} & \textbf{Meaning} \\
+\hline MP\_GT & $a > b$ \\
+\hline MP\_EQ & $a = b$ \\
+\hline MP\_LT & $a < b$ \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Comparison Codes for $a, b$}
+\label{fig:CMP}
+\end{figure}
+
+In figure \ref{fig:CMP} two integers $a$ and $b$ are being compared.  In this case $a$ is said to be ``to the left'' of
+$b$.
+
+\subsection{Unsigned comparison}
+
+An unsigned comparison considers only the digits themselves and not the associated \textit{sign} flag of the
+mp\_int structures.  This is analogous to an absolute comparison.  The function mp\_cmp\_mag() will compare two
+mp\_int variables based on their digits only.
+
+\index{mp\_cmp\_mag}
+\begin{alltt}
+int mp_cmp_mag(mp_int * a, mp_int * b);
+\end{alltt}
+This will compare $a$ to $b$ placing $a$ to the left of $b$.  This function cannot fail and will return one of the
+three compare codes listed in figure \ref{fig:CMP}.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number1, number2;
+   int result;
+
+   if ((result = mp_init_multi(&number1, &number2, NULL)) != MP_OKAY) \{
+      printf("Error initializing the numbers.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* set the number1 to 5 */
+   mp_set(&number1, 5);
+
+   /* set the number2 to -6 */
+   mp_set(&number2, 6);
+   if ((result = mp_neg(&number2, &number2)) != MP_OKAY) \{
+      printf("Error negating number2.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   switch(mp_cmp_mag(&number1, &number2)) \{
+       case MP_GT:  printf("|number1| > |number2|"); break;
+       case MP_EQ:  printf("|number1| = |number2|"); break;
+       case MP_LT:  printf("|number1| < |number2|"); break;
+   \}
+
+   /* we're done with it. */
+   mp_clear_multi(&number1, &number2, NULL);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+If this program\footnote{This function uses the mp\_neg() function which is discussed in section \ref{sec:NEG}.} completes
+successfully it should print the following.
+
+\begin{alltt}
+|number1| < |number2|
+\end{alltt}
+
+This is because $\vert -6 \vert = 6$ and obviously $5 < 6$.
+
+\subsection{Signed comparison}
+
+To compare two mp\_int variables based on their signed value the mp\_cmp() function is provided.
+
+\index{mp\_cmp}
+\begin{alltt}
+int mp_cmp(mp_int * a, mp_int * b);
+\end{alltt}
+
+This will compare $a$ to the left of $b$.  It will first compare the signs of the two mp\_int variables.  If they
+differ it will return immediately based on their signs.  If the signs are equal then it will compare the digits
+individually.  This function will return one of the compare conditions codes listed in figure \ref{fig:CMP}.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number1, number2;
+   int result;
+
+   if ((result = mp_init_multi(&number1, &number2, NULL)) != MP_OKAY) \{
+      printf("Error initializing the numbers.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* set the number1 to 5 */
+   mp_set(&number1, 5);
+
+   /* set the number2 to -6 */
+   mp_set(&number2, 6);
+   if ((result = mp_neg(&number2, &number2)) != MP_OKAY) \{
+      printf("Error negating number2.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   switch(mp_cmp(&number1, &number2)) \{
+       case MP_GT:  printf("number1 > number2"); break;
+       case MP_EQ:  printf("number1 = number2"); break;
+       case MP_LT:  printf("number1 < number2"); break;
+   \}
+
+   /* we're done with it. */
+   mp_clear_multi(&number1, &number2, NULL);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+If this program\footnote{This function uses the mp\_neg() function which is discussed in section \ref{sec:NEG}.} completes
+successfully it should print the following.
+
+\begin{alltt}
+number1 > number2
+\end{alltt}
+
+\subsection{Single Digit}
+
+To compare a single digit against an mp\_int the following function has been provided.
+
+\index{mp\_cmp\_d}
+\begin{alltt}
+int mp_cmp_d(mp_int * a, mp_digit b);
+\end{alltt}
+
+This will compare $a$ to the left of $b$ using a signed comparison.  Note that it will always treat $b$ as
+positive.  This function is rather handy when you have to compare against small values such as $1$ (which often
+comes up in cryptography).  The function cannot fail and will return one of the tree compare condition codes
+listed in figure \ref{fig:CMP}.
+
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   if ((result = mp_init(&number)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* set the number to 5 */
+   mp_set(&number, 5);
+
+   switch(mp_cmp_d(&number, 7)) \{
+       case MP_GT:  printf("number > 7"); break;
+       case MP_EQ:  printf("number = 7"); break;
+       case MP_LT:  printf("number < 7"); break;
+   \}
+
+   /* we're done with it. */
+   mp_clear(&number);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+If this program functions properly it will print out the following.
+
+\begin{alltt}
+number < 7
+\end{alltt}
+
+\section{Logical Operations}
+
+Logical operations are operations that can be performed either with simple shifts or boolean operators such as
+AND, XOR and OR directly.  These operations are very quick.
+
+\subsection{Multiplication by two}
+
+Multiplications and divisions by any power of two can be performed with quick logical shifts either left or
+right depending on the operation.
+
+When multiplying or dividing by two a special case routine can be used which are as follows.
+\index{mp\_mul\_2} \index{mp\_div\_2}
+\begin{alltt}
+int mp_mul_2(mp_int * a, mp_int * b);
+int mp_div_2(mp_int * a, mp_int * b);
+\end{alltt}
+
+The former will assign twice $a$ to $b$ while the latter will assign half $a$ to $b$.  These functions are fast
+since the shift counts and maskes are hardcoded into the routines.
+
+\begin{small} \begin{alltt}
+int main(void)
+\{
+   mp_int number;
+   int result;
+
+   if ((result = mp_init(&number)) != MP_OKAY) \{
+      printf("Error initializing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* set the number to 5 */
+   mp_set(&number, 5);
+
+   /* multiply by two */
+   if ((result = mp\_mul\_2(&number, &number)) != MP_OKAY) \{
+      printf("Error multiplying the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+   switch(mp_cmp_d(&number, 7)) \{
+       case MP_GT:  printf("2*number > 7"); break;
+       case MP_EQ:  printf("2*number = 7"); break;
+       case MP_LT:  printf("2*number < 7"); break;
+   \}
+
+   /* now divide by two */
+   if ((result = mp\_div\_2(&number, &number)) != MP_OKAY) \{
+      printf("Error dividing the number.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+   switch(mp_cmp_d(&number, 7)) \{
+       case MP_GT:  printf("2*number/2 > 7"); break;
+       case MP_EQ:  printf("2*number/2 = 7"); break;
+       case MP_LT:  printf("2*number/2 < 7"); break;
+   \}
+
+   /* we're done with it. */
+   mp_clear(&number);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt} \end{small}
+
+If this program is successful it will print out the following text.
+
+\begin{alltt}
+2*number > 7
+2*number/2 < 7
+\end{alltt}
+
+Since $10 > 7$ and $5 < 7$.
+
+To multiply by a power of two the following function can be used.
+
+\index{mp\_mul\_2d}
+\begin{alltt}
+int mp_mul_2d(mp_int * a, int b, mp_int * c);
+\end{alltt}
+
+This will multiply $a$ by $2^b$ and store the result in ``c''.  If the value of $b$ is less than or equal to
+zero the function will copy $a$ to ``c'' without performing any further actions.  The multiplication itself
+is implemented as a right-shift operation of $a$ by $b$ bits.
+
+To divide by a power of two use the following.
+
+\index{mp\_div\_2d}
+\begin{alltt}
+int mp_div_2d (mp_int * a, int b, mp_int * c, mp_int * d);
+\end{alltt}
+Which will divide $a$ by $2^b$, store the quotient in ``c'' and the remainder in ``d'.  If $b \le 0$ then the
+function simply copies $a$ over to ``c'' and zeroes $d$.  The variable $d$ may be passed as a \textbf{NULL}
+value to signal that the remainder is not desired.  The division itself is implemented as a left-shift
+operation of $a$ by $b$ bits.
+
+\subsection{Polynomial Basis Operations}
+
+Strictly speaking the organization of the integers within the mp\_int structures is what is known as a
+``polynomial basis''.  This simply means a field element is stored by divisions of a radix.  For example, if
+$f(x) = \sum_{i=0}^{k} y_ix^k$ for any vector $\vec y$ then the array of digits in $\vec y$ are said to be
+the polynomial basis representation of $z$ if $f(\beta) = z$ for a given radix $\beta$.
+
+To multiply by the polynomial $g(x) = x$ all you have todo is shift the digits of the basis left one place.  The
+following function provides this operation.
+
+\index{mp\_lshd}
+\begin{alltt}
+int mp_lshd (mp_int * a, int b);
+\end{alltt}
+
+This will multiply $a$ in place by $x^b$ which is equivalent to shifting the digits left $b$ places and inserting zeroes
+in the least significant digits.  Similarly to divide by a power of $x$ the following function is provided.
+
+\index{mp\_rshd}
+\begin{alltt}
+void mp_rshd (mp_int * a, int b)
+\end{alltt}
+This will divide $a$ in place by $x^b$ and discard the remainder.  This function cannot fail as it performs the operations
+in place and no new digits are required to complete it.
+
+\subsection{AND, OR and XOR Operations}
+
+While AND, OR and XOR operations are not typical ``bignum functions'' they can be useful in several instances.  The
+three functions are prototyped as follows.
+
+\index{mp\_or} \index{mp\_and} \index{mp\_xor}
+\begin{alltt}
+int mp_or  (mp_int * a, mp_int * b, mp_int * c);
+int mp_and (mp_int * a, mp_int * b, mp_int * c);
+int mp_xor (mp_int * a, mp_int * b, mp_int * c);
+\end{alltt}
+
+Which compute $c = a \odot b$ where $\odot$ is one of OR, AND or XOR.
+
+\section{Addition and Subtraction}
+
+To compute an addition or subtraction the following two functions can be used.
+
+\index{mp\_add} \index{mp\_sub}
+\begin{alltt}
+int mp_add (mp_int * a, mp_int * b, mp_int * c);
+int mp_sub (mp_int * a, mp_int * b, mp_int * c)
+\end{alltt}
+
+Which perform $c = a \odot b$ where $\odot$ is one of signed addition or subtraction.  The operations are fully sign
+aware.
+
+\section{Sign Manipulation}
+\subsection{Negation}
+\label{sec:NEG}
+Simple integer negation can be performed with the following.
+
+\index{mp\_neg}
+\begin{alltt}
+int mp_neg (mp_int * a, mp_int * b);
+\end{alltt}
+
+Which assigns $-a$ to $b$.
+
+\subsection{Absolute}
+Simple integer absolutes can be performed with the following.
+
+\index{mp\_neg}
+\begin{alltt}
+int mp_abs (mp_int * a, mp_int * b);
+\end{alltt}
+
+Which assigns $\vert a \vert$ to $b$.
+
+\section{Integer Division and Remainder}
+To perform a complete and general integer division with remainder use the following function.
+
+\index{mp\_div}
+\begin{alltt}
+int mp_div (mp_int * a, mp_int * b, mp_int * c, mp_int * d);
+\end{alltt}
+
+This divides $a$ by $b$ and stores the quotient in $c$ and $d$.  The signed quotient is computed such that
+$bc + d = a$.  Note that either of $c$ or $d$ can be set to \textbf{NULL} if their value is not required.  If
+$b$ is zero the function returns \textbf{MP\_VAL}.
+
+
+\chapter{Multiplication and Squaring}
+\section{Multiplication}
+A full signed integer multiplication can be performed with the following.
+\index{mp\_mul}
+\begin{alltt}
+int mp_mul (mp_int * a, mp_int * b, mp_int * c);
+\end{alltt}
+Which assigns the full signed product $ab$ to $c$.  This function actually breaks into one of four cases which are
+specific multiplication routines optimized for given parameters.  First there are the Toom-Cook multiplications which
+should only be used with very large inputs.  This is followed by the Karatsuba multiplications which are for moderate
+sized inputs.  Then followed by the Comba and baseline multipliers.
+
+Fortunately for the developer you don't really need to know this unless you really want to fine tune the system.  mp\_mul()
+will determine on its own\footnote{Some tweaking may be required.} what routine to use automatically when it is called.
+
+\begin{alltt}
+int main(void)
+\{
+   mp_int number1, number2;
+   int result;
+
+   /* Initialize the numbers */
+   if ((result = mp_init_multi(&number1,
+                               &number2, NULL)) != MP_OKAY) \{
+      printf("Error initializing the numbers.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* set the terms */
+   if ((result = mp_set_int(&number, 257)) != MP_OKAY) \{
+      printf("Error setting number1.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   if ((result = mp_set_int(&number2, 1023)) != MP_OKAY) \{
+      printf("Error setting number2.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* multiply them */
+   if ((result = mp_mul(&number1, &number2,
+                        &number1)) != MP_OKAY) \{
+      printf("Error multiplying terms.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* display */
+   printf("number1 * number2 == \%lu", mp_get_int(&number1));
+
+   /* free terms and return */
+   mp_clear_multi(&number1, &number2, NULL);
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt}
+
+If this program succeeds it shall output the following.
+
+\begin{alltt}
+number1 * number2 == 262911
+\end{alltt}
+
+\section{Squaring}
+Since squaring can be performed faster than multiplication it is performed it's own function instead of just using
+mp\_mul().
+
+\index{mp\_sqr}
+\begin{alltt}
+int mp_sqr (mp_int * a, mp_int * b);
+\end{alltt}
+
+Will square $a$ and store it in $b$.  Like the case of multiplication there are four different squaring
+algorithms all which can be called from mp\_sqr().  It is ideal to use mp\_sqr over mp\_mul when squaring terms because
+of the speed difference.
+
+\section{Tuning Polynomial Basis Routines}
+
+Both of the Toom-Cook and Karatsuba multiplication algorithms are faster than the traditional $O(n^2)$ approach that
+the Comba and baseline algorithms use.  At $O(n^{1.464973})$ and $O(n^{1.584962})$ running times respectively they require
+considerably less work.  For example, a 10000-digit multiplication would take roughly 724,000 single precision
+multiplications with Toom-Cook or 100,000,000 single precision multiplications with the standard Comba (a factor
+of 138).
+
+So why not always use Karatsuba or Toom-Cook?   The simple answer is that they have so much overhead that they're not
+actually faster than Comba until you hit distinct  ``cutoff'' points.  For Karatsuba with the default configuration,
+GCC 3.3.1 and an Athlon XP processor the cutoff point is roughly 110 digits (about 70 for the Intel P4).  That is, at
+110 digits Karatsuba and Comba multiplications just about break even and for 110+ digits Karatsuba is faster.
+
+Toom-Cook has incredible overhead and is probably only useful for very large inputs.  So far no known cutoff points
+exist and for the most part I just set the cutoff points very high to make sure they're not called.
+
+A demo program in the ``etc/'' directory of the project called ``tune.c'' can be used to find the cutoff points.  This
+can be built with GCC as follows
+
+\begin{alltt}
+make XXX
+\end{alltt}
+Where ``XXX'' is one of the following entries from the table \ref{fig:tuning}.
+
+\begin{figure}[here]
+\begin{center}
+\begin{small}
+\begin{tabular}{|l|l|}
+\hline \textbf{Value of XXX} & \textbf{Meaning} \\
+\hline tune & Builds portable tuning application \\
+\hline tune86 & Builds x86 (pentium and up) program for COFF \\
+\hline tune86c & Builds x86 program for Cygwin \\
+\hline tune86l & Builds x86 program for Linux (ELF format) \\
+\hline
+\end{tabular}
+\end{small}
+\end{center}
+\caption{Build Names for Tuning Programs}
+\label{fig:tuning}
+\end{figure}
+
+When the program is running it will output a series of measurements for different cutoff points.  It will first find
+good Karatsuba squaring and multiplication points.  Then it proceeds to find Toom-Cook points.  Note that the Toom-Cook
+tuning takes a very long time as the cutoff points are likely to be very high.
+
+\chapter{Modular Reduction}
+
+Modular reduction is process of taking the remainder of one quantity divided by another.  Expressed
+as (\ref{eqn:mod}) the modular reduction is equivalent to the remainder of $b$ divided by $c$.
+
+\begin{equation}
+a \equiv b \mbox{ (mod }c\mbox{)}
+\label{eqn:mod}
+\end{equation}
+
+Of particular interest to cryptography are reductions where $b$ is limited to the range $0 \le b < c^2$ since particularly
+fast reduction algorithms can be written for the limited range.
+
+Note that one of the four optimized reduction algorithms are automatically chosen in the modular exponentiation
+algorithm mp\_exptmod when an appropriate modulus is detected.
+
+\section{Straight Division}
+In order to effect an arbitrary modular reduction the following algorithm is provided.
+
+\index{mp\_mod}
+\begin{alltt}
+int mp_mod(mp_int *a, mp_int *b, mp_int *c);
+\end{alltt}
+
+This reduces $a$ modulo $b$ and stores the result in $c$.  The sign of $c$ shall agree with the sign
+of $b$.  This algorithm accepts an input $a$ of any range and is not limited by $0 \le a < b^2$.
+
+\section{Barrett Reduction}
+
+Barrett reduction is a generic optimized reduction algorithm that requires pre--computation to achieve
+a decent speedup over straight division.  First a $\mu$ value must be precomputed with the following function.
+
+\index{mp\_reduce\_setup}
+\begin{alltt}
+int mp_reduce_setup(mp_int *a, mp_int *b);
+\end{alltt}
+
+Given a modulus in $b$ this produces the required $\mu$ value in $a$.  For any given modulus this only has to
+be computed once.  Modular reduction can now be performed with the following.
+
+\index{mp\_reduce}
+\begin{alltt}
+int mp_reduce(mp_int *a, mp_int *b, mp_int *c);
+\end{alltt}
+
+This will reduce $a$ in place modulo $b$ with the precomputed $\mu$ value in $c$.  $a$ must be in the range
+$0 \le a < b^2$.
+
+\begin{alltt}
+int main(void)
+\{
+   mp_int   a, b, c, mu;
+   int      result;
+
+   /* initialize a,b to desired values, mp_init mu,
+    * c and set c to 1...we want to compute a^3 mod b
+    */
+
+   /* get mu value */
+   if ((result = mp_reduce_setup(&mu, b)) != MP_OKAY) \{
+      printf("Error getting mu.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* square a to get c = a^2 */
+   if ((result = mp_sqr(&a, &c)) != MP_OKAY) \{
+      printf("Error squaring.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* now reduce `c' modulo b */
+   if ((result = mp_reduce(&c, &b, &mu)) != MP_OKAY) \{
+      printf("Error reducing.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* multiply a to get c = a^3 */
+   if ((result = mp_mul(&a, &c, &c)) != MP_OKAY) \{
+      printf("Error reducing.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* now reduce `c' modulo b  */
+   if ((result = mp_reduce(&c, &b, &mu)) != MP_OKAY) \{
+      printf("Error reducing.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* c now equals a^3 mod b */
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt}
+
+This program will calculate $a^3 \mbox{ mod }b$ if all the functions succeed.
+
+\section{Montgomery Reduction}
+
+Montgomery is a specialized reduction algorithm for any odd moduli.  Like Barrett reduction a pre--computation
+step is required.  This is accomplished with the following.
+
+\index{mp\_montgomery\_setup}
+\begin{alltt}
+int mp_montgomery_setup(mp_int *a, mp_digit *mp);
+\end{alltt}
+
+For the given odd moduli $a$ the precomputation value is placed in $mp$.  The reduction is computed with the
+following.
+
+\index{mp\_montgomery\_reduce}
+\begin{alltt}
+int mp_montgomery_reduce(mp_int *a, mp_int *m, mp_digit mp);
+\end{alltt}
+This reduces $a$ in place modulo $m$ with the pre--computed value $mp$.   $a$ must be in the range
+$0 \le a < b^2$.
+
+Montgomery reduction is faster than Barrett reduction for moduli smaller than the ``comba'' limit.  With the default
+setup for instance, the limit is $127$ digits ($3556$--bits).   Note that this function is not limited to
+$127$ digits just that it falls back to a baseline algorithm after that point.
+
+An important observation is that this reduction does not return $a \mbox{ mod }m$ but $aR^{-1} \mbox{ mod }m$
+where $R = \beta^n$, $n$ is the n number of digits in $m$ and $\beta$ is radix used (default is $2^{28}$).
+
+To quickly calculate $R$ the following function was provided.
+
+\index{mp\_montgomery\_calc\_normalization}
+\begin{alltt}
+int mp_montgomery_calc_normalization(mp_int *a, mp_int *b);
+\end{alltt}
+Which calculates $a = R$ for the odd moduli $b$ without using multiplication or division.
+
+The normal modus operandi for Montgomery reductions is to normalize the integers before entering the system.  For
+example, to calculate $a^3 \mbox { mod }b$ using Montgomery reduction the value of $a$ can be normalized by
+multiplying it by $R$.  Consider the following code snippet.
+
+\begin{alltt}
+int main(void)
+\{
+   mp_int   a, b, c, R;
+   mp_digit mp;
+   int      result;
+
+   /* initialize a,b to desired values,
+    * mp_init R, c and set c to 1....
+    */
+
+   /* get normalization */
+   if ((result = mp_montgomery_calc_normalization(&R, b)) != MP_OKAY) \{
+      printf("Error getting norm.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* get mp value */
+   if ((result = mp_montgomery_setup(&c, &mp)) != MP_OKAY) \{
+      printf("Error setting up montgomery.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* normalize `a' so now a is equal to aR */
+   if ((result = mp_mulmod(&a, &R, &b, &a)) != MP_OKAY) \{
+      printf("Error computing aR.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* square a to get c = a^2R^2 */
+   if ((result = mp_sqr(&a, &c)) != MP_OKAY) \{
+      printf("Error squaring.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* now reduce `c' back down to c = a^2R^2 * R^-1 == a^2R */
+   if ((result = mp_montgomery_reduce(&c, &b, mp)) != MP_OKAY) \{
+      printf("Error reducing.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* multiply a to get c = a^3R^2 */
+   if ((result = mp_mul(&a, &c, &c)) != MP_OKAY) \{
+      printf("Error reducing.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* now reduce `c' back down to c = a^3R^2 * R^-1 == a^3R */
+   if ((result = mp_montgomery_reduce(&c, &b, mp)) != MP_OKAY) \{
+      printf("Error reducing.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* now reduce (again) `c' back down to c = a^3R * R^-1 == a^3 */
+   if ((result = mp_montgomery_reduce(&c, &b, mp)) != MP_OKAY) \{
+      printf("Error reducing.  \%s",
+             mp_error_to_string(result));
+      return EXIT_FAILURE;
+   \}
+
+   /* c now equals a^3 mod b */
+
+   return EXIT_SUCCESS;
+\}
+\end{alltt}
+
+This particular example does not look too efficient but it demonstrates the point of the algorithm.  By
+normalizing the inputs the reduced results are always of the form $aR$ for some variable $a$.  This allows
+a single final reduction to correct for the normalization and the fast reduction used within the algorithm.
+
+For more details consider examining the file \textit{bn\_mp\_exptmod\_fast.c}.
+
+\section{Restricted Dimminished Radix}
+
+``Dimminished Radix'' reduction refers to reduction with respect to moduli that are ameniable to simple
+digit shifting and small multiplications.  In this case the ``restricted'' variant refers to moduli of the
+form $\beta^k - p$ for some $k \ge 0$ and $0 < p < \beta$ where $\beta$ is the radix (default to $2^{28}$).
+
+As in the case of Montgomery reduction there is a pre--computation phase required for a given modulus.
+
+\index{mp\_dr\_setup}
+\begin{alltt}
+void mp_dr_setup(mp_int *a, mp_digit *d);
+\end{alltt}
+
+This computes the value required for the modulus $a$ and stores it in $d$.  This function cannot fail
+and does not return any error codes.  After the pre--computation a reduction can be performed with the
+following.
+
+\index{mp\_dr\_reduce}
+\begin{alltt}
+int mp_dr_reduce(mp_int *a, mp_int *b, mp_digit mp);
+\end{alltt}
+
+This reduces $a$ in place modulo $b$ with the pre--computed value $mp$.  $b$ must be of a restricted
+dimminished radix form and $a$ must be in the range $0 \le a < b^2$.  Dimminished radix reductions are
+much faster than both Barrett and Montgomery reductions as they have a much lower asymtotic running time.
+
+Since the moduli are restricted this algorithm is not particularly useful for something like Rabin, RSA or
+BBS cryptographic purposes.  This reduction algorithm is useful for Diffie-Hellman and ECC where fixed
+primes are acceptable.
+
+Note that unlike Montgomery reduction there is no normalization process.  The result of this function is
+equal to the correct residue.
+
+\section{Unrestricted Dimminshed Radix}
+
+Unrestricted reductions work much like the restricted counterparts except in this case the moduli is of the
+form $2^k - p$ for $0 < p < \beta$.  In this sense the unrestricted reductions are more flexible as they
+can be applied to a wider range of numbers.
+
+\index{mp\_reduce\_2k\_setup}
+\begin{alltt}
+int mp_reduce_2k_setup(mp_int *a, mp_digit *d);
+\end{alltt}
+
+This will compute the required $d$ value for the given moduli $a$.
+
+\index{mp\_reduce\_2k}
+\begin{alltt}
+int mp_reduce_2k(mp_int *a, mp_int *n, mp_digit d);
+\end{alltt}
+
+This will reduce $a$ in place modulo $n$ with the pre--computed value $d$.  From my experience this routine is
+slower than mp\_dr\_reduce but faster for most moduli sizes than the Montgomery reduction.
+
+\chapter{Exponentiation}
+\section{Single Digit Exponentiation}
+\index{mp\_expt\_d\_ex}
+\begin{alltt}
+int mp_expt_d_ex (mp_int * a, mp_digit b, mp_int * c, int fast)
+\end{alltt}
+This function computes $c = a^b$.
+
+With parameter \textit{fast} set to $0$ the old version of the algorithm is used,
+when \textit{fast} is $1$, a faster but not statically timed version of the algorithm is used.
+
+The old version uses a simple binary left-to-right algorithm.
+It is faster than repeated multiplications by $a$ for all values of $b$ greater than three.
+
+The new version uses a binary right-to-left algorithm.
+
+The difference between the old and the new version is that the old version always
+executes $DIGIT\_BIT$ iterations. The new algorithm executes only $n$ iterations
+where $n$ is equal to the position of the highest bit that is set in $b$.
+
+\index{mp\_expt\_d}
+\begin{alltt}
+int mp_expt_d (mp_int * a, mp_digit b, mp_int * c)
+\end{alltt}
+mp\_expt\_d(a, b, c) is a wrapper function to mp\_expt\_d\_ex(a, b, c, 0).
+
+\section{Modular Exponentiation}
+\index{mp\_exptmod}
+\begin{alltt}
+int mp_exptmod (mp_int * G, mp_int * X, mp_int * P, mp_int * Y)
+\end{alltt}
+This computes $Y \equiv G^X \mbox{ (mod }P\mbox{)}$ using a variable width sliding window algorithm.  This function
+will automatically detect the fastest modular reduction technique to use during the operation.  For negative values of
+$X$ the operation is performed as $Y \equiv (G^{-1} \mbox{ mod }P)^{\vert X \vert} \mbox{ (mod }P\mbox{)}$ provided that
+$gcd(G, P) = 1$.
+
+This function is actually a shell around the two internal exponentiation functions.  This routine will automatically
+detect when Barrett, Montgomery, Restricted and Unrestricted Dimminished Radix based exponentiation can be used.  Generally
+moduli of the a ``restricted dimminished radix'' form lead to the fastest modular exponentiations.  Followed by Montgomery
+and the other two algorithms.
+
+\section{Root Finding}
+\index{mp\_n\_root}
+\begin{alltt}
+int mp_n_root (mp_int * a, mp_digit b, mp_int * c)
+\end{alltt}
+This computes $c = a^{1/b}$ such that $c^b \le a$ and $(c+1)^b > a$.  The implementation of this function is not
+ideal for values of $b$ greater than three.  It will work but become very slow.  So unless you are working with very small
+numbers (less than 1000 bits) I'd avoid $b > 3$ situations.  Will return a positive root only for even roots and return
+a root with the sign of the input for odd roots.  For example, performing $4^{1/2}$ will return $2$ whereas $(-8)^{1/3}$
+will return $-2$.
+
+This algorithm uses the ``Newton Approximation'' method and will converge on the correct root fairly quickly.  Since
+the algorithm requires raising $a$ to the power of $b$ it is not ideal to attempt to find roots for large
+values of $b$.  If particularly large roots are required then a factor method could be used instead.  For example,
+$a^{1/16}$ is equivalent to $\left (a^{1/4} \right)^{1/4}$ or simply
+$\left ( \left ( \left ( a^{1/2} \right )^{1/2} \right )^{1/2} \right )^{1/2}$
+
+\chapter{Prime Numbers}
+\section{Trial Division}
+\index{mp\_prime\_is\_divisible}
+\begin{alltt}
+int mp_prime_is_divisible (mp_int * a, int *result)
+\end{alltt}
+This will attempt to evenly divide $a$ by a list of primes\footnote{Default is the first 256 primes.} and store the
+outcome in ``result''.  That is if $result = 0$ then $a$ is not divisible by the primes, otherwise it is.  Note that
+if the function does not return \textbf{MP\_OKAY} the value in ``result'' should be considered undefined\footnote{Currently
+the default is to set it to zero first.}.
+
+\section{Fermat Test}
+\index{mp\_prime\_fermat}
+\begin{alltt}
+int mp_prime_fermat (mp_int * a, mp_int * b, int *result)
+\end{alltt}
+Performs a Fermat primality test to the base $b$.  That is it computes $b^a \mbox{ mod }a$ and tests whether the value is
+equal to $b$ or not.  If the values are equal then $a$ is probably prime and $result$ is set to one.  Otherwise $result$
+is set to zero.
+
+\section{Miller-Rabin Test}
+\index{mp\_prime\_miller\_rabin}
+\begin{alltt}
+int mp_prime_miller_rabin (mp_int * a, mp_int * b, int *result)
+\end{alltt}
+Performs a Miller-Rabin test to the base $b$ of $a$.  This test is much stronger than the Fermat test and is very hard to
+fool (besides with Carmichael numbers).  If $a$ passes the test (therefore is probably prime) $result$ is set to one.
+Otherwise $result$ is set to zero.
+
+Note that is suggested that you use the Miller-Rabin test instead of the Fermat test since all of the failures of
+Miller-Rabin are a subset of the failures of the Fermat test.
+
+\subsection{Required Number of Tests}
+Generally to ensure a number is very likely to be prime you have to perform the Miller-Rabin with at least a half-dozen
+or so unique bases.  However, it has been proven that the probability of failure goes down as the size of the input goes up.
+This is why a simple function has been provided to help out.
+
+\index{mp\_prime\_rabin\_miller\_trials}
+\begin{alltt}
+int mp_prime_rabin_miller_trials(int size)
+\end{alltt}
+This returns the number of trials required for a $2^{-96}$ (or lower) probability of failure for a given ``size'' expressed
+in bits.  This comes in handy specially since larger numbers are slower to test.  For example, a 512-bit number would
+require ten tests whereas a 1024-bit number would only require four tests.
+
+You should always still perform a trial division before a Miller-Rabin test though.
+
+\section{Primality Testing}
+\index{mp\_prime\_is\_prime}
+\begin{alltt}
+int mp_prime_is_prime (mp_int * a, int t, int *result)
+\end{alltt}
+This will perform a trial division followed by $t$ rounds of Miller-Rabin tests on $a$ and store the result in $result$.
+If $a$ passes all of the tests $result$ is set to one, otherwise it is set to zero.  Note that $t$ is bounded by
+$1 \le t < PRIME\_SIZE$ where $PRIME\_SIZE$ is the number of primes in the prime number table (by default this is $256$).
+
+\section{Next Prime}
+\index{mp\_prime\_next\_prime}
+\begin{alltt}
+int mp_prime_next_prime(mp_int *a, int t, int bbs_style)
+\end{alltt}
+This finds the next prime after $a$ that passes mp\_prime\_is\_prime() with $t$ tests.  Set $bbs\_style$ to one if you
+want only the next prime congruent to $3 \mbox{ mod } 4$, otherwise set it to zero to find any next prime.
+
+\section{Random Primes}
+\index{mp\_prime\_random}
+\begin{alltt}
+int mp_prime_random(mp_int *a, int t, int size, int bbs,
+                    ltm_prime_callback cb, void *dat)
+\end{alltt}
+This will find a prime greater than $256^{size}$ which can be ``bbs\_style'' or not depending on $bbs$ and must pass
+$t$ rounds of tests.  The ``ltm\_prime\_callback'' is a typedef for
+
+\begin{alltt}
+typedef int ltm_prime_callback(unsigned char *dst, int len, void *dat);
+\end{alltt}
+
+Which is a function that must read $len$ bytes (and return the amount stored) into $dst$.  The $dat$ variable is simply
+copied from the original input.  It can be used to pass RNG context data to the callback.  The function
+mp\_prime\_random() is more suitable for generating primes which must be secret (as in the case of RSA) since there
+is no skew on the least significant bits.
+
+\textit{Note:}  As of v0.30 of the LibTomMath library this function has been deprecated.  It is still available
+but users are encouraged to use the new mp\_prime\_random\_ex() function instead.
+
+\subsection{Extended Generation}
+\index{mp\_prime\_random\_ex}
+\begin{alltt}
+int mp_prime_random_ex(mp_int *a,    int t,
+                       int     size, int flags,
+                       ltm_prime_callback cb, void *dat);
+\end{alltt}
+This will generate a prime in $a$ using $t$ tests of the primality testing algorithms.  The variable $size$
+specifies the bit length of the prime desired.  The variable $flags$ specifies one of several options available
+(see fig. \ref{fig:primeopts}) which can be OR'ed together.  The callback parameters are used as in
+mp\_prime\_random().
+
+\begin{figure}[here]
+\begin{center}
+\begin{small}
+\begin{tabular}{|r|l|}
+\hline \textbf{Flag}         & \textbf{Meaning} \\
+\hline LTM\_PRIME\_BBS       & Make the prime congruent to $3$ modulo $4$ \\
+\hline LTM\_PRIME\_SAFE      & Make a prime $p$ such that $(p - 1)/2$ is also prime. \\
+                             & This option implies LTM\_PRIME\_BBS as well. \\
+\hline LTM\_PRIME\_2MSB\_OFF & Makes sure that the bit adjacent to the most significant bit \\
+                             & Is forced to zero.  \\
+\hline LTM\_PRIME\_2MSB\_ON  & Makes sure that the bit adjacent to the most significant bit \\
+                             & Is forced to one. \\
+\hline
+\end{tabular}
+\end{small}
+\end{center}
+\caption{Primality Generation Options}
+\label{fig:primeopts}
+\end{figure}
+
+\chapter{Input and Output}
+\section{ASCII Conversions}
+\subsection{To ASCII}
+\index{mp\_toradix}
+\begin{alltt}
+int mp_toradix (mp_int * a, char *str, int radix);
+\end{alltt}
+This still store $a$ in ``str'' as a base-``radix'' string of ASCII chars.  This function appends a NUL character
+to terminate the string.  Valid values of ``radix'' line in the range $[2, 64]$.  To determine the size (exact) required
+by the conversion before storing any data use the following function.
+
+\index{mp\_radix\_size}
+\begin{alltt}
+int mp_radix_size (mp_int * a, int radix, int *size)
+\end{alltt}
+This stores in ``size'' the number of characters (including space for the NUL terminator) required.  Upon error this
+function returns an error code and ``size'' will be zero.
+
+\subsection{From ASCII}
+\index{mp\_read\_radix}
+\begin{alltt}
+int mp_read_radix (mp_int * a, char *str, int radix);
+\end{alltt}
+This will read the base-``radix'' NUL terminated string from ``str'' into $a$.  It will stop reading when it reads a
+character it does not recognize (which happens to include th NUL char... imagine that...).  A single leading $-$ sign
+can be used to denote a negative number.
+
+\section{Binary Conversions}
+
+Converting an mp\_int to and from binary is another keen idea.
+
+\index{mp\_unsigned\_bin\_size}
+\begin{alltt}
+int mp_unsigned_bin_size(mp_int *a);
+\end{alltt}
+
+This will return the number of bytes (octets) required to store the unsigned copy of the integer $a$.
+
+\index{mp\_to\_unsigned\_bin}
+\begin{alltt}
+int mp_to_unsigned_bin(mp_int *a, unsigned char *b);
+\end{alltt}
+This will store $a$ into the buffer $b$ in big--endian format.  Fortunately this is exactly what DER (or is it ASN?)
+requires.  It does not store the sign of the integer.
+
+\index{mp\_read\_unsigned\_bin}
+\begin{alltt}
+int mp_read_unsigned_bin(mp_int *a, unsigned char *b, int c);
+\end{alltt}
+This will read in an unsigned big--endian array of bytes (octets) from $b$ of length $c$ into $a$.  The resulting
+integer $a$ will always be positive.
+
+For those who acknowledge the existence of negative numbers (heretic!) there are ``signed'' versions of the
+previous functions.
+
+\begin{alltt}
+int mp_signed_bin_size(mp_int *a);
+int mp_read_signed_bin(mp_int *a, unsigned char *b, int c);
+int mp_to_signed_bin(mp_int *a, unsigned char *b);
+\end{alltt}
+They operate essentially the same as the unsigned copies except they prefix the data with zero or non--zero
+byte depending on the sign.  If the sign is zpos (e.g. not negative) the prefix is zero, otherwise the prefix
+is non--zero.
+
+\chapter{Algebraic Functions}
+\section{Extended Euclidean Algorithm}
+\index{mp\_exteuclid}
+\begin{alltt}
+int mp_exteuclid(mp_int *a, mp_int *b,
+                 mp_int *U1, mp_int *U2, mp_int *U3);
+\end{alltt}
+
+This finds the triple U1/U2/U3 using the Extended Euclidean algorithm such that the following equation holds.
+
+\begin{equation}
+a \cdot U1 + b \cdot U2 = U3
+\end{equation}
+
+Any of the U1/U2/U3 paramters can be set to \textbf{NULL} if they are not desired.
+
+\section{Greatest Common Divisor}
+\index{mp\_gcd}
+\begin{alltt}
+int mp_gcd (mp_int * a, mp_int * b, mp_int * c)
+\end{alltt}
+This will compute the greatest common divisor of $a$ and $b$ and store it in $c$.
+
+\section{Least Common Multiple}
+\index{mp\_lcm}
+\begin{alltt}
+int mp_lcm (mp_int * a, mp_int * b, mp_int * c)
+\end{alltt}
+This will compute the least common multiple of $a$ and $b$ and store it in $c$.
+
+\section{Jacobi Symbol}
+\index{mp\_jacobi}
+\begin{alltt}
+int mp_jacobi (mp_int * a, mp_int * p, int *c)
+\end{alltt}
+This will compute the Jacobi symbol for $a$ with respect to $p$.  If $p$ is prime this essentially computes the Legendre
+symbol.  The result is stored in $c$ and can take on one of three values $\lbrace -1, 0, 1 \rbrace$.  If $p$ is prime
+then the result will be $-1$ when $a$ is not a quadratic residue modulo $p$.  The result will be $0$ if $a$ divides $p$
+and the result will be $1$ if $a$ is a quadratic residue modulo $p$.
+
+\section{Modular square root}
+\index{mp\_sqrtmod\_prime}
+\begin{alltt}
+int mp_sqrtmod_prime(mp_int *n, mp_int *p, mp_int *r)
+\end{alltt}
+
+This will solve the modular equatioon $r^2 = n \mod p$ where $p$ is a prime number greater than 2 (odd prime).
+The result is returned in the third argument $r$, the function returns \textbf{MP\_OKAY} on success,
+other return values indicate failure.
+
+The implementation is split for two different cases:
+
+1. if $p \mod 4 == 3$ we apply \href{http://cacr.uwaterloo.ca/hac/}{Handbook of Applied Cryptography algorithm 3.36} and compute $r$ directly as
+$r = n^{(p+1)/4} \mod p$
+
+2. otherwise we use \href{https://en.wikipedia.org/wiki/Tonelli-Shanks_algorithm}{Tonelli-Shanks algorithm}
+
+The function does not check the primality of parameter $p$ thus it is up to the caller to assure that this parameter
+is a prime number. When $p$ is a composite the function behaviour is undefined, it may even return a false-positive
+\textbf{MP\_OKAY}.
+
+\section{Modular Inverse}
+\index{mp\_invmod}
+\begin{alltt}
+int mp_invmod (mp_int * a, mp_int * b, mp_int * c)
+\end{alltt}
+Computes the multiplicative inverse of $a$ modulo $b$ and stores the result in $c$ such that $ac \equiv 1 \mbox{ (mod }b\mbox{)}$.
+
+\section{Single Digit Functions}
+
+For those using small numbers (\textit{snicker snicker}) there are several ``helper'' functions
+
+\index{mp\_add\_d} \index{mp\_sub\_d} \index{mp\_mul\_d} \index{mp\_div\_d} \index{mp\_mod\_d}
+\begin{alltt}
+int mp_add_d(mp_int *a, mp_digit b, mp_int *c);
+int mp_sub_d(mp_int *a, mp_digit b, mp_int *c);
+int mp_mul_d(mp_int *a, mp_digit b, mp_int *c);
+int mp_div_d(mp_int *a, mp_digit b, mp_int *c, mp_digit *d);
+int mp_mod_d(mp_int *a, mp_digit b, mp_digit *c);
+\end{alltt}
+
+These work like the full mp\_int capable variants except the second parameter $b$ is a mp\_digit.  These
+functions fairly handy if you have to work with relatively small numbers since you will not have to allocate
+an entire mp\_int to store a number like $1$ or $2$.
+
+\input{bn.ind}
+
+\end{document}
diff --git a/doc/makefile b/doc/makefile
new file mode 100644
index 0000000..fcc47c9
--- /dev/null
+++ b/doc/makefile
@@ -0,0 +1,69 @@
+ifeq ($V,1)
+silent_stdout=
+else
+silent_stdout= > /dev/null
+endif
+
+PLATFORM := $(shell uname | sed -e 's/_.*//')
+ifeq ($(PLATFORM), Darwin)
+err:
+	$(error Docs can't be built on Mac)
+
+docdvi poster docs mandvi manual: err
+endif
+
+# makes the LTM book DVI file, requires tetex, perl and makeindex [part of tetex I think]
+docdvi: tommath.src
+	cd pics ; MAKE=${MAKE} ${MAKE}
+	echo "hello" ${silent_stdout}
+	perl booker.pl
+	latex tommath ${silent_stdout}
+	latex tommath ${silent_stdout}
+	makeindex tommath
+	latex tommath ${silent_stdout}
+
+# poster, makes the single page PDF poster
+poster: poster.tex
+	cp poster.tex poster.bak
+	touch --reference=poster.tex poster.bak
+	(printf "%s" "\def\fixedpdfdate{"; date +'D:%Y%m%d%H%M%S%:z' -d @$$(stat --format=%Y poster.tex) | sed "s/:\([0-9][0-9]\)$$/'\1'}/g") > poster-deterministic.tex
+	printf "%s\n" "\pdfinfo{" >> poster-deterministic.tex
+	printf "%s\n" "  /CreationDate (\fixedpdfdate)" >> poster-deterministic.tex
+	printf "%s\n}\n" "  /ModDate (\fixedpdfdate)" >> poster-deterministic.tex
+	cat poster.tex >> poster-deterministic.tex
+	mv poster-deterministic.tex poster.tex
+	touch --reference=poster.bak poster.tex
+	pdflatex poster
+	sed -b -i 's,^/ID \[.*\]$$,/ID [<0> <0>],g' poster.pdf
+	mv poster.bak poster.tex
+	rm -f poster.aux poster.log poster.out
+
+# makes the LTM book PDF file, requires tetex, cleans up the LaTeX temp files
+docs: docdvi
+	dvipdf tommath
+	rm -f tommath.log tommath.aux tommath.dvi tommath.idx tommath.toc tommath.lof tommath.ind tommath.ilg
+	cd pics ; MAKE=${MAKE} ${MAKE} clean
+
+#LTM user manual
+mandvi: bn.tex
+	cp bn.tex bn.bak
+	touch --reference=bn.tex bn.bak
+	(printf "%s" "\def\fixedpdfdate{"; date +'D:%Y%m%d%H%M%S%:z' -d @$$(stat --format=%Y bn.tex) | sed "s/:\([0-9][0-9]\)$$/'\1'}/g") > bn-deterministic.tex
+	printf "%s\n" "\pdfinfo{" >> bn-deterministic.tex
+	printf "%s\n" "  /CreationDate (\fixedpdfdate)" >> bn-deterministic.tex
+	printf "%s\n}\n" "  /ModDate (\fixedpdfdate)" >> bn-deterministic.tex
+	cat bn.tex >> bn-deterministic.tex
+	mv bn-deterministic.tex bn.tex
+	touch --reference=bn.bak bn.tex
+	echo "hello" > bn.ind
+	latex bn ${silent_stdout}
+	latex bn ${silent_stdout}
+	makeindex bn
+	latex bn ${silent_stdout}
+
+#LTM user manual [pdf]
+manual:	mandvi
+	pdflatex bn >/dev/null
+	sed -b -i 's,^/ID \[.*\]$$,/ID [<0> <0>],g' bn.pdf
+	mv bn.bak bn.tex
+	rm -f bn.aux bn.dvi bn.log bn.idx bn.lof bn.out bn.toc
diff --git a/doc/pics/design_process.sxd b/doc/pics/design_process.sxd
new file mode 100644
index 0000000..7414dbb
Binary files /dev/null and b/doc/pics/design_process.sxd differ
diff --git a/doc/pics/design_process.tif b/doc/pics/design_process.tif
new file mode 100644
index 0000000..4a0c012
Binary files /dev/null and b/doc/pics/design_process.tif differ
diff --git a/doc/pics/expt_state.sxd b/doc/pics/expt_state.sxd
new file mode 100644
index 0000000..6518404
Binary files /dev/null and b/doc/pics/expt_state.sxd differ
diff --git a/doc/pics/expt_state.tif b/doc/pics/expt_state.tif
new file mode 100644
index 0000000..cb06e8e
Binary files /dev/null and b/doc/pics/expt_state.tif differ
diff --git a/doc/pics/makefile b/doc/pics/makefile
new file mode 100644
index 0000000..3ecb02f
--- /dev/null
+++ b/doc/pics/makefile
@@ -0,0 +1,35 @@
+# makes the images... yeah
+
+default:  pses
+
+design_process.ps: design_process.tif
+	tiff2ps -s -e design_process.tif > design_process.ps
+
+sliding_window.ps: sliding_window.tif
+	tiff2ps -s -e sliding_window.tif > sliding_window.ps
+	
+expt_state.ps: expt_state.tif
+	tiff2ps -s -e expt_state.tif > expt_state.ps
+
+primality.ps: primality.tif
+	tiff2ps -s -e primality.tif > primality.ps
+
+design_process.pdf: design_process.ps
+	epstopdf design_process.ps
+
+sliding_window.pdf: sliding_window.ps
+	epstopdf sliding_window.ps
+	
+expt_state.pdf: expt_state.ps
+	epstopdf expt_state.ps
+
+primality.pdf: primality.ps
+	epstopdf primality.ps
+
+
+pses: sliding_window.ps expt_state.ps primality.ps design_process.ps
+pdfes: sliding_window.pdf expt_state.pdf primality.pdf design_process.pdf
+
+clean:
+	rm -rf *.ps *.pdf .xvpics
+   
\ No newline at end of file
diff --git a/doc/pics/primality.tif b/doc/pics/primality.tif
new file mode 100644
index 0000000..76d6be3
Binary files /dev/null and b/doc/pics/primality.tif differ
diff --git a/doc/pics/radix.sxd b/doc/pics/radix.sxd
new file mode 100644
index 0000000..b9eb9a0
Binary files /dev/null and b/doc/pics/radix.sxd differ
diff --git a/doc/pics/sliding_window.sxd b/doc/pics/sliding_window.sxd
new file mode 100644
index 0000000..91e7c0d
Binary files /dev/null and b/doc/pics/sliding_window.sxd differ
diff --git a/doc/pics/sliding_window.tif b/doc/pics/sliding_window.tif
new file mode 100644
index 0000000..bb4cb96
Binary files /dev/null and b/doc/pics/sliding_window.tif differ
diff --git a/makefile b/makefile
index 96578ee..3fd47d2 100644
--- a/makefile
+++ b/makefile
@@ -108,61 +108,8 @@ timing: $(LIBNAME)
 coveralls: coverage
 	cpp-coveralls
 
-# makes the LTM book DVI file, requires tetex, perl and makeindex [part of tetex I think]
-docdvi: tommath.src
-	cd pics ; MAKE=${MAKE} ${MAKE}
-	echo "hello" > tommath.ind
-	perl booker.pl
-	latex tommath > /dev/null
-	latex tommath > /dev/null
-	makeindex tommath
-	latex tommath > /dev/null
-
-# poster, makes the single page PDF poster
-poster: poster.tex
-	cp poster.tex poster.bak
-	touch --reference=poster.tex poster.bak
-	(printf "%s" "\def\fixedpdfdate{"; date +'D:%Y%m%d%H%M%S%:z' -d @$$(stat --format=%Y poster.tex) | sed "s/:\([0-9][0-9]\)$$/'\1'}/g") > poster-deterministic.tex
-	printf "%s\n" "\pdfinfo{" >> poster-deterministic.tex
-	printf "%s\n" "  /CreationDate (\fixedpdfdate)" >> poster-deterministic.tex
-	printf "%s\n}\n" "  /ModDate (\fixedpdfdate)" >> poster-deterministic.tex
-	cat poster.tex >> poster-deterministic.tex
-	mv poster-deterministic.tex poster.tex
-	touch --reference=poster.bak poster.tex
-	pdflatex poster
-	sed -b -i 's,^/ID \[.*\]$$,/ID [<0> <0>],g' poster.pdf
-	mv poster.bak poster.tex
-	rm -f poster.aux poster.log poster.out
-
-# makes the LTM book PDF file, requires tetex, cleans up the LaTeX temp files
-docs:   docdvi
-	dvipdf tommath
-	rm -f tommath.log tommath.aux tommath.dvi tommath.idx tommath.toc tommath.lof tommath.ind tommath.ilg
-	cd pics ; MAKE=${MAKE} ${MAKE} clean
-
-#LTM user manual
-mandvi: bn.tex
-	cp bn.tex bn.bak
-	touch --reference=bn.tex bn.bak
-	(printf "%s" "\def\fixedpdfdate{"; date +'D:%Y%m%d%H%M%S%:z' -d @$$(stat --format=%Y bn.tex) | sed "s/:\([0-9][0-9]\)$$/'\1'}/g") > bn-deterministic.tex
-	printf "%s\n" "\pdfinfo{" >> bn-deterministic.tex
-	printf "%s\n" "  /CreationDate (\fixedpdfdate)" >> bn-deterministic.tex
-	printf "%s\n}\n" "  /ModDate (\fixedpdfdate)" >> bn-deterministic.tex
-	cat bn.tex >> bn-deterministic.tex
-	mv bn-deterministic.tex bn.tex
-	touch --reference=bn.bak bn.tex
-	echo "hello" > bn.ind
-	latex bn > /dev/null
-	latex bn > /dev/null
-	makeindex bn
-	latex bn > /dev/null
-
-#LTM user manual [pdf]
-manual:	mandvi
-	pdflatex bn >/dev/null
-	sed -b -i 's,^/ID \[.*\]$$,/ID [<0> <0>],g' bn.pdf
-	mv bn.bak bn.tex
-	rm -f bn.aux bn.dvi bn.log bn.idx bn.lof bn.out bn.toc
+docdvi poster docs mandvi manual:
+	$(MAKE) -C doc/ $@ V=$(V)
 
 pretty:
 	perl pretty.build
diff --git a/pics/design_process.sxd b/pics/design_process.sxd
deleted file mode 100644
index 7414dbb..0000000
Binary files a/pics/design_process.sxd and /dev/null differ
diff --git a/pics/design_process.tif b/pics/design_process.tif
deleted file mode 100644
index 4a0c012..0000000
Binary files a/pics/design_process.tif and /dev/null differ
diff --git a/pics/expt_state.sxd b/pics/expt_state.sxd
deleted file mode 100644
index 6518404..0000000
Binary files a/pics/expt_state.sxd and /dev/null differ
diff --git a/pics/expt_state.tif b/pics/expt_state.tif
deleted file mode 100644
index cb06e8e..0000000
Binary files a/pics/expt_state.tif and /dev/null differ
diff --git a/pics/makefile b/pics/makefile
deleted file mode 100644
index 3ecb02f..0000000
--- a/pics/makefile
+++ /dev/null
@@ -1,35 +0,0 @@
-# makes the images... yeah
-
-default:  pses
-
-design_process.ps: design_process.tif
-	tiff2ps -s -e design_process.tif > design_process.ps
-
-sliding_window.ps: sliding_window.tif
-	tiff2ps -s -e sliding_window.tif > sliding_window.ps
-	
-expt_state.ps: expt_state.tif
-	tiff2ps -s -e expt_state.tif > expt_state.ps
-
-primality.ps: primality.tif
-	tiff2ps -s -e primality.tif > primality.ps
-
-design_process.pdf: design_process.ps
-	epstopdf design_process.ps
-
-sliding_window.pdf: sliding_window.ps
-	epstopdf sliding_window.ps
-	
-expt_state.pdf: expt_state.ps
-	epstopdf expt_state.ps
-
-primality.pdf: primality.ps
-	epstopdf primality.ps
-
-
-pses: sliding_window.ps expt_state.ps primality.ps design_process.ps
-pdfes: sliding_window.pdf expt_state.pdf primality.pdf design_process.pdf
-
-clean:
-	rm -rf *.ps *.pdf .xvpics
-   
\ No newline at end of file
diff --git a/pics/primality.tif b/pics/primality.tif
deleted file mode 100644
index 76d6be3..0000000
Binary files a/pics/primality.tif and /dev/null differ
diff --git a/pics/radix.sxd b/pics/radix.sxd
deleted file mode 100644
index b9eb9a0..0000000
Binary files a/pics/radix.sxd and /dev/null differ
diff --git a/pics/sliding_window.sxd b/pics/sliding_window.sxd
deleted file mode 100644
index 91e7c0d..0000000
Binary files a/pics/sliding_window.sxd and /dev/null differ
diff --git a/pics/sliding_window.tif b/pics/sliding_window.tif
deleted file mode 100644
index bb4cb96..0000000
Binary files a/pics/sliding_window.tif and /dev/null differ
kc3-lang/libtommath

Commit ddddcf2e71e55796313d61661eb9666e9d52a5df