As outlined in the introduction of this chapter, an in-house module is provided which consists of an extensible interface to three in-house solvers and to two external (third party) solver modules. In combination with the assembly module (see Chapter 4), this module is currently employed by MINIMOS-NT and FEDOS. In Figure 5.1 an overview is given how these modules are integrated in the simulation flow of these simulators.
In the upper left corner the Newton iteration control of the simulator is shown. Depending on the kind of simulation, various model implementation functions are selected and subsequently called. They add their contributions to the matrix structures in the assembly module [229].
After assembling has been completed, the simulator requests the solution of the equation system by starting the solving process which is preceded by the compilation, pre-elimination, sorting, and scaling as discussed in Chapter 4. Eventually, a solver module is called which actually calculates the solution vector. After the chosen solver module has returned the solution vector, all transformations are reverted. MINIMOS-NT uses the solution to calculate the update for Newton approximation or terminates the iteration if a specific criterion is fulfilled.
There are different approaches available to calculate the solution vector. The in-house solver module provides one direct Gaussian solver as well as two iterative solvers, namely BICGSTAB [49] and GMRES(M) [179] in combination with an ILU-preconditioner. In addition, an interface is provided to employ recently developed modules, which provide highly-optimized mathematical code and allow a significant speed-up of the solving process.
The choice of the solver basically depends on the following considerations:
As can be seen from the MINIMOS-NT examples given below, the iterative solvers have also performance advantages for problems with smaller dimensions. In the literature, often the opposite is found [182]. The reason for this fundamental difference are the various transformations done in the assembly module resulting in well-conditioned inner equation systems. Of course, the overall effort could be less if all transformations are skipped and a direct solver employed instead.
Direct solution techniques refer to Gaussian elimination
using a factorization of the system matrix
in a lower and
upper triangular matrix as
. The linear
equation system
can thus be written as
. Therefore, the following three steps can
be specified, that determine the Gaussian algorithm:
If the linear equation system has to be solved for multiple right-hand-side
vectors
, only steps two and three have to be repeated. So the
factorization has to be done only once. The effort of the factorization is
in the general case, resulting in a prohibitively high computational
effort for problems with higher dimensions, for example three-dimensional
device simulations. The time and memory consumption of the in-house LU
factorization is indeed very high. The effort for determining space
requirements is
, that for determining time requirements
,
and that for factorizing is
. For the sake of completeness it is to
note that the Cholesky method [52] has not been
implemented since the assembled matrices are normally not symmetric and
positive definite.
Iterative methods refer to techniques which use successive approximations to obtain more accurate solutions at each step [17]. The iteration is stopped if the error is sufficiently reduced or a maximum number of iterations has been reached. There are two types of iterative methods:
The Biconjugate Gradient Stabilized [49] (BICGSTAB) is applicable for non-symmetric matrices. This method avoids the irregular convergence patterns but shows almost the same convergence speed as the alternative Conjugate Gradient Squared method CGS [17]. The Conjugate Gradient method (CG) calculates a sequence of conjugate vectors, which are the residuals of the iterates. The minimization of these residuals is equivalent to solving the linear equation system [17]. The BICG method calculates two sequences of vectors: one of the system matrix and one of the transposed system matrix. The CGS and BICGSTAB are variants of the BICG with modifications regarding the updating operations of the sequences. The BICGSTAB uses different updates for the sequence of the transposed matrix and therefore obtains smoother convergence than CGS.
The Generalized Minimal Residual method (GMRES) is also
applicable for non-symmetric matrices, leads to the smallest residual for a
fixed number of iteration steps, although these steps require increasingly more
computational resources. Thus, GMRES is actually not an iterative solver,
since the exact solution of an equation system with rank can be obtained in
at most
steps (if an exact arithmetic is assumed). The solution procedure
is based on orthogonal basis vectors, which are combined by a least-squares
solve and update.
The dimension of the orthogonal vectors increases with each step. Since this increase in memory consumption is the drawback of this method, a restarted version of GMRES, normally referred by GMRES(M), can be used instead. The iteration will be terminated and the solution will be used as initial guess for the next iteration.
The choice of an optimum restart factor is not trivial and ``requires skill
and experience'' [17]. In [96],
is suggested
for device simulation, but a significant higher value seems to be necessary for
more ill-conditioned problems. In [183],
was set to
to
avoid too high memory consumption. In view of the system memory of the average
computer this parameter can be set to higher values at least in the area of
classical device simulation. However, a default value for
had to be found
and for that reason an empirical investigation was performed (see
Section 5.5.3).
In order to provide an alternative solver system, a restarted version of the Generalized Minimal Residual method [179] was implemented for the internal solver module. This was done based on templates provided by the Netlib repository [17,152]. During the implementation, it was absolutely crucial to retain the existing structure of the solver module and to apply all already implemented capabilities.
The convergence rate of iterative methods depends on spectral properties of the
system matrix
[17]. For that reason preconditioning is used
to transform the linear equation system into a similar one, which has the same
solution but better spectral properties. Thus, by using a preconditioner
the original system
is transformed to
![]() |
(5.1) |
There are many approaches to derive the preconditioner
. One class of
them is the Incomplete-LU factorization (ILU), which approximates the matrix
.
Basically, a factorization
is incomplete if not all
necessary fill elements are added to
or
. The respective
preconditioner has the form
![]() |
(5.2) |
Adding a preconditioner to an iterative solver causes extra cost, so the
resulting trade-off between construction/application and gain in convergence
speed must be considered. As outlined in [65], a hierarchical
concept is used to minimize the necessary computational time of this
system. This time is mainly influenced by the parameters fill-in
and threshold (tolerance) of the Incomplete-LU factorization process. These
parameters are stored by the parameter administration and automatically adapted
after each solver call. See Figure 5.2 for an illustration of the
hierarchical concept. This form of the Incomplete-LU factorization is sometimes referred as
ILUT(,
) [183] with
standing for the
threshold and
standing for the fill-in parameter specifying the
largest
elements which are kept per row.
Alternatives are the ILU(0) and ILU() methods. The former has less
complexity and simply keeps the elements whose corresponding values in the
system matrix are non-zero, that is
. The method called
ILU(
) drops elements but takes only the structure of the linear equation
system into account. However, the Incomplete-LU factorization faces the same problems as the
Gaussian elimination. For example, due to zero or negative pivots,
incomplete factorizations may break down or result in indefinite matrices,
respectively, even if the full factorization exists and yields a positive
definite matrix [17].
An alternative to these methods are those which approximate the inverse of
, the so-called approximate inverses methods. The major
advantage of these approaches is that the application of the resulting
preconditioner requires matrix-vector products only and not a solving step
[183]. Examples are the SPAI [92] or the AINV
methods [19].
For the synchronization between preconditioner and solver the concept of counting mathematical operations is used. In contrast to the system time which was used in the former version this count remains constant for the same simulations, and thus subsequent passes of the optimization loop are deterministic even in a multitasking environment.
The dual drop-off strategy in the incomplete factorization strategy employed in
the internal solver module works as follows: Elements in
and
whose size is less than the threshold (relative to the average element size of
the current row in
) are dropped. By setting a zero threshold, all
elements will be kept. Furthermore, only a specific number of the remaining
elements are kept. This number is determined by the fill-in parameter and the
selection is done by size. Thus, the largest elements remain in the matrix.
One can use a zero fill-in parameter to obtain a strategy based on keeping the
largest elements in each row of
and
. Or, by choosing an
appropriate threshold but setting the fill-in parameter to
, the usual
threshold strategy will be applied, but the number of fill-ins is then
unpredictable.