Next: 5.3 External Solvers Up: 5. The Solver Module Previous: 5.1 Third Party Modules

5.2 Overview of the Solver Module

As outlined in the introduction of this chapter, an in-house module is provided which consists of an extensible interface to three in-house solvers and to two external (third party) solver modules. In combination with the assembly module (see Chapter 4), this module is currently employed by MINIMOS-NT and FEDOS. In Figure 5.1 an overview is given how these modules are integrated in the simulation flow of these simulators.

In the upper left corner the Newton iteration control of the simulator is shown. Depending on the kind of simulation, various model implementation functions are selected and subsequently called. They add their contributions to the matrix structures in the assembly module [229].

After assembling has been completed, the simulator requests the solution of the equation system by starting the solving process which is preceded by the compilation, pre-elimination, sorting, and scaling as discussed in Chapter 4. Eventually, a solver module is called which actually calculates the solution vector. After the chosen solver module has returned the solution vector, all transformations are reverted. MINIMOS-NT uses the solution to calculate the update for Newton approximation or terminates the iteration if a specific criterion is fulfilled.

**Figure 5.1:** Schematic overview of the linear modules [230].
$\includegraphics[width=10.8cm]{figures/schematic5b.eps}$

There are different approaches available to calculate the solution vector. The in-house solver module provides one direct Gaussian solver as well as two iterative solvers, namely BICGSTAB [49] and GMRES(M) [179] in combination with an ILU-preconditioner. In addition, an interface is provided to employ recently developed modules, which provide highly-optimized mathematical code and allow a significant speed-up of the solving process.

5.2.1 Solver Selection

The choice of the solver basically depends on the following considerations:

In contrast to direct methods, the convergence rate of iterative methods highly depends on spectral properties of the system matrix (see Section 5.2.5). Several measures are taken in order to improve the solvability. For example, iterative methods involve a second matrix, the so-called preconditioner, which transforms the system matrix into one with better properties (see Section 5.2.5). However, for some kinds of simulations still direct methods have performance and solving advantages (see Section 5.5.1).
Although direct methods show solving advantages in general, their memory consumption is the main limiting factor. Since iterative methods require less computational resources, they are normally employed as standard solvers for well-conditioned linear equation systems.

As can be seen from the MINIMOS-NT examples given below, the iterative solvers have also performance advantages for problems with smaller dimensions. In the literature, often the opposite is found [182]. The reason for this fundamental difference are the various transformations done in the assembly module resulting in well-conditioned inner equation systems. Of course, the overall effort could be less if all transformations are skipped and a direct solver employed instead.

5.2.2 In-House Direct Solvers

Direct solution techniques refer to Gaussian elimination using a factorization of the system matrix $\mathbf{A}$ in a lower and upper triangular matrix as $\ensuremath{\mathbf{A}} =\ensuremath{\mathbf{L}} \ensuremath{\mathbf{U}}$ . The linear equation system $\ensuremath{\mathbf{A}} \ensuremath{\mathbf{x}} = \ensuremath{\mathbf{b}}$ can thus be written as $\ensuremath{\mathbf{L}} (\ensuremath{\mathbf{U}} \ensuremath{\mathbf{x}}) = \ensuremath{\mathbf{b}}$ . Therefore, the following three steps can be specified, that determine the Gaussian algorithm:

$\ensuremath{\mathbf{A}} =\ensuremath{\mathbf{L}} \ensuremath{\mathbf{U}}$ : Gaussian elimination (factorization: computing $\mathbf{L}$ and $\mathbf{U}$ )
$\ensuremath{\mathbf{L}} \ensuremath{\mathbf{y}} = \ensuremath{\mathbf{b}}$ : forward-substitution (compute $\mathbf{y}$ )
$\ensuremath{\mathbf{y}} = \ensuremath{\mathbf{U}} \ensuremath{\mathbf{x}}$ : back-substitution (compute $\mathbf{x}$ )

If the linear equation system has to be solved for multiple right-hand-side vectors $\mathbf{b}$ , only steps two and three have to be repeated. So the factorization has to be done only once. The effort of the factorization is in the general case, resulting in a prohibitively high computational effort for problems with higher dimensions, for example three-dimensional device simulations. The time and memory consumption of the in-house LU factorization is indeed very high. The effort for determining space requirements is $O(\ensuremath{n_{\mathrm{non-zero}}})$ , that for determining time requirements , and that for factorizing is . For the sake of completeness it is to note that the Cholesky method [52] has not been implemented since the assembled matrices are normally not symmetric and positive definite.

5.2.3 In-House Iterative Solvers

Iterative methods refer to techniques which use successive approximations to obtain more accurate solutions at each step [17]. The iteration is stopped if the error is sufficiently reduced or a maximum number of iterations has been reached. There are two types of iterative methods:

Stationary methods: they are older and simpler in both understanding and implementing, but generally not as effective. The four main stationary methods are the Jacobi method, the Gauss-Seidel, the Successive Over-Relaxation method (SOR), and the Symmetric Successive Over-Relaxation method (SSOR). Stationary methods can be generally expressed by ${\ensuremath{\mathbf{x}}}^{(k)} = {\ensuremath{\mathbf{B}}} {\ensuremath{\mathbf{x}}}^{(k-1)} + \ensuremath{\mathbf{c}}$ , where $\mathbf{B}$ and $\ensuremath{\mathbf{c}}$ are independent from the iteration counter .
Non-stationary methods: in contrast to stationary methods, they use information that changes at each iteration. Most of the methods are based on the idea of sequences of orthogonal vectors. The most prominent examples are the Krylow subspace methods GMRES(M), CG, CGS, BICG, and the BICGSTAB.

The Biconjugate Gradient Stabilized [49] (BICGSTAB) is applicable for non-symmetric matrices. This method avoids the irregular convergence patterns but shows almost the same convergence speed as the alternative Conjugate Gradient Squared method CGS [17]. The Conjugate Gradient method (CG) calculates a sequence of conjugate vectors, which are the residuals of the iterates. The minimization of these residuals is equivalent to solving the linear equation system [17]. The BICG method calculates two sequences of vectors: one of the system matrix and one of the transposed system matrix. The CGS and BICGSTAB are variants of the BICG with modifications regarding the updating operations of the sequences. The BICGSTAB uses different updates for the sequence of the transposed matrix and therefore obtains smoother convergence than CGS.

5.2.4 The Generalized Minimal Residual Method

The Generalized Minimal Residual method (GMRES) is also applicable for non-symmetric matrices, leads to the smallest residual for a fixed number of iteration steps, although these steps require increasingly more computational resources. Thus, GMRES is actually not an iterative solver, since the exact solution of an equation system with rank can be obtained in at most steps (if an exact arithmetic is assumed). The solution procedure is based on orthogonal basis vectors, which are combined by a least-squares solve and update.

The dimension of the orthogonal vectors increases with each step. Since this increase in memory consumption is the drawback of this method, a restarted version of GMRES, normally referred by GMRES(M), can be used instead. The iteration will be terminated and the solution will be used as initial guess for the next iteration.

The choice of an optimum restart factor is not trivial and ``requires skill and experience'' [17]. In [96], is suggested for device simulation, but a significant higher value seems to be necessary for more ill-conditioned problems. In [183], was set to to avoid too high memory consumption. In view of the system memory of the average computer this parameter can be set to higher values at least in the area of classical device simulation. However, a default value for had to be found and for that reason an empirical investigation was performed (see Section 5.5.3).

In order to provide an alternative solver system, a restarted version of the Generalized Minimal Residual method [179] was implemented for the internal solver module. This was done based on templates provided by the Netlib repository [17,152]. During the implementation, it was absolutely crucial to retain the existing structure of the solver module and to apply all already implemented capabilities.

5.2.5 Preconditioner

The convergence rate of iterative methods depends on spectral properties of the system matrix $\mathbf{A}$ [17]. For that reason preconditioning is used to transform the linear equation system into a similar one, which has the same solution but better spectral properties. Thus, by using a preconditioner $\ensuremath{\mathbf{M}}$ the original system $\ensuremath{\mathbf{A}} \ensuremath{\mathbf{x}} = \ensuremath{\mathbf{b}}$ is transformed to

$\displaystyle \ensuremath{\mathbf{M}}^{-1} \ensuremath{\mathbf{A}} \ensuremath{\mathbf{x}} = \ensuremath{\mathbf{M}}^{-1} \ensuremath{\mathbf{b}} \ ,$

(5.1)

where $\ensuremath{\mathbf{M}}^{-1} \ensuremath{\mathbf{A}}$ has better spectral properties than $\ensuremath{\mathbf{A}}$ .

There are many approaches to derive the preconditioner $\ensuremath{\mathbf{M}}$ . One class of them is the Incomplete-LU factorization (ILU), which approximates the matrix $\ensuremath{\mathbf{A}}$ . Basically, a factorization $\ensuremath{\mathbf{L}} \ensuremath{\mathbf{U}}$ is incomplete if not all necessary fill elements are added to $\ensuremath{\mathbf{L}}$ or $\ensuremath{\mathbf{U}}$ . The respective preconditioner has the form

$\displaystyle \ensuremath{\mathbf{M}} = \ensuremath{\mathbf{L}} \ensuremath{\mathbf{U}} \approx \ensuremath{\mathbf{A}}\ .$

(5.2)

Adding a preconditioner to an iterative solver causes extra cost, so the resulting trade-off between construction/application and gain in convergence speed must be considered. As outlined in [65], a hierarchical concept is used to minimize the necessary computational time of this system. This time is mainly influenced by the parameters fill-in and threshold (tolerance) of the Incomplete-LU factorization process. These parameters are stored by the parameter administration and automatically adapted after each solver call. See Figure 5.2 for an illustration of the hierarchical concept. This form of the Incomplete-LU factorization is sometimes referred as ILUT( $\epsilon$ , ) [183] with $\epsilon$ standing for the threshold and standing for the fill-in parameter specifying the largest elements which are kept per row.

Alternatives are the ILU(0) and ILU() methods. The former has less complexity and simply keeps the elements whose corresponding values in the system matrix are non-zero, that is $\epsilon = 0$ . The method called ILU() drops elements but takes only the structure of the linear equation system into account. However, the Incomplete-LU factorization faces the same problems as the Gaussian elimination. For example, due to zero or negative pivots, incomplete factorizations may break down or result in indefinite matrices, respectively, even if the full factorization exists and yields a positive definite matrix [17].

An alternative to these methods are those which approximate the inverse of $\mathbf{A}$ , the so-called approximate inverses methods. The major advantage of these approaches is that the application of the resulting preconditioner requires matrix-vector products only and not a solving step [183]. Examples are the SPAI [92] or the AINV methods [19].

**Figure 5.2:** The hierarchical concept [65,228].
$\includegraphics[width=8.0cm]{figures/hierarch.eps}$

For the synchronization between preconditioner and solver the concept of counting mathematical operations is used. In contrast to the system time which was used in the former version this count remains constant for the same simulations, and thus subsequent passes of the optimization loop are deterministic even in a multitasking environment.

The dual drop-off strategy in the incomplete factorization strategy employed in the internal solver module works as follows: Elements in $\mathbf{L}$ and $\mathbf{U}$ whose size is less than the threshold (relative to the average element size of the current row in $\mathbf{U}$ ) are dropped. By setting a zero threshold, all elements will be kept. Furthermore, only a specific number of the remaining elements are kept. This number is determined by the fill-in parameter and the selection is done by size. Thus, the largest elements remain in the matrix. One can use a zero fill-in parameter to obtain a strategy based on keeping the largest elements in each row of $\mathbf{L}$ and $\mathbf{U}$ . Or, by choosing an appropriate threshold but setting the fill-in parameter to , the usual threshold strategy will be applied, but the number of fill-ins is then unpredictable.

Next: 5.3 External Solvers Up: 5. The Solver Module Previous: 5.1 Third Party Modules

S. Wagner: Small-Signal Device and Circuit Simulation