4.6 Assembling the Complete Linear Equation System

The assembly module can be used to assemble arbitrary linear equation systems $\ensuremath{\mathbf{A}} \ensuremath{\mathbf{x}} = \ensuremath{\mathbf{b}}$ independently of the concept the simulator is based on. This fulfills the major key demand of general applicability. However, the row transformation feature necessitates to continue the discussion with a more specific field of application. Although not mandatory, this feature can be well applied for the finite volume (or box integration) discretization method. For that reason it shall be used as example during the following discussion.

A semiconductor device which is going to be simulated is normally divided into several segments that are geometrical regions employing a distinct set of models. The implementation of each model is completely independent from other models and each model is basically allowed to enter its contributions to the linear equation system. All boundary and interface issues are completely separated from the general segment models represented by assembly structures for the boundary system which are independent from the segment ones.

Thus, the system matrix $\mathbf{A}$ will be assembled from two parts, namely the direct part $\ensuremath{\mathbf{A}}_{\mathrm{b}}$ (boundary models) and the transformed part $\ensuremath{\mathbf{A}}_{\mathrm{s}}$ (segment models). The latter is multiplied by the row transformation matrix $\ensuremath{\mathbf{T}}_{\mathrm{b}}$ from the left before contributing to the system matrix $\mathbf{A}$ . The right-hand-side vector $\mathbf{b}$ is treated the same way:

$\displaystyle \ensuremath{\mathbf{A}} = \ensuremath{\ensuremath{\mathbf{A}}_{\m... ...h{\mathbf{T}}_{\mathrm{b}}}\ensuremath{\ensuremath{\mathbf{A}}_{\mathrm{s}}}\ ,$	(4.6)
$\displaystyle \ensuremath{\mathbf{b}} = \ensuremath{\ensuremath{\mathbf{b}}_{\m... ...h{\mathbf{T}}_{\mathrm{b}}}\ensuremath{\ensuremath{\mathbf{b}}_{\mathrm{s}}}\ ,$	(4.7)
$\displaystyle \ensuremath{\mathbf{A}} \ensuremath{\mathbf{x}} = \ensuremath{\mathbf{b}} \ .$	(4.8)

Although in principle every model is allowed to add entries to all components, the assembly module checks two prerequisites before actually entering the value: first, the quantity the value belongs to must be marked to be solved (the user may request only a subset of all provided models), and second the priority of the model has to be high enough to modify the row transformation properties. As stated before, the row transformation is used to complete missing fluxes in boundary boxes. Since a grid point can be part of more than two segments, a ranking using a priority has been introduced. For example contact models have usually the highest priority and thus their contributions are always used for completion.

All three matrices $\ensuremath{\ensuremath{\mathbf{A}}_{\mathrm{b}}}$ , $\ensuremath{\ensuremath{\mathbf{A}}_{\mathrm{s}}}$ , and $\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{b}}}$ and the two vectors $\ensuremath{\ensuremath{\mathbf{b}}_{\mathrm{b}}}$ and $\ensuremath{\ensuremath{\mathbf{b}}_{\mathrm{s}}}$ may be assembled simultaneously, so no assembly sequence must be adhered to. In addition, a fourth matrix $\ensuremath{\ensuremath{\mathbf{T}}_{\mathrm{v}}}$ is assembled which contains information for an additional variable transformation (see Section 4.7.2).

The assembly module offers an additional feature for quantity administration. The simulator is able to use this feature to store and obtain the information about quantity indices and other properties from the assembly module. This has the specific advantage that multi-level solvers can directly be provided with the required connectivity information between the equations (see Section 5.3.3).

During the assembling process, all contributions are added to values stored in a flexible sparse matrix structure based on a balanced binary tree. The advantage of this structure is that new entries can be easily added at any place and any time. The purpose of this matrix format is for data entry only, so no mathematical operations are defined for this structure. Since diagonal entries are always required to be assembled (zero diagonals are not allowed), they are stored in an array allowing very fast access. So the dimension of the linear equation system must be known in advance before the structure can be allocated. In the format sorted by row indices, all off-diagonal entries are stored in a balanced binary tree for each row. This allows one to delete complete rows very efficiently. If complete columns have to be deleted, for example required for the transformation matrix, it is faster to store the off-diagonal entries sorted by columns, which can be specified on construction.

After the assembling has been finished, that is after the flexible sparse matrix structure is completely constructed, these structures are converted to the sparse matrix format MCSR, which stands for Modified Compressed Sparse Row [178]. The analogous, column-oriented Modified Compressed Sparse Column format MCSC is used to speed up column deleting. See Appendix E.1 for a detailed description of these formats. The advantages of using compressed structures can be summarized as follows [54]: if $\mathbf{A}$ is considered as a dense matrix with a dimension of

, it requires

storage. The so-called big-

notation

[158] is a theoretical measure usually for the time or memory required by an algorithm, given for the problem size

, which is normally the number of items processed. Thus, a matrix with

stored with double precision numbers requires $100,000^2 \times 8\,$ bytes. These roughly $75\,$ GB are a huge chunk of memory even for today's computers. Since typically sparse systems are solved with similar or even higher dimensions, dense formats and algorithms require prohibitively high amounts of memory and time. Thus, the objective of sparse matrix formats and algorithms is solving linear equation systems with time and space proportional to $O(n) + O(\ensuremath{n_{\mathrm{non-zero}}})$ , with $n_{\mathrm{non-zero}}$ as the number of non-zeros.

During the Newton iterations the structural configuration of these matrices is not modified very often. A structural reconfiguration can be triggered by a change of iteration schemes, for example for enabling more derivatives in the Jacobian (see Section 3.6.4). The assembly module is designed to take such considerations into account. If the structure remains unchanged, the balanced binary trees can be skipped and the variables may be entered directly in the already existing MCSR structures. Hence, the effort for deleting, tree assembling, reallocating and converting can be saved which drastically speeds up the assembly process. The so-called Newton adjustment addresses not only the assembly matrices, but also the resulting structures of the compilation and pre-elimination process. The performance impact of these features has been analyzed and is discussed in Section 5.5.2. See Section 4.12 for more details on the Newton adjustment levels.

After the four compressed sparse matrix structures have been completely constructed, the following steps discussed in the respective sections are performed: