The SHE equations furnish a number of interesting structural properties, which are the topic of this chapter. All properties are discussed on the continuous level and result from certain symmetries of the underlying physical processes. Consequently, no additional assumptions about a particular discretization method need to be imposed and the same structural properties hold true for the system of discretized linear equations as outlined in Chap. 5.
In Sec. 4.1 the coupling between the SHE equations is investigated and a scheme for the lossless compression of the system matrix is proposed, which greatly reduces the memory consumption of the SHE method at high expansion orders. The different boundary conditions used for the SHE method are discussed in Sec. 4.4 and extensions to the system matrix compression scheme are proposed in order to handle them. This chapter closes with some numerical results demonstrating the benefits of the proposed compression scheme.
As already noted in Sec. 2.3, the terms , and determine the coupling among the projected equations (2.30). If all coupling coefficients were multiples of the Kronecker delta, then the SHE equations were decoupled and could be solved individually. On the contrary, if all coupling coefficients were nonzero, all projected equations were coupled with each other. In analogy to systems of linear equations described by matrices, one is typically interested in a weak coupling, which typically accelerates the solution process, and on the other hand leads to lower memory requirements. A closer inspection of the coupling structure for spherically symmetric dispersion relations is carried out in this section.
For general band structures, the symmetry of the underlying processes leads to the following result: [53]
Theorem 1 (Jungemann et. al.). For a spherical harmonics expansion up to order with , there holds for all , and
j2i,m2i′,m′ = j2i+1,m2i′+1,m′ = 0, Γ2i,m2i′,m′ = Γ2i+1,m2i′+1,m′ = 0 . |
The essence of this theorem is that all nonzero coupling coefficients possess different parities in the leading indices. This small structural information about the coupling was already used for a preprocessing step for the solution of the discretized equations in [53].
Under the assumption of spherical energy bands, i.e. , the velocity , the modulus of the wave vector and the generalized density of states only depend on the energy , but not on the angles . Consequently, (2.31), (2.32) and (2.31) are rewritten as
The coupling between index pairs and is determined by the integral terms and only. As shown by the author, it turns out that the coupling is rather weak:
Theorem 2. For spherical energy bands and indices , and , there holds:
A rather lengthy and technical proof is given in [87] and makes use of recurrence relations and orthogonalities of trigonometric functions and associated Legendre functions.
Thm. 2 is very important for large expansion orders : The total number of unknown expansion coefficients is , but according to the result of the theorem, each is directly coupled with at most ten other coefficients. It should also be noted that the weak coupling stated in Thm. 2 has already been observed for less general situations in earlier publications [78, 37].
Next, the structure of the SHE equations (2.30) for spherical energy bands is investigated in depth. Unlike the presentation in [87], the structure is investigated already at the continuous level, which has the advantage of covering a wider class of discretization schemes and less notational clutter. Due to the equivalence of (2.30) and (2.38) for spherical energy bands, the results in this section apply equally to an expansion of the distribution function and to an expansion of the generalized distribution function .
A motivation for a first-order expansion is now given, then the scheme is extended to general expansion orders. Ignoring the stabilization using MEDS in a first step, the four SHE equations using Einstein summation convention read
+ vZa0,0l′,m′ ⋅∇xfl′,m′ + b0,0l′,m′ ⋅F + ∑ ηfl′,m′ | = 0 , | ||
+ vZa1,−1l′,m′ ⋅∇xfl′,m′ + b1,−1l′,m′ ⋅F + ∑ ηfl′,m′ | = 0 , | ||
+ vZa1,0l′,m′ ⋅∇xfl′,m′ + b1,0l′,m′ ⋅F + ∑ ηfl′,m′ | = 0 , | ||
+ vZa1,1l′,m′ ⋅∇xfl′,m′ + b1,1l′,m′ ⋅F + ∑ ηfl′,m′ | = 0 . |
This continuous linear system of partial differential equations can be written in operator matrix form as
| (4.4) |
where the gradient and the time derivative act as an operator on the expansion coefficients, not on the coefficient matrices. The crucial observation now is that all entries in each of the matrices are independent of the spatial variable as well as of total energy . Writing and in components,
F(x) = ,x = , |
For the case of a general expansion order , an enumeration of the index pairs is required. Let denote such a mapping, for instance . Repeating the calculation for the general case, one obtains that the operator matrix for the SHE equations can be written as
The decomposition of the continuous case is preserved after spatial discretization. To see this, let denote the matrices arising from an arbitrary discretization of the terms . In most cases, a finite volume or a finite element method will be employed, but also less widespread techniques such as wavelet methods could be used. Then, the discrete system matrix for the SHE equations can be written as
In light of the memory requirements for the system matrix it is also worthwhile to point out the importance of Thm. 2: Without the sparsity of coupling coefficients and without the use of a Kronecker representation, memory is required. With the sparsity of coupling coefficients, only memory is required for a full representation of , which is further reduced to when using a Kronecker product representation. Since typical values of are in the range three to seven, the memory savings due to Thm. 2 combined with (4.21) can be orders of magnitude.
The matrix compression described in the previous section relies on the factorizations (4.2) and (4.3) of the coupling terms and , such that the factors depend on either energy (and possibly the spatial location) or on the indices , , and , but not on both. Moreover, the derivation requires that the density of states does not show an angular dependence. However, in the case of nonspherical energy bands, these three terms depend on energy and on angles.
In order to decouple the radial (energy) contributions from the angular ones for nonspherical energy band models, a spherical projection up to order of the coupling terms can be performed by approximating [53]
l′′,m′′(ε) | = ∫ Ωv(ε,θ,φ)Y l′′,m′′ (θ,φ)dΩ , | ||
l′′,m′′(ε) | = ∫ ΩY l′′,m′′ (θ,φ)dΩ , | ||
l′′,m′′(ε) | = ∫ ΩZ(ε,θ,φ)Y l′′,m′′ (θ,φ)dΩ . |
The expansion order of the generalized density of states is implicitly coupled to the expansion order of the distribution function by the scattering operator (2.23). Thus, even if is expanded up to order , only expansion terms up to order can be considered. For this reason is assumed in the following.
Substitution of the expansions (4.23) and (4.24) into (4.2) and (4.3) yields
jl,ml′,m′ | = l′′,m′′(ε)∫ ΩY l,mY l′,m′ Y l′′,m′′ dΩ =: vl′′,m′′(ε)al,ml′,m′;l′′,m′′ , | ||
Γl,ml′,m′ | = l′′,m′′(ε)∫ Ωeθ + eφY l′,m′ Y l′′,m′′ dΩ =: Γl′′,m′′(ε)bl,ml′,m′;l′′,m′′ , |
The stabilization schemes discussed in Sec. 2.3 lead to different equations for even-order and odd-order projected equations. Since MEDS basically exchanges the summation index and the sign of the coupling term , the representations (4.6) and (4.26) can extended to
For the discretization it is assumed that even-order harmonics can be discretized in a different manner than odd-order harmonics. If all even-order unknowns are enumerated first, the system matrix has a block-structure of the form
Since the coupling structure of the scattering operator is explicitly given by (2.23), the structure of and is as follows:
Theorem 3. The following statements hold true for a system matrix (4.29) obtained from a discretization of the stabilized SHE equations in steady-state with velocity-randomizing scattering processes only:
This structural information is very important for the construction of solution schemes in Sec. 4.5.
Only the discretization of the resulting system of partial differential equations in the interior of the simulation domain has been considered in an abstract fashion so far. At the boundary, suitable conditions need to be imposed and incorporated into the proposed compression scheme.
At all noncontact boundaries, homogeneous Neumann boundary conditions are imposed [78, 53, 42], which can be directly included in the proposed compression scheme, because no additional boundary terms appear. At the contact boundaries, two different types of boundary conditions are typically imposed. The first type are Dirichlet boundary conditions [78], where the distribution function is set to a Maxwell distribution. Hence, the expansion coefficient is set according to (2.1), while is set to zero for . This it either enforced by replacing the corresponding matrix row with unity in the diagonal and zeros elsewhere as well as setting the appropriate value at the right hand side vector, or by directly absorbing the known values to the load vector. For the proposed compression scheme, the second way is of advantage, because in that case boundary conditions do not alter the matrix structure.
The second type of contact boundary conditions is a Robin-type generation/recombination process [53]
γl,m(ε) = − , |
Γs = [feq(k′)ϑ(v ⋅n) + f(k′)ϑ(−v ⋅n)]v ⋅n , |
As can be easily verified, can also be written as a sum of Kronecker products. Consequently, the following discussions are based on the case of Dirichlet boundary conditions, but are also applicable to the more general case including Robin-type boundary conditions, which only add additional summands.
The system matrix representation introduced in the previous sections is of use only if the resulting scheme can be solved without recovering the full matrix again. Such a reconstruction is, in principle, necessary if direct solvers such as the Gauss algorithm are used, because the matrix structure is altered in a way that destroys the block structure. In contrast, for many popular iterative solvers from the family of Krylov methods, it is usually sufficient to provide matrix-vector multiplications. Therefore, methods to compute the matrix-vector product for a given vector are discuss first.
Given a system matrix of the form (4.21) or (4.27), it is well known that a row-by-row reconstruction of the compressed matrix for the matrix-vector product is not efficient. Therefore, the vector is decomposed into blocks of size by
In the case of spherical energy bands, it can be shown that the matrix-vector multiplication requires slightly less computational effort than the uncompressed case [87]. As discussed in Sec. 4.2, nonspherical bands lead to a larger number of summands, thus leading to a higher computational effort for the matrix-vector multiplications compared to the uncompressed case. Nevertheless, the additional computational effort is increased only moderately, while the memory requirements can be reduced significantly.
Recent publications report the elimination of odd order unknowns in a preprocessing step [53, 42]. Moreover, it has been shown that for a first-order expansion the system matrix after elimination of odd order unknowns is an -matrix [42]. Furthermore, numerical experiments indicate a considerable improvement in the convergence of iterative solvers. However, for a matrix structure as given by (4.29), a direct elimination of odd order unknowns would destroy the representation of the system matrix as a sum of Kronecker products. Writing the system as
In contrast to a matrix-vector multiplication with the full system matrix , where the proposed matrix compression scheme requires approximately the same computational effort, a matrix-vector multiplication with the condensed matrix is more expensive than a matrix-vector multiplication with a fully set up condensed matrix. The penalty is quantified in Sec. 4.6.
With a system matrix representation of the form (4.21), the total memory needed for the SHE equations is essentially given by the memory required for the unknowns, which adds another perspective on the selection of the iterative solver. Since , the system matrix is not symmetric. Moreover, numerical experiments indicate that the matrix is indefinite, thus many popular solvers cannot be used. A popular solver for indefinite problems is GMRES [89, 112], which is typically restarted after, say, steps and denoted by GMRES(). Since GMRES has been used in recent publications on SHE simulations [53, 42], it deserves a closer inspection. For a system with unknowns, the memory required by GMRES() during the solution process is . In typical applications, in which the system matrix is uncompressed, this additional memory approximately equals the amount of memory needed for the storage of the system matrix, hence it is not a major concern. However, with the system matrix representation (4.6) the memory needed for the unknowns is dominant, thus the additional memory for GMRES() directly pushes the overall memory requirements from to . The number of steps is typically chosen between and as smaller values may lead to smaller convergence rates or the solver may even fail to converge within a reasonable number of iterations. Hence, it is concluded that GMRES() is too expensive for SHE simulations when using the representation (4.6). Instead, iterative solvers with smaller memory consumption such as BiCGStab [102] should be used.
In this section execution times and memory requirements of the Kronecker product representation (4.6) using a single core of a machine with a Core 2 Quad 9550 CPU are reported. To simplify wording, the representation (4.6) will be referred to as matrix compression scheme. All simulations were carried out for a stationary two-dimensional device on a regular staggered grid with nodes in -space for various expansion orders. Spherical energy bands were assumed and the -transform and MEDS were used for stabilization. A fixed potential distribution was applied to the device to obtain comparable results. For self-consistency with the Poisson equation using a Newton scheme, similar results can in principle be obtained by application of the matrix compression scheme to the Jacobian.
Unknowns | |||
1 | 3.7 MB | 4.7 MB | 0.2 MB |
3 | 28.4 MB | 4.7 MB | 1.4 MB |
5 | 83.1 MB | 4.7 MB | 3.5 MB |
7 | 168 MB | 4.8 MB | 6.6 MB |
9 | 263 MB | 4.8 MB | 10.7 MB |
11 | 470 MB | 4.8 MB | 15.7 MB |
13 | 709 MB | 4.9 MB | 21.6 MB |
First, memory requirements for the storage of the system matrix are compared. The total number of entries stored in the matrix were extracted, multiplied by three to account for row and column indices. Eight bytes per entry for double precision are used. In this way, the influence of different sparse matrix storage schemes is eliminated. The results in Fig. 4.1 and Fig. 4.2 clearly demonstrate the asymptotic advantage of (4.6): While no savings are observed at , memory savings of a factor of are obtained already at an expansion order of . At , this factor grows to . In particular, the memory requirement for the matrix compression scheme shows only a weak dependence on and is determined only by the additional memory needed for the coupling matrices in (4.14)-(4.20). With increasing expansion order , the additional memory requirements for the compressed scheme grow quadratically with because of the spherical harmonics of degree smaller or equal to . Even at the additional memory compared to is less than one megabyte. Consequently, the memory used for the unknowns dominates already at moderate values of , cf. Fig. 4.1 and Fig. 4.2.
Full system | Condensed
| |||
compr. | compr. | |||
1 | 3.9 | 7.4 | 0.2 | 9.2 |
3 | 28.4 | 19.3 | 4.0 | 17.9 |
5 | 73.9 | 53.2 | 15.7 | 48.9 |
7 | 134.8 | 98.3 | 36.5 | 92.2 |
9 | 228.1 | 160.7 | 68.2 | 149.8 |
Execution times for matrix-vector multiplications are compared in Fig. 4.3 for the case of a full system matrix and the system matrix with odd expansion orders eliminated. For the lowest expansion order , the matrix compression does not pay off and execution times are by a factor of two larger then for the standard storage scheme. This is due to the additional structural overhead of the compressed scheme at expansion order , where no compression effect occurs. However, for larger values of , the matrix compression scheme leads to faster matrix-vector multiplications with the full system of linear equations. An additional performance gain of about ten percent is observed. Comparing execution times for the condensed system, where odd order unknowns have been eliminated in a preprocessing step, the runtime penalty for matrix-vector multiplication is a factor of at , but in this case there is no compression effect anyway. At , the runtime penalty is only a factor of three and drops to slightly above two at . In conclusion, the matrix compression scheme allows for a trade-off between memory consumption and execution time, where the memory reduction is much larger than the penalty on execution times.
L | GMRES(50) | GMRES(30) | GMRES(10) | BiCGStab | Unknowns |
1 | 10.2 MB | 6.2 MB | 2.2 MB | 1.2 MB | 0.2 MB |
3 | 71.4 MB | 43.4 MB | 15.4 MB | 8.4 MB | 1.4 MB |
5 | 178.5 MB | 108.5 MB | 38.5 MB | 21.0 MB | 3.5 MB |
7 | 336.6 MB | 204.7 MB | 72.6 MB | 39.6 MB | 6.6 MB |
9 | 545.7 MB | 331.7 MB | 117.7 MB | 64.2 MB | 10.7 MB |
11 | 800.7 MB | 486.7 MB | 172.7 MB | 93.5 MB | 15.7 MB |
13 | 1101.6 MB | 669.6 MB | 237.6 MB | 129.6 MB | 21.6 MB |
A comparison of the additional memory required by GMRES(50), GMRES(30), GMRES(10) and BiCGStab is shown in Tab. 4.1 and Fig. 4.4. As expected, GMRES leads to higher memory requirements than many other Krylov methods such as BiCGStab. For GMRES(), auxiliary vectors of the same length as the vector of unknowns are used, while BiCGStab uses six auxiliary vectors of that size. It can clearly be seen that the memory required by GMRES(50) is by one order of magnitude larger than the memory needed for the compressed system (i.e. second and third column in Fig. 4.1) and BiCGStab. On the other hand, without system matrix compression, the additional memory needed by GMRES(50) is comparable to the memory needed for the system matrix and is thus less of a concern.