The SHE equations furnish a number of interesting structural properties, which are the topic of this chapter. All properties are discussed on the continuous level and result from certain symmetries of the underlying physical processes. Consequently, no additional assumptions about a particular discretization method need to be imposed and the same structural properties hold true for the system of discretized linear equations as outlined in Chap. 5.
In Sec. 4.1 the coupling between the SHE equations is investigated and a scheme for the lossless compression of the system matrix is proposed, which greatly reduces the memory consumption of the SHE method at high expansion orders. The different boundary conditions used for the SHE method are discussed in Sec. 4.4 and extensions to the system matrix compression scheme are proposed in order to handle them. This chapter closes with some numerical results demonstrating the benefits of the proposed compression scheme.
As already noted in Sec. 2.3, the terms ,
and
determine the coupling
among the projected equations (2.30). If all coupling coefficients were multiples of the Kronecker
delta, then the SHE equations were decoupled and could be solved individually. On the contrary,
if all coupling coefficients were nonzero, all projected equations were coupled with each
other. In analogy to systems of linear equations described by matrices, one is typically
interested in a weak coupling, which typically accelerates the solution process, and
on the other hand leads to lower memory requirements. A closer inspection of the
coupling structure for spherically symmetric dispersion relations is carried out in this
section.
For general band structures, the symmetry of the underlying processes leads to the following result: [53]
Theorem 1 (Jungemann et. al.). For a spherical harmonics expansion up to order
with
, there holds for all
,
and
j2i,m2i′,m′ = j2i+1,m2i′+1,m′ = 0, Γ2i,m2i′,m′ = Γ2i+1,m2i′+1,m′ = 0 . |
The essence of this theorem is that all nonzero coupling coefficients possess different parities in the leading indices. This small structural information about the coupling was already used for a preprocessing step for the solution of the discretized equations in [53].
Under the assumption of spherical energy bands, i.e. , the velocity
, the
modulus of the wave vector
and the generalized density of states only depend on the energy
, but not on the angles
. Consequently, (2.31), (2.32) and (2.31) are rewritten
as
The coupling between index pairs and
is determined by the integral terms
and
only. As shown by the author, it turns out that the coupling is rather
weak:
Theorem 2. For spherical energy bands and indices ,
and
, there holds:
A rather lengthy and technical proof is given in [87] and makes use of recurrence relations and orthogonalities of trigonometric functions and associated Legendre functions.
Thm. 2 is very important for large expansion orders : The total number of unknown
expansion coefficients is
, but according to the result of the theorem, each
is
directly coupled with at most ten other coefficients. It should also be noted that the weak
coupling stated in Thm. 2 has already been observed for less general situations in earlier
publications [78, 37].
Next, the structure of the SHE equations (2.30) for spherical energy bands is investigated in
depth. Unlike the presentation in [87], the structure is investigated already at the
continuous level, which has the advantage of covering a wider class of discretization
schemes and less notational clutter. Due to the equivalence of (2.30) and (2.38) for
spherical energy bands, the results in this section apply equally to an expansion of the
distribution function and to an expansion of the generalized distribution function
.
A motivation for a first-order expansion is now given, then the scheme is extended to general expansion orders. Ignoring the stabilization using MEDS in a first step, the four SHE equations using Einstein summation convention read
![]() ![]() ![]() | = 0 , | ||
![]() ![]() ![]() | = 0 , | ||
![]() ![]() ![]() | = 0 , | ||
![]() ![]() ![]() | = 0 . |
This continuous linear system of partial differential equations can be written in operator matrix form as
![]() | (4.4) |
where the gradient and the time derivative act as an operator on the expansion coefficients, not
on the coefficient matrices. The crucial observation now is that all entries in each of the matrices
are independent of the spatial variable as well as of total energy
. Writing
and
in components,
F(x) = ![]() ![]() |
For the case of a general expansion order , an enumeration of the index pairs
is
required. Let
denote such a mapping, for instance
. Repeating the
calculation for the general case, one obtains that the operator matrix
for the SHE equations
can be written as
The decomposition of the continuous case is preserved after spatial discretization. To see
this, let denote the matrices arising from an arbitrary discretization of
the terms
. In most cases, a finite volume or a finite element method will
be employed, but also less widespread techniques such as wavelet methods could be
used. Then, the discrete system matrix
for the SHE equations can be written
as
In light of the memory requirements for the system matrix it is also worthwhile to
point out the importance of Thm. 2: Without the sparsity of coupling coefficients and
without the use of a Kronecker representation, memory is
required. With the sparsity of coupling coefficients, only
memory is required for
a full representation of
, which is further reduced to
when using a
Kronecker product representation. Since typical values of
are in the range three
to seven, the memory savings due to Thm. 2 combined with (4.21) can be orders of
magnitude.
The matrix compression described in the previous section relies on the factorizations (4.2) and
(4.3) of the coupling terms and
, such that the factors depend on either energy (and
possibly the spatial location) or on the indices
,
,
and
, but not on both. Moreover,
the derivation requires that the density of states does not show an angular dependence. However,
in the case of nonspherical energy bands, these three terms depend on energy and on
angles.
In order to decouple the radial (energy) contributions from the angular ones for nonspherical
energy band models, a spherical projection up to order of the coupling terms can be
performed by approximating [53]
![]() | = ∫ Ωv(ε,θ,φ)Y l′′,m′′ (θ,φ)dΩ , | ||
![]() | = ∫
Ω![]() | ||
![]() | = ∫ ΩZ(ε,θ,φ)Y l′′,m′′ (θ,φ)dΩ . |
The expansion order of the generalized density of states is implicitly coupled to the
expansion order
of the distribution function by the scattering operator (2.23). Thus, even if
is expanded up to order
, only expansion terms up to order
can be considered. For
this reason
is assumed in the following.
Substitution of the expansions (4.23) and (4.24) into (4.2) and (4.3) yields
jl,ml′,m′ | = ![]() | ||
Γl,ml′,m′ | = ![]() ![]() ![]() ![]() ![]() ![]() |
The stabilization schemes discussed in Sec. 2.3 lead to different equations for even-order and
odd-order projected equations. Since MEDS basically exchanges the summation index and the
sign of the coupling term , the representations (4.6) and (4.26) can extended
to
For the discretization it is assumed that even-order harmonics can be discretized in a different
manner than odd-order harmonics. If all even-order unknowns are enumerated first, the system
matrix has a block-structure of the form
Since the coupling structure of the scattering operator is explicitly given by (2.23), the
structure of and
is as follows:
Theorem 3. The following statements hold true for a system matrix (4.29) obtained from a discretization of the stabilized SHE equations in steady-state with velocity-randomizing scattering processes only:
This structural information is very important for the construction of solution schemes in Sec. 4.5.
Only the discretization of the resulting system of partial differential equations in the interior of the simulation domain has been considered in an abstract fashion so far. At the boundary, suitable conditions need to be imposed and incorporated into the proposed compression scheme.
At all noncontact boundaries, homogeneous Neumann boundary conditions are imposed
[78, 53, 42], which can be directly included in the proposed compression scheme, because no
additional boundary terms appear. At the contact boundaries, two different types of boundary
conditions are typically imposed. The first type are Dirichlet boundary conditions [78], where
the distribution function is set to a Maxwell distribution. Hence, the expansion coefficient is
set according to (2.1), while
is set to zero for
. This it either enforced by
replacing the corresponding matrix row with unity in the diagonal and zeros elsewhere as well as
setting the appropriate value at the right hand side vector, or by directly absorbing the
known values to the load vector. For the proposed compression scheme, the second
way is of advantage, because in that case boundary conditions do not alter the matrix
structure.
The second type of contact boundary conditions is a Robin-type generation/recombination process [53]
γl,m(ε) = −![]() |
Γs = [feq(k′)ϑ(v ⋅n) + f(k′)ϑ(−v ⋅n)]v ⋅n , |
As can be easily verified, can also be written as a sum of Kronecker products.
Consequently, the following discussions are based on the case of Dirichlet boundary conditions,
but are also applicable to the more general case including Robin-type boundary conditions, which
only add additional summands.
The system matrix representation introduced in the previous sections is of use only if the
resulting scheme can be solved without recovering the full matrix again. Such a reconstruction is,
in principle, necessary if direct solvers such as the Gauss algorithm are used, because the matrix
structure is altered in a way that destroys the block structure. In contrast, for many popular
iterative solvers from the family of Krylov methods, it is usually sufficient to provide
matrix-vector multiplications. Therefore, methods to compute the matrix-vector product
for a given vector
are discuss first.
Given a system matrix of the form (4.21) or (4.27), it is well known that a row-by-row
reconstruction of the compressed matrix for the matrix-vector product is not
efficient. Therefore, the vector
is decomposed into
blocks of size
by
In the case of spherical energy bands, it can be shown that the matrix-vector multiplication requires slightly less computational effort than the uncompressed case [87]. As discussed in Sec. 4.2, nonspherical bands lead to a larger number of summands, thus leading to a higher computational effort for the matrix-vector multiplications compared to the uncompressed case. Nevertheless, the additional computational effort is increased only moderately, while the memory requirements can be reduced significantly.
Recent publications report the elimination of odd order unknowns in a preprocessing step
[53, 42]. Moreover, it has been shown that for a first-order expansion the system matrix after
elimination of odd order unknowns is an -matrix [42]. Furthermore, numerical experiments
indicate a considerable improvement in the convergence of iterative solvers. However, for a matrix
structure as given by (4.29), a direct elimination of odd order unknowns would destroy the
representation of the system matrix
as a sum of Kronecker products. Writing the system
as
In contrast to a matrix-vector multiplication with the full system matrix , where the
proposed matrix compression scheme requires approximately the same computational effort, a
matrix-vector multiplication with the condensed matrix
is more
expensive than a matrix-vector multiplication with a fully set up condensed matrix. The penalty
is quantified in Sec. 4.6.
With a system matrix representation of the form (4.21), the total memory needed for the SHE
equations is essentially given by the memory required for the unknowns, which adds another
perspective on the selection of the iterative solver. Since , the system matrix
is not symmetric. Moreover, numerical experiments indicate that the matrix
is
indefinite, thus many popular solvers cannot be used. A popular solver for indefinite
problems is GMRES [89, 112], which is typically restarted after, say,
steps and
denoted by GMRES(
). Since GMRES has been used in recent publications on SHE
simulations [53, 42], it deserves a closer inspection. For a system with
unknowns, the
memory required by GMRES(
) during the solution process is
. In typical
applications, in which the system matrix is uncompressed, this additional memory
approximately equals the amount of memory needed for the storage of the system
matrix, hence it is not a major concern. However, with the system matrix representation
(4.6) the memory needed for the unknowns is dominant, thus the additional memory
for GMRES(
) directly pushes the overall memory requirements from
to
. The number of steps
is typically chosen between
and
as smaller
values may lead to smaller convergence rates or the solver may even fail to converge
within a reasonable number of iterations. Hence, it is concluded that GMRES(
)
is too expensive for SHE simulations when using the representation (4.6). Instead,
iterative solvers with smaller memory consumption such as BiCGStab [102] should be
used.
In this section execution times and memory requirements of the Kronecker product representation
(4.6) using a single core of a machine with a Core 2 Quad 9550 CPU are reported. To simplify
wording, the representation (4.6) will be referred to as matrix compression scheme. All
simulations were carried out for a stationary two-dimensional device on a regular staggered grid
with nodes in
-space for various expansion orders. Spherical energy
bands were assumed and the
-transform and MEDS were used for stabilization. A
fixed potential distribution was applied to the device to obtain comparable results.
For self-consistency with the Poisson equation using a Newton scheme, similar results
can in principle be obtained by application of the matrix compression scheme to the
Jacobian.
![]() | ![]() | ![]() | Unknowns |
1 | 3.7 MB | 4.7 MB | 0.2 MB |
3 | 28.4 MB | 4.7 MB | 1.4 MB |
5 | 83.1 MB | 4.7 MB | 3.5 MB |
7 | 168 MB | 4.8 MB | 6.6 MB |
9 | 263 MB | 4.8 MB | 10.7 MB |
11 | 470 MB | 4.8 MB | 15.7 MB |
13 | 709 MB | 4.9 MB | 21.6 MB |
First, memory requirements for the storage of the system matrix are compared. The total
number of entries stored in the matrix were extracted, multiplied by three to account for row and
column indices. Eight bytes per entry for double precision are used. In this way, the influence of
different sparse matrix storage schemes is eliminated. The results in Fig. 4.1 and Fig. 4.2
clearly demonstrate the asymptotic advantage of (4.6): While no savings are observed
at , memory savings of a factor of
are obtained already at an expansion
order of
. At
, this factor grows to
. In particular, the memory
requirement for the matrix compression scheme shows only a weak dependence on
and is determined only by the additional memory needed for the coupling matrices
in (4.14)-(4.20). With increasing expansion order
, the additional memory
requirements for the compressed scheme grow quadratically with
because of the
spherical harmonics of degree smaller or equal to
. Even at
the additional
memory compared to
is less than one megabyte. Consequently, the memory
used for the unknowns dominates already at moderate values of
, cf. Fig. 4.1 and
Fig. 4.2.
Full system | Condensed
| |||
![]() | ![]() | compr. | ![]() | compr. |
1 | 3.9 | 7.4 | 0.2 | 9.2 |
3 | 28.4 | 19.3 | 4.0 | 17.9 |
5 | 73.9 | 53.2 | 15.7 | 48.9 |
7 | 134.8 | 98.3 | 36.5 | 92.2 |
9 | 228.1 | 160.7 | 68.2 | 149.8 |
Execution times for matrix-vector multiplications are compared in Fig. 4.3 for the case of a
full system matrix and the system matrix with odd expansion orders eliminated. For the lowest
expansion order , the matrix compression does not pay off and execution times are by a
factor of two larger then for the standard storage scheme. This is due to the additional structural
overhead of the compressed scheme at expansion order
, where no compression effect
occurs. However, for larger values of
, the matrix compression scheme leads to faster
matrix-vector multiplications with the full system of linear equations. An additional performance
gain of about ten percent is observed. Comparing execution times for the condensed
system, where odd order unknowns have been eliminated in a preprocessing step, the
runtime penalty for matrix-vector multiplication is a factor of
at
, but in
this case there is no compression effect anyway. At
, the runtime penalty is
only a factor of three and drops to slightly above two at
. In conclusion, the
matrix compression scheme allows for a trade-off between memory consumption and
execution time, where the memory reduction is much larger than the penalty on execution
times.
L | GMRES(50) | GMRES(30) | GMRES(10) | BiCGStab | Unknowns |
1 | 10.2 MB | 6.2 MB | 2.2 MB | 1.2 MB | 0.2 MB |
3 | 71.4 MB | 43.4 MB | 15.4 MB | 8.4 MB | 1.4 MB |
5 | 178.5 MB | 108.5 MB | 38.5 MB | 21.0 MB | 3.5 MB |
7 | 336.6 MB | 204.7 MB | 72.6 MB | 39.6 MB | 6.6 MB |
9 | 545.7 MB | 331.7 MB | 117.7 MB | 64.2 MB | 10.7 MB |
11 | 800.7 MB | 486.7 MB | 172.7 MB | 93.5 MB | 15.7 MB |
13 | 1101.6 MB | 669.6 MB | 237.6 MB | 129.6 MB | 21.6 MB |
A comparison of the additional memory required by GMRES(50), GMRES(30), GMRES(10)
and BiCGStab is shown in Tab. 4.1 and Fig. 4.4. As expected, GMRES leads to higher memory
requirements than many other Krylov methods such as BiCGStab. For GMRES(),
auxiliary vectors of the same length as the vector of unknowns are used, while BiCGStab
uses six auxiliary vectors of that size. It can clearly be seen that the memory required
by GMRES(50) is by one order of magnitude larger than the memory needed for the
compressed system (i.e. second and third column in Fig. 4.1) and BiCGStab. On the
other hand, without system matrix compression, the additional memory needed by
GMRES(50) is comparable to the memory needed for the system matrix and is thus less of a
concern.