Chapter 7 Bottom-Up Correction for Re-Distancing
The previous chapter presented an algorithm used in the Re-Distancing step in a top-down manner (processing the levels of a hierarchical grid from coarsest to finest). In contrast, this chapter proposes a bottom-up correction algorithm to increase the accuracy of the signed-distance field on coarser levels of the hierarchical grid. This is important to couple the levels of the hierarchical grid together, because Re-Gridding may remove higher levels from the hierarchical grid. An additional goal is to keep the computational overhead low with respect to the always necessary top-down re-distancing algorithm. The developed algorithm does not only increase the accuracy in regions covered by blocks on a higher level, as is possible through straightforward interpolation, but also in regions not covered by blocks on higher levels.
First, the idea and the algorithmic implementation of the proposed bottom-up correction algorithm, which is the core contribution of this chapter [100], is discussed (Section 7.1). Then the proposed bottom-up correction algorithm is evaluated, by two criteria (Section 7.2):
-
• Accuracy: Computing the errors (difference to an exact solution) of the signed-distance field.
-
• Performance: Measuring the run-time compared to Re-Distancing without the correction algorithm.
The evaluation examples consider interfaces representing typical challenges in process TCAD simulations, i.e., corners and trenches. The examples are prepared and chosen so as to allow for the exact computation of a signed-distance field as explicit interface representations (e.g., triangles) are available.
7.1 Algorithmic Implementation
The core idea of the developed bottom-up correction algorithm is to process the levels of a hierarchical grid in reversed order. The reversed order allows to incorporate the solution from spatially fine resolved regions into coarse resolved regions.
The algorithm (cf. Algorithm 10) operates in a bottom-up manner, starting from the second highest level: The highest level may not be corrected, because there is no higher level from which to derive a correction. First, the signed-distance field is interpolated based on the solution on the higher level of the hierarchical grid and then the FMM is used to improve the accuracy of the signed-distance field. The usage of the FMM allows to improve the accuracy in regions which are not covered by blocks on higher levels.
1 procedure Correction():
2
3 while
/* Initialization Phase */
4 foreach Block on Level
5 foreach ChildBlock on Level
6 InterpolatedLists
7 end foreach
9 initializeFMMCorrection()
10 end foreach
11 Wait // Synchronization barrier
/* Marching Phase */
12 foreach Block on Level
13 Marching() // Create task
14 end foreach
15 Wait // Synchronization barrier
17 end while
18 end procedure
Algorithm 10: The bottom-up correction algorithm processes the hierarchical grid in a bottom-up manner. First, the signed-distance field on a level is interpolated based on the solution of the next higher level. Then the FMM is initialized keeping the flags from the previous (top-down) re-distancing. Finally, an unmodified marching (from the FMM) is used to correct the solution also in unrefined regions.
In a practical setting, the overhead of the bottom-up correction algorithm is reduced by reusing the flags (adapting the initialization of the FMM) used by the presented (top-down) algorithm in Section 5.3. On every level the developed algorithm is split into two phases (same phases as the FMM):
-
• Initialization: Interpolation of the signed-distance values based on the values on higher levels and collect the corresponding grid points in block-specific lists. Those lists are then used to initialize the Band data structure (Heap).
-
• Marching: Execute the core multi-block FMM. For efficiency purposes grid points keep their signed-distance value and flag from the previous top-down re-distancing algorithm.
Initialization
The initialization phase starts in Algorithm 10 Line: 4. All blocks on the current level (Level
The
The grid points A, E, and I are not covered by a block on a higher level and, therefore, not interpolated. The grid points G, H, J, K, and L have a neighboring grid point with opposing sign. Thus, they are Close Points and must not be modified. The grid point B is also not interpolated as it is not fully covered by the computed values of the narrow-band on the higher level. The remaining grid points C, D, and F are interpolated.
In case of parallelization, there are no restrictions with respect to synchronizations (race conditions) because the nesting criteria of the hierarchical grid enforce that grid points are uniquely refined, i.e., each grid point on the lower level is interpolated by a dedicated set of grid points on the higher level. The interpolated points are first collected in a separate list: One list per block on the higher level. The lists are then merged through a reduction operation, so that for each block on the current level a single list is created containing all interpolated points on the block (Algorithm 10 Line: 8).
Afterwards the merged list of interpolated grid points is used to initialize the multi-block FMM [62]. The initialization is different
compared to the multi-block FMM, because all grid points keep their current
1 For the level-set method it is sufficient to solve the level-set equation only in a narrow-band around the interface (on grid points where the absolute distance to the interface is less than a given threshold); this is done for efficiency purposes
Marching
Subsequently, the unmodified marching of the FMM is performed until all the heaps are empty (cf. Algorithm 10 Line: 12). The specialized initialization avoids the re-computation of all grid points, because a re-computation of all grid points would result in an approximately doubled run-time per level compared to the Re-Distancing step without correction algorithm. The main disadvantage of this approach is that grid points for which the distance to the interface has been under-estimated are not corrected. This is inherent to the FMM because grid points are only processed, if their newly computed distance is lower than their current one. However, this is only a minor issue because the FMM tends to over-estimates the distance [169].
After a global synchronization barrier (ensuring all heaps are empty), the algorithm eventually moves to the next level by decreasing the level counter
7.2 Benchmark Examples and Analyses
The effect of the bottom-up correction algorithm on the accuracy is first analyzed on 2D examples. Additionally, a 3D example is used to study the performance impact.
The accuracy is measured by computing the error (difference of the signed-distance field to an exact solution) in three discrete norms
The domain for all benchmark examples is chosen as
7.2.1 Corner
The Corner and Sharp corner examples have a corner located near the center of the domain. The corner as well as the sides of the angle are purposely not aligned to the computational grid to account for the generic case of a level-set simulation. The angle of the Corner example is 110° and for the Sharp corner example 50°. Figure 7.2 shows the level-set values together with several iso-lines extracted form Level 0 and the block placement (yellow rectangles) for both corner examples. On each level of the hierarchical grid only a single block is present, which is located around the corner near the center of the domain. In regions with low curvature only the black iso-line is visible, because all three solutions match and, therefore, their iso-lines overlap.
The symmetric boundary conditions which are applied to the lateral domain dimensions create additional corners at the domain boundary. Around these corners purposely no refinement is made (no block placed), therefore, the signed-distance field is not corrected around those corners: This allows for evaluating the effect precisely for a single corner. Generally speaking, the first-order approximation computed by the FMM over-estimates the distance to the interface for rarefaction fans (reflex angle side) and under-estimates the distance to the interface for shock waves (obtuse and acute angle side). The bottom-up correction algorithm increases the accuracy of the signed-distance field on rarefaction fans even outside the refined regions, due to the marching.
In Table 7.1 and Table 7.2 the error norms and the
reduction by the proposed bottom-up correction algorithm are shown. For the Corner example the
Level |
|
L1-reduc. |
|
L2-reduc. |
|
inf-reduc. |
|
| |||||||
0 |
5.437e-3 |
3.260e-4 |
4.785e-2 |
||||
0 corrected |
2.491e-3 |
2.2 |
1.550e-4 |
2.1 |
3.079e-2 |
1.6 |
|
1 |
1.122e-3 |
5.101e-5 |
1.393e-2 |
||||
1 corrected |
6.035e-4 |
1.9 |
2.792e-5 |
1.8 |
8.541e-3 |
1.6 |
|
2 |
5.126e-4 |
1.819e-5 |
3.757e-3 |
||||
| |||||||
Level |
|
L1-reduc. |
|
L2-reduc. |
|
inf-reduc. |
|
| |||||||
0 |
9.110e-3 |
4.823e-4 |
6.212e-2 |
||||
0 corrected |
3.388e-3 |
2.7 |
1.812e-4 |
2.7 |
2.707e-2 |
2.3 |
|
1 |
1.894e-3 |
7.264e-5 |
1.753e-2 |
||||
1 corrected |
8.957e-4 |
2.1 |
3.484e-5 |
2.1 |
9.546e-3 |
1.8 |
|
2 |
8.866e-4 |
2.569e-5 |
4.661e-3 |
||||
| |||||||
7.2.2 Two-Dimensional Trench
The 2D Trench is an axis-aligned thin trench on an otherwise flat surface. The interesting fact about this trench is the small width of only 0.001, because the Level 0 grid has only a spatial resolution of 0.05, thus is not able to resolve the trench, because no grid points with opposing sign exist along the trench. There are four blocks on Level 1 and six on Level 2, covering the trench completely and, therefore, enable the interface representation.
Such high aspect ratio trenches, are common in semiconductor manufacturing [191, 120] and thus by extension also in process TCAD simulations.
As shown in Figure 7.3 the Re-distancing step without the bottom-up correction algorithm shows only a small dent of the level-set on
Level 0, while with the bottom-up correction algorithm the trench is well-resolved and present. This yields a significant reduction in the measured error norms (cf. Table 7.3): A reduction of 15.3 and 14.4, respectively for the
Level |
|
L1-reduc. |
|
L2-reduc. |
|
inf-reduc. |
|
| |||||||
0 |
9.101e-2 |
4.539e-3 |
4.839e-1 |
||||
0 corrected |
5.941e-3 |
15.3 |
3.148e-4 |
14.4 |
4.005e-2 |
12.1 |
|
1 |
6.732e-4 |
5.222e-5 |
1.262e-2 |
||||
1 corrected |
4.129e-4 |
1.6 |
3.339e-5 |
1.6 |
8.398e-3 |
1.5 |
|
2 |
6.709e-5 |
4.236e-6 |
3.400e-3 |
||||
| |||||||
7.2.3 Three-Dimensional Trench
The 3D example consists of (i) a step diagonal through the simulation domain and (ii) a thin trench from the center of the domain to one corner. In Figure 7.4 a rendering of the interface representing the trench is shown from the bottom (viewing it from the top would only show the step and a thin line unable to visually grasp the trench). Note that from the bottom perspective the trench looks like a thin fin instead of a thin trench because the difference between a fin and trench is just the viewpoint. The trench, as in the 2D Trench example, has a width smaller than the spatial resolution of Level 0. Additionally, the trench has a slight dent to avoid an alignment with the computational grid, to provide a more challenging test case. The dent is shown in Figure 7.4b, by a close up on the trench (also shown from the bottom). The example is also inspired by the common high aspect ratio geometries in microelectronic devices [192, 193].
Compared to the 2D examples there are significantly more blocks on Level 1 and Level 2, 107 and 591, respectively. Their placement is visualized in Figure 7.5, by showing their individual contributions to the interface. All blocks on Level 1 have combined
Accuracy Evaluation
The measured error norms are shown in Table 7.4. On Level 0 the reduction for any error norm is at least a factor of 4.2,
because the trench is not resolved on this level (cf. Figure 7.5a, where the extracted interface is based on Level 0 only). The
reductions in the error norm on Level 1 are similar to the 2D Trench examples for the
Level |
|
L1-reduc. |
|
L2-reduc. |
|
inf-reduc. |
|
| |||||||
0 |
1.853e-2 |
1.356e-4 |
4.593e-1 |
||||
0 corrected |
4.422e-3 |
4.2 |
2.644e-5 |
5.1 |
3.478e-2 |
13.2 |
|
1 |
2.426e-4 |
1.259e-6 |
1.418e-2 |
||||
1 corrected |
1.588e-4 |
1.5 |
8.429e-7 |
1.5 |
1.418e-2 |
1.0 |
|
2 |
3.165e-5 |
7.340e-8 |
3.017e-3 |
||||
| |||||||
Performance Evaluation
The 3D Trench example has a high computational load compared to the previously considered 2D examples. This allows to evaluate the performance impact of the bottom-up correction algorithm by measuring the run-time of the top-down re-distancing algorithm and bottom-up correction algorithm on the compute system VSC3. The block-decomposition proposed in Chapter 6 is not employed, because the focus is on the effects of the proposed bottom-up correction algorithm. Figure 7.6 shows the run-time and parallel speedup for the top-down re-distancing algorithm (Re-Distancing), the bottom-up correction algorithm (Correction), and both together (Total) for all three levels of the hierarchical grid combined. For a single thread the run-time introduced by the additional correction algorithm is increased by 4 % and for 16 threads (full utilization of the compute system) by 10 %. The overhead introduced by the by the additional correction algorithm is acceptable, because it allows a proper signed-distance field for thin trenches.
A parallel speedup of 9.3 is achieved with 16 threads for the Re-Distancing with the bottom-up correction algorithm, which is slightly below the parallel speedup for the Re-Distancing without the bottom-up correction algorithm (parallel speedup 9.8). The inferior parallel speedup of 4.1 of the bottom-up correction algorithm alone is caused by the non-parallelized marching on Level 0 (it contains only a single block). Therefore, a level-by-level comparison is necessary to properly evaluate the performance of the bottom-up correction algorithm.
Level-by-Level Comparison
Figure 7.7 shows the level-by-level comparison, except for Level 2, because this is the last level and, as previously mentioned, no correction is performed. The run-time and parallel speedup depend strongly on the level, because the number of blocks and the number of grid points differ significantly.
Starting the investigation on Level 1, the bottom-up correction algorithm has a parallel speedup of 7.5, which is similar to the top-down re-distancing algorithm with a parallel speedup of 7.8. The reason for the slightly reduced parallel speedup is the reduced computational load (not all grid points are re-computed). This leads to a larger overhead of the synchronization tasks. Also, for more than eight threads (fully utilizing one of the two processors of the single utilized node of VSC3) the initialization shows no additional parallel speedup because of NUMA limitations (interpolation between blocks occurs across memory domains).
Finalizing the investigation on Level 0 shows that practically no parallel speedup is observed, because this level only contains a single block. However, the block-decomposition proposed in Chapter 6 solves this issue but is not further considered here. Only the initialization benefits slightly from the additional threads, because the interpolation is parallelized on the blocks on Level 1. The minimum run-time is reached using eight threads. For more than eight threads NUMA effects increase the run-time again.
7.3 Summary
In this chapter a bottom-up correction algorithm for the computational step Re-distancing has been presented. This correction algorithm is tailored towards the FMM and hierarchical grids. The signed-distance field is corrected by interpolation from higher levels of the hierarchical grid and a specialized restarted FMM allows for the bottom-up correction of the signed-distance field even in regions not covered by blocks on higher levels.
The advantageous effect on the accuracy has been evaluated on 2D examples as well on a 3D example. The accuracy of the signed-distance field is significantly increased for a corner by a factor of up to 2.7 (depending on the angel enclosing the corner). In a case where a feature is too small to be represented on a coarse level, i.e., a trench thinner than the grid resolution, the correction algorithm ensures a proper representation. The feature is now represented in the signed-distance field on the coarser level. For the 3D example the impact of the bottom-up correction algorithm on the computational performance has been evaluated, yielding a run-time overhead compared to the top-down re-distancing algorithm presented in Chapter 6 between 4 % and 10 %, which is an acceptable trade off for the proper representation for thin trenches. The evaluation of the parallelization showed a parallel speedup of 9.3 for 16 threads for all levels combined and a similar parallel speedup as for Re-Distancing without the bottom-up correction algorithm for a level-by-level comparison. The block decomposition presented in Chapter 6, may also be employed for the bottom-up correction algorithm to enable a better performance.