Dissertation - Topography Simulation and Simulation Platforms

Chapter 4 Topography Simulation and Simulation Platforms

The fabrication of a semiconductor device consists of many process steps. In some of them materials are stacked on top of the wafer or other materials (i.e., deposited). Others remove parts of the materials on the wafer (i.e., etch selectively) [78]. Furthermore, there are processing steps that do not change the topography of the semiconductor device, yet they change its electrical properties (i.e., ion implantation). When such fabrication processes are simulated they are abstracted into a process model. These process models describe how reactants (e.g., molecules or ions) generated inside a reactor interact with the wafer surface.

Fabricating semiconductor devices typically involves a plethora of materials. Consequently, the fabrication process models tend to be material dependent and each material on the simulated wafer has to be uniquely represented. The movement of the materials when they are affected by the process model is described by the level-set equation (see Equation 4.2), a time-dependent PDE.

This chapter gives an overview of the numerical methods used during topography simulations. These methods are subsequently consolidated into the general simulation flow of a topography simulation. Additionally, the computing systems and software tools used in this thesis are introduced.

4.1 Evolution of Surfaces (Advection)

Consider a given process model which describes the movement of the wafer surface in each surface point. Furthermore, assume that the process model has already been evaluated. Consequently, a so-called velocity value, based on physical inputs, is produced by the process model for every point on the wafer surface, which expresses the movement of the surface points in time. The set of all velocity values is referred to as the velocity field ( $V (x, t)$ ). Depending on the sign of the velocity field an etching (i.e., positive sign), or a deposition process (i.e., negative sign) is modeled. The objective is to move all surface points according to this velocity field. This can be accomplished by solving the ordinary differential equation [15]

$\begin{array}{r} (4.1) & \frac{d x}{d t} = V (x, t) \end{array}$ for each point on the surface. This description of the surface’s movement is referred to as the Lagrangian formulation. The Lagrangian formulation for the evolution of the surface requires an explicit representation of the surface, since it describes the movement of individual points on the surface. In addition to the already discussed problems with the deformation of explicit surface representations (see Section 2.2), the Lagrangian formulation also has problems with instabilities and requires topological repairs of the surface [15].

A more stable description which avoids the aforementioned issues of the surface when it evolves in time is an implicit representation of the surface (see Section 2.3). An implicit surface $ϕ (x)$ with dimension $n$ is embedded into an $n + 1$ dimensional space to describe its evolution in time. The evolution of the surface is accomplished by solving the so called level-set equation [54]

$\begin{array}{r} (4.2) & \frac{\partial ϕ (x, t)}{\partial t} + V (x, t) ‖ \nabla ϕ (x, t) ‖ = 0, \end{array}$ which is also referred to in literature as the convection or advection equation. This method of tracking the evolution of the zero level-set (i.e., the surface) is commonly referred to as an Eulerian formulation. When using an Eulerian formulation the velocity values have to be known in the entire domain instead of only on the surface. Section 4.1.3 discusses how the velocity values are expanded from the surface into the domain $Ω$ .

4.1.1 Solving the Level-Set Equation

Substituting $H (x, ϕ (x, t), \nabla ϕ (x, t), t)$ for $V (x, t) ‖ \nabla ϕ (x, t) ‖$ in Equation 4.2 yields

$\begin{array}{r} (4.3) & \frac{\partial ϕ (x, t)}{\partial t} + H (x, ϕ (x, t), \nabla ϕ (x, t), t) = 0, \end{array}$ which shows that the level-set equation belongs to the class of Hamilton-Jacobi equations, $H$ is also referred to as the Hamiltonian. The arguments to $V$ and $ϕ$ are omitted for the remainder of this section. Hamilton-Jacobi equations are utilized in classical mechanics to describe mechanical systems, thus there has been substantial research into numerically solving these types of equations [79, 80, 81, 82, 83]. The solution schemes used in this work are discussed further down in this section.

Upwind Schemes

Assume that $V$ is known in the entire domain, a first-order accurate scheme to solve Equation 4.2 is the so-called forward Euler method [84]

$\begin{array}{r} (4.4) & \frac{ϕ (t_{n + 1}) - ϕ (t_{n})}{\partial t} + H (x, ϕ (t_{n}), \nabla ϕ (t_{n}), t_{n}) = 0, \end{array}$ where $t_{n}$ is a point in time and $t_{n + 1} = t_{n} + Δ t$ is the point after one time step. However, simply using this scheme to solve the level-set equation will not result in accurate solutions, as a consequence of numerical instabilities [15, 54]. To improve the solution of the level-set equation Engquist and Osher proposed a modified scheme [81]

$\begin{array}{r} (4.5) & H \approx max (V, 0) D^{+} (ϕ) + min (V, 0) D^{-} (ϕ), \end{array}$ with the first-order accurate approximation $D^{+} (ϕ) = \sqrt{(\sum_{l = 1}^{n} D_{l}^{+} (ϕ))^{2}}$ and $D^{-} (ϕ) = \sqrt{(\sum_{l = 1}^{n} D_{l}^{-} (ϕ))^{2}}$ . This scheme is referred to as the Engquist Osher scheme. The idea is to look at the values of the velocity field and the characteristics that flow out of the zero level-set. This information is subsequently used to bias the one-sided finite difference approximation dependent on the direction in which the information flows [81]. The drawback of this method is that it only works for convex Hamiltonians (i.e., if the Hessian matrix of the Hamiltonian $\frac{\partial^{n} H}{\partial x_{1} \dots \partial x_{n}}$ is positive semi-definite) [8].

Lax-Friedrichs Schemes

In topography simulations the Hamiltonian is not necessarily always convex [85]. Thus, a different scheme is required, to calculate the discretization of the Hamiltonian. The Lax-Friedrichs scheme uses first-order central finite difference approximations (see Equation 3.18) to calculate the Hamiltonian. Using central differences to solve Equation 4.2 leads to numerical instabilities, like strong oscillation in the solution, that have to be handled [78]. The Lax-Friedrichs scheme is defined as follows [82]

$\begin{array}{r} (4.6) & H \approx V \sqrt{\sum_{l = 1}^{n} D_{l} (ϕ)^{2}} - \underset{D T}{\underset{⏟}{\sum_{l = 1}^{n} α_{l} (\frac{D_{l}^{+} (ϕ) - D_{l}^{-} (ϕ)}{2})}}, \end{array}$ with the dissipation coefficients $α_{l}$ and the dissipation term $D T$ . The dissipation coefficients have to fulfill

$\begin{array}{r} (4.7) & max_{x_{i} \in x} | \frac{\partial H}{\partial q_{i}} | \leq α_{l}, \end{array}$ with $q_{i} = \frac{\partial ϕ}{\partial x_{i}}$ for a stable propagation of the solution [82]. The choice of the dissipation coefficients has to be handled with great care. When the dissipation term is chosen to be too large it over-smooths the solution of Equation 4.2. This over-smoothing then leads to inconsistently rounded corners with respect to the process model. On the other hand, if it is chosen too small the solution is not stable [79, 86]. To address these problems several strategies have been introduced [13, 15]. In this work a so-called Stencil Lax-Friedrichs schemes is used [87]. A Stencil Lax-Friedrich scheme only considers a subset of the domain close to a given point in which the numerical Hamiltonian is solved. Thus, the dissipation coefficients are only calculated in a local domain around the point, which prevents over-smoothing of the solution. Toifl et al. proposed the following calculation of the dissipation terms for a grid point $x$ in a $9$ grid points (2D) and 27 grid points (3D) stencil (e.g., $η_{B} (x)$ ) for selective epitaxy and wet etching [87]

$\begin{array}{r} (4.8) & α_{l} = max_{p \in η (x)} | V n_{i} + \frac{\partial V}{\partial n_{i}} (\frac{ϕ_{j}^{2} + ϕ_{k}^{2}}{‖ \nabla ϕ ‖^{2}}) - \frac{\partial V}{\partial n_{j}} (\frac{ϕ_{j} ϕ_{i}}{‖ \nabla ϕ ‖^{2}}) - \frac{\partial V}{\partial n_{k}} (\frac{ϕ_{k} ϕ_{i}}{‖ \nabla ϕ ‖^{2}}) |, \end{array}$ where $n_{i}$ stands for the $i$ -th coordinate of the normal vector and $\frac{\partial V}{\partial n_{l}}$ is defined as follows

$\begin{array}{r} (4.9) & \frac{\partial V}{\partial n_{i}} \approx \frac{V (n_{i} + Δ n, n_{j}, n_{k}, t) - V (n_{i} - Δ n, n_{j}, n_{k}, t)}{2 Δ n, t}, \end{array}$ with $Δ n = ϵ^{\frac{1}{3}} V$ where $ϵ$ stands for the floating point accuracy.

Courant-Friedrichs-Lewy Condition

When numerically solving time-dependent PDEs (e.g., Equation 4.2) the stability of the solution has to be considered when it evolves in time. A solution to a PDE is called stable if small errors in the numerical approximation are not magnified as the solution moves forward in time [88]. This can be achieved by enforcing the Courant-Friedrichs-Lewy (CFL) condition [89], the CFL condition for the forward Euler method can be expressed as [88]

$\begin{array}{r} (4.10) & Δ t < \frac{Δ x}{max (| V |)} . \end{array}$ The maximal time step $Δ t$ is thus calculated as follows

$\begin{array}{r} (4.11) & Δ t = α \frac{Δ x}{max (| V |)}, \end{array}$ with $α \in (0, 1)$ . In the context of the level-set method the CFL condition restricts how far the zero level-set can move with respect to the grid resolution $Δ x$ on a regular grid, during a single time step.

4.1.2 Reconstructing the Signed Distance Function (Re-Distancing)

During simulations using the level-set method the zero level-set is moved through the simulation domain. Thus, the description of the zero level-set (i.e., the surface described by the level-set function) changes in every time step. The quality of the $ϕ$ -values away from the zero level-set deteriorate with each time step (i.e., the distances between the level-sets of the level-set function move closer together or spread out) [90]. To reduce numerical errors in the propagation of the surface, the signed distance function has to be reconstructed. To that end, depending on the normalization used to describe the level-set function, different strategies must be applied.

Euclidean Normalization

The level-set function is constructed in such a way that it fulfills the Eikonal equation (Equation 2.8 revisited)

$\begin{array}{r} \begin{aligned} ‖ \nabla ϕ (x) ‖ & = F (x), \forall x \in Ω \\ ϕ (x) & = G (x), \forall x \in S, \end{aligned} \end{array}$ when the Euclidean normalization is used.

To construct a signed distance function originating from the zero level-set $S$ with a departure time of $0$ and a constant speed of $1$ the following Re-Distancing PDE has to be solved

$\begin{array}{r} (4.12) & \begin{aligned} ‖ \nabla ϕ (x) ‖ & = 1, \forall x \in Ω \\ ϕ (x) & = 0, \forall x \in S . \end{aligned} \end{array}$ This method only calculates positive signed distance values (i.e., values on the outside of the volume $Ω^{+}$ ). Thus, to calculate the signed distance values on the inside of the volume (i.e., $Ω^{-}$ ) the calculated distance values on the inside have to be multiplied by $- 1$ .

Constructing the Signed Distance Function (Fast Marching Method)

The previously described scheme to construct a signed distance function is achieved by using the fast marching method (FMM) [91]. The FMM is a numerical method that solves the Eikonal equation on a grid. The numerical solution requires first-order forward and backwards finite difference approximations (see Equation 3.26 and 3.27). Thus, the solution of the Eikonal equation can be discretized as follows [92]

$\begin{array}{r} (4.13) & [\begin{array}{c} max (- D_{x}^{+} (ϕ_{i, j, k}), D_{x}^{-} (ϕ_{i, j, k}), 0)^{2} & + \\ max (- D_{y}^{+} (ϕ_{i, j, k}), D_{y}^{-} (ϕ_{i, j, k}), 0)^{2} & + \\ max (- D_{z}^{+} (ϕ_{i, j, k}), D_{z}^{-} (ϕ_{i, j, k}), 0)^{2} \end{array}] = \frac{1}{F_{i, j, k}^{2}} . \end{array}$ When the following substitutions are used

$\begin{aligned} V & = ϕ_{i, j, k}, \\ V_{1} & = min (ϕ_{i - 1, j, k}, ϕ_{i + 1, j, k}), \\ V_{2} & = min (ϕ_{i, j - 1, k}, ϕ_{i, j + 1, k}), \\ V_{3} & = min (ϕ_{i, j, k - 1}, ϕ_{i, j, k + 1}), \end{aligned}$ Equation 4.13 can be simplified to

$\begin{array}{r} (4.14) & max {(\frac{V - V_{1}}{Δ x}, 0)}^{2} + max {(\frac{V - V_{2}}{Δ y}, 0)}^{2} + max {(\frac{V - V_{3}}{Δ z}, 0)}^{2} = \frac{1}{F_{i, j, k}^{2}} . \end{array}$ Furthermore, since it is assumed that the speed of the waves emerging from the front (e.g., the zero level-set) is positive, $V - V_{i}$ must be greater than $0$ . Therefore, the problem can further be simplified to the quadratic equation

$\begin{array}{r} (4.15) & {(\frac{V - V_{1}}{Δ x})}^{2} + {(\frac{V - V_{2}}{Δ y})}^{2} + {(\frac{V - V_{3}}{Δ z})}^{2} = \frac{1}{F_{i, j, k}^{2}} . \end{array}$ When the Eikonal equation is solved on a regular grid, the $n$ dimensional solution to the quadratic equation can be expressed as follows

$\begin{array}{r} (4.16) & V = \frac{1}{n} \sum_{l = 1}^{n} V_{l} + \frac{1}{n} \sqrt{{(\sum_{l = 1}^{n} V_{l})}^{2} - n (\sum_{l = 1}^{n} V_{l}^{2} - \frac{Δ x^{2}}{F_{i, j, k}^{2}})} . \end{array}$ It is possible that during this calculation not all grid points needed to solve Equation 4.13 have a finite value or that Equation 4.16 does not have a real solution. In this case a so-called lower dimensional update has to be performed. To that end the largest value $V_{max} = max_{l \in 1 \dots n} (V_{l})$ is removed and Equation 4.16 is solved with one less value. Note that this process always produces a valid solution for the Eikonal equation since a one-dimensional update is $min_{l \in 1 \dots n} (V_{l}) + \frac{Δ x}{F_{i, j, k}}$ , which always results in a finite value. The grid points associated with $ϕ$ -values $V_{i}$ used to solve Equation 2.8 are also referred to as the upwind neighbors.

During the FMM a grid point can have one of three states: accepted, calculated, and far. The FMM starts by assigning all grid points neighboring the zero level-set $S$ the state of accepted and all other points the state far. Afterwards, all grid points adjacent to the zero level-set $S$ are collected, and their distance is calculated using Equation 4.16. The grid points are marked as calculated and stored in a minimum heap. Next, the head of the minimum heap is removed, the respective grid point is marked as accepted and new values for the grid points in a star stencil (i.e., $η_{S}$ ) that are not yet accepted are calculated with Equation 4.16. This process continues until the heap is empty and all points in the domain have a distance value.

The FMM is not the only method for numerically solving Equation 2.8 on a grid [93]. However, it is to this day a widely used method, particularly due its numerical stability [94].

Manhattan Normalization

In the sparse field method the level-set function is considered in layers. The layer $L_{0}$ represents the zero level-set and the layers $L_{j}$ with $j \in Z ∖ {0}$ represent the iso-contours further away from the zero level-set. Calculating the next layers $L_{i + 1}$ relative to an already known layer $L_{i}$ is achieved by updating all points in the new layer as follows [55, 58]

$\begin{array}{r} (4.17) & ϕ^{L_{i}} (x) = {\begin{cases} min_{y \in η_{S} (x) \cap L_{i - 1}} ϕ^{L_{i - 1}} (y) + 1 & if i > 0 \\ max_{y \in η_{S} (x) \cap L_{i + 1}} ϕ^{L_{i + 1}} (y) - 1 & if i < 0 \end{cases}, \end{array}$ where $η_{S} (x)$ describes a star stencil in the point $x$ and $ϕ^{L_{i}} (x)$ the $ϕ$ -value of the level-set function on the layer $L_{i}$ in the grid point $x$ .

Constructing the signed distance function with this approach is computationally much more efficient than using the FMM. Since, the computational overhead of managing the heap and solving the quadratic Equation 4.16 is replaced by determining the minimum or maximum in $η_{S} (x)$ and a summation.

4.1.3 Velocity extension

In Section 4.1 methods to numerically solve Equation 4.2 have been discussed. Until now, it has been assumed that the values of the velocity field $V (x, t)$ are known in the entire domain. However, the process models used to describe the propagation of the zero level-set in topography simulations can only be meaningfully defined directly on the surface [78, 95, 96, 97]. In order to solve Equation 4.2, the values of $V (x, t)$ have to be known in the entire domain. Dependent on the level-set normalization different strategies have to be employed to propagate the surface information as far into the domain as necessary.

Euclidean Normalization

Equation 2.8 describes the waves that propagate outwards from the zero level-set in normal direction with a given speed of $\frac{1}{F (x)}$ . So, any information defined on the zero level-set can be propagated into the entire domain using this method. This propagation of information (e.g., velocity values) is described by the so-called velocity extension equation [95]

$\begin{array}{r} (4.18) & \begin{aligned} \nabla Φ (\vec{x}) \cdot \nabla V (\vec{x}) & = 0 & \vec{x} \in Ω \\ V (\vec{x}) & = V_{S} (\vec{x}) & \vec{x} \in S, \end{aligned} \end{array}$ where $V_{S}$ stands for the calculated velocity values at the cross points of the zero level-set [95].

The solution of Equation 4.18 can be calculated using the FMM. Yet, during this simulation step (see Figure 4.7) it can be assumed that the FMM has already been performed to construct the signed distance function [15]. Thus, the direction of the information propagating outwards, orthogonal to the zero level-set, is already known. This available information can be reused during the velocity extension by constructing the heap in such a way that the new velocity values are calculated only once for each grid point. This is possible since all the upwind neighbors for each grid point of the level-set function have already been calculated. Furthermore, this process can be parallelized by considering the problem in the context of graph theory [98].

Manhattan Normalization

The previously described extension process is not necessary for Manhattan normalized level-set functions. In this case, these level-set functions directly describe the cross points of the zero level-set (i.e., $L_{0}$ ) on which the models are defined. Nevertheless, to achieve the movement of the layer $L_{0}$ a rebuilding step is required [55]. This rebuilding step is in essence the same as the construction of the signed distance function discussed in Section 4.1.2 (see Equation 4.17) combined with the pruning of larger values (i.e., $ϕ (x) \geq 0.5$ ; see discussion in Section 2.3.2). The velocity values are calculated for all surface points (i.e., $L_{0}$ ) and stored in this layer. Consequently, the layers $L_{1}$ and $L_{- 1}$ are calculated. Note that only these two layers are needed since the maximal distance the zero level-set can move during one time step is bound by the CFL condition. To reconstruct the layer $L_{0}$ all values of the signed distance function that are larger than $0.5$ are removed (since in the here considered case the $ϕ$ -values are normalized between $- 0.5$ and $0.5$ ; see Section 2.3.2). This results in the zero level-set (i.e., layer $L_{0}$ ) after one time step.

4.2 Surface Flux Calculation

During the fabrication of a semiconductor device the wafer is placed inside a reactor that produces reactants that interact with the wafer surface. These interactions can remove (i.e., etching) add material (i.e., deposition) from/to the wafer surface [78, 96, 97, 99, 100, 101]. In a topography simulation the above discussed surface interactions are captured by the process model. The two main parts that have to be considered in the process model are:

• Reactor Scale Transport Model: The chamber in which the reactants for the process are created is modeled. This model has to take into account the gases or plasma put into the chamber in addition to the temperature, pressure, chemical reactions, and the geometry of the chamber [102]. The output is the concentration, energy distribution, and the velocities of different reactant species.
• Feature Scale Transport Model: The transport through the feature geometry until a reactant species interacts with the wafer surface is modeled. Furthermore, the actual interaction of the different reactant species with the wafer surface is described.

In topography simulations the results of the reactor scale model are used as inputs for the feature scale model. This is achieved through the virtual so-called source plane ( $P$ ), which allows to model the distribution of the reactant species created during the reactor scale simulation. The source plane separates the reactor scale from the feature scale. The reactant (source) fluxes and their energy and angular distributions are fixed on this plane. Feature scale transport is modeled by combining several reactants of the same species into particles. Furthermore, a particle may also represent the aggregation of multiple species with similar chemical behavior. This is a necessary abstraction since there could be an order of $10^{20}$ reactants present in the reactor, easily overcoming available computing resources and thus not allowing for practically relevant simulations. Additionally, it is assumed that the surface stays constant during the modeling of the particle transport and that the particles are distributed according to the reactor scale transport model in the source plane. During this approach a surface coverage ( $Θ$ ) of the reactants on the wafer surface is calculated, which is subsequently used to determine the velocity field $V (x, t)$ according to the process model. The surface coverage is calculated by estimating the surface fluxes $Γ (x)$ for the particles according to the process model. The surface flux a point $x$ on the surface receives is modeled by the following equation [103]

$\begin{array}{r} (4.19) & \begin{aligned} Γ (x) & = \int_{Ω} Γ_{i n} (x, ω_{d Ω}) d Ω \\ = \int_{P_{v i s}} \frac{ω_{x_{s r c}} \cdot n_{x}}{‖ x - x_{s r c} ‖^{2}} (Γ_{s r c} (x_{s r c}, - ω_{x_{s r c}})) d x_{s r c} \\ + \int_{S_{v i s}} \frac{ω_{x_{r e}} \cdot n_{x}}{‖ x - x_{r e} ‖^{2}} (Γ_{r e} (x_{r e}, - ω_{x_{r e}})) d x_{r e}, \end{aligned} \end{array}$ where $P_{v i s}$ stands for the visible part of the source plane and $S_{v i s}$ stands for the visible part of the surface.

Equation 4.19 describes the flux on the entire surface by integrating over the amount of flux each point on the surface receives (see Figure 4.1). A point on the surface $x$ receives fluxes from two sources, first, the direct flux received from the source $Γ_{s r c}$ and second, the flux from reflections of particles in the domain $Γ_{r e}$ . The amount of flux a point on the surface receives is affected by how much of the source plane is visible $x_{s r c}$ and how many other points on the surface that reflect particles are visible $x_{r e}$ .

Since not the entire wafer is simulated some particles will leave the simulation domain. To avoid loosing the information of these particles so-called boundary conditions are employed. There are two kinds of boundary conditions: periodic and reflective. Periodic boundary conditions check where a particle leaves the domain and re-emit this particle on the opposite side of the domain, with the same trajectory. Reflective boundary conditions re-emit the particles that hit the boundary back into the domain at the point of impact.

For computational reasons, it is not feasible to calculate the surface flux directly from Equation 4.19, thus, the surface flux has to be estimated. The following sections discuss different approaches to estimate the surface flux (i.e., Equation 4.19): constant, bottom-up, and top-down [101].

4.2.1 Constant Approach

The straightforward way to approximate the surface flux on the wafer surface is to assume that it is constant. In this case it is assumed that each point on the surface receives the same amount of flux from $P$ . However, the simulated structure still has to be checked for eventual voids that are not exposed to $P$ [104].

This simplistic surface flux approach is valid for some cases. For example in anisotropic wet etching the surface is exposed to enough reactant that the assumption of constant flux holds [105, 106]. However, this assumption does not hold in general and more sophisticated approaches for calculating the surface flux are required (e.g., for processes where a flux distribution is observed on the wafer surface) [101].

4.2.2 Bottom-Up Approach

The constant flux approach does not take shadowing and other geometry-dependent effects into consideration, e.g., points on the surface that receive fewer particles due to parts of the geometry blocking a direct path to the source plane $P$ (see Figure 4.2).

To alleviate this problem, a bottom-up approach is used. In a bottom-up approach each point on the wafer surface (i.e., bottom) is considered, and it is calculated how much of the source plane $P$ is visible (i.e., up) from this point. Computationally the performance of this approach can be improved by adaptively sampling the hemisphere and only calculating the flux on certain surface elements [107]. These calculations can be performed directly on the level-set function $ϕ (x, t)$ . This approach lends itself to model processes in which the reactants interact with the surface with little to no re-sputtering [6, 108].

4.2.3 Top-Down Approach (Monte Carlo Ray Tracing)

The bottom-up approach does not take the reflection of individual particles from the geometry back into the domain into consideration. Although it is possible to model these reflections with a bottom up approach, it would require multiple iterations of bottom-up flux calculations on the entire surface, which would thus be unfeasible since it would require a lot of computational resources. To efficiently model the reflections of particles from the surface back into the domain, so-called ray tracing is utilized, which is a Monte Carlo based strategy to model the surface flux [103, 109]. During a ray tracing simulation, the source plane $P$ is split into equally sized areas. The to be emitted particles are grouped together into packages which subsequently are emitted into the domain with a direction $\vec{t}$ given by the distribution of the particle source. The particles are traced from the source plane (i.e., top) until they interact with the wafer surface (i.e., down). When a particle interacts with the wafer surface a part of its flux’s payload is absorbed by the surface and, depending on the remaining flux payload, is subsequently re-emitted into the domain. The amount of particles that are absorbed by a surface element from all particles are summed up and result in the flux at the respective surface element. Figure 4.3 shows an illustration of this process.

To efficiently compute the top-down ray tracing an explicit representation of the geometry of the wafer surface (e.g., a surface mesh or a point cloud) is preferable [110]. Furthermore, since ray tracing is a Monte Carlo process, numerical noise is introduced into the resulting flux, values which does not have a physical meaning. The magnitude of the numerical noise can be reduced by increasing the number of simulated particles. However, the numerical noise can never be entirely removed, and the more particles are simulated, the more computational resources are required.

4.3 Multi-Material Simulations

A typical topography simulation consists of several materials that are stacked on top of a wafer. To model different interactions of the process model with different materials each of these materials has to be represented by its own level-set function. The most natural approach would be to define a level-set function that envelopes each material (see Figure 4.4a).

However, this approach has some disadvantages concerning the interfaces between materials. Figure 4.5 illustrates the problems that can occur with such a description of the material layers. In Figure 4.5a an overlap of the two material layers is shown. These non-physical overlaps can form due to small numerical errors in the discretization of the level-set function. The same problem can also lead to non-physical voids shown in Figure 4.5b. A special kind of void is shown in Figure 4.5c (a triple junction). This void occurs because of the discretization of the level-set function, which leads to rounded corners. Reducing the grid resolution also reduces the rounding effect.

Most of the research on circumventing the formation of the previously discussed voids or overlaps originates from the study of multiphase flow [36, 111, 112, 113]. The strategies employed during such simulations are, on the one hand, to artificially keep the level-set functions together by changing the velocities of the velocity field. On the other hand, Boolean operations are employed to prevent the level-set functions from overlapping.

In multi-material topography simulations consisting of $M$ materials usually $M - 1$ level-set functions are required, since the vacuum or gas above the simulated device is usually not represented explicitly. The most common approach to handle multiple materials in topography simulations is the so-called additive or wrapping approach [114]. In the additive approach, each level-set function $ϕ_{i}$ describing a material $M_{i}$ is designed in such a way that it represents the union of all underlying materials $M_{j} \neq \emptyset$

$\begin{array}{r} (4.20) & M_{i} = ⋃_{j = 1}^{i - 1} M_{j} where M_{j} = ϕ_{j} . \end{array}$ Figure 4.4b shows an illustration of this approach. The additive approach fixes another problem that often occurs in topography simulations (see Figure 4.6): When a material layer is stacked on top of another material, the rounding that occurs at the edges of the material interfaces leads to the formation of non-physical voids. Figure 4.6b illustrates how this problem is resolved by using the wrapping approach.

4.4 Application of Surface Representations in Topography Simulations

During a topography simulation it can be advantageous, or in some cases even essential, to utilize different surface representations. As discussed in Section 4.1, an implicit surface representation is preferred during a topography simulation for the evolution of the surface. An implicit surface representation intrinsically handles topographical changes of the surface, such as the merging of two fronts, due to material merging which often occures during topography simulations [17]. In Chapter 7 a strategy for simulating etching processes with Boolean operations is discussed where again an implicit surface representation is advantageous (see Section 2.3.3). Furthermore, it is easier to perform CSG on implicit surfaces, so the generation of initial geometries is handled with implicit surfaces.

On the other hand, some flux calculation methods require an explicit surface representation (e.g., Monte Carlo ray tracing) [18]. Moreover, the visualization of surfaces, especially 3D surfaces, is much simpler with an explicit surface representation than with an implicit surface representation. Therefore, for flux calculations and the visualization of the surface the implicit surface is converted into an explicit surface as discussed in Section 2.4.

4.5 Topography Simulation Workflow

The effects of the process models on different materials on the wafer surface during a topography simulation is governed by the level-set equation (see Equation 4.2). In a discrete setting, a time-dependent PDE like the level-set equation is solved by calculating small time-steps that allow for a stable propagation of the solution according to the CFL condition [88], and is from heron referred to as level-set method. A simulation based on the level-set method is discretized in time as well as space.

An example of a topography simulation workflow based on the level-set method is shown in Figure 4.7, the stimulation steps this thesis focuses on are highlighted in red. The general flow of a simulation is independent of the chosen level-set normalization.

The simulation starts with the initialization of the geometry. The initial geometry usually originates from a previous simulation step or is created by CSG using Boolean operations of simple level-set functions (e.g., planes and boxes). After the initialization the main time-loop starts, and repeats the following three steps until the process model concludes.

Flux Calculation

Dependent on the process model used during the simulation certain pre-processing steps have to be performed (e.g., surface flux calculations). The surface flux obtained form this pre-processing step is then used during the evaluation of the process model. The extraction of the surface has been discussed in Section 2.4, a pre-processing step to improve the performance of Monte Carlo ray-tracing based flux calculation is discussed in Chapter 8. This step is not necessary for process models where a constant flux can be assumed.

Advection

The advection step is the core part of a level-set based topography simulation. During the advection step the process model is evaluated by taking into account the values generated by the flux calculation. The level-set equation is solved, and the velocities calculated by the process model are extended into the domain to propagate the surface. In a level-set framework based on the Manhattan normalization the velocity extension step is not required.

Re-Distancing

This step is required to prevent the propagation of numerical errors which emerge during the advection step. In Chapter 6 this step is further discussed to allow for simulations utilizing hierarchical grids.

4.6 Computational Hardware

The computational performance results discussed in the following chapters are computed on commercially available computer systems. Therefore, the performance of the to be presented experiments is bound by the limitations of these computer systems. To get a more precise understanding of these limitations the relevant concepts of modern computer systems are discussed in this section.

Modern microprocessor are schematically composed of three parts, see Figure 4.8 [115].

The central processing unit (CPU) handles all computations and interactions with data stored in the main memory, the main memory holds all information required for a program to run, and input/output allows for interactions between the program and the user. A core is the combination of the control and arithmetic logic unit. Modern CPUs consist of several cores on a single chip, which allows for the parallel execution of instructions. Furthermore, modern CPUs offer simultaneous multithreading and thereby offering for each physical core two logical cores, further increasing parallel computing capabilities. In turn, compute clusters are composed of nodes, each offering at least one CPU and potentially also additional accelerators, such as general purpose graphics processing units.

The speed at which the CPU can fetch information from the main memory is the primary limiting factor for performance also known as the von Neumann bottleneck. To partially overcome this issue so-called caches have been introduced to the CPU chip. Caches are small memories with very fast read and write speeds. They store small amounts of information and act as a buffer between CPU and main memory. Figure 4.9 shows an illustration of a modern computer system, which, conceptually, also represents one form of the previously mentioned node.

This illustration shows an extended version of the basic principle described by Alan Turing in the 1930s [115]. The use of different cache levels is a historical development, since CPU clock speed (i.e., the amount of instruction the CPU can process during a certain time interval) grew much faster than the memory bandwidth (i.e., the speed at which data can be moved from the main memory to the CPU). This discrepancy is also commonly referred to as the DRAM gap [116].

4.6.1 Caches

As can be seen in Figure 4.9 there are different kinds of caches, usually the closer a cache is to the CPU register the faster it is [115]. Caches have different responsibilities, the instruction cache (i.e., L1 I) stores the order of instructions the CPU has to perform. In the data cache (i.e., L1 D) the data is stored on which the CPU wants to perform its instructions. If the data is not present in the cache the data has to be fetched from a higher level cache (e.g., L2 or L3), in the worst case it has to be fetched from the main memory. Some caches are exclusive to one core (e.g., L1 and L2), others are shared between all cores (e.g., L3).

As discussed in the previous section when the CPU requests data from the memory, it is first checked if the data resides in the cache. If the requested data is present in a cache it is called a cache hit, otherwise it is called a cache miss and the data has to be fetched from the main memory.

Cache memory is built with SRAM cells, SRAM cells take up a huge amount of physical space on a CPU. Furthermore, the fabrication of SRAM cells is more expensive than DRAM thus, it is not feasible to create huge caches [117].

In a scientific framework the geometries that are simulated are large (i.e., large in the sense that the geometries do not fit into the cache of a CPU). Thus, much of the run-time required for a topography simulation is dedicated to the fetching of data from memory or higher cache levels. When the primary effort in executing a program consists of memory access and not calculation speed the problem is called memory bound.

4.6.2 Parallelization

In the mid 2000s CPUs hit the so-called heat barrier, which describes the effect that an increase in clock speed yields a disproportional amount of heat that has to be dissipated [115]. Thus, different strategies had to be developed to increase the performance of CPUs. Since Moore’s law is still observable the size of semiconductor devices is still shrinking which frees up space on a wafer and ultimately the individual chips. To utilize this additional space vendors started introducing additional cores to the CPU. However, the speedup that can be expected by adding $n$ cores to a CPU is in general smaller than $n$ . This can have several reasons, ranging from limited memory bandwidth over non-optimal cache use up to load balancing issues (i.e., when a problem cannot efficiently be split into $n$ parts).

The efficiency of a parallel program can be measured by putting the serial ( $s$ ) and the parallel ( $p$ ) parts into relation [115]

$\begin{array}{r} (4.21) & s + p = 1. \end{array}$ Let’s first assume that there is one worker (e.g., a core) that works on a task (e.g., a program) the time it takes to finish the task can be expressed as

$\begin{array}{r} (4.22) & T_{s} = s + p . \end{array}$ Now assume that in the perfect scenario the run-time of a parallel task $T$ is reduced to $\frac{T}{n}$ when using $n$ workers. As previously mentioned there are limitations to the speedup a parallel program running on $n$ cores can achieve. The ideal achievable speedup trough parallelization (also known as strong scaling) can be expressed as

$\begin{array}{r} (4.23) & T_{p} (n) = s + \frac{p}{n} . \end{array}$ Conversely, in case of weak scaling the number of processes and the problem size is increased, leading to a constant workload per process.

To define a measurement for the speedup of a program when it is parallelized a metric for the performance (i.e., the work that is done during the run-time of a program) is required. The performance of a serial program can be defined as [115]

$\begin{array}{r} (4.24) & P^{s} = \frac{s + p}{T_{s}} = 1, \end{array}$ and the performance of a parallel program as

$\begin{array}{r} (4.25) & P^{p} = \frac{s + p}{T_{p} (n)} = \frac{1}{s + \frac{1 - s}{n}} . \end{array}$ Combining these two performance metrics leads to the speedup

$\begin{array}{r} (4.26) & S = \frac{P^{p}}{P^{s}} = \frac{1}{s + \frac{1 - s}{n}} \end{array}$ of a parallel program. Equation 4.26 is also known as Amdahl’s Law, which describes an upper limit of speedup for a problem of fixed size [118]. When $n \to \infty$ it is easily seen that the speedup that can be achieved trough parallelization is bound by the run-time of the serial parts of the program. Furthermore, there is Gustafson’s law which estimates the parallel speedup of weak scaling programs [119].

4.6.3 Benchmark Systems

To evaluate the run-times of the algorithms presented in this work the used benchmarking systems are introduced in this section. The Benchmarking systems are a single node from the Vienna Scientific Cluster (VSC). The VSC is a collaboration of several Austrian universities that provides supercomputer resources and corresponding services to their users [120]. Furthermore, an industrial computer system (ICS) and a workstation (Workstation 1) are used. Table 4.1 summarizes the key metrics of the three used benchmark systems.

	Workstation 1	Workstation 2	VSC4	ICS
Frequency (GHz)	4.4	4.6	3.1	2.8
Sockets	1	1	2	2
Cores per CPU	4	12	24	10
Logical cores per CPU	8	24	48	20
L1i cache	32 KByte	384 KByte	32 KByte	32 KByte
L1d cache	32 KByte	384 KByte	32 KByte	32 KByte
L2 cache	256 KByte	6 MB	1024 KByte	256 KByte
L3 cache	8 MByte	64 MB	33 MByte	26 MByte
Main memory	32 GByte	32 GByte	96 GByte	226 GByte

Table 4.1: Benchmarking systems used to run simulations in this work.

4.7 Software Tools

The research results presented in this work have been produced by the use of several open- and closed-source software tools. In this section all used tools are introduced and a brief description of the relevant features is given.

Victory Process

Victory Process is commercial process TCAD simulation tool developed by Silvaco [121]. It features several etching, deposition and oxidation models for semiconductor fabrication simulations as well as an interface to implement user specific process models. The modeling of the process models is based on the level-set method. Victory Process is written in C++, parallelization is achieved with pThreads and OpenMP. The results presented in Chapter 6 and Chapter 7 where produced with Victory Process. Victory Process uses a hierarchical grid data structure to store the level-set functions representing the different material layers in a topography simulation.

ViennaLS

ViennaLS is an open source level-set library which can be used for process TCAD simulations developed at the Institute for Microelectronics TU Wien [13]. It is developed in C++, uses OpenMP for parallelization and is accessible as a Python library. The results presented in Chapter 5 where produced with ViennaLS. ViennaLS uses a Hierarchical Run-Length Encoding (HRLE) data structure to store the level-set function.

Visualization Toolkit (VTK)

The Visualization Toolkit (VTK) is an open source collection of algorithms to manipulate and visualize scientific data [122]. It is developed in C++, however, it is available on several other platforms like Java or Python. In this work, VTK is used to store and visualize data. Furthermore, it is used to calculate the Hausdorff distance between two meshes in Chapter 8.

Computational Geometry Algorithms Library (CGAL)

The Computational Geometry Algorithms Library (CGAL) is a high performance library for geometric algorithms developed in C++ [123]. The surface mesh simplification algorithm presented in Chapter 8 is based on the implementation of the edge-collapse algorithm of CGAL. CGAL uses a half-edge data structure to store meshes. It is a high performance data structure that allows for fast access to neighboring vertices, edges faces and an efficient method to iterate over the unstructured mesh data sets.

Vienna Mesh

Vienna Mesh is a meshing framework developed at the Institute for Microelectronics TU Wien [124]. It is developed in C++ and uses a modular approach to combine multiple external mesh generation and adaptation tools (e.g., CGAL or VTK). The surface simplification algorithm presented in Chapter 8 has been developed with ViennaMesh.

Embree

Embree is an open source high performance CPU based ray tracing kernel library [41, 125]. It is used in Chapter 8 to perform Monte Carlo ray tracing.