[ Home ]
This chapter describes the specific software implementation of the numerical methods presented in Chapter 2, implemented within the scope of this work. The C++ code for all numerical algorithms was implemented in smaller self-consistent libraries depending on the specific functionality of the respective implementation. Five individual C++ libraries were created, incorporating all functionality required for a process simulator with their dependency hierarchy shown in Fig. 4.1:
• ViennaHRLE [163]: This library provides the underlying sparse data storage for the level set and cell set representations. The data type stored is not limited to numeric types, so the data container can also be used for the storage of volume data.
• ViennaLS [164]: All level set specific algorithms, including conversions to other surface descriptions, advection algorithms, and geometric properties are part of this library which uses the data structure provided by ViennaHRLE to store the level set values.
• ViennaCS: This library contains the cell-based data structure described in Section 2.3.3.2. It uses the data structure of ViennaHRLE to store custom volumetric properties of the simulated materials.
• ViennaRay [165]: A high-performance ray tracing library which is used to model molecular transport in the feature scale region, supporting custom particle types in order to model different sources, reflection types, and particle properties.
• ViennaPS [166]: This library ties together all other software implementations by providing a modelling framework which allows for the straight-forward creation of process models. It provides the necessary interfaces between all other libraries and makes them accessible in a simple fashion.
All the software libraries depicted in Fig. 4.1 are discussed in detail in this chapter starting with ViennaHRLE and then moving on to the dependent libraries. Finally, the interplay of all these software elements within ViennaPS will be discussed by considering the program flow of a typical microelectronic device fabrication simulation, highlighting important interfaces between the different libraries.
ViennaHRLE provides a storage container for sparse spatial data, as required for the description of a sparse field level set (LS). Due to the specific requirements of level set algorithms, a specialised data structure is needed for storing the LS values. Algorithms operating on the LS do not require random access, but rather fast sequential access as they operate on the entire surface in one pass. Furthermore, all values are defined on a rectilinear grid. However, not all points on the grid are required for the surface description but only a small set of grid points. A data structure which is optimised for this type of data access is the hierarchical run-length encoded (HRLE) data structure [167].
In this data container, the grid is segmented in each direction and run-length encoded. This means that certain neighbouring points can be described as one entity if they have similar values. Such a collection of points in an HRLE structure is referred to as a run. In a sparse field LS stored in this structure, grid points with a signed distance value larger than 0.5 can be represented by a large negative or large positive number, depending on whether their location is inside or outside of the material volume, respectively. Since their exact LS value is not known, these grid points, or runs, are referred to as undefined points or undefined runs, respectively.
The way this data structure is constructed is illustrated in Fig. 4.2. The simple triangle in Fig. 4.2a is represented using a sparse field LS. In order to store the LS values efficiently, the grid is segmented into defined and undefined runs, as shown in Fig. 4.2b. Blue areas in the segmentation represent negative undefined runs, in which all points have no oppositely signed neighbour. Therefore, these points do not contribute additional information to the surface description and thus their exact value is not important. However, the sign of these points may still be required in simulations and can be found straight-forwardly by using the sign of the undefined value referenced by the run. The same applies to the red areas in Fig. 4.2b, which refer to positive undefined runs and therefore incorporate all grid points with positive values and no oppositely signed neighbours. Once the grid containing the LS is segmented, it can be run-length encoded and stored in the HRLE structure.
In order to generate the data structure shown in Fig. 4.3, the segmented grid has to be projected in each dimension. In the case of the segmentation shown in Fig. 4.2b, the grid is first projected into the \(y\) direction. This is done by considering one index on the \(y\)-axis and marking it as a defined run if there is a defined run for any x-value in this grid line. Taking Fig. 4.2b as an example, the grid line at \(y=-2\) corresponds to an undefined positive run in the y-direction, as there are no defined runs at \(y=-2\). At \(y=-1\) however, there are defined runs in this grid line, meaning \(y=-1\) will be a defined run. For the structure shown in this figure, the projection onto the y-axis thus results in defined runs from \(y=-1\) to \(y=5\) and positive undefined runs everywhere else. This projection is then run-length encoded to give three separate runs: One undefined positive run from \(y=-3\) to \(y=-2\), one defined run from \(y=-1\) to \(y=5\) and another undefined positive run from \(y=6\) to \(y=7\). This gives the set of run types for the \(y\) dimension shown in Fig. 4.3. The defined run is given an identifier (ID) denoting which value is the first of this run. Since there is still a dimension below the current one, this refers to the start index for the lower dimension at which this run starts. The first defined run therefore always has the ID \(0\).
In Fig. 4.3, directly below the run types, the corresponding run breaks are shown. These values store the coordinate at which the next run starts, meaning the defined run starts at \(y=-1\) and the last undefined run starts at \(y=6\), as discussed above. The beginning of the first run is defined as the lowest grid extent and the end of the last run is the upper grid extent, meaning that these runs extend to the edges of the simulation domain.
The start index, shown above the run types, defines which run is the first in the next grid line. As the entire structure was projected onto the \(y\) direction first, there is only one grid line for this dimension and thus only one start index, namely \(0\). Since all indices start at \(0\) by convention, the first start index must always be \(0\).
Now the next lower dimension \(x\) is considered and the structure is filled accordingly. Since the only defined run in the \(y\) dimension extends over several grid lines, the first run of each grid line must be saved in the start indices. The first start index thus refers to the first run in the first grid line, which must always have the run type \(0\). The nth start index then refers to the first run in the nth grid line. The run types are then filled as before, but since \(x\) is the lowest dimension, the grid is not projected but run-length encoded as it is.
For example, the grid line at \(y=-1\) in Fig. 4.2b is encoded first, resulting in three separate runs: Undefined positive from the beginning of the domain to \(x=-5\), one defined run from \(x=-4\) to \(x=2\) and one undefined positive run from \(x=3\) to the edge of the domain. There are seven defined runs in this grid line with the run type denoting the first of these. Since each defined run has its own run type ID, they are described by the IDs from \(0\) to \(6\). Hence, the next defined run will start with the run type ID of \(7\). In the lowest dimension, the run type IDs are simply the indices into the array which stores the defined LS value for each grid point. Therefore, all defined points are close to each other in storage, allowing for fast memory access. Especially during iterative advection, where the location of the LS values is not needed, the scalar velocity field can be applied to the array directly, allowing for highly efficient advection. There are three separate runs in the grid line at \(y=-1\), which means that the next grid line must start with the 4th run, corresponding to a start index of \(3\). This is repeated for each grid line corresponding to a defined run in the \(y\) dimension, resulting in a full description of the structure.
In order to describe a LS, only two undefined values are required: negative and positive. However, for the description of volume information, it is useful to define several additional background values. Hence, the HRLE structure allows for any number of undefined values. These are values, which are shared by several grid points and are therefore perfectly suited to be used for the storage of background values.
The implementation of the HRLE data structure in ViennaHRLE also includes the possibility for straight-forward parallelisation by automatically splitting the data structure according to the number of central processing unit (CPU) threads available for computation. This is performed by marking specific segmentation run types which signify the start and end of a parallel segment. The splitting is achieved through the dynamically balanced approach presented in [168], so that the optimal parallelisation can be achieved straight-forwardly.
First, the simulation domain is defined by setting its extent and boundary conditions. The most efficient way to fill this domain with the LS values describing a surface is to use a helper function provided by the ViennaHRLE library.
template <class T, int D> void hrleFillDomainWithSignedDistance( hrleDomain<T, D> &newDomain, std::vector<std::pair<hrleVectorType<int, D>, T>> pointData, const T &negValue, const T &posValue, const bool sortPointList = true)
This function takes a list of points with their respective LS values, sorts them in lexicographical order and then inserts them into the HRLE structure. The sorting is necessary to reach the optimal performance by inserting values only at the end of all the arrays used to store the internal HRLE representation. Internally, the helper function shown in Listing 4.1 uses the two methods provided by hrleDomain to insert the points in the most efficient manner.
template <class V> void insertNextDefinedPoint( const V &point, hrleValueType distance) template <class V> void insertNextUndefinedPoint( const V &startPoint, const V &endPoint, hrleValueType value)
The first of the two functions provided in Listing 4.2 is used to insert a defined run into the structure by copying the value distance into the defined values array and building the required start indices and run breaks. The second function is used to insert undefined runs by specifying their start and end coordinates, as well as the value they represent. If this value is already stored somewhere in the undefined values array, there is no need to store it a second time, so the undefined run type of the first value will simply be used to represent this undefined run.
Due to the complexity of the structure, random access is not efficient, as a value has to be found by traversing through each dimension and identifying the correct run by the run breaks. Therefore, finding one value takes on average \(\mathcal {O}(\log {N})\), where \(N\) is the number of LS values stored. Traversing the entire structure would thus require \(\mathcal {O}(N\log {N})\) operations. However, since many algorithms used in level sets can be performed by iterating over the entire structure once, efficient sequential data access can be used. Since all defined points are simply elements in an array, sequential access to each element is constant in time and the entire structure can be traversed in \(\mathcal {O}(N)\) [169]. For ease of use, iterators are implemented in the library, which allow the user to access the current coordinate, the LS value, and several additional container-specific properties such as the current run. The ViennaHRLE library provides several types of iterators for accessing the values stored at the grid points:
1. hrleSparseIterator: Stops once at every defined and undefined run.
2. hrleDenseIterator: Stops once at every grid point.
3. hrleSparseMultiIterator: Iterates over several domains at the same time and stops at every run. If there are two defined runs at the same coordinate in multiple domains, it only stops once.
The last of the above iterators is useful for combining several hrleDomain structures into one according to a specific metric.
In order to find numerical derivatives, often required for topography simulations in a LS framework, fast neighbour access is required. However, finding the neighbouring grid points of any point requires random access and is therefore inefficient. A more efficient approach is to use several sequential iterators, each separated by one grid point and advancing them collectively. Therefore, the entire structure can be traversed in optimal time. ViennaHRLE provides fast neighbour access through specialised iterators. The neighbours to which these iterators provide access is given in terms of the spatial indices \(i, j, k \in [x, y, z], i \neq j \neq k\) for a point at index \((x, y, z)\) in the following. The order \(n\) of each iterator describes the distance of the farthest neighbour which can be accessed.
1. hrleSparseStarIterator: Access to all neighbours within \((i \pm n, j, k)\)
2. hrleCartesianPlaneIterator: Access to all neighbours within \((i \pm n, j \pm n, k)\)
3. hrleSparseBoxIterator: Access to all neighbours within \((i \pm n, j \pm n, k \pm n)\)
4. hrleSparseCellIterator: Access to all neighbours within \((i + 1, j + 1, k + 1)\)
Since access to each neighbour requires its own additional iterator, it is important to choose the most efficient one for the specific application. If only neighbours in grid directions are required, the hrleSparseStarIterator is sufficient, while grid diagonal neighbour access requires the hrleSparseBoxIterator. The hrleSparseCellIterator is specialised for efficient access to a unit cell of the rectilinear grid, which is important for the conversion of LSs to explicit surfaces using the marching cubes algorithm [170].