Dissertation

D I S S E R T A T I O N

Parallel Velocity Extension and Load-Balanced Re-Distancing
on Hierarchical Grids for
High Performance Process TCAD

ausgeführt zum Zwecke der Erlangung des akademischen Grades
eines Doktors der technischen Wissenschaften

unter der Betreuung von
Assistant Prof. Privatdoz. Dipl.-Ing. Dr.techn. Josef Weinbub, BSc
O.Univ.Prof. Dipl.-Ing. Dr.techn. Dr.h.c. Siegfried Selberherr

eingereicht an der Technischen Universität Wien
Fakultät für Elektrotechnik und Informationstechnik
von

Dipl.-Ing. Michael Quell, BSc

Matrikelnummer 01226394

Wien, im November 2021

Abstract

The continuous developments and miniaturization of manufacturing processes for semiconductor devices require physical simulations to reduce the number of costly conventional experiments involved in the design and production processes. Most prominent are physical simulations which model individual physical processing steps like etching or deposition. These topography-changing simulations are commonly based on the level-set method, because of its capability to efficiently represent complex three-dimensional device structures. High accuracy demands of those simulations require the application of complex and, therefore, computationally expensive physical models.

In this work, three parallel algorithms belonging to two computational steps of the level-set method are introduced. The algorithms significantly reduce overall run-time and improve accuracy. The algorithms are tailored to simulations using adaptive discretizations with hierarchical grids to efficiently handle sharp features, e.g., corners and edges. The focus of the presented research is to efficiently utilize shared-memory parallel computing systems to stem the increasingly demanding level-set based physical simulations.

The first algorithm belongs to the computational step Velocity Extension which extends the velocity describing the deformation of an arbitrary structure from the structure’s surface to the entire computational domain. The developed velocity extension algorithm is based on the fast marching method. The fast marching method allows to extend the velocity in a single pass through the computational domain by means of a strict ordering of the computations. The key advantage of the developed velocity extension algorithm is a relaxed ordering of the computations. This not only reduces the computational complexity but also enables parallelism. Different stages of the developed algorithm are evaluated by comparing run-times measured on representative computing systems. A run-time reduction by a factor of 18.5 using 10 threads has been achieved.

The second algorithm belongs to the computational step Re-Distancing which creates or restores a numerically stable representation of the structure by computing the signed-distance field relative to the surface of the structure. This algorithm is also based on the fast marching method, but because of self-referred data dependencies a different parallelization strategy was developed. A domain decomposition is introduced to increase the granularity of the parallel tasks. This enables a better implicit load-balancing compared to the native decomposition provided by the given hierarchical grid. A speedup of more than 17.4 has been achieved when using 24 threads.

Finally, a bottom-up correction algorithm was developed, also belonging to the computational step Re-Distancing, which increases the accuracy of the signed-distance field computed by the second algorithm. This correction algorithm utilizes the signed-distance field on higher resolved regions of hierarchical grids to also reduce the error in lower resolved regions. The developed algorithm adds negligible computational overhead to the second algorithm, yet reduces the error around corners by a factor of up to 2.7.

Combining all developed algorithms, it is shown that the run-time of a representative physical simulation is more than halved whilst the accuracy is further improved.

Kurzfassung

Die kontinuierlichen Entwicklungen und die Miniaturisierung der Herstellungsprozesse für Halbleiterbauelemente erfordern physikalische Simulationen, um die Zahl der kostspieligen konventionellen Experimente in den Entwurfs- und Produktionsprozessen zu verringern. Am bekanntesten sind physikalische Simulationen, die einzelne physikalische Prozessschritte wie Ätzen oder Abscheiden modellieren. Diese topografieverändernden Simulationen basieren gewöhnlich auf der Level-Set-Methode, da sie komplexe dreidimensionale Bauelementstrukturen effizient darstellen kann. Die hohen Genauigkeitsanforderungen dieser Simulationen erfordern die Anwendung komplexer und daher rechenintensiver physikalischer Modelle.

In dieser Arbeit werden drei parallele Algorithmen eingeführt, die zu zwei Rechenschritten der Level-Set-Methode gehören. Die Algorithmen verringern die Gesamtlaufzeit erheblich und verbessern die Genauigkeit. Die Algorithmen sind an Simulationen angepasst, die adaptive Diskretisierungen mit hierarchischen Gittern verwenden, um spitze Geometrien, z.B. Ecken und Kanten, effizient zu behandeln. Der Schwerpunkt der hier vorgestellten Forschung ist die effiziente Nutzung paralleler Rechensysteme mit gemeinsamem Speicher, um die immer anspruchsvolleren Level-Set-basierten physikalischen Simulationen zu bewältigen.

Der erste Algorithmus gehört zum Rechenschritt Velocity Extension, der die Geschwindigkeit, die die Verformung einer beliebigen Struktur beschreibt, von der Oberfläche der Struktur auf das gesamte Simulationsgebiet ausdehnt. Der entwickelte Algorithmus zur Geschwindigkeitserweiterung basiert auf der Fast-Marching-Methode. Die Fast-Marching-Methode ermöglicht es, die Geschwindigkeit in einem einzigen Durchgang durch das Simulationsgebiet zu berechnen, indem die Berechnungen in einer strengen Reihenfolge durchgeführt werden. Der Hauptvorteil des entwickelten Algorithmus ist eine relaxierte Reihenfolge der Berechnungen. Diese reduziert nicht nur die Komplexität der Berechnungen, sondern ermöglicht auch Parallelität. Verschiedenen Entwicklungsstufen des Algorithmus werden durch den Vergleich der auf repräsentativen Rechensystemen gemessenen Laufzeiten bewertet. Eine Laufzeitverkürzung um den Faktor 18.5 wurde bei der Verwendung von 10 Threads erreicht.

Der zweite Algorithmus gehört zum Rechenschritt Re-Distancing, der eine numerisch stabile Repräsentation der Struktur durch Berechnung des vorzeichenbehafteten Abstandsfeldes relativ zur Oberfläche der Struktur erzeugt oder wiederherstellt. Dieser Algorithmus basiert ebenfalls auf der Fast-Marching-Methode, aber wegen der selbstbezogenen Datenabhängigkeiten wurde eine andere Parallelisierungsstrategie entwickelt. Es wird eine Gebietszerlegung eingeführt, um die Granularität der parallelen Aufgaben zu erhöhen. Dies ermöglicht einen besseren impliziten Lastausgleich im Vergleich zur nativen Gebietszerlegung, die durch das gegebene hierarchische Gitter bereitgestellt wird. Eine Geschwindigkeitssteigerung von mehr als 17.4 wurde bei der Verwendung von 24 Threads erreicht.

Schließlich wurde ein Bottom-up-Korrekturalgorithmus entwickelt, der ebenfalls zum Rechenschritt Re-Distancing gehört und die Genauigkeit des vom zweiten Algorithmus berechneten vorzeichenbehafteten Abstandsfeldes erhöht. Dieser Korrekturalgorithmus benutzt das vorzeichenbehaftete Abstandsfeld in höher aufgelösten Gebieten des hierarchischen Gitters, um auch den Fehler in niedriger aufgelösten Gebieten zu reduzieren. Der entwickelte Algorithmus fügt dem zweiten Algorithmus einen vernachlässigbaren Rechenaufwand hinzu, reduziert aber den Fehler bei Ecken um einen Faktor von bis zu 2.7.

Durch die Kombination aller entwickelten Algorithmen wird gezeigt, dass sich die Gesamtlaufzeit einer repräsentativen physikalischen Simulation mehr als halbiert, während die Genauigkeit weiter verbessert wird.

Acknowledgement

First, I want to thank the whole Institute for Microelectronics and all its members, for their welcoming support, when I started my journey in 2018.

Especially, I want to thank my supervisor Josef Weinbub, who is also the head of the Christian Doppler Laboratory for High Performance Technology Computer-Aided Design, for his continuous support and encouragement for my scientific path. He not only provided necessary guidance, but also excelled with personal wisdom.

I also want to thank my secondary supervisor Siegfried Selberherr, for providing excellent feedback content wise and grammatical. His eyes neither missed a single punctuation mark, nor a wrongly sized white space.

Additional thanks is directed to Andreas Hössinger from Silvaco Europe Ltd. who fueled the research with interesting questions and practical issues stemming from real world problems. I am grateful for his insights providing feedback and the valuable discussions.

From my colleagues, I would like to thank Alexander Toifl, for answering and explaining questions related to semiconductor devices to great detail, which lead to fruitful scientific collaborations. Also Paul Manstetten deserves my gratitude, because he provided me with high quality discussions and precise feedback especially related to high performance computations and presenting scientific results. I want to thank Luiz Felipe Aguinsky and Christoph Lenz, because our shared office lead to many insightful discussions on the chalkboard, leading to full blown research ideas and papers.

My thank is also directed towards the remaining present and former members of the Christian Doppler Laboratory for High Performance TCAD and the Institute for Microelectronics, Vito Simonka, Xaver Klemenschits, Georgios Diamantopoulos, Lukas Gnam, Alexander Scharinger, and Francio Rodrigues. The discussions with them during lunch widened my view on almost all topics concerning mankind.

Finally, I would like to thank my parents and siblings for their unconditional support during my journey, their nice words and encouragement on my goals. Only their altruistic deeds enabled me to pursue my career as a scientist.

Contents