A second STI task is the encapsulation of typical deficiencies occurring in a distributed computing environment. As VISTA/SFC is designed to distribute simulation tasks on a number of hosts connected by a network, with different tools executing on different machines and accessing data files on local and remote disks, the stable operation of the network layer is of greatest concern. As shown in Figure 4.19, on UNIX systems, remote executables are started using the remote procedure protocol (RPC), while data files are accessed across the network using the network file system (NFS). Due to the large size of simulation data files - typically several hundred kilobytes - reading and writing data from the remote host might take more or less time, depending on the bandwidth and the load of the network connection.
Figure 4.19:
Remote tool call and local file: Propagation
delay differences between remote job execution and network file data
transfer causes loss of synchronization between job termination and
file update.
If two tools are run one after the other, one reading the output of the previous one, situations occur where the RPC for the first tool has already terminated, but its output file has not been written completely by the NFS, or file locking is not synchronized. Starting the second tool immediately after the first one will result in a read error because of incomplete file data. The STI takes care of appropriately scheduling the submission of system jobs to avoid network-induced malfunctions.