A computational framework for regional seismic simulation of buildings with multiple fidelity models
Zhen Xu a, Xinzheng Lu b,*, Kincho H. Law c
a School of Civil and Environmental Engineering, University of Science and Technology Beijing, Beijing, 100083, P.R. China;
b Key Laboratory of Civil Engineering Safety and Durability of China Education Ministry, Department of Civil Engineering, Tsinghua University, Beijing 100084, P.R. China.
c Department of Civil and Environmental Engineering, Stanford University, Stanford, CA, 94305-4020, USA.
Abstract: Regional seismic damage simulation of buildings can potentially reveal possible consequences that are important for disaster mitigation and decision making. However, such a simulation involving all the buildings in a region can be computationally intensive. In this study, a computational framework using a network of distributed computers, each equipped with graphics processing units (GPUs), is proposed. The computational framework includes two types of structural fidelity models. For high-fidelity models, which are employed to analyze complex and/or important buildings, an efficient GPU-based linear equation solver is developed and incorporated in OpenSees, an open source computational platform commonly used for structural and earthquake engineering simulations of buildings and civil infrastructures. To handle the large number of computationally intensive high-fidelity structural models in a region, a dynamic load balancing strategy is designed to distribute the computational tasks among the available resources. For moderate-fidelity models, which are used to model regular building structures, a GPU-based tool is developed to accelerate the simulation. A static load balancing strategy is designed to distribute the computational tasks among the GPUs. To demonstrate the potential for a cost-effective and flexible computing paradigm for regional seismic simulation, the computational framework is applied to perform seismic simulation of a virtual city with 50 high-fidelity structural models and 100,000 moderate-fidelity building models.
Key words: Seismic damage simulation; multi-fidelity models; distributed computing; GPU; city-wide scale.
If you need the PDF version of this paper, please email to firstname.lastname@example.org
Cities are generally densely populated with many buildings and civil infrastructures. A strong earthquake occurring in a city can have a devastated consequence with many casualties and significant financial losses. For example, the 2011 Christchurch earthquake caused 185 deaths and a loss of US$ 11每15 billion, and significantly impacted the economy of New Zealand . Regional seismic damage simulation can potentially provide valuable information that can facilitate decision making, enhance planning for disaster mitigation, and reduce human and economic losses.
Various models with different levels of fidelities have been proposed for regional seismic damage simulation of buildings [2每11]. These models can be divided into three levels: high-, moderate- and low-fidelity models:
﹞ The structural models that involve detailed modeling of every beam, column and wall in a building are referred to as ※high-fidelity models§, such as the finite element (FE) tall building models using fiber beam elements and multi-layer shell elements proposed by Lu et al. . For buildings with important functions (e.g., hospitals and bridges) or with special structural arrangements (e.g., large span stadiums and stations, super-tall and irregular buildings), high-fidelity models are necessary to provide accurate and detailed seismic damage simulation that is essential for further seismic loss analyses (e.g. repair costs and downtime) [13每15].
﹞ The structural models using single-degree-of-freedom (SDOF) model or other simple ※equivalent§ models are referred to as ※low-fidelity models§. Examples of such models include the advanced engineering building modules (AEBM) proposed by the Federal Emergency Management Agency (FEMA) of the United States [16每17], and the damage probability matrix method . As previously discussed by Lu et al. , low-fidelity models, although computationally efficient, are not able to simulate many important damage features such as localized damages at the story levels.
﹞ Moderate-fidelity structural models are simplified models that are sufficient to determine important structural damages such as detection of softened stories. One example is the multi-degree-of-freedom (MDOF) concentrated mass shear (MCS) model that can fully represent the nonlinear characteristics of soft-story failure by story levels of a building . The MCS model requires only specification of five general characteristics of a building (i.e., structural type, construction year, area, building height, and number of stories) that can greatly simplify modeling of large number of regular buildings in a region.
In this study, we propose using high-fidelity models for the simulation of selected important and specialty buildings to evaluate in details the seismic performance of the buildings. On the other hand, moderate-fidelity models are employed to assess the damage levels of each of the thousands of regular buildings in a city.
Clearly, seismic simulation involving hundreds and thousands of buildings in a city is a computationally challenging problem. Yamashita et al.  employed supercomputer capability to perform regional seismic simulation with multi-fidelity models. Instead of supercomputers which are expensive to acquire and costly to maintain, distributed computing involving cluster of networked computers, each equipped with graphics processing units (GPUs), can be a cost effective alternative that is affordable even for engineering offices. GPU allows parallel computations to accelerate algorithmic calculations on a standalone computer. Furthermore, distributed computing can be employed to handle large number of buildings in a city by distributing the computational tasks to the networked computers with GPUs.
Because of its powerful parallel computing capability and low cost, GPU technology has been widely employed in many science and engineering fields, including biology, electromagnetism, geography, and others [19每23]. For seismic damage simulation of high-fidelity models, GPUs can be used to accelerate many of the matrix calculations in a finite element program, such as solutions of systems of equations and eigenvalue problems [24每25]. For the seismic damage simulation of a large number of moderate-fidelity models, Lu et al.  used a standalone computer equipped with a GPU to perform time history analysis (THA) using the MCS models and obtained a speedup of 39 times over the same simulation utilizing a non-GPU computer of the same price.
Distributed computing is a flexible and easily reconfigurable platform that takes full advantage of networked computers by dynamically adjusting the computational resources according to the computational loads . With adequate computational resources, distributed computing can be highly efficient to handle many large scale scientific problems [27每32]. For instance, Wijerathne et al.  used a cluster of workstations to simulate the seismic damage of buildings in Tokyo. Many distributed computing platforms, such as BOINC , Hadoop  and HTCondor , are now available. This study employs HTCondor for implementation because HTCondor is open-source and supports GPUs for distributed computing [37每38].
This study explores the use of both GPU computing and distributed computing for regional seismic damage simulation. A computational framework using a network of distributed computers, each equipped with a GPU, is proposed. The investigation includes developing distributed load balancing strategies as well as improving computational throughputs. For high-fidelity models, OpenSees (Open System for Earthquake Engineering Simulation) is employed . Specifically, a linear equation solver that takes full advantage of the parallel computing capability of GPUs is incorporated in OpenSees to expedite the solution process. Furthermore, a greedy, dynamic load balancing strategy is designed for distributing the high-fidelity building models among the available resources. For moderate-fidelity models, a GPU-based implementation of the MCS models is employed in this study. A static load balancing strategy to handle the large number of regular building structures is proposed. To demonstrate the potential of a cost-effective and flexible computing paradigm for simulating seismic damage to buildings in a regional scale, the computational framework is applied to seismic simulation of a virtual city with 50 high-fidelity structural models and 100,000 moderate-fidelity building models.
2. Overview of the computational framework
Figure 1 shows the overall computational framework for regional seismic simulation with multiple fidelity models. Simulations conducted in this study include two types of fidelity models: (1) high-fidelity models employing refined FE models with multi-layer shell elements and fiber beam elements for important or special buildings ; and (2) moderate-fidelity models using the MCS model for regular buildings . A simulation of a building model is considered as a computational task. The host computer manages the assignment of the building models to a cluster of networked slave computers, each is designated for simulations of either high or moderate fidelity models.
In the host computer, load balancing strategies are implemented to assign the computational tasks according to the capabilities of the slave computers. For high-fidelity models, OpenSees  is selected as the analysis platform. Since the complexity of each high-fidelity model varies significantly, the computational loads are difficult to estimate. Hence, a greedy load balancing strategy that assigns the computational task dynamically according to the actual load status of a slave computer is used. For moderate-fidelity models, the computational load can be easily estimated. Hence, a static load balancing strategy, for which all the computational tasks are assigned to the slave computers prior to the simulation, is proposed. The above two load balancing strategies are both implemented by HTCondor. However, it should be noted that these strategies are not limited to HTCondor and their implementation on other distributed computing platforms is also feasible.
The simulations take full advantage of the GPUs equipped in the slave computers. For the high-fidelity simulation with OpenSees, a GPU-based linear equation solver is incorporated to accelerate the nonlinear THAs. For the moderate-fidelity simulation, the computational tasks of the building models are performed as threads executed in parallel on the GPUs. In this study, CUDA, a widely-used parallel computing platform is employed for the software development on the GPUs .
3. Software implementation
In this study, the regional seismic simulation involves two different fidelity models. Each uses a different computational tool and requires a different load balancing strategy. The following describes in details the key software implementation efforts for the simulation models.
3.1 Computer simulation with high-fidelity building models
For high-fidelity models, simulations of the building structures are performed using OpenSees . In general, high-fidelity models involve detailed modeling of a structure and result in large (stiffness and mass) matrices. Accurate simulations using implicit nonlinear THAs of such models require significant amount of computational times spent on solving the linear systems of equations. Figure 2 shows the implementation of the linear equation solver on a slave computer equipped with a GPU. Denoting the matrix system as Ax = b where A represents the system matrix and b and x are, respectively, the right hand side vector and the solution vector. Since reading data from the main memory of a computer is far slower than reading data within the GPU , the first step is to copy the assembled system matrix A and vector b to the GPU so that the entire solution process, including preconditioning, is performed within the GPU. The results for the solution vector x are then returned back to the computer.
For ease of parallelization, iterative schemes such as conjugate gradient (CG), Bi-CG and GMRES, are commonly adopted for GPU computing. In this study, CUSP, an open source sparse linear algebra and graph computation library on CUDA, is employed . Since the CUSP library has provided the GPU-based functions of CG, Bi-CG and GMRES, a solver in the form of an independent dynamic link library (DLL) is developed based on CUSP for solving the equations. In this way, the solver itself can be easily modified or updated. In this study, diagonal preconditioning is employed with the iterative solvers. By replacing the solver in the function setLinearSOE() of the Class LinearSOESolver in OpenSees with the developed solver, the solution of linear equations of high-fidelity models will be done by GPUs in a parallel way.
To distribute the large number of high-fidelity building models among the slave computers, a greedy dynamic load balance strategy is proposed. We assume that the communication time between the host computer and the slave computer is negligible in comparing to the lengthy computational time required for the simulation of high-fidelity models. In this case, the load status of a slave computer is evaluated by the load rate of the CPU . The strategy is to select an unassigned task that is computationally most demanding and assign it to the least loaded slave computer. To do so, two heaps are built for the purpose of task assignment: one for tracking the computational tasks and the other for the load status of the slave computers. Before every assignment, the load status heap is updated and checked whether any of the slave computers is available for accepting a new task. If so, the task in the task heap that requires the most computations is assigned to the slave computer with the lowest load until the task heap is empty. As shown in Figure 3, the process of dynamic load balancing strategy includes four basic steps as follows:
(1) Preparation: Prior to assigning the tasks to the slave computers, a task heap consisting of all the high-fidelity building models to be simulated is constructed. The tasks are sorted by their estimated computational loads according to the number of degrees of freedom (DOF) of the models, since the number of DOFs in general is a good estimate of the computational requirement for the analysis of a building model. The tasks in the heap are sorted in descending order with the top element of the heap being the most computationally demanding task. Additionally, a load status heap is built to record the load status (i.e. the load rate of the CPU) of all the slave computers available for the simulation of the high-fidelity building models. Note that a computer with no task assignment normally would still have some background system operations (e.g., from the operation system). The load status heap is initialized by the background load of each slave computer and is sorted in ascending order with the least loaded slave computer as the top element in the heap.
(2) Updating: For the simulation of high-fidelity models using OpenSees, each computational task would be uploaded to a GPU. Since each slave computer generally has only one GPU, each slave computer can handle only one task at a time. When a slave computer is assigned to execute a task, the computer becomes unavailable for accepting a new task. The load status heap is continually updated for the task assignment when the load status of any slave computer changes.
(3) Assignment: For task assignment, among the unassigned tasks in the task heap, the heaviest task requiring the most computational efforts (i.e. the task with the most DOFs) is chosen to assign to the slave computer with the lowest load status (i.e. the load rate of the CPU). Once assigned, the computational task is removed from the task heap.
(4) Completion: The dynamic task assignment strategy continues until all the tasks are assigned and all the simulations are completed.
The dynamic load balancing strategy for task assignment of the high-fidelity building models is implemented using the commands defined in HTCondor . As shown in Table 1, the dynamic strategy requires the information about the load status and task queue of slave computers and the ability to assign specific tasks to the slave computers. The load status and task queue of each slave computer can be obtained by the HTCondor＊s commands condor_status and condor_q, respectively. Note that the task queue is used to indicate whether a slave computer is available. Task assignment is issued using the Input and the Requirements commands in HTCondor. Using the Input command, different structural models are defined as inputs of the corresponding tasks to establish the relationships between the structural models and the tasks. Using the Requirements command, the name of the computer corresponding to an assigned task is defined, so that the tasks can be submitted to the specific slave computer. In addition, the file transfer between the host computer and the slave computers is performed by the commands transfer_input_files and transfer_output_files. Specifically, the transfer_input_files command is used to transfer the model files to the slave computers, while the transfer_output_files command is used to transfer the simulation results back to the host computer.
Table 1. Implementation of the proposed strategy using HTCondor
3.2 Computer simulation with moderate-fidelity MCS models
Figure 4 shows the parallel computational framework for the simulation of the moderate-fidelity MSC models on a GPU . Specifically, for the data flow of the simulation, the input data includes the parameters of MCS models (e.g., the structural height, the number of stories, the mass of each story, the inter-story hysteretic relations, the damping ration, etc. ) and the ground motion records, while the output data includes the damage states and the seismic responses of the structure.
To initiate the simulation (see Figure 4), the input data are first copied from the computer to the GPU. Each MCS model is assigned to a GPU thread. Nonlinear THAs of all the MCS models are executed in parallel by using the GPU threads. The output data is returned to the computer from the GPU for post-processing. As mentioned before, CUDA is used as the GPU computing platform . In CUDA, a program is executed by blocks of threads . Each block can have multiple threads, in this case, corresponding to the THAs of the MCS models. The number of threads per block can vary by the multiple of 32 but no more than 1024 . Typically, the number of threads per block is set as 256 [40, 43]. The total number of threads (i.e., the product of the number of blocks and the number of threads per block) must be larger than or equal to the number of tasks, to make sure each task has a GPU thread for performing a THA. For the parallel THA of the MCS models, the computing time required by each thread is different. This is because different threads correspond to the different MCS models of the buildings with different number of stories, and therefore have different workloads. Hence, to avoid conflicts during executing the block in a GPU, a synchronization function (i.e., _syncthreads() in CUDA ) is employed for managing the threads in a block, so that the next block will not start until all the threads in the previous block are finished. In addition, explicit (central difference) time integration scheme is adopted for the THA of the MCS models. Explicit method is particularly suitable for parallel computing and, for the MCS models, poses no convergence issues.
Since the input data of a MCS model is very small with approximately 0.3 Kbytes of data per story and the typical local network speed is on the order of 100 Mbps, the time for data communication between the host and the slave computers is negligible comparing to the simulation of a MCS model. For the case study to be discussed in Section 4, the time to transfer 100,000 MCS models (with a total size of approximately 12 Mbytes) between the host computer and the slave computers takes only 0.96 s, while the simulation of the models on a powerful GPU takes 731 s. Hence, the basic principle is to assign the computational tasks to as many GPUs as possible to maximize the throughput for parallel execution. In addition, more tasks should be assigned to the slave computer with a more powerful GPU. A simple strategy to distribute the tasks to the GPU is determined a priori before the parallel execution of the MCS models.
For the MCS model, the response of a building is computed story-by-story and the computations for each story are identical [7, 18]. Therefore, the number of stories can be used to determine the computational workload of a MCS model. The computational capability of a GPU is measured by the FLOPS (floating-point operations per second) of the GPU. Denoting Si as the number of stories for the ith building and Cj as the computational capability of the GPU in the jth slave computer, a ratio k relating the total number of stories to be simulated for all the n buildings and the total computational capability of the simulation platform with m slave computers can be calculated as:
Therefore, the ideal load LjSlave (i.e., the number of stories) for the jth slave computer can be estimated as:
Subsequently, the MCS models are assigned to the slave computers one-by-one according to the ideal load LjSlave. When the total number of stories of the models to be assigned to the jth slave computer exceeds LjSlave, the remaining models will be assigned to j+1th slave computer, and so on.
The simple static assignment strategy for the MSC models is implemented using HTCondor . HTCondor only needs to submit the assigned tasks to the corresponding slave computers. This can be easily achieved by specifying (1) the MCS models to be submitted using the Input command, and (2) the slave computer to be used for the THA using the Requirements command. Similar to the dynamic assignment strategy discussed earlier, the two commands, transfer_input_files and transfer_output_files, are used to transfer files between the host computer and the slave computers.
4. Case study
To assess the performance of the computational framework for regional seismic damage simulation, a virtual city consisting of building types similar to those in the cities of Xi＊an and Taiyuan, China is constructed as a case study. Information about the buildings including structural types, construction years, building heights and number of stories, etc. are obtained from the survey data by the local government departments (i.e., Bureau of Housing and Urban Development) of the two cities. Altogether, the virtual city consists of 100,000 regular buildings and 50 important or special buildings, such as tall residential building and large office buildings. The important buildings are modeled using the high-fidelity models and simulated using OpenSees based on the real design data obtained from the actual buildings. The multi-story regular buildings are modelled using the MCS models. The El-Centro ground motion record with a scaled peak ground acceleration (PGA) of 400 cm/s2 is selected for the seismic damage simulation of the virtual city. Although ground motion inputs for each building in the city may vary and should take into account each site characteristics and the distance from the epicenter, this study focuses only on the computational efficiency of the simulation framework and therefor assumes the same ground motion for all the buildings.
The computational system employed in this study includes one host computer and seven slave computers, which are connected over a local network with the communication speed of 100 Mbps. The hardware configurations for the computers, each equipped with a GPU, are shown in Table 2. Among the seven slave computers, slave computers 1 and 2 are the most powerful in terms of both the CPU and the GPU. Specifically, the CPU and GPU of the slave computer 1 are, respectively, 4.5 times and 5.9 times faster than the weakest slave computer 7 in terms of FLOPS performance. The average price for a slave computer including a GPU is approximately US$1700 when purchased in 2014. It should be noted that the system platform is flexible and allows new hardware to be added easily.
Table 2 Hardware configuration of the computational framework
4.1 Simulation of high-fidelity building models
We first evaluate the effectiveness of a (diagonal) preconditioned CG solver provided by the CUSP library implemented with OpenSees. As shown in Figure 5, a 43-story, 141.8 meters tall building, consisting of 23,024 fiber beam elements and 16,032 multi-layered shell elements, with a total of 23,945 nodes and 143,549 degrees of freedom is employed for the simulation on OpenSees. There are seven linear equation solvers currently available in OpenSees  with SparseSYM having the highest efficiency as a direct solver for this case . As shown in Figure 6, good agreement in the displacement time histories on the top floor is obtained by the iterative and direct solvers. Using the slave computer 1, the computational time using the GPU-based iterative solver requires only 11 hours and is 15 times faster than the simulation using the direct solver SparseSYM (with the RCM ordering for bandwidth minimization) which requires 168 hours running entirely on the CPU on the same machine. The distributed computational framework is applied to simulate the seismic performance of the 50 high-fidelity building models, with the number of DOFs ranging from 5,672 to 143,549. Figure 7 shows the total simulation times required for all 50 building models using different number of slave computers. The total simulation time for all 50 models using one slave computer (Slave 1) takes about 168 hours with the GPU-based solver. As the number of slave computers increases, the simulation time decreases. Since the slave computers 1 to 3 are relatively powerful, significant improvements can be observed when the computations are distributed among the three computers. When the slave computers 4 to 7 (among which the slave computer 7 being the weakest) are included, although the total simulation times are shortened, the improvements on computational efficiency are not as dramatic. When all seven computers are employed, the total simulation time decreases to about 48 hours, achieving a speed up of 3.5 times efficiency in comparing with the simulation on the fastest slave computer.
With the dynamic load balance strategy, the computational tasks are assigned based on the capabilities of the computational resources. As shown, in Figure 8, the computational times for all the slave computers employed for the simulation of the 50 buildings are about the same. Last but not least, as illustrated in Figure 9, if the computational tasks are assigned randomly (without load balancing), the total computational times can vary widely; in the 10 simulation studies shown in Figure 9, the total simulation time can vary from 58 to 280 hours. This example demonstrates the benefits and the effectiveness of the dynamic load balancing strategy for the distributed assignment of high-fidelity simulation models among the cost effective GPU-based computational framework.
4.2 Simulation of moderate-fidelity building models
For the simulation of the 100,000 regular buildings, ranging from 1 to 22 stories, the moderate-fidelity MCS models are employed. First, the effectiveness of the proposed GPU-based computational strategy for MCS models is evaluated. Using the slave computer 1, Figure 10 compares the efficiency for the simulations of 10,000 to 100,000 buildings using the CPU and the GPU of the computer. The simulation of the 100,000 building on the slave computer 1 using the GPU needs only 731 s, which shows a speed up of 54.5 times when comparing to the 39,480 s (approximately 11 hours) required for the simulation using the CPU of the same computer. This result clearly shows the benefits of GPU computing.
With the static load balance strategy, the 100,000 MCS models are assigned to 7 slave computers according to the FLOPS performance values of the GPUs. The total computing time for distributed simulation of the 100,000 buildings only takes 123 seconds. Comparing to the computing time of 731 seconds using the slave computer 1 with the fastest GPU, the distributed simulation makes a further improvement by almost 6 folds. In the process of task assignment, the workloads (i.e., the number of stories) are assigned in proportional to the capability of the GPU. As shown in Figure 11, more stories are assigned to the slave computers with more powerful GPUs. Consequently, the computational times for all of the slave computers are approximately the same. This result shows the effectiveness of the proposed static load balancing strategy for the distributed assignment of moderate-fidelity models.
4.3 Simulation results
Using the simulation framework as described, it is now possible to simulate the nonlinear THA of hundreds of thousands of building models within reasonable time. For example, a virtual city consisting of 50 important and special buildings and 100,000 regular buildings can be simulated in about 48 hours. Without distributed computing and GPU computing, even using the most powerful slave computer 1, the total simulation time would have taken over 2500 h. The distributed GPU computing framework expedites the simulation of this case study of a relatively big city by 52 times. If additional hardware is incorporated for the multi-fidelity simulation, the computational times can be further reduced.
The result for the region wide simulation of the buildings in a virtual city is shown in Figure 12. The simulation produces the distribution of building damages in the region and the specific damage locations (i.e., the damage state of each story) of each building, which provide important guidance for seismic preparation of a city. From the simulation results, the number of buildings with different damage states can be accounted and the economic losses of the buildings can be estimated. The building damage information provides a valuable reference for the disaster planning and considerations of earthquake insurance. In addition, a visual earthquake scenario (See Figure 12) is created based on the simulation results, which can be used to heighten the awareness of earthquake consequences by the policy makers for emergency preparedness.
Fig. 12 Seismic damage states in a virtual city (different colors represent different damage
5. Summary and discussion
In this study, a computational framework combining distributed computing and GPU computing designed for regional seismic simulation is presented. For large scale regional seismic simulation of a virtual city with thousands of buildings, multiple fidelity models are employed for modelling the buildings according to their level of significance. The following summarize the results of this study:
(1) For seismic simulation of high-fidelity models, a GPU-based solver for linear equation is developed and a dynamic load balancing strategy is proposed. For moderate-fidelity models, a GPU-based parallel strategy is proposed for the simulation of thousands of buildings and an efficient static load balancing strategy is designed for assigning the tasks among the GPU-based computers.
(2) For illustration, a case study of a virtual city consisting of 50 high-fidelity models and 100,000 moderate-fidelity models has been conducted. The seismic damage simulation can be simulated within 48 hours using the distributed and parallel computational framework, achieving 52 times improvement when comparing to the computing time required for the same simulation on a single computer. Furthermore, if additional hardware is incorporated for the multi-fidelity simulation, the computational times can be further reduced.
(3) Lastly, this study has demonstrated the distributed and parallel computational framework is cost-effective, computationally efficient and flexible for regional seismic damage simulation with multi-fidelity models.
The distributed computing and GPU computing framework for regional seismic simulation presented in this paper deals with the use of high-fidelity models that are available in the OpenSees platform for super-tall and important structures [12, 45每46] and a highly versatile moderate-fidelity, concentrated mass shear (MCS) model for regular multi-story buildings . For the time-consuming high-fidelity models, this study has incorporated a parallel, GPU iterative solver to expedite the solution process. There are other approaches that can be used to further reduce the computational time. For instance, the macro-element approach [47每49] which has been shown for lowering the computational cost, especially for the analysis of masonry and monumental structures, can be adopted. Other parallel strategies, such as the methods of dual partition super elements, have also been proposed for large scale simulations [50每51]. However, implementation of these methods would likely require significant modification of the current, open source OpenSees platform. Nevertheless, it should be noted that efforts have been made to parallelize the OpenSees code on computers with multiple CPUs, for example, using domain partition and distributed solvers [45每46]. Our effort has primarily been focused on developing efficient GPU-based methods to accelerate the simulation of high-fidelity models with minimum changes to the single CPU-based OpenSees software commonly employed by the earthquake engineering research community. Integrating the GPU-based solution strategies with the parallel software development (including domain decomposition or super-element strategies) could be an effort worth pursuing as future work.
Another challenging and time consuming task is to construct the structural models for a city region with hundreds and thousands of buildings and facilities. The moderate-fidelity, MDOF concentrated mass shear model that requires only a small number of characteristics of each building can be easily automated and can be constructed in a few minutes . However, constructing a high-fidelity model that includes detailed modeling of each member and member type is a time consuming process. For the structural design models that are readily available in analysis and design software (such as ETABS or SAP2000), the modelling work can be automated by transforming the design models to the analysis models for OpenSees. However, if the structural design models are not available, the high-fidelity analysis models would need to be built manually. For the case study with 50 high-fidelity models (among them 32 design models available) and 100,000 moderate-fidelity models, the modelling task for the entire region has taken about 2 months, with the major effort spending on the 18 high-fidelity models without prior design models. Future work to facilitate model construction (e.g., using graphical user interfaces or wizard) will be beneficial to reduce the total time required for conducting regional seismic analysis.
Lastly, a cloud computing plan based on the proposed computational framework will be developed. In this plan, users can simply upload the building models with different fidelities using a web interface. The proposed framework will perform the simulation of the multi-fidelity models and return the results to the users for downloading. The cloud computing plan could potentially provide a more convenient, flexible and extendible environment than the current framework.
The first two authors are grateful for the financial support received from the National Key Technology R&D Program (No. 2015BAK14B02), the National Natural Science Foundation of China (No. 51578320), National Non-profit Institute Research Grant of IGP-CEA (Grant No: DQJB14C01) and the European Community's Seventh Framework Programme, Marie Curie International Research Staff Exchange Scheme (IRSES) under grant agreement n∼ 612607. The third author would like to acknowledge the chair (visiting) professorship sponsored by Tsinghua University which greatly helps facilitate this research during his visit to the university.
 Reyners M. Lessons from the destructive Mw 6.3 Christchurch, New Zealand, earthquake. Seismol Res Lett 2011; 82(3): 371每372.
 Tyagunov S, Grunthal G, Wahlstrom R, Stempniewski L, Zschau J. Seismic risk mapping for Germany. Nat Hazard Earth Sys 2006; 6: 573每586.
 Korkmaz KA. Seismic safety assessment of unreinforced masonry low-rise buildings in Pakistan and its neighborhood. Nat Hazard Earth Sys 2009; 9: 1021每1031.
 Steelman J, Hajjar J. Influence of inelastic seismic response modeling on regional loss estimation. Eng Struct 2009; 31: 2976每2987.
 Yamashita T, Kajiwara K, Hori M. Petascale computation for earthquake engineering. Comput Sci Eng 2011; 13: 44每49.
 Xiong C, Lu XZ, Lin XC, Xu Z, Ye LP, Parameter determination and damage assessment for THA-based regional seismic damage prediction of multi-story buildings. J Earthq Eng 2016. DOI: 10.1080/13632469.2016.1160009.
 Xu Z, Lu XZ, Guan H, Tian Y, Ren AZ, Simulation of earthquake-induced hazards of falling exterior non-structural components and its application to emergency shelter design. Nat Hazards 2016; 80(2): 935每950.
 FEMA. Multi-hazard loss estimation methodology, earthquake model, HAZUS每MH 2.1 technical manual, Washington, DC, 2012.
 FEMA. Multi-hazard loss estimation methodology HAZUS每MH 2.1 advanced engineering building module (AEBM) technical and user＊s manual, Washington, DC, 2012.
 Lu XZ, Han B, Hori M, Xiong C, Xu Z. A coarse-grained parallel approach for seismic damage simulations of urban areas based on refined models and GPU/CPU cooperative computing. Adv Eng Softw 2014; 70: 90每103.
 Wen-Mei WH. GPU computing gems emerald edition. Morgan Kaufmann, Burlington, MA, 2011.
 Stone JE, Phillips JC, Freddolino PL, Hardy DJ, Trabuco LG, Schulten K. Accelerating molecular modeling applications with graphics processors. J Comput Chem 2007; 28: 2618每40.
 Weldon M, Maxwell L, Cyca D, Hughes M, Whelan C, Okoniewski M. A practical look at GPU-accelerated FDTD performance. Appl Comput Electrom 2010; 25: 315每22.
 Zheng J, An X, Huang M. GPU-based parallel algorithm for particle contact detection and its application in self-compacting concrete flow simulations. Comput Struct 2012; 112每113: 193每204.
 Cai Y, Wang GP, Li GY, Wang H. A high performance crashworthiness simulation system based on GPU. Adv Eng Softw 2015; 86: 29每38.
 Buatois L, Caumon G, Levy B. Concurrent number cruncher: a GPU implementation of a general sparse linear solver. International Journal of Parallel, Emergent and Distributed Systems 2009; 24(3): 205每223.
 Coulouris, G, Dollimore J, Kindberg T, Blair G. Distributed systems: concepts and design (5th edition). Addison-Wesley, Boston, MA, 2011.
 Xian W, Takayuki A. Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster. Parallel Comput 2011; 37(9): 521每535.
 Chen Y, Cui X, Mei H. Large-scale FFT on GPU clusters. Proceedings of the 24th ACM International Conference on Supercomputing, ACM, New York, NY, 2010.
 Cevahir A, Nukada A, Matsuoka S. High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning. Comput Sci Res Dev 2010; 25(1-2): 83每91.
 Hassan AH, Fluke CJ, Barnes DG. A distributed GPU-based framework for real-time 3D volume rendering of large astronomical data cubes. Publ Astron Soc Aust 2012; 29(3): 340每351.
 Okamoto T, Takenaka H, Nakamura T, Aoki T. Large-scale simulation of seismic-wave propagation of the 2011 Tohoku-Oki M9 earthquake. Proceedings of the International Symposium on Engineering Lessons Learned from the 2011 Great East Japan Earthquake, Tokyo, Japan, 2012.
 Wang LZ, Laszewski G, Kunze M, Tao J, Dayal J. Provide virtual distributed environments for grid computing on demand. Adv Eng Softw 2010; 41(2): 213每219.
 Wijerathne M, Hori M, Kabeyazawa T, Ichimura T. Strengthening of parallel computation performance of integrated earthquake simulation. J Comput Civil Eng 2013; 27: 570每573.
 BOINC. BOINC project governance, http://boinc.berkeley.edu/trac/wiki/Project-Governance, 2014.
 Hadoop. What is Apache Hadoop, http://hadoop.apache.org/, 2014.
 HTCondor. http://research.cs.wisc.edu/htcondor/, 2014.
 Yang CF, Li HJ, Rezgui Y, Petri I, Yuce B, Chen BS, Jayan B. High throughput computing based distributed genetic algorithm for building energy consumption optimization. Energ Buildings 2014; 76: 92每101.
 Guba S, Őry M, Szeber谷nyi I. Harnessing wasted computing power for scientific computing. Large-Scale Scientific Computing, Springer Berlin Heidelberg, Berlin, Germany, 2014.
 OpenSees. http://opensees.berkeley.edu/, 2014.
 NVIDIA. CUDA Programming Guide. http://docs.nvidia.com/cuda, 2014.
 CuSPSolver, http://opensees.berkeley.edu/wiki/index.php/Cusp, 2014.
 Zheng GB. Achieving high performance on extremely large parallel machines: performance prediction and load balancing. University of Illinois at Urbana-Champaign, 2005.
 Nickolls J, Buck I, Garland M, Skadron K. Scalable parallel programming with CUDA. Queue 2008; 6(2): 40每53.
 McKenna F. OpenSees: a framework for earthquake engineering simulation. Comput Sci Eng 2011; 13(4): 58每64.
 McKenna F. Introduction to OpenSees, http://www.slideshare.net/openseesdays/d1-001-005
mckennaosdpt2014, Sep 10, 2014.
 Marques R, Lourenço PB. Unreinforced and confined masonry buildings in seismic regions: validation of macro-element models and cost analysis. Eng Struct 2014; 64: 52每67.
 Cali辰 I, Marletta M, Pant辰 B. A new discrete element model for the evaluation of the seismic behaviour of unreinforced masonry buildings. Eng Struct 2012; 40: 327每338.
 Cali辰 I, Pant辰 B. A macro-element modelling approach of infilled frame structures. Comput Struct 2014; 143: 91每107.
 Jokhio GA, Izzuddin BA. A dual super-element domain decomposition approach for parallel nonlinear finite element analysis. International Journal of Computational Science and Engineering 2015; 16(3): 188每212.
 Jokhio GA, Izzuddin BA. Parallelisation of nonlinear structural analysis using dual partition super elements. Adv Eng Softw 2013; 60-61:81每88.
Fig. 1 Framework for simulating regional seismic damage with multi-fidelity models
Fig. 2 Method of linear equations solving based on GPUs
Fig. 3 Flow chart of the proposed dynamic load balancing strategy
Fig. 4 A GPU-based parallel simulation method for the MCS models
Fig. 5 High-fidelity model of a tall building
Fig. 6 Comparison of the time-history top displacements in the two different platforms
Fig. 7 Relationship between computing time and number of Slaves
Fig. 8 Computing time for each Slave in the high-fidelity simulation
Fig. 9 Comparisons between the proposed strategy and the ten random assignments
Fig. 10 Comparison between the proposed GPU computing and CPU computing
Fig. 11 Computing time for each Slave in the moderate-fidelity simulation
Fig. 12 Seismic damage states in a virtual city (different colors represent different damage states at different stories)
Table 1. Implementation of the proposed strategy using HTCondor
Table 2 Hardware configuration of the computational framework
* Corresponding author. Tel.: 86-10-62795364; E-mail address: email@example.com.