Parallel computing











During my research in physics I have been involved in simulations on parallel computer systems. Parallel processes are quite natural in the nature but early versions of computes were built with Single Instruction unit Single Data (SISD) type of architecture.

Fig. 1. SISD architecture
When the amount of data is huge and the operations to be performed on them is the same the Single Instruction unit Multiple Data (SIMD) architecture is more effective. Monte Carlo simulations and vision like data processing are typical problems like that.

Fig. 2. SIMD architecture
Massively parallelism appears nowadays mainly on the level of Multiple Instruction Multiple Data (MIMD) processor clusters owing to the commercially available cheap building elements with ever increasing clock speeds. However everybody knows that CPU clock speed can not increased without limit, and the memory access speeds are well behind. Therefore redesigning of the high performing architectures are necessary time to time. One such a direction is the intelligent memory IRAM  or processor on memory projects. By putting more and more fast memory on the silicon surface of the processors (cashes) or processors at the edges of the memory matrices one can avoid huge (1000 times magnitude) losses on the connection hardware, buses.

Fig. 3. Losses at CPU memory communications
The Massively Parallel Processing Collaboration started a research and development of conceptually similar architectures in the early nineties with a target of processing large quantities of parallel data on-line. The basic architecture was a low level MIMD high level SIMD to fit the best to the requirements. While the development has stopped with prototype (ASTRA-2) in the physics research development collaboration, the founding engineering company ASPEX continued developing the Associative String Processing (ASP) architecture to produce a special "co-processor" (System-V) for workstations that enhances image processing capabilities.

Fig. 4. ASP architecture
During my Associateship in CERN in 1990-1991 I was working in the MPPC collaboration [7], later I showed that it is possible to use this architecture for effective Monte Carlo simulation of statistical physical systems [r5, r3]. One can easily map one-dimensional stochastic cellular automata like systems on it if an appropriate random number generator has been invented. Since the processing elements are numerous but very small, one-site to one-processor mapping is possible if such random generator is used that can fit on the 64 bit memory of a processing element (APE) and still the cycle is large [r5,r3,17,21,22,24,25].

Fig. 5. Associative Processing Element (APE)
I have also developed effective simulation algorithms for Transputers [4], for the Connection Machine 5 [9,10,14] and for Fujitsu's AP1000, AP3000 series parallel computers [15-19,21-24]. While in case of CM5 the parallel architecture was hidden from the users and high level parallel Fortran and C compilers translated the problems onto the set of SuperSparc processors (interconnected by a fat three architecture) by the Transputers and by Fujitsu users had to develop their own parallel programs with the inter processor communications  (here processors were arranged on two-dimensional torus grid). Currently we run Condor and MPI jobs on the parallel clusters and supercomputers of NIIF and HPC-Europa. See a recent presentation.

Owing to continual price-performance decrease of commodity PC-s and internet connections a new paradigm, the meta-computing that embraces the whole globe has been emerging (GRID). It aims to provide a standard access to heterogeneous computing resources (similarly as the web does to information). The Globus project is developing fundamental technologies needed to build computational grids. Grids are persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations. In 2000-2001 I was involved in the the Grid project of CERN. My application "UC-Explorer" has been selected and has been running on the desktopgrid.

Recently we have been using GPUs for Monte Carlo simulations purposes, supported by the NVIDIA Academic Partnership Program. Our team made performance scaling of algorithms [58], [65] and attacked challenging problems of statistical physics [60], [63], [70], and nanotechnology [61], [66], presented at the GTC2013 Conference in San Jose (see recent talks).

Apr 1, 2014