Tuesday, April 2, 2019
Shared Memory MIMD Architectures
dual-lane Memory MIMD ArchitecturesIntroduction to MIMD Architectures eightfold cultivation stream, doubled entropy stream (MIMD) machines pull in a weigh of processors that function asynchronously and free-lancely. At whatever time, divers(prenominal) processors whitethorn be slaying diverse instruction manual on different pieces of selective information. MIMD computing machine architectures may be utilise in a turning of diligence argonas much(prenominal) as computer-aided invent/computer-aided manufacturing, simulation, modeling, and as parley switches. MIMD machines cornerst adept be of either overlap retention or distributed remembrance categories. These varietyifications ar based on how MIMD processors computer shop access stock. Sh atomic amount 18d wargonhousing machines may be of the deal-based, lengthy, or ranked type. Distributed shop machines may wear hypercube or mesh inter tie schemes.MIMDA type of multiprocessor architecture in w hich some(prenominal) instruction cycles may be active at either given time, all(prenominal) independently fetching instructions and operands into multiplex processing units and operating on them in a concurrent fashion. Acronym for multiple-instruction-stream. provide of Form( quintuple Instruction stream bigeminal Data stream) A computer that john process two or more independent sets of instructions simultaneously on two or more sets of data. Computers with multiple CPUs or single CPUs with dual cores are examples of MIMD architecture. Hyperthreading also results in a reliable degree of MIMD performance as well. Contrast with SIMD.In computing, MIMD (Multiple Instruction stream, Multiple Data stream) is a technique employed to achieve par eitherelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data. MIMD architectures may be wontd in a number of diligence areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD machines crumb be of either dual-lane fund or distributed reposition categories. These classifications are based on how MIMD processors access computer storehouse board. Shared memory machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnectedness schemes.Multiple Instruction Multiple DataMIMD architectures have multiple processors that all(prenominal) run away an independent stream (sequence) of machine instructions. The processors execute these instructions by using any accessible data rather than being forced to conk step forward upon a single, divided data stream. Hence, at any given time, an MIMD governing body can buoy be using as many different instruction streams and data streams as there are processors.Although software package pr ocesses executing on MIMD architectures can be synchronized by passing data among processors d sensation an interconnectedness network, or by having processors examine data in a dual-lane memory, the processors autonomous execution makes MIMD architectures asynchronous machines.Shared Memory Bus-basedMIMD machines with overlap memory have processors which share a common, central memory. In the simplest form, each(prenominal) processors are attached to a bus which connects them to memory. This setup is called bus-based divided out memory. Bus-based machines may have an new(prenominal) bus that enables them to communicate directly with one another. This additive bus is used for synchronization among the processors. When using bus-based shared memory MIMD machines, exclusively a small number of processors can be supported. there is feud among the processors for access to shared memory, so these machines are limited for this reason. These machines may be incrementally expanded up to the point where there is too much rivalry on the bus.Shared Memory ExtendedMIMD machines with extended shared memory attempt to avoid or reduce the contention among processors for shared memory by subdividing the memory into a number of independent memory units. These memory units are connected to the processsors by an interconnectedness network. The memory units are inured as a unified central memory. One type of interconnection network for this type of architecture is a crossbar switching network. In this scheme, N processors are linked to M memory units which requires N times M switches. This is not an economically feasible setup for connecting a large number of processors.Shared Memory HierarchicalMIMD machines with hierarchical shared memory use a hierarchy of buses to give processors access to each others memory. Processors on different boards may communicate through inter nodal buses. Buses support communication amidst boards. We use this type of architecture, the m achine may support over a thousand processors.In computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Depending on context, programs may run on a single processor or on multiple separate processors. victimization memory for communication inside a single program, for example among its multiple threads, is generally not referred to as shared memoryIN hardwareIn computer hardware, shared memory refers to a (typically) large stop of random access memory that can be accessed by several different central processing units (CPUs) in a multiple-processor computer system.A shared memory system is relatively easy to program since all processors share a single view of data and the communication in the midst of processors can be as closely as memory accesses to a analogous spatial relation.The issue with shared memory systems is that many CPUs need fast access to memory and w ill likely lay away memory, which has two complicationsCPU-to-memory connection becomes a bottleneck. Shared memory computers cannot scale very well. Most of them have ten or fewer processors.Cache coherence Whenever one save up is updated with information that may be used by other processors, the qualifying needs to be reflected to the other processors, otherwise the different processors will be working with in crystalline data (see save coherence and memory coherence). such(prenominal) coherence protocols can, when they work well, provide extremely high-performance access to shared information between multiple processors. On the other hand they can sometimes become overloaded and become a bottleneck to performance.The alternatives to shared memory are distributed memory and distributed shared memory, each having a resembling set of issues. See also Non-Uniform Memory retrieve.IN SOFTWAREIn computer software, shared memory is eitherA method of inter-process communication (IP C), i.e. a way of exchanging data between programs running at the same time. One process will create an area in repulse which other processes can access, orA method of conserving memory blank by directing accesses to what would ordinarily be copies of a piece of data to a single instance instead, by using virtual memory mappings or with explicit support of the program in question. This is most a lot used for shared libraries and for Execute in Place.Shared Memory MIMD ArchitecturesThe distinguishing feature film of shared memory systems is that no matter how many memory blocks are used in them and how these memory blocks are connected to the processors and address spaces of these memory blocks are unified into a global address space which is all visible to all processors of the shared memory system. Issuing a certain memory address by any processor will access the same memory block location. However, according to the physical system of the logically shared memory, two main ty pes of shared memory system could be distinguishedPhysically shared memory systemsVirtual (or distributed) shared memory systemsIn physically shared memory systems all memory blocks can be accessed uniformly by all processors. In distributed shared memory systems the memory blocks are physically distributed among the processors as topical anaesthetic memory units.The trey main design issues in increasing the scalability of shared memory systems areOrganization of memoryDesign of interconnection networksDesign of save up coherent protocolsCache CoherenceCache memories are introduced into computers in order to father data closer to the processor and hence to reduce memory latency. Caches wide accepted and employed in uniprocessor systems. However, in multiprocessor machines where several processors require a copy of the same memory block.The maintenance of consistency among these copies raises the so-called cache coherence problem which has three causesSharing of writable dataProc ess migrationI/O activityFrom the point of view of cache coherence, data structures can be divided into three classesRead-only data structures which never cause any cache coherence problem. They can be replicated and placed in any number of cache memory blocks without any problem.Shared writable data structures are the main source of cache coherence problems.Private writable data structures come out cache coherence problems only in the case of process migration.There are several techniques to maintain cache coherence for the critical case, that is, shared writable data structures. The applied methods can be divided into two classeshardware-based protocolssoftware-based protocolsSoftware-based schemes unremarkably introduce some restrictions on the cachability of data in order to hamper cache coherence problems.Hardware-based ProtocolsHardware-based protocols provide general solutions to the problems of cache coherence without any restrictions on the cachability of data. The price of this approach is that shared memory systems must be extended with sophisticated hardware weapons to support cache coherence. Hardware-based protocols can be classified according to their memory update policy, cache coherence policy, and interconnection scheme. deuce types of memory update policy are applied in multiprocessors pen-through and write-back. Cache coherence policy is divided into write-update policy and write-invalidate policy.Hardware-based protocols can be further classified into three basic classes depending on the nature of the interconnection network applied in the shared memory system. If the network expeditiously supports broadcasting, the so-called snoopy cache protocol can be intimately exploited. This scheme is typically used in single bus-based shared memory systems where consistency commands (invalidate or update commands) are broadcast via the bus and each cache snoops on the bus for incoming consistency commands.Large interconnection networks like mu ltistage networks cannot support broadcasting efficiently and therefore a mechanism is needed that can directly forward consistency commands to those caches that contain a copy of the updated data structure. For this purpose a directory must be kept up(p) for each block of the shared memory to administer the actual location of blocks in the thinkable caches. This approach is called the directory scheme.The third approach tries to avoid the application of the costly directory scheme but still provide high scalability. It proposes multiple-bus networks with the application of hierarchical cache coherence protocols that are generalized or extended versions of the single bus-based snoopy cache protocol.In describing a cache coherence protocol the following definitions must be given interpretation of possible provinces of blocks in caches, memories and directories.Definition of commands to be performed at various read/write hit/miss actions.Definition of state transitions in caches, m emories and directories according to the commands.Definition of transmission routes of commands among processors, caches, memories and directories.Software-based ProtocolsAlthough hardware-based protocols offer the fastest mechanism for maintaining cache consistency, they introduce a significant extra hardware complexity, particularly in ascendable multiprocessors. Software-based approaches settle a good and competitive compromise since they require nearly miserable hardware support and they can lead to the same small number of invalidation misses as the hardware-based protocols. All the software-based protocols rely on compiler assistance.The compiler analyses the program and classifies the variables into four classesRead-onlyRead-only for any number of processes and read-write for one processRead-write for one processRead-write for any number of processes.Read-only variables can be cached without restrictions. vitrine 2 variables can be cached only for the processor where the read-write process runs. Since only one process uses type 3 variables it is sufficient to cache them only for that process. Type 4 variables must not be cached in software-based schemes. Variables demonstrate different behavior in different program sections and hence the program is unremarkably divided into sections by the compiler and the variables are categorized independently in each section. More than that, the compiler generates instructions that control the cache or access the cache explicitly based on the classification of variables and engrave segmentation. Typically, at the end of each program section the caches must be invalidated to ensure that the variables are in a consistent state before starting a new section.shared memory systems can be divided into four main classesUniform Memory Access (UMA) MachinesContemporary uniform memory access machines are small-size single bus multiprocessors. Large UMA machines with hundreds of processors and a switching network were ty pical in the early design of scalable shared memory systems. Famous representatives of that class of multiprocessors are the Denelcor HEP and the NYU Ultracomputer. They introduced many innovative features in their design, some of which heretofore today represent a significant milestone in agree computer architectures. However, these early systems do not contain either cache memory or topical anesthetic main memory which turned out to be necessary to achieve high performance in scalable shared memory systemsNon-Uniform Memory Access (NUMA) MachinesNon-uniform memory access (NUMA) machines were designed to avoid the memory access bottleneck of UMA machines. The logically shared memory is physically distributed among the processing inspissations of NUMA machines, leading to distributed shared memory architectures. On one hand these parallel computers became highly scalable, but on the other hand they are very sensitive to data allocation in local memories. Accessing a local memory segment of a node is much faster than accessing a remote memory segment. Not by chance, the structure and design of these machines resemble in many ways that of distributed memory multicomputers. The main inequality is in the organization of the address space. In multiprocessors, a global address space is applied that is uniformly visible from each processor that is, all processors can transparently access all memory locations. In multicomputers, the address space is replicated in the local memories of the processing elements. This difference in the address space of the memory is also reflected at the software level distributed memory multicomputers are programmed on the basis of the message-passing paradigm, while NUMA machines are programmed on the basis of the global address space (shared memory) principle.The problem of cache coherency does not appear in distributed memory multicomputers since the message-passing paradigm explicitly handles different copies of the same data st ructure in the form of independent messages. In the shard memory paradigm, multiple accesses to the same global data structure are possible and can be accelerated if local copies of the global data structure are maintained in local caches. However, the hardware-supported cache consistency schemes are not introduced into the NUMA machines. These systems can cache read-only law and data, as well as local data, but not shared modifiable data. This is the distinguishing feature between NUMA and CC-NUMA multiprocessors. Accordingly, NUMA machines are closer to multicomputers than to other shared memory multiprocessors, while CC-NUMA machines look like real shared memory systems.In NUMA machines, like in multicomputers, the main design issues are the organization of processor nodes, the interconnection network, and the possible techniques to reduce remote memory accesses. Two examples of NUMA machines are the Hector and the Cray T3D multiprocessor.Sources usedwww.wikipedia.comhttp//www.d evelopers.net/tsearch?searchkeys=MIMD+architecturehttp//carbon.cudenver.edu/galaghba/mimd.htmlhttp//www.docstoc.com/docs/2685241/Computer-Architecture-Introduction-to-MIMD-architectures
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.