If you have any suggestions about our website or study materials, please feel free to leave a comment to help us improve. Alternatively, you can send us a message through the Contact Us page. Thank you for helping us make Free Study Hub better!

CSE-211_Part_8


Topics 

1. Symmetric Multiprocessing (SMP)

  • DefinitionSMP i.e. symmetric multiprocessing, refers to the computer architecture where multiple identical processors are interconnected to a single shared main memory, with full accessibility to all the I/O devices, unlike asymmetric MP. In other words, all the processors have common shared(common) memory and same data path or I/O bus as shown in the figure(Credit: Geeks for geek).
  • Characteristics of SMP

    • Identical: All the processors are treated equally i.e. all are identical.
    • Communication: Shared memory is the mode of communication among processors.
    • Complexity: Are complex in design, as all units share same memory and data bus.
    • Expensive: They are costlier in nature.
    • Unlike asymmetric where a task is done only by Master processor, here tasks of the operating system are handled individually by processors.

    Applications

    This concept finds its application in parallel processing, where time-sharing systems(TSS) have assigned tasks to different processors running in parallel to each other, also in TSS that uses multithreading i.e. multiple threads running simultaneously.

    Advantages

    • Throughput: Since tasks can be run by all the processors unlike in asymmetric, hence increased degree of throughput(processes executed in unit time).
    • Reliability: Failing a processor doesn’t fail whole system, as all are equally capable processors, though throughput do fail a little.

    Disadvantages

    • Complex design: Since all the processors are treated equally by OS, so designing and management of such OS become difficult.
    • Costlier: As all the processors share the common main memory, on account of which size of memory required is larger implying more expensive.

2. Components of SMP Systems

  • Processors:
    • Multiple processors or cores, each capable of executing tasks simultaneously.
    • Equal access to the shared memory and I/O systems.
  • Memory:
    • Shared physical memory accessed by all processors.
    • Memory is typically connected via a memory bus or crossbar switch.
  • Inter-Processor Communication:
    • Shared memory or message-passing mechanisms for communication.
    • Cache Coherency Protocols like MESI ensure consistency across caches.
  • I/O System:
    • Shared I/O buses or controllers allowing processors to access I/O devices independently.

 Use Cases for SMP

  • Database Servers: Handling concurrent queries or transactions.
  • High-Performance Computing (HPC): Scientific simulations and complex calculations.
  • Web Servers: Parallel processing of requests for scalability and high throughput.

 SMP System Challenges

  • Managing Cache Coherency: Ensuring all processors see a consistent view of memory.
  • Memory Bandwidth Limitations: Ensuring memory access does not become a bottleneck.
  • Scaling Beyond a Certain Number of Processors: Performance gain diminishes as the number of processors increases.

2.Producer-Consumer Problem:

  • Problem: A producer generates data and puts it into a shared buffer, while a consumer takes data from this buffer. The challenge arises when both the producer and consumer operate concurrently, which can cause synchronization issues:
    • The consumer may attempt to consume data when the producer hasn't produced any yet, resulting in underflow (attempting to consume from an empty buffer).
    • The producer may attempt to add data when the consumer hasn't consumed enough, leading to overflow (adding to a full buffer).
  • Solution: Synchronization mechanisms like semaphores, mutexes, or condition variables ensure that the producer and consumer don't conflict with each other. The consumer waits for the producer to produce data, and the producer waits for space in the buffer if it's full.

2. Mutual Exclusion:

  • Problem: Mutual exclusion ensures that only one process can access a critical section or shared resource at any given time. Without this, multiple processes might corrupt the resource or lead to inconsistent results.
  • Example: In the producer-consumer scenario, the shared buffer is a resource that both processes access. If two processes attempt to modify the buffer simultaneously (e.g., both adding or removing items), it could lead to data corruption.
  • Solution: Mutual exclusion mechanisms like mutexes or locks are used to prevent multiple processes from accessing the shared resource simultaneously. One process locks the resource, performs its task, and then releases the lock for the next process.

Example in a Producer-Consumer Setup:

  • Shared Resource: A buffer where the producer places data, and the consumer takes data.
  • Process 1 (Producer): Adds data to the buffer.
  • Process 2 (Consumer): Takes data from the buffer.

Without synchronization:

  • The consumer may attempt to consume data from an empty buffer (leading to an error or crash).
  • The producer may attempt to add data when the buffer is full (leading to buffer overflow).

With synchronization:

  • The producer waits if the buffer is full (using a semaphore or condition variable).
  • The consumer waits if the buffer is empty.
  • Both processes access the buffer in an orderly manner, avoiding conflicts.

Visualization of the Producer-Consumer Interaction:

scss
Producer Buffer Consumer | | | v v v [Add item to buffer] --> [Buffer] <-- [Remove item from buffer] (wait if full) (wait if empty)

3.Sequential Consistency in Memory Models

Sequential Consistency is a key concept in memory models that ensures all processes in a system operate in a way that results in a consistent view of memory. Specifically, it guarantees that the execution of operations appears as though they were executed in a sequential order, while still preserving the order of operations for each individual processor.

Definition:

As stated by Leslie Lamport, a system is sequentially consistent if the result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each processor are executed in the order specified by the program. This means:

  • All processes see the same order of operations.
  • Each individual process’s operations occur in the order written in its program.

Example: Sequential Consistency with Tasks (T1, T2) and Shared Variables (X, Y)

Given:

  • Two concurrent tasks: T1 and T2.
  • Two shared variables: X and Y with initial values X = 0 and Y = 10.

T1 (Task 1):

  • Store 1 to X (i.e., X = 1).
  • Store 11 to Y (i.e., Y = 11).

T2 (Task 2):

  • Load Y into register R1.
  • Store R1 to Y'.
  • Load X into register R2.
  • Store R2 to X'.

The operations and their effects:

  • T1 performs Store 1 to X (setting X = 1) and Store 11 to Y (setting Y = 11).
  • T2 loads Y into R1, stores R1 back to Y', loads X into R2, and stores R2 back to X'.

The possible values of X' and Y':

Given that sequential consistency requires operations to be executed in some sequential order while respecting the order of operations within each task, we can deduce the following possible results for X' and Y':

The legitimate answers for (X', Y') are:

  • (1, 11): This is a consistent result because T1 stores X = 1 and Y = 11 before T2 accesses these values.
  • (0, 10): This could happen if T2 accesses the initial values of X = 0 and Y = 10 before T1 performs its stores.
  • (1, 10): This is possible if T1 stores X = 1 but T2 accesses Y = 10 before T1 stores Y = 11.
  • (0, 11): This is not possible under sequential consistency because if Y = 11, then X cannot remain 0. This would violate the constraint that memory operations from each task must appear in order.

Thus, the legitimate answers for (X', Y') are (1, 11), (0, 10), and (1, 10).

Additional Sequential Consistency Requirements:

In the example above, sequential consistency imposes additional memory ordering constraints beyond the basic uniprocessor program dependencies:

  • In a uniprocessor system, each processor's operations are naturally ordered. However, in a system with multiple processors, these operations can be interleaved in any order. Sequential consistency ensures that while the operations can be interleaved, the final outcome must respect the logical ordering of operations within each processor’s task.

For the example:

  • T1 stores X = 1 and Y = 11. In sequential consistency, the operations must respect this order, meaning that T2 cannot load X or Y before T1 performs its stores.

Systems with Caches or Out-of-Order Execution:

A system with caches or out-of-order execution capability can still provide a sequentially consistent view of memory, but enforcing this view may require additional synchronization mechanisms. In modern systems, caches and out-of-order execution can result in operations being executed out of order, which can violate the expected sequential order of memory accesses. To maintain sequential consistency in such systems, hardware or software mechanisms (like memory barriers or fences) are used to enforce the correct order of memory operations.

In a system with caches, the cache coherence protocol ensures that each processor's local cache is consistent with the main memory. However, maintaining sequential consistency might require additional checks to prevent out-of-order operations from being observed by other processors in an inconsistent manner.

4.Locks or Semaphores

E. W. Dijkstra, 1965

A semaphore is a non-negative integer, with the following operations:

  • P(s): If s > 0, decrement s by 1, otherwise wait (also called "try to reduce", or proberen te verlagen).
  • V(s): Increment s by 1 and wake up one of the waiting processes (also called "increase", or verhogen).

Atomicity:

  • P’s and V’s must be executed atomically, i.e., without interruptions or interleaved accesses to s by other processors.

Initial Value of s:

  • The initial value of s determines the maximum number of processes in the critical section.

Example of Process i:

c

P(s) <critical section> V(s)

Implementation of Semaphores

  • Semaphores (for mutual exclusion) can be implemented using ordinary Load and Store instructions in the Sequential Consistency memory model.
  • However, protocols for mutual exclusion are difficult to design, so a simpler solution is to use atomic read-modify-write instructions.

Examples of Atomic Instructions:

  • Test&Set (m), R:

    c

    R ← M[m]; if R == 0 then M[m] ← 1;
  • Swap (m), R:

    c

    Rt ← M[m]; M[m] ← R; R ← Rt;
  • Fetch&Add (m), RV, R:

    c

    R ← M[m]; M[m] ← R + RV;

Where m is a memory location, and R is a register.

Critical Section Example Using Test&Set Instruction

  • P (Critical Section Entry):

    c

    Test&Set (mutex), Rtemp; if (Rtemp != 0) goto P; Load Rhead, (head); spin: Load Rtail, (tail); if Rhead == Rtail goto spin; Load R, (Rhead); Rhead = Rhead + 1; Store Rhead, (head);
  • V (Critical Section Exit):

    c
    Store 0, (mutex); process(R);

Multiple Consumers Example Using Test & Set Instruction

  • Other atomic read-modify-write instructions (such as Swap, Fetch & Add, etc.) can also implement P’s and V’s.

Non blocking Synchronization

  • Compare & Swap(m), Rt, Rs:

    lua
    if (Rt == M[m]) then M[m] = Rs; Rs = Rt; status ← success; else status ← fail;
  • try:

    bash
    Load Rhead, (head)
  • spin:

    scss
    Load R tail, (tail) if Rhead == Rtail goto spin Load R, (Rhead) Rnewhead = Rhead + 1 Compare & Swap(head), Rhead, Rnewhead if (status == fail) goto try process(R)

Status is an implicit argument.


Load-link & Store-conditional (aka Load-reserve, Load-Locked)

  • Special register(s) to hold reservation flag and address, and the outcome of store-conditional:

    try:

    bash
    Load-link Rhead, (head)

    spin:

    scss
    Load Rtail, (tail) if Rhead == Rtail goto spin Load R, (Rhead) Rhead = Rhead + 1 Store-conditional Rhead, (head) if (status == fail) goto try process(R)
  • Load-link R, (m):

    css
    <flag, adr> ← <1, m>; R ← M[m];
  • Store-conditional (m), R:

    vbnet
    if <flag, adr> == <1, m> then cancel other procs' reservation on m; M[m] ← R; status ← succeed; else status ← fail;

Performance of Locks

  • Blocking atomic read-modify-write instructions:

    • e.g., Test & Set, Fetch & Add, Swap
  • Non-blocking atomic read-modify-write instructions:

    • e.g., Compare & Swap, Load-link/Store-conditional
  • Protocols based on ordinary Loads and Stores

  • Performance depends on several interacting factors:

    • Degree of contention
    • Caches
    • Out-of-order execution of Loads and Stores

Issues in Implementing Sequential Consistency

  • Implementation of SC is complicated by two issues:
    • Out-of-order execution capability:
      • Load(a); Load(b) yes
      • Load(a); Store(b) yes if a ≠ b
      • Store(a); Load(b) yes if a ≠ b
      • Store(a); Store(b) yes if a ≠ b
  • Caches:
    • Caches can prevent the effect of a store from being seen by other processors

Memory Fences

  • Instructions to sequentialize memory accesses.

  • Processors with relaxed or weak memory models permit Loads and Stores to different addresses to be reordered, removing some/all extra dependencies imposed by SC.

    Examples of relaxed memory models:

    • Total Store Order: LL, LS, SS, enforce SL with fence
    • Partial Store Order: LL, LS, enforce SL, SS with fences
    • Weak Ordering: enforce LL, LS, SL, SS with fences
  • Memory fences are expensive operations—mem instructions wait for all relevant instructions in-flight to complete (including stores to retire—need store acks). However, the cost of serialization only when it is required!


Using Memory Fences

  • Producer posting Item x:

    scss

    Load Rtail, (tail) Store x, (Rtail) MFenceSS Rtail = Rtail + 1 Store Rtail, (tail)
  • Consumer:

    bash

    Load Rhead, (head) spin: Load Rtail, (tail) if Rhead == Rtail goto spin MFenceLL Load R, (Rhead) Rhead = Rhead + 1 Store Rhead, (head) process(R)
  • Producer Consumer:

    • Tail and head pointers ensure synchronization in producer-consumer problems.

Mutual Exclusion Using Load/Store

  • A protocol based on two shared variables c1 and c2.
    • Initially, both c1 and c2 are 0 (not busy).
    • What is wrong?
      • Process 1:

        makefile

        c1 = 1; L: if c2 == 1 then go to L <critical section> c1 = 0;
      • Process 2:

        makefile

        c2 = 1; L: if c1 == 1 then go to L <critical section> c2 = 0;
      • Result: Deadlock


Mutual Exclusion: Second Attempt

  • To avoid deadlock, let a process give up the reservation (i.e., Process 1 sets c1 to 0) while waiting.

    • Deadlock is not possible but with a low probability, a livelock may occur.
    • An unlucky process may never get to enter the critical section → starvation.
  • Process 1:

    css

    L: c1 = 1; if c2 == 1 then { c1 = 0; go to L } <critical section> c1 = 0
  • Process 2:

    css

    L: c2 = 1; if c1 == 1 then { c2 = 0; go to L } <critical section> c2 = 0

A Protocol for Mutual Exclusion (T. Dekker, 1966)

  • A protocol based on 3 shared variables c1, c2, and turn.

    • Initially, both c1 and c2 are 0 (not busy).
    • turn == i ensures that only process i can wait.
    • Variables c1 and c2 ensure mutual exclusion.
  • Process 1:

    makefile

    c1 = 1; turn = 1; L: if c2 == 1 && turn == 1 then go to L <critical section> c1 = 0;
  • Process 2:

    makefile

    c2 = 1; turn = 2; L: if c1 == 1 && turn == 2 then go to L <critical section> c2 = 0;

N-process Mutual Exclusion (Lamport’s Bakery Algorithm)

  • Process i:

    css

    choosing[i] = 1; num[i] = max(num[0], …, num[N-1]) + 1; choosing[i] = 0; for (j = 0; j < N; j++) { while (choosing[j]); while (num[j] && ((num[j] < num[i]) || (num[j] == num[i] && j < i))); } num[i] = 0;
  • Initially, num[j] = 0, for all j.


Symmetric Multiprocessors

  • Memory and I/O:
    • Symmetric multiprocessors have equally distant memory from all processors and I/O that can be managed by any processor.
    • Includes architecture such as memory controllers, buses, and processors.

Memory Coherence in SMPs

  • Write-back: Memory and cache-2 have stale values.

  • Write-through: Cache-2 has a stale value.

  • Does this matter?

    • CPU-1 updates A to 200. Write-back ensures memory is up-to-date, whereas write-through ensures cache is updated.

Write-back Caches & SC

  • Program Execution:
    • T1:
      vbnet

      LD Y, R1 ST R1, Y' LD X, R2 ST R2, X'
  • Program T2:

    ST 1, X ST 11, Y

Write-through Caches & SC

  • Memory and Cache Inconsistencies:

    • Write-through caches do not preserve sequential consistency either.
  • Execution sequence:

    • T1 executed first, then T2 with write-through cache updates.

Cache Coherence vs. Memory Consistency

  • Cache coherence: Ensures that writes by one processor are visible to others for the same memory address.

  • Memory consistency model: Specifies when a write by one processor can be seen by another processor.

  • Combination of protocols:

    • Cache coherence protocols ensure that sequential consistency is achieved.

Parallel I/O

  • DMA (Direct Memory Access):
    • DMA allows I/O devices to autonomously transfer data to/from memory without CPU intervention.

Problems with Parallel I/O

  • Memory and Cache Issues:
    • When memory is dirty or stale, problems arise between memory and cache states.
    • The cache may hold stale data not visible to memory writes.

Snoopy Cache

  • Cache Snooping Mechanism:
    • A technique used to maintain memory coherence by having caches "snoop" on the memory bus to check if updates are necessary.

Snoopy Cache & Scalability

  • As the number of processors increases, managing cache coherence through snooping becomes more difficult due to bandwidth limitations and the complexities involved in monitoring bus traffic.

False Sharing

  • Definition: False sharing occurs when processors modify variables in the same cache line, causing unnecessary cache invalidations and performance degradation.

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.