1. Symmetric Multiprocessing (SMP)
- Definition: SMP i.e. symmetric multiprocessing, refers to the computer architecture where multiple identical processors are interconnected to a single shared main memory, with full accessibility to all the I/O devices, unlike asymmetric MP. In other words, all the processors have common shared(common) memory and same data path or I/O bus as shown in the figure(Credit: Geeks for geek).
Characteristics of SMP
- Identical: All the processors are treated equally i.e. all are identical.
- Communication: Shared memory is the mode of communication among processors.
- Complexity: Are complex in design, as all units share same memory and data bus.
- Expensive: They are costlier in nature.
- Unlike asymmetric where a task is done only by Master processor, here tasks of the operating system are handled individually by processors.
Applications
This concept finds its application in parallel processing, where time-sharing systems(TSS) have assigned tasks to different processors running in parallel to each other, also in TSS that uses multithreading i.e. multiple threads running simultaneously.
Advantages
- Throughput: Since tasks can be run by all the processors unlike in asymmetric, hence increased degree of throughput(processes executed in unit time).
- Reliability: Failing a processor doesn’t fail whole system, as all are equally capable processors, though throughput do fail a little.
Disadvantages
- Complex design: Since all the processors are treated equally by OS, so designing and management of such OS become difficult.
- Costlier: As all the processors share the common main memory, on account of which size of memory required is larger implying more expensive.
2. Components of SMP Systems
- Processors:
- Multiple processors or cores, each capable of executing tasks simultaneously.
- Equal access to the shared memory and I/O systems.
- Memory:
- Shared physical memory accessed by all processors.
- Memory is typically connected via a memory bus or crossbar switch.
- Inter-Processor Communication:
- Shared memory or message-passing mechanisms for communication.
- Cache Coherency Protocols like MESI ensure consistency across caches.
- I/O System:
- Shared I/O buses or controllers allowing processors to access I/O devices independently.
Use Cases for SMP
- Database Servers: Handling concurrent queries or transactions.
- High-Performance Computing (HPC): Scientific simulations and complex calculations.
- Web Servers: Parallel processing of requests for scalability and high throughput.
SMP System Challenges
- Managing Cache Coherency: Ensuring all processors see a consistent view of memory.
- Memory Bandwidth Limitations: Ensuring memory access does not become a bottleneck.
- Scaling Beyond a Certain Number of Processors: Performance gain diminishes as the number of processors increases.
2.Producer-Consumer Problem:
- Problem: A producer generates data and puts it into a shared buffer, while a consumer takes data from this buffer. The challenge arises when both the producer and consumer operate concurrently, which can cause synchronization issues:
- The consumer may attempt to consume data when the producer hasn't produced any yet, resulting in underflow (attempting to consume from an empty buffer).
- The producer may attempt to add data when the consumer hasn't consumed enough, leading to overflow (adding to a full buffer).
- Solution: Synchronization mechanisms like semaphores, mutexes, or condition variables ensure that the producer and consumer don't conflict with each other. The consumer waits for the producer to produce data, and the producer waits for space in the buffer if it's full.
2. Mutual Exclusion:
- Problem: Mutual exclusion ensures that only one process can access a critical section or shared resource at any given time. Without this, multiple processes might corrupt the resource or lead to inconsistent results.
- Example: In the producer-consumer scenario, the shared buffer is a resource that both processes access. If two processes attempt to modify the buffer simultaneously (e.g., both adding or removing items), it could lead to data corruption.
- Solution: Mutual exclusion mechanisms like mutexes or locks are used to prevent multiple processes from accessing the shared resource simultaneously. One process locks the resource, performs its task, and then releases the lock for the next process.
Example in a Producer-Consumer Setup:
- Shared Resource: A buffer where the producer places data, and the consumer takes data.
- Process 1 (Producer): Adds data to the buffer.
- Process 2 (Consumer): Takes data from the buffer.
Without synchronization:
- The consumer may attempt to consume data from an empty buffer (leading to an error or crash).
- The producer may attempt to add data when the buffer is full (leading to buffer overflow).
With synchronization:
- The producer waits if the buffer is full (using a semaphore or condition variable).
- The consumer waits if the buffer is empty.
- Both processes access the buffer in an orderly manner, avoiding conflicts.
Visualization of the Producer-Consumer Interaction:
3.Sequential Consistency in Memory Models
Sequential Consistency is a key concept in memory models that ensures all processes in a system operate in a way that results in a consistent view of memory. Specifically, it guarantees that the execution of operations appears as though they were executed in a sequential order, while still preserving the order of operations for each individual processor.
Definition:
As stated by Leslie Lamport, a system is sequentially consistent if the result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each processor are executed in the order specified by the program. This means:
- All processes see the same order of operations.
- Each individual process’s operations occur in the order written in its program.
Example: Sequential Consistency with Tasks (T1, T2) and Shared Variables (X, Y)
Given:
- Two concurrent tasks: T1 and T2.
- Two shared variables:
X
andY
with initial valuesX = 0
andY = 10
.
T1 (Task 1):
- Store 1 to
X
(i.e.,X = 1
). - Store 11 to
Y
(i.e.,Y = 11
).
T2 (Task 2):
- Load
Y
into registerR1
. - Store
R1
toY'
. - Load
X
into registerR2
. - Store
R2
toX'
.
The operations and their effects:
- T1 performs
Store 1
toX
(settingX = 1
) andStore 11
toY
(settingY = 11
). - T2 loads
Y
intoR1
, storesR1
back toY'
, loadsX
intoR2
, and storesR2
back toX'
.
The possible values of X'
and Y'
:
Given that sequential consistency requires operations to be executed in some sequential order while respecting the order of operations within each task, we can deduce the following possible results for X'
and Y'
:
The legitimate answers for (X', Y')
are:
- (1, 11): This is a consistent result because T1 stores
X = 1
andY = 11
before T2 accesses these values. - (0, 10): This could happen if T2 accesses the initial values of
X = 0
andY = 10
before T1 performs its stores. - (1, 10): This is possible if T1 stores
X = 1
but T2 accessesY = 10
before T1 storesY = 11
. - (0, 11): This is not possible under sequential consistency because if
Y = 11
, thenX
cannot remain0
. This would violate the constraint that memory operations from each task must appear in order.
Thus, the legitimate answers for (X', Y') are (1, 11)
, (0, 10)
, and (1, 10)
.
Additional Sequential Consistency Requirements:
In the example above, sequential consistency imposes additional memory ordering constraints beyond the basic uniprocessor program dependencies:
- In a uniprocessor system, each processor's operations are naturally ordered. However, in a system with multiple processors, these operations can be interleaved in any order. Sequential consistency ensures that while the operations can be interleaved, the final outcome must respect the logical ordering of operations within each processor’s task.
For the example:
- T1 stores
X = 1
andY = 11
. In sequential consistency, the operations must respect this order, meaning that T2 cannot loadX
orY
before T1 performs its stores.
Systems with Caches or Out-of-Order Execution:
A system with caches or out-of-order execution capability can still provide a sequentially consistent view of memory, but enforcing this view may require additional synchronization mechanisms. In modern systems, caches and out-of-order execution can result in operations being executed out of order, which can violate the expected sequential order of memory accesses. To maintain sequential consistency in such systems, hardware or software mechanisms (like memory barriers or fences) are used to enforce the correct order of memory operations.
In a system with caches, the cache coherence protocol ensures that each processor's local cache is consistent with the main memory. However, maintaining sequential consistency might require additional checks to prevent out-of-order operations from being observed by other processors in an inconsistent manner.
4.Locks or Semaphores
E. W. Dijkstra, 1965
A semaphore is a non-negative integer, with the following operations:
- P(s): If
s > 0
, decrements
by 1, otherwise wait (also called "try to reduce", or proberen te verlagen). - V(s): Increment
s
by 1 and wake up one of the waiting processes (also called "increase", or verhogen).
Atomicity:
- P’s and V’s must be executed atomically, i.e., without interruptions or interleaved accesses to
s
by other processors.
Initial Value of s
:
- The initial value of
s
determines the maximum number of processes in the critical section.
Example of Process i:
Implementation of Semaphores
- Semaphores (for mutual exclusion) can be implemented using ordinary Load and Store instructions in the Sequential Consistency memory model.
- However, protocols for mutual exclusion are difficult to design, so a simpler solution is to use atomic read-modify-write instructions.
Examples of Atomic Instructions:
Test&Set (m), R:
Swap (m), R:
Fetch&Add (m), RV, R:
Where m
is a memory location, and R
is a register.
Critical Section Example Using Test&Set Instruction
P (Critical Section Entry):
V (Critical Section Exit):
Multiple Consumers Example Using Test & Set Instruction
- Other atomic read-modify-write instructions (such as Swap, Fetch & Add, etc.) can also implement P’s and V’s.
Non blocking Synchronization
Compare & Swap(m), Rt, Rs:
try:
spin:
Status is an implicit argument.
Load-link & Store-conditional (aka Load-reserve, Load-Locked)
Special register(s) to hold reservation flag and address, and the outcome of store-conditional:
try:
spin:
Load-link R, (m):
Store-conditional (m), R:
Performance of Locks
Blocking atomic read-modify-write instructions:
- e.g., Test & Set, Fetch & Add, Swap
Non-blocking atomic read-modify-write instructions:
- e.g., Compare & Swap, Load-link/Store-conditional
Protocols based on ordinary Loads and Stores
Performance depends on several interacting factors:
- Degree of contention
- Caches
- Out-of-order execution of Loads and Stores
Issues in Implementing Sequential Consistency
- Implementation of SC is complicated by two issues:
- Out-of-order execution capability:
- Load(a); Load(b) yes
- Load(a); Store(b) yes if a ≠ b
- Store(a); Load(b) yes if a ≠ b
- Store(a); Store(b) yes if a ≠ b
- Out-of-order execution capability:
- Caches:
- Caches can prevent the effect of a store from being seen by other processors
Memory Fences
Instructions to sequentialize memory accesses.
Processors with relaxed or weak memory models permit Loads and Stores to different addresses to be reordered, removing some/all extra dependencies imposed by SC.
Examples of relaxed memory models:
- Total Store Order: LL, LS, SS, enforce SL with fence
- Partial Store Order: LL, LS, enforce SL, SS with fences
- Weak Ordering: enforce LL, LS, SL, SS with fences
Memory fences are expensive operations—mem instructions wait for all relevant instructions in-flight to complete (including stores to retire—need store acks). However, the cost of serialization only when it is required!
Using Memory Fences
Producer posting Item x:
Consumer:
Producer Consumer:
- Tail and head pointers ensure synchronization in producer-consumer problems.
Mutual Exclusion Using Load/Store
- A protocol based on two shared variables
c1
andc2
.- Initially, both
c1
andc2
are 0 (not busy). - What is wrong?
Process 1:
Process 2:
Result: Deadlock
- Initially, both
Mutual Exclusion: Second Attempt
To avoid deadlock, let a process give up the reservation (i.e., Process 1 sets
c1
to 0) while waiting.- Deadlock is not possible but with a low probability, a livelock may occur.
- An unlucky process may never get to enter the critical section → starvation.
Process 1:
Process 2:
A Protocol for Mutual Exclusion (T. Dekker, 1966)
A protocol based on 3 shared variables
c1
,c2
, andturn
.- Initially, both
c1
andc2
are 0 (not busy). turn == i
ensures that only processi
can wait.- Variables
c1
andc2
ensure mutual exclusion.
- Initially, both
Process 1:
Process 2:
N-process Mutual Exclusion (Lamport’s Bakery Algorithm)
Process i:
Initially,
num[j] = 0
, for allj
.
Symmetric Multiprocessors
- Memory and I/O:
- Symmetric multiprocessors have equally distant memory from all processors and I/O that can be managed by any processor.
- Includes architecture such as memory controllers, buses, and processors.
Memory Coherence in SMPs
Write-back: Memory and cache-2 have stale values.
Write-through: Cache-2 has a stale value.
Does this matter?
- CPU-1 updates A to 200. Write-back ensures memory is up-to-date, whereas write-through ensures cache is updated.
Write-back Caches & SC
- Program Execution:
- T1:
- T1:
- Program T2:
Write-through Caches & SC
Memory and Cache Inconsistencies:
- Write-through caches do not preserve sequential consistency either.
Execution sequence:
- T1 executed first, then T2 with write-through cache updates.
Cache Coherence vs. Memory Consistency
Cache coherence: Ensures that writes by one processor are visible to others for the same memory address.
Memory consistency model: Specifies when a write by one processor can be seen by another processor.
Combination of protocols:
- Cache coherence protocols ensure that sequential consistency is achieved.
Parallel I/O
- DMA (Direct Memory Access):
- DMA allows I/O devices to autonomously transfer data to/from memory without CPU intervention.
Problems with Parallel I/O
- Memory and Cache Issues:
- When memory is dirty or stale, problems arise between memory and cache states.
- The cache may hold stale data not visible to memory writes.
Snoopy Cache
- Cache Snooping Mechanism:
- A technique used to maintain memory coherence by having caches "snoop" on the memory bus to check if updates are necessary.
Snoopy Cache & Scalability
- As the number of processors increases, managing cache coherence through snooping becomes more difficult due to bandwidth limitations and the complexities involved in monitoring bus traffic.
False Sharing
- Definition: False sharing occurs when processors modify variables in the same cache line, causing unnecessary cache invalidations and performance degradation.