V3nom's
  • Welcome
  • Getting Started
    • CEH v13
    • Basics of Networking
      • Network Models
        • Application Layer in OSI ->
        • Presentation Layer in OSI ->
          • Comprehensive list of character encoding formats
        • Session Layer in OSI ->
        • Transport Layer in OSI ->
        • Network Layer in OSI ->
        • Data Link Layer in OSI ->
        • Physical Layer ->
    • Arch Linux Installation Guide
    • How to add VBoxLinuxAdditions.run in Debian Based Linux Distros
    • C# Programming Language
  • Research Papers
    • Word Embedding for Anomaly Detection
    • Build your own Redis
    • Blockchain Technology
    • Interactive blocks
    • OpenAPI
    • Integrations
  • Risk Analysis & Mitigation Notes
    • Risk Analysis & Mitigation
      • Unit 1: An Introduction to Risk Management
      • Unit 2: The Threat Assessment Process
      • Unit 3: Vulnerability Issues
      • Unit 4 ( Risk Analysis & Mitigation )
      • Unit 5 ( Risk Analysis & Mitigation )
  • Ethical Hacking
    • Ethical Hacking Syllabus
      • Unit I: Introduction ( English )
      • Unit I: Introduction ( Hinglish )
      • Unit II: The Business Perspective ( English )
      • Unit II: The Business Perspective ( Hinglish )
      • Unit III: Preparing for a Hack ( English )
      • Unit III: Preparing for a Hack ( Hinglish )
      • Unit IV: Enumeration ( English )
      • Unit IV: Enumeration ( Hinglish )
      • Unit V: Deliverables ( English )
      • Unit V: Deliverables ( Hinglish )
  • .NET Framework Notes
    • .NET Framework Syllabus
      • Unit - I ( Hinglish Version )
      • Unit - I ( English - Version for exams )
      • Unit - II ( Hinglish Version - For Understanding )
      • Unit - II (English Version - for papers)
      • Unit - III ( Hinghlish Version )
      • Unit - III ( English - Version )
      • Unit - IV ( Hinglish Version )
      • Unit - IV ( English Version )
      • Unit - V ( Hinglish Version )
      • Unit - V ( English Version )
  • IOT
    • unit 1
    • unit 2
    • unit 3
    • unit 4
    • unit 5
  • AD-Hoc and Wireless Networks
    • Unit 1 ( Hinglish )
    • unit 2 Hinglish
    • All assignments answers with questions
    • Mind Maps for All Questions
    • Page
  • Distributed Systems
    • Unit 1
    • Unit 2
    • Unit 3
    • Unit 4
    • Unit 5
  • Group 1
    • 1’s and 2’s Complement
    • Direct Memory Access
    • Register Transfer Level
    • Interrupt-Based Input/Output (I/O)
    • Memory and CPU Design
    • Instruction Cycle
    • Addressing Modes
    • Pipelining
    • Three Types of Hazards
    • All Types of Differences Tables
    • Parallel Processing
    • Addition/Subtraction Conversion
    • Data Representation
    • Page 1
Powered by GitBook
On this page
  1. Distributed Systems

Unit 5

Unit 5: Fault Tolerance and Consensus - Complete Notes


1. Introduction to Fault Tolerance

Definition:

  • Fault Tolerance is the ability of a system to continue functioning correctly even in the presence of faults (e.g., hardware failures, software bugs, network issues).

Why It’s Important:

  • Ensures system reliability and availability.

  • Example: A banking system must continue working even if a server fails.

Key Concepts:

  1. Fault:

    • A defect or failure in a system component.

    • Example: A crashed server or a broken network link.

  2. Error:

    • A deviation from correct behavior due to a fault.

    • Example: Incorrect data due to a faulty sensor.

  3. Failure:

    • The system’s inability to perform its intended function.

    • Example: A website going offline due to a server crash.

Techniques for Fault Tolerance:

  1. Redundancy:

    • Adding extra components to ensure system functionality.

    • Example: Replicating data across multiple servers.

  2. Checkpointing:

    • Periodically saving the system state to recover from failures.

    • Example: Database systems saving transaction logs.

  3. Recovery:

    • Restoring the system to a correct state after a failure.

    • Example: Rebooting a server after a crash.


Mind Map for Introduction to Fault Tolerance:

Fault Tolerance  
├── Definition  
│   └── System functioning despite faults  
├── Key Concepts  
│   ├── Fault (Example: Crashed server)  
│   ├── Error (Example: Incorrect data)  
│   └── Failure (Example: Website offline)  
└── Techniques  
    ├── Redundancy (Example: Data replication)  
    ├── Checkpointing (Example: Transaction logs)  
    └── Recovery (Example: Server reboot)

2. Distributed Commit Protocols

Definition:

  • Distributed Commit Protocols ensure that a distributed transaction is either fully committed or fully rolled back across all nodes.

Why It’s Important:

  • Ensures data consistency in distributed systems.

  • Example: A distributed database must ensure that all nodes agree on whether to commit or abort a transaction.

Types of Commit Protocols:

  1. Two-Phase Commit (2PC):

    • A coordinator ensures all participants agree to commit or abort.

    • Phases:

      1. Prepare Phase: Coordinator asks participants if they are ready to commit.

      2. Commit Phase: If all participants agree, the coordinator sends a commit request.

    • Example: Distributed databases like MySQL Cluster.

  2. Three-Phase Commit (3PC):

    • Adds a pre-commit phase to reduce blocking in case of failures.

    • Phases:

      1. Prepare Phase: Same as 2PC.

      2. Pre-Commit Phase: Coordinator ensures all participants are ready to commit.

      3. Commit Phase: Coordinator sends the final commit request.

    • Example: High-availability systems like Google Spanner.

Comparison:

Feature

Two-Phase Commit (2PC)

Three-Phase Commit (3PC)

Phases

2 (Prepare, Commit)

3 (Prepare, Pre-Commit, Commit)

Blocking

Can block if coordinator fails

Non-blocking due to pre-commit phase

Use Case

Simple distributed systems

High-availability systems


Mind Map for Distributed Commit Protocols:

Distributed Commit Protocols  
├── Two-Phase Commit (2PC)  
│   ├── Phases: Prepare, Commit  
│   └── Example: MySQL Cluster  
├── Three-Phase Commit (3PC)  
│   ├── Phases: Prepare, Pre-Commit, Commit  
│   └── Example: Google Spanner  
└── Comparison  
    ├── Phases, Blocking, Use Case

3. Byzantine Fault Tolerance

Definition:

  • Byzantine Fault Tolerance (BFT) is the ability of a system to tolerate arbitrary (malicious) faults, where nodes may behave unpredictably or maliciously.

Why It’s Important:

  • Ensures system reliability even in the presence of malicious nodes.

  • Example: Blockchain systems must tolerate malicious nodes trying to disrupt the network.

How It Works:

  1. Consensus Algorithms:

    • Nodes agree on a single value despite malicious behavior.

    • Example: Practical Byzantine Fault Tolerance (PBFT).

  2. Redundancy:

    • Multiple nodes verify and agree on the correctness of data.

    • Example: Replicating transactions across multiple nodes in a blockchain.

Real-World Example:

  • Blockchain Networks: Bitcoin and Ethereum use BFT to ensure consensus among nodes, even if some nodes are malicious.


Mind Map for Byzantine Fault Tolerance:

Byzantine Fault Tolerance (BFT)  
├── Definition  
│   └── Tolerates arbitrary/malicious faults  
├── How It Works  
│   ├── Consensus Algorithms (Example: PBFT)  
│   └── Redundancy (Example: Blockchain)  
└── Real-World Example  
    └── Blockchain Networks (Bitcoin, Ethereum)

4. Impossibilities in Fault Tolerance

Definition:

  • Impossibilities in Fault Tolerance refer to theoretical limits on what can be achieved in fault-tolerant systems.

Key Impossibility Results:

  1. FLP Impossibility:

    • It is impossible to achieve consensus in an asynchronous distributed system with even a single faulty process.

    • Implication: No algorithm can guarantee consensus in all cases.

    • Example: In a network with delays, consensus may not always be reached.

  2. CAP Theorem:

    • A distributed system cannot simultaneously provide Consistency, Availability, and Partition Tolerance.

    • Implication: Systems must trade off between these properties.

    • Example:

      • CP Systems: Prioritize Consistency and Partition Tolerance (e.g., distributed databases).

      • AP Systems: Prioritize Availability and Partition Tolerance (e.g., NoSQL databases).

Real-World Example:

  • Distributed Databases: MongoDB (AP system) prioritizes availability, while PostgreSQL (CP system) prioritizes consistency.


Mind Map for Impossibilities in Fault Tolerance:

Impossibilities in Fault Tolerance  
├── FLP Impossibility  
│   ├── No consensus in asynchronous systems with faults  
│   └── Example: Network delays  
├── CAP Theorem  
│   ├── Trade-off between Consistency, Availability, Partition Tolerance  
│   └── Example: MongoDB (AP), PostgreSQL (CP)  
└── Real-World Example  
    └── Distributed Databases

Summary of Unit 5: Fault Tolerance and Consensus

  • Fault Tolerance ensures system reliability despite faults, using techniques like redundancy and recovery.

  • Distributed Commit Protocols (2PC, 3PC) ensure atomicity in distributed transactions.

  • Byzantine Fault Tolerance handles malicious behavior in distributed systems.

  • Impossibility Results (FLP, CAP) define theoretical limits in fault-tolerant systems.


Final Mind Map for Unit 5:

Unit 5: Fault Tolerance and Consensus  
├── Fault Tolerance  
│   ├── Definition, Key Concepts (Fault, Error, Failure)  
│   └── Techniques (Redundancy, Checkpointing, Recovery)  
├── Distributed Commit Protocols  
│   ├── Two-Phase Commit (2PC)  
│   ├── Three-Phase Commit (3PC)  
│   └── Comparison (Phases, Blocking, Use Case)  
├── Byzantine Fault Tolerance (BFT)  
│   ├── Definition, How It Works (Consensus, Redundancy)  
│   └── Real-World Example (Blockchain Networks)  
└── Impossibilities in Fault Tolerance  
    ├── FLP Impossibility (No consensus in asynchronous systems)  
    ├── CAP Theorem (Trade-off: Consistency, Availability, Partition Tolerance)  
    └── Real-World Example (Distributed Databases)

PreviousUnit 4Next1’s and 2’s Complement

Last updated 3 months ago