Replication Mechanism

This document provides a deep dive into HarmonyLite's replication process, explaining how changes propagate between nodes, how conflicts are resolved, and how to optimize the system for different workloads.

Core Principles

HarmonyLite implements a leaderless, eventually consistent replication system with the following design principles:

Any node can accept writes without coordination
Changes propagate asynchronously to all nodes
Last-writer-wins for conflict resolution
Eventual consistency rather than strong consistency
Parallel processing through sharding

This approach prioritizes availability and partition tolerance over immediate consistency, making HarmonyLite an "AP" system in terms of the CAP theorem.

Change Data Capture (CDC)

Trigger Mechanism

HarmonyLite captures database changes using SQLite triggers:

The system creates three triggers for each monitored table:

INSERT Trigger: Captures new row data
UPDATE Trigger: Captures modified row data
DELETE Trigger: Captures deleted row IDs

Here's a simplified example of an INSERT trigger:

CREATE TRIGGER __harmonylite__users_change_log_on_insert
AFTER INSERT ON users
BEGIN
    INSERT INTO __harmonylite__users_change_log(
        val_id, val_name, val_email, 
        type, created_at, state
    ) VALUES(
        NEW.id, NEW.name, NEW.email,
        'insert',
        CAST((strftime('%s','now') || substr(strftime('%f','now'),4)) AS INT),
        0 -- Pending state
    );

    INSERT INTO __harmonylite___global_change_log (change_table_id, table_name)
    VALUES (
        last_insert_rowid(),
        'users'
    );
END;

Change Log Tables

Two types of tables track changes:

Per-Table Change Logs: Store specific changes to each table
- Named __harmonylite__<table_name>_change_log
- Contain columns for all original table fields (prefixed with val_)
- Include metadata like operation type, timestamp, and state
Global Change Log: Tracks the sequence of all changes
- Named __harmonylite___global_change_log
- References the specific change log entries
- Provides a chronological record of all operations

Schema Example

Change Detection and Publishing

Polling Mechanism

HarmonyLite periodically checks for changes using a polling mechanism:

File Monitoring: Watches the SQLite database and WAL files for changes
Change Detection: When changes are detected, queries the change log tables
State Filtering: Retrieves only changes in the "pending" state (state = 0)

Hash-Based Sharding

For efficient parallel processing, changes are distributed across multiple streams based on a hash:

Hash Calculation: Computed from table name and primary key values
Shard Selection: The hash determines which stream receives the change
Message Distribution: Changes to the same row always go to the same stream

This ensures that changes to the same row are processed in order, while allowing changes to different rows to be processed in parallel.

Change Serialization

Changes are serialized for transmission using CBOR (Concise Binary Object Representation):

{
  "FromNodeId": 1,
  "Payload": {
    "Id": 123,
    "Type": "insert",
    "TableName": "users",
    "Row": {
      "id": 456,
      "name": "John Doe",
      "email": "john@example.com"
    }
  }
}

Optional compression using zstd can be enabled to reduce bandwidth usage.

NATS JetStream Integration

HarmonyLite uses NATS JetStream for reliable message delivery:

Stream Creation: Creates persistent streams for each shard
Publication: Publishes changes to the appropriate stream
Acknowledgment: Waits for confirmation from NATS
State Update: Marks changes as published in the database (state = 1)

Change Application

Message Flow

Here's the complete message flow from one node to another:

Processing on the Receiving Node

When a node receives a change message:

Origin Check: Verifies the change didn't originate from this node (prevents loops)
Duplicate Check: Ensures the change hasn't been applied already
Table Existence: Confirms the target table exists
Operation Parsing: Determines the operation type (INSERT, UPDATE, DELETE)
SQL Generation: Constructs appropriate SQL statements
Statement Execution: Applies the change to the local database
Sequence Tracking: Updates the sequence map for recovery purposes

SQL Statement Generation

HarmonyLite generates SQL statements based on the operation type:

INSERT: INSERT INTO table (col1, col2) VALUES (val1, val2)
UPDATE: UPDATE table SET col1 = val1, col2 = val2 WHERE id = ?
DELETE: DELETE FROM table WHERE id = ?

These statements are executed using SQLite's native APIs.

Conflict Resolution

Last-Writer-Wins Strategy

HarmonyLite uses a timestamp-based last-writer-wins (LWW) strategy for conflict resolution:

Timestamp Comparison: Changes include microsecond-precision timestamps
Latest Wins: When two nodes modify the same row, the later change prevails
Node ID Tiebreaker: If timestamps are identical, higher node ID wins

This approach ensures eventual consistency but may result in lost updates in high-conflict scenarios.

Conflict Scenarios

Consider this example scenario:

Node 1 updates a user's email to "john@example.com" at timestamp 1000
Node 2 updates the same user's email to "john@company.com" at timestamp 1050
Both changes replicate to all nodes
The final state will be "john@company.com" (from Node 2) because it has a later timestamp

Multi-Column Conflicts

If different nodes update different columns of the same row:

Node 1 updates a user's email at timestamp 1000
Node 2 updates the same user's phone at timestamp 1050
Column-level merging does not occur
The entire row from Node 2 replaces the row from Node 1
The email update from Node 1 is lost

Replication Configuration

Sharding Configuration

[replication_log]
# Number of shards for parallel processing
shards = 4

# Maximum entries per stream before cleanup
max_entries = 1024

# Number of stream replicas for fault tolerance
replicas = 3

# Enable compression for network efficiency
compress = true

Increasing shards improves parallel processing but requires more NATS resources.

Write-Only and Read-Only Nodes

HarmonyLite supports specialized node configurations:

# Write-only node (doesn't apply incoming changes)
publish = true
replicate = false

# Read-only node (doesn't publish changes)
publish = false
replicate = true

These settings enable custom topologies like hub-and-spoke or primary-replica.

Performance Considerations

Optimizing Replication Performance

To improve replication performance:

Increase Shards for parallel processing:

[replication_log]
shards = 8  # More shards for higher throughput

Enable Compression for bandwidth efficiency:
```
[replication_log]
compress = true
```

Tune Cleanup Interval for reduced overhead:

# Time in milliseconds between change log cleanup runs
cleanup_interval = 60000  # 1 minute

Transaction Batching

When making multiple changes, use transactions to reduce overhead:

BEGIN TRANSACTION;
INSERT INTO users (name, email) VALUES ('John', 'john@example.com');
INSERT INTO users (name, email) VALUES ('Jane', 'jane@example.com');
COMMIT;

This reduces trigger overhead and improves replication efficiency.

Network Considerations

For optimal performance:

Low-Latency Connections: Keep network latency below 50ms when possible
Sufficient Bandwidth: Ensure adequate bandwidth for your change volume
Reliable Network: Use retry mechanisms for unreliable connections
Regional Clustering: Group nodes geographically for better performance

Replication Guarantees and Limitations

What HarmonyLite Guarantees

HarmonyLite provides these guarantees:

Eventual Consistency: All nodes eventually reach the same state
Change Preservation: Changes are not lost if a node is temporarily offline
Order Preservation: Changes to the same row are applied in order
Fault Tolerance: The system continues to function despite node failures

Limitations

Important limitations to be aware of:

No Strong Consistency: There's a window where nodes have different states
No Cross-Row Transactions: Transactions spanning multiple tables may not be atomic across nodes
Last-Writer-Wins only: No custom conflict resolution strategies
No Quorum Enforcement: Updates proceed without waiting for acknowledgment from other nodes

Advanced Replication Patterns

Multi-Region Deployment

For global deployments, configure NATS gateways between regions:

This setup enables efficient cross-region replication with local write capability.

Hub-and-Spoke Pattern

For edge computing scenarios:

Read Scaling Pattern

For read-heavy workloads:

Conclusion

HarmonyLite's replication mechanism provides a flexible, efficient way to maintain database consistency across distributed nodes. By understanding these internals, you can better optimize, monitor, and troubleshoot your deployment.

For more information on related topics, see:

Snapshots - How database state is preserved and restored
NATS Configuration - Details on messaging configuration
Production Deployment - Best practices for production

Core Principles​

Change Data Capture (CDC)​

Trigger Mechanism​

Change Log Tables​

Schema Example​

Change Detection and Publishing​

Polling Mechanism​

Hash-Based Sharding​

Change Serialization​

NATS JetStream Integration​

Change Application​

Message Flow​

Processing on the Receiving Node​

SQL Statement Generation​

Conflict Resolution​

Last-Writer-Wins Strategy​

Conflict Scenarios​

Multi-Column Conflicts​

Replication Configuration​

Sharding Configuration​

Write-Only and Read-Only Nodes​

Performance Considerations​

Optimizing Replication Performance​

Transaction Batching​

Network Considerations​

Replication Guarantees and Limitations​

What HarmonyLite Guarantees​

Limitations​

Advanced Replication Patterns​

Multi-Region Deployment​

Hub-and-Spoke Pattern​

Read Scaling Pattern​

Conclusion​