Advanced Cache Invalidation Patterns with PostgreSQL LISTEN/NOTIFY

October 4, 2025

15 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Challenge: Stale Caches in Distributed Systems

In any distributed architecture, maintaining cache consistency across multiple service instances is a persistent and non-trivial problem. While Time-To-Live (TTL) caching is simple, it forces a trade-off between data freshness and database load, often resulting in users seeing stale data. A more proactive approach involves explicit cache invalidation, but this introduces its own complexities. Standard solutions often involve adding a message broker like RabbitMQ or Kafka, or using Redis Pub/Sub. These are powerful, but they also introduce another piece of infrastructure to maintain, secure, and scale—increasing operational overhead for what is sometimes a simple requirement: "Tell me when this piece of data changes."

This article details an alternative, database-centric pattern that leverages a powerful but often underutilized feature of PostgreSQL: LISTEN and NOTIFY. We will bypass introductory concepts and focus directly on a production-ready implementation that addresses the real-world challenges of this approach: connection failures, missed messages, performance overhead, and scalability.

Our goal is to build a system that:

Instantly propagates data changes from the database to multiple service instances.
Is resilient to network partitions and service restarts.
Guarantees eventual consistency even if notifications are missed.
Is mindful of performance impacts on the database's write throughput.

Core Mechanism: The Trigger-Based Notification Emitter

The foundation of this pattern is a PostgreSQL trigger that fires on data modifications (INSERT, UPDATE, DELETE) and uses pg_notify() to send a payload over a named channel. The design of this trigger and its payload is critical for a robust system.

1. Designing the Notification Payload

A well-designed payload should be informative but concise. The pg_notify payload is limited to 8000 bytes, so we cannot blindly serialize entire rows. A robust payload structure contains the essential metadata for a consumer to act upon:

* table_name: The source table of the change.

* operation: The type of SQL command (INSERT, UPDATE, DELETE).

* record_id: The primary key of the affected row. This is the most crucial piece of information for key-based cache invalidation.

* changed_at: A precise timestamp of the change, essential for reconciliation logic.

Here's the PL/pgSQL function that constructs and sends this payload. It's designed to be generic and reusable across multiple tables.

sql

-- Create the function that will be called by our triggers
CREATE OR REPLACE FUNCTION notify_record_change()
RETURNS TRIGGER AS $$
DECLARE
  channel TEXT := 'record_changes';
  notification JSON;
  record_id TEXT;
BEGIN
  -- Determine the record_id based on the operation
  IF (TG_OP = 'DELETE') THEN
    record_id := OLD.id::TEXT;
  ELSE
    record_id := NEW.id::TEXT;
  END IF;

  -- Construct the JSON payload
  notification = json_build_object(
    'table_name', TG_TABLE_NAME,
    'operation', TG_OP,
    'record_id', record_id,
    'changed_at', now()
  );

  -- Send the notification on the specified channel
  -- The payload must be a string, so we cast the JSON to TEXT
  PERFORM pg_notify(channel, notification::TEXT);

  -- Return null for DELETE, and the new row for INSERT/UPDATE
  IF (TG_OP = 'DELETE') THEN
    RETURN OLD;
  ELSE
    RETURN NEW;
  END IF;
END;
$$ LANGUAGE plpgsql;

2. Attaching the Trigger to a Target Table

Let's assume we have a products table whose changes we want to monitor. We also add a last_updated_at column, which is critical for our fault-tolerance strategy later.

sql

-- A sample table to track
CREATE TABLE products (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    price NUMERIC(10, 2) NOT NULL,
    inventory_count INT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    last_updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- A separate trigger to automatically update the last_updated_at timestamp
-- This is a best practice for reconciliation.
CREATE OR REPLACE FUNCTION set_last_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.last_updated_at = NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_update_timestamp
BEFORE UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION set_last_updated_at();

-- Now, attach the notification trigger
CREATE TRIGGER products_notify_change
AFTER INSERT OR UPDATE OR DELETE ON products
FOR EACH ROW
EXECUTE FUNCTION notify_record_change();

With this setup, any modification to the products table will now emit a JSON payload on the record_changes channel. For example, an update might send:

json

{
  "table_name": "products",
  "operation": "UPDATE",
  "record_id": "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
  "changed_at": "2023-10-27T10:00:00.123456Z"
}

The Resilient Subscriber Service in Go

Receiving notifications is only half the battle. The listener service must be robust against failures. A naive listener that exits on error is useless in production. Our Go service will feature:

* A dedicated, persistent connection for LISTEN.

* Automatic reconnection with exponential backoff.

* A reconciliation mechanism to fetch missed updates after a disconnect.

We'll use the pgx library, which provides excellent support for PostgreSQL-specific features.

1. Initial Listener Implementation

Here is the core structure of our listener. It connects, starts listening, and processes notifications in a loop.

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"os"
	"os/signal"
	"sync"
	"syscall"
	"time"

	"github.com/jackc/pgx/v5"
	"github.com/patrickmn/go-cache"
)

// NotificationPayload matches the JSON structure from our PostgreSQL function
type NotificationPayload struct {
	TableName string    `json:"table_name"`
	Operation string    `json:"operation"`
	RecordID  string    `json:"record_id"`
	ChangedAt time.Time `json:"changed_at"`
}

// A simple in-memory cache for demonstration
var localCache = cache.New(5*time.Minute, 10*time.Minute)

func main() {
	// Use a context for cancellation
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	// Database connection string
	dbURL := os.Getenv("DATABASE_URL")
	if dbURL == "" {
		log.Fatal("DATABASE_URL environment variable must be set")
	}

	// Handle graceful shutdown
	sigCh := make(chan os.Signal, 1)
	signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
	go func() {
		<-sigCh
		fmt.Println("\nShutting down...")
		cancel()
	}()

	// Start the listener with reconnection logic
	go listenForChanges(ctx, dbURL)

	// Keep the main goroutine alive
	<-ctx.Done()
	fmt.Println("Shutdown complete.")
}

func listenForChanges(ctx context.Context, dbURL string) {
	// The core of our fault tolerance strategy
	// We will implement this next.
	// For now, a simple loop.
	conn, err := pgx.Connect(ctx, dbURL)
	if err != nil {
		log.Fatalf("Unable to connect to database: %v\n", err)
	}
	defer conn.Close(ctx)

	_, err = conn.Exec(ctx, "LISTEN record_changes")
	if err != nil {
		log.Fatalf("Failed to execute LISTEN: %v\n", err)
	}

	log.Println("Listening for record changes...")

	for {
		notification, err := conn.WaitForNotification(ctx)
		if err != nil {
			// If context is cancelled, it's a graceful shutdown
			if ctx.Err() != nil {
				log.Println("Context cancelled, listener stopping.")
				return
			}
			log.Printf("Error waiting for notification: %v\n", err)
			// In a real implementation, we would attempt to reconnect here.
			time.Sleep(2 * time.Second)
			continue
		}

		processNotification(notification.Payload)
	}
}

func processNotification(payload string) {
	var p NotificationPayload
	err := json.Unmarshal([]byte(payload), &p)
	if err != nil {
		log.Printf("Error unmarshalling notification payload: %v", err)
		return
	}

	log.Printf("Received Change: Table=%s, Op=%s, ID=%s", p.TableName, p.Operation, p.RecordID)

	// Invalidate the cache
	cacheKey := fmt.Sprintf("%s:%s", p.TableName, p.RecordID)
	localCache.Delete(cacheKey)
	log.Printf("Invalidated cache key: %s", cacheKey)
}

2. Advanced Pattern: Reconnection and Reconciliation

The most significant weakness of LISTEN/NOTIFY is its lack of guaranteed delivery. If the listener's connection drops, any notifications sent during the downtime are lost forever. This makes the naive implementation unsuitable for production. We solve this with a hybrid approach: reconcile on reconnect.

Track Connection State: The listener maintains a timestamp of the last successfully processed notification or the time it last connected.

Exponential Backoff: On disconnection, it attempts to reconnect with an increasing delay to avoid hammering the database.

Reconciliation Query: Upon successful reconnection, before it starts listening again, it queries the database for all records that have changed since its last known state. This closes the gap created by the downtime.

Here is the enhanced listenForChanges function implementing this strategy:

// (Add this to the main.go file from above)

func listenForChanges(ctx context.Context, dbURL string) {
	var conn *pgx.Conn
	var err error
	var lastSuccessfulConnection time.Time

	for {
		// Check for context cancellation before attempting to connect
		if ctx.Err() != nil {
			return
		}

		// Attempt to connect
		conn, err = pgx.Connect(ctx, dbURL)
		if err != nil {
			log.Printf("Failed to connect to database: %v. Retrying in 5s...", err)
			time.Sleep(5 * time.Second) // Simple backoff, can be made exponential
			continue
		}

		log.Println("Successfully connected to the database.")

		// --- Reconciliation Step ---
		if !lastSuccessfulConnection.IsZero() {
			log.Printf("Reconciling changes since %s", lastSuccessfulConnection.Format(time.RFC3339))
			err = reconcileMissedChanges(ctx, conn, lastSuccessfulConnection)
			if err != nil {
				log.Printf("Error during reconciliation: %v", err)
				conn.Close(ctx)
				continue // Retry connection and reconciliation
			}
		}
		lastSuccessfulConnection = time.Now().UTC()

		// Start listening
		_, err = conn.Exec(ctx, "LISTEN record_changes")
		if err != nil {
			log.Printf("Failed to execute LISTEN: %v", err)
			conn.Close(ctx)
			continue
		}

		log.Println("Listening for record changes...")

		// --- Notification Loop ---
		err = notificationLoop(ctx, conn, &lastSuccessfulConnection)
		if err != nil {
			log.Printf("Listener loop exited with error: %v", err)
		}

		// Clean up the connection before retrying
		conn.Close(ctx)
		log.Println("Connection closed. Attempting to reconnect...")
	}
}

func notificationLoop(ctx context.Context, conn *pgx.Conn, lastSuccess *time.Time) error {
	for {
		notification, err := conn.WaitForNotification(ctx)
		if err != nil {
			// Context cancelled is a clean exit
			if ctx.Err() != nil {
				return nil
			}
			// Any other error means the connection is likely broken
			return fmt.Errorf("error waiting for notification: %w", err)
		}

		processNotification(notification.Payload)
		*lastSuccess = time.Now().UTC()
	}
}

func reconcileMissedChanges(ctx context.Context, conn *pgx.Conn, since time.Time) error {
	// Query for all products that were updated while we were disconnected
	rows, err := conn.Query(ctx, `
		SELECT id FROM products WHERE last_updated_at >= $1
	`, since)
	if err != nil {
		return fmt.Errorf("failed to query for reconciliation: %w", err)
	}
	defer rows.Close()

	var reconciledCount int
	for rows.Next() {
		var recordID string
		if err := rows.Scan(&recordID); err != nil {
			log.Printf("Error scanning reconciled row: %v", err)
			continue
		}

		// Invalidate cache for the missed update
		cacheKey := fmt.Sprintf("products:%s", recordID)
		localCache.Delete(cacheKey)
		reconciledCount++
	}

	log.Printf("Reconciliation complete. Invalidated %d stale cache entries.", reconciledCount)
	return nil
}

This robust loop ensures that even if the service is down for hours, upon restart it will achieve eventual consistency by invalidating all cache entries for data that changed during its absence.

Performance and Scalability Considerations

While elegant, this pattern is not without its performance implications. A senior engineer must understand these trade-offs.

1. Write Throughput Overhead

Executing a trigger on every INSERT, UPDATE, and DELETE adds overhead to write operations. The cost is composed of:

* Function Execution: The PL/pgSQL function itself is very fast, but not free.

* Payload Construction: JSON serialization adds a minor CPU cost.

* Notification Queuing: pg_notify writes to an in-memory queue within PostgreSQL. Under extreme write load (thousands of transactions per second), this queue can become a bottleneck.

Benchmark Analysis:

Using pgbench on a moderately sized instance, we can observe the impact. A simple UPDATE-heavy workload might show a 3-7% decrease in transactions per second (TPS) with the notification trigger enabled. This overhead is often acceptable in exchange for real-time invalidation, but it must be measured for your specific workload. If your application is extremely sensitive to write latency, this pattern might require careful consideration.

Monitoring:

Monitor PostgreSQL's pg_stat_activity. If you see many processes with wait_event = 'Async_Notify', it's a sign that the notification queue is under pressure and listeners cannot keep up with the rate of notifications.

2. The "Thundering Herd" Problem at Scale

Imagine 200 instances of a service all connected and listening to the record_changes channel. When a single product is updated, all 200 instances receive the notification and attempt to invalidate their local cache. This is generally fine for in-memory cache invalidation.

However, if the notification triggers a more expensive action (e.g., re-fetching data, calling another service), this becomes a classic "thundering herd" problem, putting unnecessary load on downstream systems. This is where the LISTEN/NOTIFY pattern reaches its architectural limit.

Advanced Pattern: The Notification Dispatcher Service

To solve this, you can introduce a dedicated dispatcher service. The architecture looks like this:

Single Listener Group: Only a small, highly-available group of 2-3 "Dispatcher" services connect to PostgreSQL and LISTEN for notifications.

Fan-out: When a dispatcher receives a notification, it uses a more scalable fan-out mechanism to inform the application services. This could be:

* Redis Pub/Sub: Extremely fast and designed for this purpose.

* gRPC Streaming: Services maintain a stream with the dispatcher.

* A lightweight message queue: If you already have one in your stack.

This pattern centralizes the database connection load and decouples the notification mechanism from the main application logic, allowing you to handle scaling more effectively. It re-introduces a dependency, but it's a strategic choice made to handle scale, rather than a default one.

plaintext

+------------+
| PostgreSQL | ----(LISTEN/NOTIFY)----> +--------------------------+
+------------+                           | Notification Dispatcher  |
                                         | (1-3 Instances)          |
                                         +--------------------------+
                                                     |
                                                     | (e.g., Redis Pub/Sub)
                                                     |
             +---------------------------------------+---------------------------------------+
             |                                       |                                       |
             v                                       v                                       v
      +--------------+                        +--------------+                        +--------------+
      | App Service  |                        | App Service  |                        | App Service  |
      | (Instance 1) |                        | (Instance 2) |                        |  ... (Instance N) |
      +--------------+                        +--------------+                        +--------------+

Edge Cases and Final Gotchas

* Transactional Guarantees: Notifications are transactional. They are queued during a transaction but only sent to listeners upon a successful COMMIT. A ROLLBACK will discard them. This is a powerful feature, as it prevents listeners from seeing uncommitted state.

* Payload Size Limit: The 8000-byte limit is strict. If you need to communicate changes to very large rows, your payload should only contain the record_id, and the listener must be responsible for re-fetching the full data from the database if needed.

* Database Failover: The listener's connection logic must be robust to primary database failover. It should not be hardcoded to a specific IP. Use a connection string that points to a load balancer or a DNS entry that will be updated during a failover event (e.g., the AWS RDS writer endpoint).

* Not a Message Broker Replacement: LISTEN/NOTIFY provides at-most-once delivery and has no built-in persistence or dead-letter queue. It is not a substitute for Kafka or RabbitMQ for critical, durable messaging. Its sweet spot is for ephemeral, real-time events like cache invalidation where eventual consistency via reconciliation is acceptable.

By combining PostgreSQL's native messaging with a resilient client-side implementation, you can build a highly efficient, low-latency, and low-dependency cache invalidation system that is perfectly suited for many distributed architectures.