Diagnosing PostgreSQL Lock Contention with pg_locks & pg_stat_activity

19 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Production Scenario: When Everything Grinds to a Halt

You're on call. Application performance metrics have fallen off a cliff. Transaction latency is through the roof, CPU usage on the database is surprisingly low, and connection pools are exhausted. The application logs are a sea of timeout errors. This isn't a simple slow query; this is a concurrency crisis. The prime suspect: lock contention.

For senior engineers managing high-throughput PostgreSQL databases, this scenario is all too familiar. The challenge isn't knowing what locks are, but having a precise, surgical methodology to identify the exact session, query, and lock type that is the root cause of a cascading failure. Basic queries showing a single blocked process are insufficient. In a complex system, you're often dealing with chains of blockers: Process A blocks B, which blocks C and D, which in turn block a dozen other processes. Your task is to find Process A, the "head blocker," and understand precisely what it's doing and why.

This guide provides a production-ready playbook for dissecting these complex lock scenarios using PostgreSQL's indispensable system catalog views: pg_locks and pg_stat_activity. We will forego introductory concepts and dive straight into the advanced queries and analysis techniques required to solve real-world contention bottlenecks.

Core Diagnostic Views: A Quick Refresher on Key Columns

While we assume familiarity, a brief review of the critical columns we'll be using is essential for context. Our analysis hinges on the precise interplay of data from these two views.

pg_stat_activity - The "Who" and "What"

This view provides a snapshot of all active processes (backends) connected to the database server.

  • pid: The process ID of the backend. Our primary key for joining with pg_locks.
  • query: The full text of the currently executing query. Absolutely critical.
  • state: The current status of the backend. We're especially interested in active and the notorious idle in transaction.
  • wait_event_type & wait_event: The most important columns for diagnosing contention. When a process is blocked, these will tell you why. For our purposes, we're looking for wait_event_type = 'Lock'. The wait_event will specify the type of lock (e.g., relation, transactionid, tuple).
  • xact_start: The timestamp when the current transaction started. Crucial for identifying long-running and idle in transaction sessions.
  • pg_locks - The "Why" and "How"

    This view shows detailed information about the locks held by active processes.

  • pid: The process ID holding or waiting for the lock.
  • locktype: The type of object being locked (e.g., relation, tuple, transactionid).
  • relation: The OID of the table/index being locked. We join this to pg_class to get the table name.
  • page, tuple: Identifies the specific page or tuple (row) being locked, providing granularity.
  • mode: The type of lock being held or requested (e.g., AccessExclusiveLock, RowExclusiveLock).
  • granted: A boolean indicating if the lock has been acquired (true) or if the process is waiting for it (false). This is the cornerstone of our diagnostic queries.
  • The Foundational Query: Identifying Blockers and Waiters

    Our investigation starts with a query that joins these two views to create a clear picture of who is blocking whom. This isn't a simple join; it's a self-join on pg_locks correlated with pg_stat_activity to link waiting processes with the processes that hold the locks they need.

    sql
    SELECT
        -- Information about the blocked process
        blocked_locks.pid     AS blocked_pid,
        blocked_activity.usename  AS blocked_user,
        blocked_activity.query    AS blocked_query,
        blocked_activity.state    AS blocked_state,
        blocked_activity.wait_event_type AS blocked_wait_type,
        blocked_activity.wait_event AS blocked_wait_event,
    
        -- Information about the blocking process
        blocking_locks.pid     AS blocking_pid,
        blocking_activity.usename  AS blocking_user,
        blocking_activity.query    AS blocking_query,
        blocking_activity.state    AS blocking_state,
        
        -- Details about the lock itself
        now() - blocking_activity.xact_start AS blocking_duration,
        now() - blocked_activity.query_start AS blocked_duration,
        blocking_locks.locktype AS lock_type,
        blocking_locks.mode AS lock_mode,
        blocking_locks.relation::regclass AS locked_relation
    FROM pg_catalog.pg_locks blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity 
        ON blocked_locks.pid = blocked_activity.pid
    JOIN pg_catalog.pg_locks blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity 
        ON blocking_locks.pid = blocking_activity.pid
    WHERE NOT blocked_locks.granted;

    Deconstructing the Join Logic

    The complexity here lies in the JOIN condition between blocking_locks and blocked_locks. We are looking for a row in pg_locks where granted is false (the waiter) and finding another row in pg_locks where granted is true (the holder) for the exact same locked object. The series of IS NOT DISTINCT FROM clauses ensures we match on the specific object, whether it's a relation, a tuple, a transaction ID, or another lockable resource.

    This query is your first line of defense. It will immediately show you direct, one-to-one blocking scenarios. But in a real crisis, the output of this query might show multiple blockers and waiters, forming a confusing graph. This is where we need a more powerful tool.

    Advanced Scenario 1: Unraveling Lock Chains with Recursive CTEs

    When Process C is blocked by B, which is blocked by A, the foundational query will show that B blocks C, and A blocks B. It doesn't give you a clear, hierarchical view of the entire dependency chain or easily identify Process A as the ultimate "head blocker." To solve this, we use a recursive Common Table Expression (CTE).

    This query traverses the lock graph, starting from waiting processes and working its way up to find the root cause.

    sql
    WITH RECURSIVE lock_waits AS (
        -- Base case: Select all sessions that are waiting for a lock.
        -- These are the starting points of our lock chains.
        SELECT
            pid,
            locktype,
            relation,
            page,
            tuple,
            virtualxid,
            transactionid,
            classid,
            objid,
            objsubid,
            granted
        FROM pg_locks
        WHERE NOT granted
    
        UNION ALL
    
        -- Recursive step: Find the session that is holding the lock
        -- that the previous level in the recursion is waiting for.
        SELECT
            l.pid,
            l.locktype,
            l.relation,
            l.page,
            l.tuple,
            l.virtualxid,
            l.transactionid,
            l.classid,
            l.objid,
            l.objsubid,
            l.granted
        FROM pg_locks l, lock_waits w
        WHERE
            l.granted AND
            l.locktype = w.locktype AND
            l.database IS NOT DISTINCT FROM (SELECT datid FROM pg_database WHERE datname = current_database()) AND
            l.relation IS NOT DISTINCT FROM w.relation AND
            l.page IS NOT DISTINCT FROM w.page AND
            l.tuple IS NOT DISTINCT FROM w.tuple AND
            l.virtualxid IS NOT DISTINCT FROM w.virtualxid AND
            l.transactionid IS NOT DISTINCT FROM w.transactionid AND
            l.classid IS NOT DISTINCT FROM w.classid AND
            l.objid IS NOT DISTINCT FROM w.objid AND
            l.objsubid IS NOT DISTINCT FROM w.objsubid
    )
    SELECT
        lw.pid,
        sa.usename,
        sa.application_name,
        sa.state,
        -- Use a CASE statement to clearly identify waiters vs. holders
        CASE WHEN lw.granted THEN 'HOLDER' ELSE 'WAITER' END AS lock_status,
        -- Show the chain of PIDs
        (SELECT array_agg(pid) FROM pg_catalog.pg_locks l WHERE NOT l.granted AND l.locktype = lw.locktype AND l.relation = lw.relation) AS waiting_pids,
        sa.query,
        lw.locktype,
        lw.relation::regclass,
        now() - sa.xact_start AS transaction_duration
    FROM lock_waits lw
    JOIN pg_stat_activity sa ON lw.pid = sa.pid
    -- Filter out the waiters from the final result to only show the holders
    -- or adjust as needed to see the full chain.
    WHERE lw.granted
    ORDER BY transaction_duration DESC;

    How This Recursive Query Works

  • Base Case: The first part of the UNION ALL selects all locks that are not granted (WHERE NOT granted). These are the leaf nodes of our inverted dependency tree—the processes at the end of the wait chain.
  • Recursive Step: The second part joins pg_locks back to the lock_waits CTE. It looks for a lock that is granted (l.granted) and matches the exact object that a process from the previous step is waiting for. It effectively walks one step up the chain from a waiter to its blocker.
  • Termination: The recursion stops when it finds a process that holds a lock but is not itself waiting for another lock (the head blocker).
  • Final SELECT: We then join the results back to pg_stat_activity to get the rich context (username, query text, state, etc.) and present a clear picture. The ORDER BY transaction_duration DESC is crucial, as the head blocker is often a long-running transaction.
  • Running this query during a contention event will produce a list of PIDs that are holding locks wanted by other processes. The one at the top, with the longest transaction duration, is almost always your culprit.

    Advanced Scenario 2: The Phantom Blocker - `idle in transaction`

    One of the most insidious causes of lock contention is the idle in transaction state. This occurs when an application begins a transaction (BEGIN;), performs some database operations (acquiring locks), and then, instead of issuing a COMMIT or ROLLBACK, it goes off to do other work, like calling a slow external API or waiting for user input. While it's idle, it continues to hold its locks, blocking any other process that needs them.

    These are particularly hard to debug because pg_stat_activity will show the process state as idle in transaction, and the query column will be empty or show the last completed query, not what's currently running (which is nothing). The key is to correlate this state with long transaction durations.

    Here's a modified query to specifically hunt for these phantom blockers:

    sql
    SELECT
        sa.pid,
        sa.usename,
        sa.application_name,
        sa.client_addr,
        now() - sa.xact_start as transaction_duration,
        sa.state,
        sa.query AS last_query_in_transaction,
        -- Count how many other processes this PID is blocking
        (SELECT count(*) FROM pg_stat_activity WHERE pid IN (pg_blocking_pids(sa.pid))) as blocked_process_count
    FROM pg_stat_activity sa
    WHERE sa.state = 'idle in transaction'
      AND sa.xact_start IS NOT NULL
      -- Filter for transactions idle for more than a minute. Adjust as needed.
      AND now() - sa.xact_start > '1 minute'::interval
      -- And is actually blocking someone
      AND EXISTS (
          SELECT 1
          FROM pg_locks l1
          JOIN pg_locks l2 ON l1.locktype = l2.locktype 
                            AND l1.database = l2.database 
                            AND l1.relation = l2.relation
          WHERE l1.pid = sa.pid AND NOT l2.granted
      )
    ORDER BY transaction_duration DESC;

    Analysis and Remediation

  • PostgreSQL 14+ pg_blocking_pids(): This query uses the pg_blocking_pids() function (available in PG14+), which is a more efficient way to determine if a PID is blocking others. For older versions, the EXISTS subquery with a join on pg_locks achieves the same goal.
  • Identifying the Source: The application_name, usename, and client_addr columns are your best friends here. They help you trace the offending session back to a specific application server, code module, or user.
  • Remediation:
  • 1. Immediate: If the situation is critical, you may need to terminate the backend process using SELECT pg_terminate_backend(pid);. This will force a rollback of the idle transaction, releasing its locks and allowing waiting processes to proceed. Use this with caution, as it abruptly ends the session.

    2. Long-term: The root cause is always in the application code. You must find the code path that opens a transaction and fails to close it promptly. Common culprits include: missing try...finally blocks for closing connections/transactions, ORMs that abstract transaction management poorly, or application logic that performs slow, non-database work within a transaction boundary.

    Advanced Scenario 3: Differentiating Lock Granularity (Row vs. Page vs. Relation)

    Not all locks are created equal. Understanding the granularity of a lock is key to resolving the underlying performance issue. A RowExclusiveLock on a single tuple is far less impactful than an AccessExclusiveLock on an entire relation (table).

    pg_locks gives us the clues we need:

  • Relation Lock: locktype = 'relation' and relation is not null. page and tuple will be null.
  • Page Lock: locktype = 'page' and relation, page are not null. tuple will be null.
  • Tuple (Row) Lock: locktype = 'tuple' and relation, page, tuple are not null.
  • Consider a scenario where a seemingly simple UPDATE causes widespread blocking. Why?

    sql
    -- Table structure
    CREATE TABLE products (
        id SERIAL PRIMARY KEY,
        sku VARCHAR(255) UNIQUE,
        status VARCHAR(20) NOT NULL, -- e.g., 'active', 'archived'
        last_updated TIMESTAMPOFFSETZ
    );
    
    -- A background job archives old products
    -- The 'status' column is NOT indexed
    BEGIN;
    UPDATE products SET status = 'archived' WHERE last_updated < now() - '90 days'::interval;
    -- ... some other work ...
    COMMIT;

    Because last_updated is not indexed, this UPDATE will perform a sequential scan on the products table. PostgreSQL may acquire a RowExclusiveLock on every row it updates, but more problematically, if the update affects a significant portion of the table, it might escalate to a full ShareUpdateExclusiveLock on the relation to prevent schema changes during the scan. This can block other sessions trying to acquire more aggressive locks, like AccessExclusiveLock for a VACUUM FULL or ALTER TABLE.

    Our diagnostic query can be enhanced to show this granularity:

    sql
    -- A simplified version of the foundational query focusing on lock details
    SELECT
        blocking_activity.pid AS blocking_pid,
        blocking_activity.query AS blocking_query,
        blocked_activity.pid AS blocked_pid,
        blocked_activity.query AS blocked_query,
        blocking_locks.locktype,
        blocking_locks.mode AS blocking_mode,
        blocking_locks.relation::regclass,
        blocking_locks.page,
        blocking_locks.tuple
    FROM pg_catalog.pg_locks blocked_locks
    -- ... (full join logic from foundational query) ...
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_locks.pid = blocking_activity.pid
    WHERE NOT blocked_locks.granted;

    If the output of this query shows the UPDATE query holding a lock on the products relation (locktype = 'relation') and blocking other processes, your smoking gun is the lack of an index. The solution is clear:

    sql
    CREATE INDEX idx_products_last_updated ON products (last_updated);

    With this index, the UPDATE can perform a much faster index scan, locking only the specific tuples it needs to modify. This dramatically reduces the duration and scope of the locks held, resolving the contention.

    Proactive Architectural Patterns to Minimize Contention

    Debugging locks is a reactive process. Senior engineers should also focus on proactive design patterns that prevent contention from occurring in the first place.

  • Use SELECT ... FOR UPDATE SKIP LOCKED for Job Queues:
  • A common source of contention is multiple workers trying to grab the same job from a queue table. A naive SELECT ... FOR UPDATE LIMIT 1 will cause workers to block each other.

    sql
        -- Naive approach - causes contention
        BEGIN;
        SELECT id, payload FROM job_queue WHERE status = 'pending' ORDER BY created_at ASC LIMIT 1 FOR UPDATE;
        -- This worker is now blocked if another worker already has the lock on the first job.
        
        -- Advanced, contention-free approach
        BEGIN;
        SELECT id, payload FROM job_queue WHERE status = 'pending' ORDER BY created_at ASC LIMIT 1 FOR UPDATE SKIP LOCKED;
        -- If the first job is locked, this query immediately skips it and locks the next available one.

    SKIP LOCKED is a powerful feature for building highly concurrent, scalable queueing systems directly in PostgreSQL.

  • Keep Transactions Short and Focused:
  • Never perform slow, external operations within a database transaction. The idle in transaction problem is a direct result of violating this rule. The correct pattern is:

    javascript
        // ANTI-PATTERN: Slow operation inside a transaction
        async function processOrder(orderId) {
          const client = await pool.connect();
          try {
            await client.query('BEGIN');
            const order = await client.query('SELECT * FROM orders WHERE id = $1 FOR UPDATE', [orderId]);
            // This API call can take seconds, holding the row lock the entire time!
            await callSlowShippingAPI(order.rows[0]); 
            await client.query('UPDATE orders SET status = \'shipped\' WHERE id = $1', [orderId]);
            await client.query('COMMIT');
          } finally {
            client.release();
          }
        }
    
        // CORRECT PATTERN: Perform work outside the transaction
        async function processOrder(orderId) {
          const client = await pool.connect();
          let orderData;
          try {
            // First, fetch the data needed without a lock if possible, or in a very short transaction.
            const result = await client.query('SELECT * FROM orders WHERE id = $1', [orderId]);
            orderData = result.rows[0];
          } finally {
            client.release();
          }
    
          // Perform the slow operation outside any transaction.
          await callSlowShippingAPI(orderData);
    
          const client2 = await pool.connect();
          try {
            // Start a new, short transaction just for the final update.
            await client2.query('BEGIN');
            await client2.query('UPDATE orders SET status = \'shipped\' WHERE id = $1', [orderId]);
            await client2.query('COMMIT');
          } finally {
            client2.release();
          }
        }
  • Use Advisory Locks for Application-Level Locking:
  • Sometimes you need to lock a conceptual resource that doesn't map directly to a table row (e.g., preventing concurrent processing for a specific customer ID across multiple tables). Instead of locking a dummy row in a semaphore table (which can cause bloat and contention), use advisory locks.

    sql
        -- Worker 1 tries to get a lock for customer 123
        -- pg_try_advisory_xact_lock is session-scoped and released at transaction end.
        SELECT pg_try_advisory_xact_lock(123);
        -- Returns 't' (true), it got the lock.
        -- ... do complex processing for customer 123 ...
        COMMIT; -- Lock is automatically released.
    
        -- Worker 2, at the same time, tries for the same customer
        SELECT pg_try_advisory_xact_lock(123);
        -- Returns 'f' (false), lock is held by someone else.
        -- Worker 2 can now either wait or work on a different customer.

    Advisory locks are a lightweight, high-performance mechanism for implementing distributed locks when the application and database are tightly coupled.

    Conclusion

    Mastering the art of diagnosing lock contention in PostgreSQL is a rite of passage for engineers moving from intermediate to senior levels. It requires moving beyond simplistic monitoring and embracing the rich, detailed data within the database's own system catalogs. By combining pg_locks and pg_stat_activity with the analytical power of recursive CTEs, you can dissect even the most tangled lock chains and pinpoint their root cause with confidence. The true mark of an expert, however, is not just in debugging the crisis but in designing systems and application patterns that proactively minimize contention, ensuring your database remains a source of robust, scalable performance rather than a bottleneck.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles