Zanzibar-style ReBAC: A Deep Dive into Fine-Grained Authorization

19 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

Beyond RBAC: The Inevitable Shift to Relationship-Based Access Control

For years, Role-Based Access Control (RBAC) has been the bedrock of authorization. We define roles (admin, editor, viewer) and assign them to users. It's simple, predictable, and sufficient for a vast number of applications. However, as systems evolve towards collaborative, multi-tenant, and deeply nested resource hierarchies, RBAC's limitations become glaringly apparent.

Consider a Google Docs-style application. The authorization questions we need to answer are not as simple as "Is Alice an editor?" They are inherently relational:

* Can Bob edit document 'Q4-report' because he is a member of the 'Finance' group, which was granted editor access to the 'Reports' folder, which contains the 'Q4-report' document?

Modeling this with traditional RBAC is a nightmare of join tables, recursive CTEs in SQL, and application-level logic that is brittle, difficult to audit, and impossible to scale. The core problem is that permission is an emergent property of the relationships between entities, not a static attribute assigned to a user.

This is the problem Google solved with its Zanzibar paper, introducing a scalable, global authorization system based on Relationship-Based Access Control (ReBAC). This article is a deep dive into the practical implementation of Zanzibar-style ReBAC. We will not cover the basics. We assume you understand why you need this and are ready to explore the architectural and implementation details required to build or integrate such a system in a production environment.

We will focus on:

  • Core Primitives: Deconstructing relation tuples and userset rewrites.
  • Schema Design: Modeling complex permissions as a formal schema.
  • The Check Algorithm: Understanding the graph traversal at the heart of ReBAC.
  • Performance & Consistency: Tackling latency, fan-out, and stale data with Zookies.
  • Production Implementation: A complete, runnable example using OpenFGA.

  • 1. The Core Primitives: Tuples and Userset Rewrites

    A Zanzibar-style system externalizes authorization logic by storing and processing relationships. All authorization data is modeled as a single, simple primitive: the relation tuple.

    A relation tuple is an atomic fact in the format: object#relation@user.

    * object: The resource. Formatted as type:id (e.g., document:roadmap).

    * relation: The relationship the user has with the object (e.g., owner, viewer).

    * user: The subject. This can be a user ID (user:alice) or, critically, a reference to another set of users—a userset.

    Let's model our document sharing scenario:

    text
    // Alice is the owner of the roadmap document
    document:roadmap#owner@user:alice
    
    // The 'engineering' group exists
    group:eng#member@user:bob
    group:eng#member@user:charlie
    
    // The roadmap document is in the 'product' folder
    folder:product#contains@document:roadmap // Note: This is a resource-to-resource relation
    
    // Members of the 'engineering' group can view the 'product' folder
    folder:product#viewer@group:eng#member

    The last tuple is the most important concept to grasp. The user portion is group:eng#member. This is not a specific user but a userset. It means "the set of users who have the member relation on the group:eng object." The system now has a rule to resolve this dynamically.

    Userset Rewrites: The Engine of Inference

    Static tuples are not enough. We need to define rules for how relations are computed or inherited. This is done through a formal authorization model or schema, which defines userset rewrites.

    Here's a snippet from a Zanzibar-style modeling language (we'll use the OpenFGA syntax):

    fsharp
    model
      schema 1.1
    
    type user
    
    type group
      relations
        define member: [user]
    
    type folder
      relations
        define viewer: [user, group#member]
        define editor: [user, group#member] or viewer
    
    type document
      relations
        define parent: [folder]
        define owner: [user]
        define editor: [user, group#member] or owner
        define viewer: [user, group#member] or editor or viewer from parent

    Let's break down the document type definition, specifically the viewer relation:

    define viewer: [user, group#member] or editor or viewer from parent

    This single line defines the logic for determining if someone is a viewer. A user is a viewer of a document if:

  • Direct Membership ([user, group#member]): They have been granted direct viewer access, either as an individual user or as a member of a group that was granted viewer access.
  • Computed Union (or editor): They are an editor of the document. This is a union operation. The set of viewers includes the set of editors.
  • Tuple-to-Userset (TTU) Rewrite (or viewer from parent): This is the most powerful rewrite. It tells the system: "To find the viewers of this document, find the object(s) related to this document via the parent relation, and then find the viewers of that object." This is how we model permission inheritance through resource hierarchies.
  • With this schema, the system can now answer our original question: "Can Bob view the roadmap document?"


    2. The `Check` Algorithm: A Distributed Graph Traversal

    The fundamental API call in a ReBAC system is Check(user, relation, object). It returns a simple boolean. The implementation of Check, however, is a recursive graph traversal that expands the userset rewrites defined in the schema.

    Let's trace Check(user:bob, viewer, document:roadmap):

  • Goal: Is user:bob in the set document:roadmap#viewer?
  • Expand document:roadmap#viewer: The schema says this is a union of three possibilities:
  • * Possibility A (Direct): Check for a direct tuple document:roadmap#viewer@user:bob. Let's assume this doesn't exist.

    * Possibility B (Inherited from editor): Is user:bob in the set document:roadmap#editor? This triggers a sub-problem.

    * Expand document:roadmap#editor. The schema says this is [user, group#member] or owner.

    * Check for direct editor tuple. Assume none.

    * Is user:bob an owner? Check for document:roadmap#owner@user:bob. Assume none.

    * Sub-problem fails. user:bob is not an editor.

    * Possibility C (Inherited from parent): The schema says viewer from parent. This triggers a TTU rewrite.

    * Step C.1 (Find Parents): Find all objects related to document:roadmap via the parent relation. We need to query our tuple store for document:roadmap#parent@folder:?. Let's say we find the tuple document:roadmap#parent@folder:product.

    * Step C.2 (Recurse): For each parent found, we now have a new sub-problem: Check(user:bob, viewer, folder:product). Is user:bob in the set folder:product#viewer?

    * Expand folder:product#viewer. The schema says [user, group#member].

    * Check for a direct tuple folder:product#viewer@user:bob. Assume none.

    * Check for a userset tuple. Is there a tuple like folder:product#viewer@group:?#member? Yes, we have folder:product#viewer@group:eng#member.

    * This creates a final sub-problem: Is user:bob in the set group:eng#member?

    * Expand group:eng#member. The schema says [user]. This means we only need to look for direct tuples.

    * Check for group:eng#member@user:bob. This tuple exists!

    * This sub-problem returns true. The recursion unwinds, and the final answer is true.

    Pseudo-code for a Naive `Check` Implementation

    go
    // Simplified, non-concurrent, no-caching implementation
    func Check(user, relation, object) -> bool {
        // 1. Check for a direct relationship tuple
        if tupleStore.Exists(object, relation, user) {
            return true
        }
    
        // 2. Resolve the relation's definition from the schema
        relationDefinition := schema.GetRelation(object.Type, relation)
    
        // 3. Handle Union ('or')
        for each subRelation in relationDefinition.Union {
            if Check(user, subRelation, object) {
                return true
            }
        }
    
        // 4. Handle Intersection ('and') - not shown in our schema, but possible
        // ... requires all sub-checks to be true
    
        // 5. Handle Exclusion ('but not') - not shown, but possible
        // ... requires first to be true and second to be false
    
        // 6. Handle TTU ('relation from other_relation')
        for each ttu in relationDefinition.TTU {
            // Find all objects linked by the ttu.sourceRelation
            // e.g., for 'viewer from parent', sourceRelation is 'parent'
            linkedObjects := tupleStore.Query(object, ttu.sourceRelation)
            
            for each linkedObject in linkedObjects {
                // Recurse with the ttu.targetRelation
                // e.g., for 'viewer from parent', targetRelation is 'viewer'
                if Check(user, ttu.targetRelation, linkedObject) {
                    return true
                }
            }
        }
    
        // 7. Handle direct usersets (e.g., relation points to group#member)
        usersetTuples := tupleStore.QueryUsersets(object, relation)
        for each usersetTuple in usersetTuples {
             // usersetTuple.User is something like 'group:eng#member'
             if Check(user, usersetTuple.User.Relation, usersetTuple.User.Object) {
                 return true
             }
        }
    
        return false
    }

    This naive implementation highlights the recursive nature but also reveals potential performance bottlenecks. A single Check can fan out into dozens or even hundreds of sub-problems and database lookups.


    3. Production Hardening: Performance and Consistency

    Running the naive algorithm in production would be disastrous. A Zanzibar-style system must be both low-latency (p99 < 10ms) and highly available. This is achieved through aggressive caching, consistency guarantees, and protection against unbounded computation.

    The "New Enemy" Problem and Zookies

    Imagine you grant viewer access on a top-level folder to the all_employees group. This single write operation could potentially invalidate millions of cached Check results across the entire system. This is known as the "New Enemy" problem.

    Zanzibar's solution is a consistency mechanism called Zookies. A Zookie is an opaque token that represents a snapshot of the database at a particular point in time. When a client performs a write (adding/deleting a tuple), the server returns a Zookie corresponding to the state after the write. When the client subsequently performs a read (Check), it can provide that Zookie to ensure the read is at least as fresh as its last write.

    This allows the system to serve reads from slightly stale but consistent snapshots, which is fantastic for caching. A Check result for (user, relation, object) at Zookie Z can be cached indefinitely. If a request comes in with a newer Zookie Z', the system can perform the check against a snapshot at Z' and cache that result separately. This avoids mass cache invalidation.

    Bounding Latency

    Deeply nested hierarchies or circular dependencies in the schema can lead to unbounded recursion. Production systems must implement several safeguards:

    * Schema Validation: Disallow circular dependencies during schema definition (e.g., A inherits from B, and B inherits from A).

    * Request Timeouts: Enforce a strict upper bound on total request time (e.g., 50ms).

    * Bounded Traversal Depth: Limit the recursion depth of the Check algorithm. If the limit is exceeded, the request fails open (deny access) and logs an error.

    * Concurrency Limits: Limit the number of concurrent sub-problems a single Check request can fan out to. This prevents a single complex query from overwhelming the system.

    Datastore Considerations

    The performance of the tuple store is paramount. It must support fast lookups of tuples and reverse lookups (finding objects a user has a relation on). While you can build this on a standard RDBMS like PostgreSQL or CockroachDB (as SpiceDB does), the query patterns are very specific. Indexes must be carefully designed on (object_type, object_id, relation) and (user_type, user_id, relation).

    For global-scale deployments, a distributed database like Google Spanner is the ideal foundation, as it provides the external consistency required for Zookies to work correctly across data centers.


    4. Advanced Pattern: Contextual Tuples and Caveats

    Pure ReBAC is powerful, but real-world authorization often has contextual constraints. For example: "Alice can approve a deployment only if she has provided a valid MFA token within the last 5 minutes" or "Bob can access sensitive documents only from an IP address within the corporate VPN range."

    This is where caveats (or conditions) come in. A caveat is a parameterized expression associated with a relationship tuple that must evaluate to true for the relationship to be valid at request time.

    Let's extend our model. We want to restrict document editing to work hours (9am-5pm).

    First, we update our schema using a more advanced language feature:

    fsharp
    model
      schema 1.2
    
    type user
    
    type document
      relations
        // ... other relations
        define editor: [user with is_work_hours]
    
    // A caveat is a typed function signature
    caveat is_work_hours(request_time: timestamp) {
      request_time.getHours() >= 9 && request_time.getHours() < 17
    }

    Now, when we write the relationship tuple, we associate it with the caveat:

    document:financials#editor@user:dave with is_work_hours

    When the application makes a Check request, it must provide the context needed to evaluate the caveat:

    json
    // Check Request Body
    {
      "user": "user:dave",
      "relation": "editor",
      "object": "document:financials",
      "context": {
        "request_time": "2023-10-27T14:30:00Z"
      }
    }

    The authorization system will now perform the graph traversal as before. When it encounters the caved tuple, it will execute the is_work_hours logic with the provided context. If the logic returns true, the relationship holds. If it returns false, that path of the graph traversal is terminated.

    Caveats are an incredibly powerful tool for handling attribute-based access control (ABAC) concerns within a ReBAC framework, but they come with significant complexity:

    * Performance: Caveat evaluation happens at read time and cannot be easily cached. Complex caveats can become a performance bottleneck.

    * Distribution: The evaluation engine and the contextual data must be available to every node processing Check requests.

    * Security: The caveat evaluation language must be sandboxed to prevent arbitrary code execution.


    5. Production Implementation with OpenFGA

    Building a Zanzibar-style system from scratch is a massive undertaking. Fortunately, open-source implementations like OpenFGA (a CNCF project based on Auth0's Sandcastle) and SpiceDB (from Authzed) provide production-ready engines.

    Let's build a small Go microservice that uses OpenFGA to protect its endpoints. This example demonstrates the full lifecycle: defining a model, writing tuples, and checking permissions.

    Step 1: Setup OpenFGA

    First, run an OpenFGA instance using Docker:

    bash
    # Create a docker network
    docker network create openfga
    
    # Start a Postgres instance for OpenFGA to use for storage
    docker run --name postgres -p 5432:5432 --network openfga \
      -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password -d postgres
    
    # Run the latest OpenFGA, wait for it to be ready
    docker run --name openfga --network openfga -p 8080:8080 -p 8081:8081 -p 3000:3000 \
      -e OPENFGA_DATASTORE_ENGINE=postgres \
      -e OPENFGA_DATASTORE_URI='postgres://postgres:password@postgres:5432/postgres?sslmode=disable' \
      -d openfga/openfga:latest run
    
    # Wait a few seconds for OpenFGA to initialize
    sleep 5

    Step 2: Define and Write the Authorization Model

    We'll use our document schema. Save this as model.fga.json:

    json
    {
      "schema_version": "1.1",
      "type_definitions": [
        {"type": "user"},
        {
          "type": "group",
          "relations": {
            "member": {"this": {}}
          },
          "metadata": {
            "relations": {
              "member": {"directly_related_user_types": [{"type": "user"}]}
            }
          }
        },
        {
          "type": "folder",
          "relations": {
            "viewer": {
              "union": {
                "child": [
                  {"this": {}},
                  {"computed_userset": {"object": "", "relation": "editor"}}
                ]
              }
            },
            "editor": {"this": {}}
          },
          "metadata": {
            "relations": {
              "viewer": {"directly_related_user_types": [{"type": "user"}, {"type": "group", "relation": "member"}]},
              "editor": {"directly_related_user_types": [{"type": "user"}, {"type": "group", "relation": "member"}]}
            }
          }
        },
        {
          "type": "document",
          "relations": {
            "parent": {"this": {}},
            "owner": {"this": {}},
            "editor": {
              "union": {
                "child": [{"this": {}}, {"computed_userset": {"object": "", "relation": "owner"}}]
              }
            },
            "viewer": {
              "union": {
                "child": [
                  {"this": {}},
                  {"computed_userset": {"object": "", "relation": "editor"}},
                  {"tuple_to_userset": {"tupleset": {"relation": "viewer"}, "computed_userset": {"object": "", "relation": "parent"}}}
                ]
              }
            }
          },
          "metadata": {
            "relations": {
              "parent": {"directly_related_user_types": [{"type": "folder"}]},
              "owner": {"directly_related_user_types": [{"type": "user"}]},
              "editor": {"directly_related_user_types": [{"type": "user"}, {"type": "group", "relation": "member"}]},
              "viewer": {"directly_related_user_types": [{"type": "user"}, {"type": "group", "relation": "member"}]}
            }
          }
        }
      ]
    }

    Now, let's create a Go program (setup.go) to create a store and write this model.

    go
    // setup.go
    package main
    
    import (
    	"context"
    	"encoding/json"
    	"fmt"
    	"log"
    	"os"
    
    	openfga "github.com/openfga/go-sdk"
    )
    
    func main() {
    	ctx := context.Background()
    	apiClient, err := openfga.NewSdkClient(&openfga.ClientConfiguration{
    		ApiUrl: "http://localhost:8080",
    	})
    	if err != nil {
    		log.Fatalf("Failed to create FGA client: %v", err)
    	}
    
    	// 1. Create a new store
    	resp, err := apiClient.CreateStore(ctx).Body(openfga.CreateStoreRequest{Name: "DocStore"}).Execute()
    	if err != nil {
    		log.Fatalf("Failed to create store: %v", err)
    	}
    	storeID := resp.GetId()
    	fmt.Printf("Created Store with ID: %s\n", storeID)
    	apiClient.SetStoreId(storeID)
    
    	// 2. Read and write the authorization model
    	modelBytes, err := os.ReadFile("model.fga.json")
    	if err != nil {
    		log.Fatalf("Failed to read model file: %v", err)
    	}
    
    	var body openfga.WriteAuthorizationModelRequest
    	if err := json.Unmarshal(modelBytes, &body); err != nil {
    		log.Fatalf("Failed to unmarshal model: %v", err)
    	}
    
    	modelResp, err := apiClient.WriteAuthorizationModel(ctx).Body(body).Execute()
    	if err != nil {
    		log.Fatalf("Failed to write model: %v", err)
    	}
    	authModelID := modelResp.GetAuthorizationModelId()
    	fmt.Printf("Wrote Authorization Model with ID: %s\n", authModelID)
    
    	// 3. Write relationship tuples
    	tuples := []openfga.TupleKey{
    		{User: "user:alice", Relation: "owner", Object: "document:roadmap"},
    		{User: "user:bob", Relation: "member", Object: "group:eng"},
    		{User: "document:roadmap", Relation: "parent", Object: "folder:product"},
    		{User: "group:eng#member", Relation: "viewer", Object: "folder:product"},
    	}
    
    	_, err = apiClient.Write(ctx).Body(openfga.WriteRequest{Writes: &openfga.TupleKeys{TupleKeys: tuples}}).Execute()
    	if err != nil {
    		log.Fatalf("Failed to write tuples: %v", err)
    	}
    	fmt.Println("Successfully wrote initial tuples.")
    }
    

    Run go mod init example && go mod tidy and then go run setup.go. This will configure your OpenFGA instance.

    Step 3: Create a Protected HTTP Endpoint

    Now, let's create a simple web server with an endpoint that checks permissions before returning data.

    go
    // main.go
    package main
    
    import (
    	"context"
    	"fmt"
    	"log"
    	"net/http"
    
    	openfga "github.com/openfga/go-sdk"
    )
    
    // A global FGA client for simplicity
    var fgaClient *openfga.APIClient
    
    // Middleware to check authorization
    func checkAuth(next http.HandlerFunc, relation string, object string) http.HandlerFunc {
    	return func(w http.ResponseWriter, r *http.Request) {
    		user := r.Header.Get("X-User-ID")
    		if user == "" {
    			http.Error(w, "Unauthorized: X-User-ID header missing", http.StatusUnauthorized)
    			return
    		}
    
    		result, err := fgaClient.Check(context.Background()).Body(openfga.ClientCheckRequest{
    			User:     user,
    			Relation: relation,
    			Object:   object,
    		}).Execute()
    
    		if err != nil {
    			http.Error(w, "Error checking permission", http.StatusInternalServerError)
    			log.Printf("FGA Check Error: %v", err)
    			return
    		}
    
    		if !result.GetAllowed() {
    			http.Error(w, "Forbidden", http.StatusForbidden)
    			return
    		}
    
    		next.ServeHTTP(w, r)
    	}
    }
    
    func getRoadmapDocument(w http.ResponseWriter, r *http.Request) {
    	fmt.Fprintln(w, "Here is the Q4 Roadmap document content...")
    }
    
    func main() {
    	// Assume store ID is known from setup or config
    	storeID := "YOUR_STORE_ID_FROM_SETUP"
    
    	var err error
    	fgaClient, err = openfga.NewSdkClient(&openfga.ClientConfiguration{
    		ApiUrl:  "http://localhost:8080",
    		StoreId: storeID,
    	})
    	if err != nil {
    		log.Fatalf("Failed to create FGA client: %v", err)
    	}
    
    	http.HandleFunc("/documents/roadmap", checkAuth(getRoadmapDocument, "viewer", "document:roadmap"))
    
    	log.Println("Server starting on :8000")
    	if err := http.ListenAndServe(":8000", nil); err != nil {
    		log.Fatal(err)
    	}
    }

    Replace YOUR_STORE_ID_FROM_SETUP with the ID printed by the setup script. Now run the server: go run main.go.

    Test it with curl:

    bash
    # Test with Bob, who should have access through the group
    # This should return 200 OK and the document content
    curl -H "X-User-ID: user:bob" http://localhost:8000/documents/roadmap
    
    # Test with Alice, the owner, who is also a viewer
    # This should also return 200 OK
    curl -H "X-User-ID: user:alice" http://localhost:8000/documents/roadmap
    
    # Test with a random user, Dave
    # This should return 403 Forbidden
    curl -i -H "X-User-ID: user:dave" http://localhost:8000/documents/roadmap

    This working example demonstrates how authorization logic is completely decoupled from the application. The microservice only needs to ask the FGA service a simple question, and the complex graph traversal is handled externally.

    Conclusion

    Relationship-Based Access Control is not a silver bullet. For simple applications, RBAC remains a perfectly valid and simpler choice. However, for systems defined by their complex interconnections—social graphs, collaborative platforms, cloud infrastructure—ReBAC provides a scalable and semantically rich way to manage authorization.

    By externalizing authorization into a dedicated service that reasons over relationship tuples, you gain clarity, auditability, and performance. The key takeaways for any senior engineer considering this architecture are:

    * Model First: The authorization model is your constitution. Time spent designing a clean, non-circular, and understandable schema will pay immense dividends.

    * Embrace the Graph: Think of permissions not as static flags but as paths through a graph of relationships. This mental model is crucial for debugging and design.

    Performance is a Feature: Naive implementations will fail. A production-ready system must* address caching, consistency (Zookies), and bounded evaluation to meet latency requirements.

    * Leverage Open Source: The problem space is complex. Projects like OpenFGA and SpiceDB have already solved the hardest parts of building a distributed, high-performance authorization engine, allowing you to focus on modeling your domain correctly.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles