ADR 0002: Database Schema per Module¶

Status: Accepted
Date: 2024-12-30
Deciders: Architecture Team

Context¶

The Spryx Backend is a modular monolith with multiple bounded contexts (modules): - RAG - Retrieval-Augmented Generation - Agent - AI Agent conversations - Billing - Subscription and usage tracking

We need to decide how to organize database tables across these modules, considering: 1. Current monolithic deployment 2. Future potential extraction to microservices 3. Clear ownership and boundaries 4. Development team autonomy

Options Considered¶

Single schema (public) - All tables in one schema
Schema per module - Each module has its own PostgreSQL schema
Separate databases - Each module has its own database

Decision¶

We will use a separate PostgreSQL schema for each module.

Rules: - Schema name = module name (e.g., rag, agent, billing) - No foreign keys between schemas - No direct writes to other module's schema - Cross-module access only via contracts/ports

Consequences¶

Positive¶

Clear ownership: Each module owns its schema completely
Independent evolution: Schema changes don't require coordination
Microservice ready: Easy to extract schema to separate database
Testing isolation: Modules can be tested independently
Performance isolation: No cross-module locks or FK cascades

Negative¶

No referential integrity: Cross-module references can become stale
Eventual consistency: Need to handle cross-module data sync
Query complexity: Can't JOIN across modules directly
Data duplication: May need to denormalize some data

Mitigations¶

Stale references: Application-level validation + soft handling
Consistency: Event-driven sync for critical data
Queries: Query services for read-only cross-module views
Duplication: Accept strategic denormalization

Implementation¶

Schema Creation¶

-- Each module creates its own schema
CREATE SCHEMA IF NOT EXISTS rag;
CREATE SCHEMA IF NOT EXISTS agent;
CREATE SCHEMA IF NOT EXISTS billing;

Cross-Module Access Pattern¶

# Module A needs data from Module B
# Use contract/port, never direct DB access

class RagContract(Protocol):
    async def get_knowledge_base(self, kb_id: str) -> KnowledgeBase | None:
        ...

# Agent module uses the contract
class AgentService:
    def __init__(self, rag: RagContract):
        self.rag = rag

    async def create_conversation(self, kb_id: str):
        kb = await self.rag.get_knowledge_base(kb_id)
        if not kb:
            raise NotFoundError("Knowledge base not found")
        # ...