Under the hood

How AuditGuardX Works

No black box. Here is exactly how AuditGuardX processes your documents, analyzes compliance, and generates audit-ready output from upload to report in under 2 minutes.

Document Intelligence Pipeline

Every document passes through 5 stages. Each stage is designed for accuracy, speed, and traceability.

Document Upload & Extraction

Documents are uploaded via the web interface or API and stored encrypted in Google Cloud Storage.

  • Supported formats: PDF, DOCX, XLSX, images (OCR via Tesseract)
  • Max file size: 50MB per document
  • Text extraction with layout preservation
  • Automatic content type detection and metadata extraction

Semantic Chunking & Embedding

Extracted text is broken into semantically meaningful sections and converted to vector embeddings.

  • Context-preserving semantic chunking (not naive splitting)
  • 384-dimensional embeddings via all-MiniLM-L6-v2 model
  • Vectors stored in PostgreSQL with pgvector extension
  • Enables hybrid search: semantic similarity + keyword matching

AI Compliance Analysis

Multi-provider AI maps document content to regulatory controls, scores compliance, and identifies gaps.

  • Multi-provider AI routing: Vertex AI, Groq, Cerebras with automatic fallbacks
  • 3,485+ controls across 39 frameworks evaluated per document
  • Confidence scoring with evidence citations for each finding
  • Severity classification: critical, high, medium, low, informational
  • AI-generated remediation suggestions with corrected clause text

Conversational Voice AI Interface

Natural language voice interaction and Whisper speech-to-text.

  • Multi-provider TTS: Groq Orpheus → Gemini TTS → local Piper (sherpa-onnx) fallback chain
  • Whisper speech-to-text (whisper-large-v3-turbo) for voice input transcription
  • Streaming TTS with sentence prefetching for near-zero playback gaps
  • Natural language access to compliance, document, and knowledge base tools via voice
  • 2 input modes: push-to-talk (spacebar) and voice-activation (hands-free)
  • Silero VAD v5 (Voice Activity Detection) with barge-in support for hands-free operation

Report Generation & Output

AI generates structured compliance reports with executive summaries, gap analyses, and remediation roadmaps.

  • Executive summary with AI-generated narrative
  • Control-by-control assessment with pass/fail/partial status
  • Gap analysis per framework with severity-weighted prioritization
  • Remediation suggestions with corrected policy language
  • PDF export in audit-ready formatting
  • Compliance score trends and historical tracking

Real Processing Timeline

Actual timestamps from a 42-page policy document analysis.

00:00

Upload policy document (PDF format)

00:10

Document parsed, text extracted, chunked into sections

00:16

Control mapping complete and evaluated

00:48

Compliance gaps identified with severity classification

01:10

Remediation suggestions generated with corrected clause text

01:30

Audit-ready report generated (PDF export)

Infrastructure

Enterprise-grade stack on Google Cloud Platform. Managed via Terraform for reproducibility and auditability.

Compute

Google Cloud Run

Fully managed, auto-scaling containers. Deployed via Docker + Terraform.

Database

Cloud SQL (PostgreSQL 16)

With pgvector extension for embedding storage and hybrid search.

Cache & Queues

Memorystore (Redis)

Session management, BullMQ job queues for async document processing.

Storage

Google Cloud Storage

Encrypted document and user generated content persistence with AES-256 at rest.

Authentication

Session-based + WorkOS SSO

Bcrypt password hashing, SAML 2.0, OIDC, SCIM directory sync.

Payment

Stripe

PCI DSS Level 1 compliant payment processing and subscription management.

See it in action

Try the full pipeline on a sample policy document, no signup required. Or start a 14-day Professional trial and upload your own policies.