Under the hood
How AuditGuardX Works
No black box. Here is exactly how AuditGuardX processes your documents, analyzes compliance, and generates audit-ready output from upload to report in under 2 minutes.
Document Intelligence Pipeline
Every document passes through 5 stages. Each stage is designed for accuracy, speed, and traceability.
Document Upload & Extraction
Documents are uploaded via the web interface or API and stored encrypted in Google Cloud Storage.
- Supported formats: PDF, DOCX, XLSX, images (OCR via Tesseract)
- Max file size: 50MB per document
- Text extraction with layout preservation
- Automatic content type detection and metadata extraction
Semantic Chunking & Embedding
Extracted text is broken into semantically meaningful sections and converted to vector embeddings.
- Context-preserving semantic chunking (not naive splitting)
- 384-dimensional embeddings via all-MiniLM-L6-v2 model
- Vectors stored in PostgreSQL with pgvector extension
- Enables hybrid search: semantic similarity + keyword matching
AI Compliance Analysis
Multi-provider AI maps document content to regulatory controls, scores compliance, and identifies gaps.
- Multi-provider AI routing: Vertex AI, Groq, Cerebras with automatic fallbacks
- 3,485+ controls across 39 frameworks evaluated per document
- Confidence scoring with evidence citations for each finding
- Severity classification: critical, high, medium, low, informational
- AI-generated remediation suggestions with corrected clause text
Conversational Voice AI Interface
Natural language voice interaction and Whisper speech-to-text.
- Multi-provider TTS: Groq Orpheus → Gemini TTS → local Piper (sherpa-onnx) fallback chain
- Whisper speech-to-text (whisper-large-v3-turbo) for voice input transcription
- Streaming TTS with sentence prefetching for near-zero playback gaps
- Natural language access to compliance, document, and knowledge base tools via voice
- 2 input modes: push-to-talk (spacebar) and voice-activation (hands-free)
- Silero VAD v5 (Voice Activity Detection) with barge-in support for hands-free operation
Report Generation & Output
AI generates structured compliance reports with executive summaries, gap analyses, and remediation roadmaps.
- Executive summary with AI-generated narrative
- Control-by-control assessment with pass/fail/partial status
- Gap analysis per framework with severity-weighted prioritization
- Remediation suggestions with corrected policy language
- PDF export in audit-ready formatting
- Compliance score trends and historical tracking
Real Processing Timeline
Actual timestamps from a 42-page policy document analysis.
Upload policy document (PDF format)
Document parsed, text extracted, chunked into sections
Control mapping complete and evaluated
Compliance gaps identified with severity classification
Remediation suggestions generated with corrected clause text
Audit-ready report generated (PDF export)
Infrastructure
Enterprise-grade stack on Google Cloud Platform. Managed via Terraform for reproducibility and auditability.
Compute
Google Cloud Run
Fully managed, auto-scaling containers. Deployed via Docker + Terraform.
Database
Cloud SQL (PostgreSQL 16)
With pgvector extension for embedding storage and hybrid search.
Cache & Queues
Memorystore (Redis)
Session management, BullMQ job queues for async document processing.
Storage
Google Cloud Storage
Encrypted document and user generated content persistence with AES-256 at rest.
Authentication
Session-based + WorkOS SSO
Bcrypt password hashing, SAML 2.0, OIDC, SCIM directory sync.
Payment
Stripe
PCI DSS Level 1 compliant payment processing and subscription management.
See it in action
Try the full pipeline on a sample policy document, no signup required. Or start a 14-day Professional trial and upload your own policies.