Security - DataQI

1. Introduction

Generative AI introduces a novel and complex threat landscape that traditional cybersecurity perimeters are ill-equipped to handle. For organizations operating within critical infrastructure, manufacturing, healthcare, and finance, the deployment of AI is not merely a question of productivity enhancement but of existential risk management.

DataQI is an enterprise-grade AI assistant platform that provides the operational efficiencies of Large Language Models (LLMs) while adhering to the strict governance, risk, and compliance (GRC) standards.

This Security Overview will cover the core security of DataQI. It provides an exhaustive analysis of DataQI’s security features, mapping them directly to the inquiries typical of a corporate NSRB. The objective is to equip sales engineers and compliance officers with a granular "Security Story" that covers:

Identity & Access Management (IAM): Proving that only authorized personnel can access the system.
Data Protection: validating encryption standards for data at rest and in transit.
Agentic Safety: Demonstrating proactive defenses against the OWASP Agentic Top 10 risks.
Observability: ensuring integration with corporate SIEM (Security Information and Event Management) ecosystems.
Lifecycle Security: Validating secure software development and patch management processes.

2. Platform Architecture and Deployment Models

2.1 The "Secure by Design" Philosophy

The fundamental differentiator for DataQI in a crowded market of AI wrappers is its architectural commitment to data isolation. Unlike multi-tenant SaaS solutions where customer data may be commingled or used to retrain foundation models, DataQI supports deployment models that prioritize data sovereignty. This architecture directly addresses the primary fear of NSRBs: Intellectual Property (IP) leakage.

DataQI offers Private Cloud, On-Premises, and Hybrid deployment options. This flexibility serves as the first line of defense.

On-Premises Deployment: For highly regulated sectors like defense and manufacturing, DataQI can reside entirely within the customer’s firewall. In this configuration, no data traverses the public internet, neutralizing a vast array of network-based attacks and resolving data residency concerns immediately.
Private Cloud (VPC): Deployments within a Virtual Private Cloud (e.g., Azure, AWS) allow for the scalability of cloud resources while maintaining network isolation through VPC peering and Private Link connections. This ensures that the AI infrastructure operates as an extension of the corporate network rather than an external dependency.

2.2 Knowledge Mode and Data Sovereignty

DataQI’s has a concept of "Knowledge Mode". This feature allows organizations to strictly control the boundaries of the AI's information retrieval.

Mechanism: Administrators can restrict assistants to use only the organization's ingested documents ("Ground Truth") and ignore the global parametric knowledge of the underlying LLM.
Security Implication: This mitigates the risk of Hallucination (fabrication of facts) and ensures that the AI does not inadvertently leak public internet data into corporate workflows or vice versa.

Furthermore, DataQI enforces a strict Zero-Training Policy: customer data is used solely for Retrieval-Augmented Generation (RAG) context injection and is never used to retrain or fine-tune the base foundation models. This distinction is critical for passing legal review.

3. Identity and Access Management (IAM)

Identity is the new perimeter, and DataQI utilizes a robust Identity and Access Management (IAM) framework to enforce this boundary.

3.1 Authentication Protocols and User Lifecycle

DataQI’s authentication mechanisms are designed to integrate with enterprise standards rather than reinvent them.

Single Sign-On (SSO) and Federation: To support enterprise scale, DataQI supports integration with centralized Identity Providers (IdPs) such as Azure Active Directory (Entra ID), Okta, and PingIdentity via SAML 2.0 and OIDC (OpenID Connect) protocols. This ensures that identity lifecycle events are synchronized; when an employee is terminated in the corporate directory, their access to DataQI is instantaneously revoked, closing a common vulnerability gap known as "Zombie Accounts."

3.2 Role-Based Access Control (RBAC) Architecture

DataQI employs a granular Role-Based Access Control (RBAC) model to enforce the Principle of Least Privilege. The following role hierarchy is supported or recommended for configuration:

Role Hierarchy and Definitions
Role Definition	Scope of Access	Intended User Persona	Security Justification
System Administrator	Full read/write access to system configurations, security settings, user management, and billing.	IT Directors, System Owners	"Keys to the Kingdom." Deffered to SSO/OIDC.
Knowledge Manager	Authorized to ingest documents, manage data connectors, and configure vector indices. No access to system security settings.	Department Leads, Data Stewards	Separates the duty of managing data from managing the system.
Assistant Builder/Operator	Can create and customize AI Assistants using existing data sources. Can define system prompts.	Power Users, Business Analysts	Allows operational flexibility without risking data integrity or system stability.
Standard User	Read-only access to interact with assigned Assistants. Cannot upload documents or change configurations.	General Employees	Minimizes the attack surface by preventing the majority of users from altering system state.
Security/Auditor	Read-only access to Audit Logs, Insights Dashboards, and User Lists. No access to proprietary business data (documents) or chat history.	Compliance Officers, InfoSec Analysts	Critical for Review Boards. Allows oversight without violating data privacy (Need-to-Know).

3.2.1 Group Management

To facilitate management at scale, DataQI supports Group-based entitlements. Administrators can map Active Directory groups (e.g., "Finance_Team") directly to DataQI roles. This ensures that permissions are dynamic and role-centric rather than user-centric, reducing administrative overhead and the likelihood of permission creep.

3.3 Session Management and Token Security

Proof of permission is maintained via secure session tokens (typically JWTs - JSON Web Tokens).

Token Security: Tokens are signed using strong cryptographic algorithms (RS256) to prevent tampering.
Timeouts: Configurable session timeouts ensure that unattended workstations do not become security liabilities. Re-authentication is required after periods of inactivity.
Contextual Access: Every API call validates the token’s claims against the requested resource, ensuring that a valid user cannot access data they are not authorized to see simply by manipulating the URL or API endpoint (mitigating IDOR - Insecure Direct Object Reference vulnerabilities).

4. Data Protection and Cryptography

4.1 Encryption Standards: Data in Transit

Data in transit refers to information moving between the client (browser) and the server, or between internal microservices.

TLS Enforcement: DataQI mandates the use of Transport Layer Security (TLS) 1.2 or 1.3 for all ingress traffic. Support for obsolete and vulnerable protocols (SSL v3, TLS 1.0, TLS 1.1) is disabled by default to prevent downgrade attacks like POODLE or BEAST.
Cipher Strength: The platform utilizes high-entropy cipher suites (e.g., ECDHE-RSA-AES256-GCM-SHA384) that support Forward Secrecy. This ensures that even if the server’s private key were compromised in the future, past sessions could not be decrypted.
Internal mTLS: In distributed architectures, internal communication between components (e.g., the application server and the vector database) is secured via Mutual TLS (mTLS), ensuring that lateral movement within the application network is authenticated and encrypted.

4.2 Encryption Standards: Data at Rest

Data "at rest" resides in databases, file systems, and backups.

Advanced Encryption Standard (AES): All persistent storage, including SQL databases holding user metadata and the Vector Stores holding the RAG knowledge base, is encrypted using AES-256. This is the industry gold standard and is FIPS 197 compliant.
Vector Embedding Protection: A unique aspect of GenAI security is the protection of vector embeddings. Embeddings can sometimes be inverted to reconstruct the original text. DataQI protects these vectors with the same rigour as raw text, ensuring that the "mathematical representation" of corporate secrets is not exposed.
PKI for Sensitive Imports: As highlighted in technical documents, DataQI utilizes PKI (Public Key Infrastructure) encryption for specific high-value import files (e.g., UB, 837, XML financial/healthcare records). This provides an additional layer of application-level encryption during the data ingestion phase, ensuring that files are protected even before they are written to the database.

4.3 Data Sovereignty

Data Residency: DataQI’s architecture supports data residency requirements by allowing storage volumes to be pinned to specific geographic regions (e.g., US-East, EU-Frankfurt), ensuring compliance with GDPR and CCPA regulations regarding cross-border data transfers.

5. Security Monitoring, Logging, and Observability

Transparency is the antidote to risk.

5.1 Comprehensive Audit Architecture

DataQI creates an immutable "Flight Recorder" of system activity. The logging architecture is designed to capture the "Who, What, Where, When, and Outcome" of every material event.

Identity Events: Successful logins, failed login attempts (crucial for detecting brute force), MFA challenges, and password resets.
Administrative Actions: User creation/deletion, role modification, system setting changes.
Data Access & Usage: Detailed logs of which user accessed which Assistant, and importantly, which documents were retrieved to answer a query. This granularity is essential for IP audits.
Log Immutability: Logs are written to a write-only appended storage medium to prevent tampering by attackers attempting to cover their tracks.

5.2 SIEM Integration and Centralized Monitoring

Security teams cannot monitor disparate dashboards for every tool. DataQI supports integration with the corporate ecosystem via Security Information and Event Management (SIEM) tools.

Log Formats: Logs are generated in structured JSON format. This is the industry-standard for interoperability, allowing easy ingestion by tools like Splunk, Datadog, QRadar, and Sumo Logic.
Transport Mechanisms: DataQI supports log forwarding via Syslog (UDP/TCP), HTTP Event Collector (HEC), or delivery to secure S3 buckets.

Alerting: The system can be configured to trigger real-time alerts for high-severity anomalies, such as:

A spike in file access or mass export attempts.
Detection of "Prompt Injection" attacks via the guardrails.
Access from unusual IP addresses (Geo-velocity checks).

5.3 Performance and Health Telemetry

DataQI exposes health metrics to allow IT operations to distinguish between a system failure and a Denial of Service (DoS) attack.

Prometheus Endpoints: The application exposes a /metrics endpoint compatible with Prometheus and Grafana.

Key Metrics:

CPU/Memory Usage: To detect resource exhaustion or crypto-mining malware.
Latency: Tracking LLM inference times.
Error Rates: Spikes in 4xx/5xx errors which may indicate fuzzing attacks.
Token Consumption: Monitoring for unexpected spikes in API usage which could indicate account compromise or a "Denial of Wallet" attack.

6. Threat Management: Addressing the OWASP Top 10 for Agentic Applications

The NSRB is increasingly aware that GenAI introduces new attack vectors. DataQI creates a powerful competitive advantage by proactively addressing the OWASP Top 10 for Agentic Applications (2026). This section demonstrates that DataQI is future-proofed against the next generation of threats.

6.1 Defense Against Goal Hijacking (ASI01)

The Threat: An attacker uses malicious prompts ("Ignore previous instructions...") to redirect the agent's objective, forcing it to exfiltrate data or perform unauthorized actions.
DataQI Mitigation: System Prompt Hardening. DataQI utilizes immutable "System Instructions" that are architecturally separated from user input. The LLM is instructed to prioritize these system commands over user prompts. Furthermore, input sanitization layers inspect incoming prompts for known "jailbreak" patterns before they are processed by the model.

6.2 Preventing Tool Misuse (ASI02) & Privilege Abuse (ASI03)

The Threat: An agent is tricked into using a tool (e.g., an email connector or database query) to damage the system, or an agent inherits the user's "God Mode" permissions to access restricted data.
DataQI Mitigation:
- Least Privilege Service Accounts: When an Agent executes a tool, it does so using a scoped Service Account with strictly defined permissions, not the user’s full credentials.
- Human-in-the-Loop (HITL): For high-impact actions (e.g., deleting a file, initiating a transaction), DataQI enforces an approval workflow. The Agent proposes the action, but a human must explicitly click "Approve" to execute it. This breaks the kill chain for autonomous attacks.

6.3 Combating Context Poisoning and Hallucination (ASI06) & (ASI09)

The Threat: Attackers poison the RAG knowledge base with malicious documents to manipulate the AI's output (Context Poisoning), or the AI confidently states falsehoods that users blindly trust (Trust Exploitation).
DataQI Mitigation: Hallucination Guardrail. This advanced feature provides real-time response validation.
- Mechanism: It compares the AI's generated response against the retrieved source text. If the AI hallucinates a fact not present in the source, the response is flagged or blocked.
- Visual Indicators: Users are presented with visual warnings (banners, highlights) when a response has low confidence or potential inaccuracies, preventing over-reliance.
- Source Citations: Every claim is backed by a clickable citation to the source document, ensuring complete traceability and verification.

6.4 Supply Chain Security (ASI04)

The Threat: Compromise of third-party libraries or the foundation model itself.
DataQI Mitigation:
- SCA (Software Composition Analysis): Automated scanning of all open-source dependencies during the build process to identify and patch known vulnerabilities (CVEs) in libraries (e.g., Log4j).
- Model Provenance: DataQI utilizes vetted, enterprise-grade foundation models rather than unverified open-source checkpoints, reducing the risk of "Model Backdoors."

7. Application Security Lifecycle and Hardening

Security is not a feature; it is a continuous process.

7.1 Vulnerability Management Program

DataQI integrates security into the CI/CD (Continuous Integration/Continuous Deployment) pipeline.

SAST (Static Application Security Testing): Source code is scanned for insecure coding patterns (e.g., SQL injection, hardcoded secrets) before compilation.
DAST (Dynamic Application Security Testing): The running application is subjected to simulated attacks to identify runtime vulnerabilities.
Container Security: Docker images are scanned for vulnerabilities in the OS layer before deployment.

7.2 Patch Management SLA

DataQI adheres to a strict Service Level Agreement (SLA) for security patches:

Critical Severity (CVSS 9.0+): Patch released within 24-48 hours.
High Severity: Patch released within 7 days.
Medium/Low: addressed in the next scheduled maintenance release.

Notification: Administrators are notified via secure channels regarding security advisories, ensuring they can schedule maintenance windows promptly.

7.3 Infrastructure Hardening

CIS Benchmarks: The underlying infrastructure (whether cloud instances or on-prem servers) is hardened according to Center for Internet Security (CIS) Benchmarks. This involves disabling unused ports, removing unnecessary services, and enforcing strict kernel parameters to minimize the attack surface.

Penetration Testing: DataQI engages third-party security firms to conduct regular (annual) penetration testing.

DataQI Security Overview