TRUST & SECURITY IN GENERATIVE AI

Our Testers and AI Engineers work collaboratively with our Client's team including ethicists and domain specialists to build and maintain responsible and trustworthy Generative AI systems.

OWASP TOP 10 FOR LLM RISKS

The OWASP Top 10 for Large Language Model (LLM) Risks provides a framework for identifying and mitigating security risks associated with the use of LLMs. Below are the best practices aligned with these risks:

LLM01. PROMPT INJECTION
Risk: Malicious input designed to manipulate the LLM into executing unintended actions.
Prevention & Mitigation Strategies:
  • Use strict input validation and sanitization for prompts.
  • Limit the model's ability to interpret and execute injected code by sandboxing critical operations.
  • Design prompts with guardrails, such as predefined templates and structured contexts.

    LLM02. SENSITIVE INFORMATION DISCLOSURE
    Risk: Sensitive or proprietary data unintentionally exposed via model outputs (data leakage).
    Prevention & Mitigation Strategies:

  • Avoid using sensitive or proprietary data in untrusted environments.
  • Implement access controls and monitor outputs for potential leaks.
  • Use encryption for data both in transit and at rest.
  • Test models for unintentional reproduction of training data.

    LLM03. SUPPLY CHAIN
    Risk: Unsafe outputs such as injection attacks (e.g., SQL injection, XSS).
    Prevention & Mitigation Strategies:

  • Treat model outputs as untrusted and sanitize them before use in downstream systems.
  • Use output validation to ensure compliance with expected formats or constraints.
  • Log and monitor outputs for anomalies.

    LLM04. DATA & MODEL POISONING
    Risk: Adversaries insert malicious data into training datasets.
    Prevention & Mitigation Strategies:

  • Vet and curate training datasets rigorously to detect and remove malicious data.
  • Use differential privacy techniques to minimize data sensitivity.
  • Continuously audit datasets for potential manipulations.

    LLM05. IMPROPER OUTPUT HANDLING
    Risk: Unauthorized access or theft of proprietary models.
    Prevention & Mitigation Strategies:

  • Use strong authentication and access controls to protect models.
  • Encrypt models at rest and during transfer.
  • Employ model watermarking or fingerprinting to detect unauthorized use.

    LLM06. EXCESSIVE AGENCY
    Risk: Abuse of LLM capabilities by unauthorized users.
    Prevention & Mitigation Strategies:

  • Implement role-based access controls and API rate limiting.
  • Monitor usage patterns for anomalies that may indicate abuse.
  • Enforce usage policies through contractual agreements and technical controls.

    LLM07. SYSTEM PROMPT WEAKNESS
    Risk: Adversarially crafted inputs that exploit model weaknesses.
    Prevention & Mitigation Strategies:

  • Regularly test models against adversarial examples to improve resilience.
  • Deploy runtime defenses such as anomaly detection for inputs.
  • Limit the scope of LLM-generated actions to prevent cascading effects.

    LLM08. VECTOR & EMBEDDING WEAKNESS
    Risk: Blind trust in model outputs leading to inaccurate or harmful actions.
    Prevention & Mitigation Strategies:

  • Require human review for high-stakes decisions.
  • Clearly label LLM outputs as AI-generated to reduce over-trust.
  • Validate and cross-check critical outputs against trusted sources.

    LLM09. MODEL MISUSE
    Risk: Using LLMs for malicious or unintended purposes (e.g., generating disinformation).
    Prevention & Mitigation Strategies::

  • Establish acceptable use policies (AUPs) and enforce them via technical and legal means.
  • Monitor usage patterns to identify and block misuse.
  • Collaborate with stakeholders to mitigate broader societal risks.

    LLM10. UNBOUNDED CONSUMPTION
    Risk: A Large Language Model (LLM) application allows users to conduct excessive and uncontrolled inferences, leading to risks such as denial of service (DoS), economic losses, model theft, and service degradation.
    Prevention & Mitigation Strategies:

  • Input validation
  • Rate limiting
  • Timeouts & throtlling
  • Sandbox techniques
  • Comprehensive logging, monitoring and anomaly detection
  • Watermarking

    The OWASP Top 10 for Large Language Model Applications Project aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing Large Language Models (LLMs) and Generative AI applications. The project provides a range of resources. Most notably the OWASP Top 10 list for LLM applications listing the top 10 most critical vulnerabilities often seen in LLM applications, highlighting their potential impact, ease of exploitation, and prevalence in real-world applications. Visit OWASP.org

  • RED TEAMING & EVALUATIONS
    Red-teaming in the context of generative AI refers to the practice of rigorously testing and probing AI models to identify vulnerabilities, biases, risks, and potential misuse cases. The goal is to improve the safety, robustness, and ethical performance of these systems by simulating adversarial or challenging scenarios.

    Key Components of Red-Teaming in Generative AI:
    1. Adversarial Testing:
  • Deliberately crafting prompts or inputs designed to exploit weaknesses in the model, such as generating harmful, biased, or incorrect outputs.
  • Example: Testing whether a generative AI model produces hate speech or misinformation when given specific prompts.
    2. Bias and Fairness Analysis:
  • Evaluating whether the model produces outputs that are biased against particular groups based on race, gender, religion, etc.
  • Example: Checking if the model stereotypes certain professions or demographics in its responses.
    3. Safety and Harm Prevention:
  • Ensuring the AI does not produce harmful outputs, such as promoting violence, self-harm, or other dangerous behaviors.
  • Example: Preventing the generation of instructions for illegal or unethical activities.
    4. Misuse Scenarios:
  • Exploring how the AI might be misused in real-world scenarios, such as creating deepfakes, phishing scams, or propaganda.
  • Example: Testing if the model can generate realistic-sounding but fake news articles.
    5. Robustness Testing:
  • Assessing how the AI behaves under unusual or ambiguous conditions, such as contradictory instructions or nonsensical inputs.
  • Example: Checking if the model can handle inputs with conflicting information without producing incoherent responses.
    6.Iterative Feedback and Improvement:
  • Using the findings from red-teaming to iteratively refine the model, adjust its training data, or implement safety mitigations.

    Why Is Red-Teaming Important?

  • Safety: To prevent the AI from causing harm or being used maliciously.
  • Ethics: To ensure the AI aligns with societal values and norms.
  • Trust: To build user confidence in the AI's reliability and integrity.
  • Regulatory Compliance: To adhere to laws and guidelines around AI usage.

    Red Teaming is often performed by dedicated teams of experts, which may include ethicists, domain specialists, adversarial testers, and AI engineers. The insights gained are critical for building responsible and trustworthy AI systems.

  • Our Amazon Solution Architects and Amazon Bedrock Specialists implement, manage, and optimize generative AI solutions using the Amazon Bedrock.

    GUARDRAILS ON AMAZON BEDROCK

    Amazon Bedrock Guardrails provides configurable safeguards to help safely build generative AI applications at scale. With a consistent and standard approach used across all supported foundation models (FMs), Guardrails delivers industry-leading safety protections:
  • Uses Automated Reasoning to help prevent factual errors from hallucinations – being the first and only generative AI safeguard to do so
  • Blocks up to 85% more undesirable and harmful content
  • Filters over 75% hallucinated responses from models for Retrieval Augmented Generation (RAG) and summarization use cases

    With Amazon Bedrock, you can:

  • Build responsible AI applications with Guardrails
  • Bring a consistent level of safety across gen AI applications
  • Detect hallucinations in model responses using contextual grounding checks
  • Automated Reasoning checks help prevent factual errors from hallucinations and offer verifiable accuracy
  • Block undesirable topics in gen AI applications
  • Filter harmful multimodal content based on your responsible AI policies
  • Redact sensitive information such as PII to protect privacy

    Visit AWS