The distinction between OpenTelemetry “Native” and “Supported” implementations represents a fundamental architectural choice with significant implications for vendor lock-in, data fidelity, operational flexibility, and long-term total cost of ownership.
Choosing a proprietary instrumentation approach can result in 18-24 months of engineering effort to migrate between vendors, while OpenTelemetry-native solutions enable platform switches in days to weeks. For C-suite leaders evaluating multi-year observability investments, this difference represents millions in avoided switching costs and preserved strategic flexibility.
The 5-year TCO difference between OTEL-native ($340K) and proprietary SDK ($1.1M) approaches — driven almost entirely by migration costs when switching platforms.
Defining the Spectrum
OpenTelemetry Native
Platform built from the ground up on OpenTelemetry standards. Internal data model directly mirrors OTLP (OpenTelemetry Protocol). GenAI Semantic Conventions stored without translation. Query interfaces understand OTLP attribute semantics natively.
OpenTelemetry Supported
Platform accepts OTLP data via ingestion endpoint. Internal data model differs from OTLP (translation layer required). GenAI attributes mapped to proprietary schema. Query interfaces may expose OTLP concepts but with limitations.
Proprietary SDK
Platform requires vendor-specific instrumentation library. No OTLP ingestion capability. Complete vendor lock-in for instrumented code. Switching vendors requires re-instrumenting entire codebase.
Architecture Patterns
OpenTelemetry Native Architecture
Application
OpenTelemetry SDK
GenAI Semantic Conventions
OTLP Collector
Standard processing
No translation needed
Native Storage
OTLP schema preserved
Semantic conventions preserved
Query/UI Layer
Native attribute semantics
Context-aware filtering
Key Advantages
- Zero Translation Loss: All GenAI semantic convention attributes preserved exactly as specified
- Attribute Semantics: UI understands attribute types and provides contextual actions (e.g., “gen_ai.usage.input_tokens” renders as numeric cost calculation, not just text)
- Resource-Centric Navigation: Natural filtering by service.name, deployment.environment using OpenTelemetry resource concepts
- Future-Proof: Automatic support for new semantic convention versions without vendor updates
OpenTelemetry Supported Architecture
Application
OpenTelemetry SDK
GenAI Semantic Conventions
OTLP Endpoint
Receives OTLP
Translation begins
Internal Format
Vendor-specific schema
Attribute mapping
Proprietary Storage
Internal data model
Semantic loss possible
Translation Layer Implications
- Attribute Mapping: GenAI semantic conventions mapped to vendor’s internal attribute names (potential inconsistencies)
- Type Coercion: Structured attributes may be flattened to strings (e.g., gen_ai.request.temperature from float to text)
- Semantic Loss: Vendor UI shows key-value pairs without understanding attribute meaning
- Version Lag: Support for new semantic convention versions depends on vendor update cycles
Proprietary SDK Architecture
Application
Vendor SDK
Proprietary decorators/tracers
Vendor Endpoint
Custom protocol
Vendor-specific format
Backend
Optimized for vendor format
No OTLP compatibility
Vendor Lock-in Realities
- Re-instrumentation Required: Switching vendors requires replacing all instrumentation decorators/SDK calls
- Framework Coverage: Limited to languages/frameworks the vendor supports
- Migration Cost: 18-24 months for large codebases with thousands of instrumentation points
- Strategic Risk: Vendor pricing changes or feature deprecation creates operational crisis
Technical Feature Comparison
GenAI Semantic Conventions
The OpenTelemetry GenAI Semantic Conventions (v1.38.0+) define standardized attributes for LLM and agent operations. Native platforms leverage these directly; supported platforms must translate them.
Model Operations
Identify what LLM operation occurred and which provider/model was used
- gen_ai.operation.name
- gen_ai.request.model
- gen_ai.response.model
- gen_ai.system
Token Metrics
Cost tracking, usage analytics, and budget monitoring
- gen_ai.usage.input_tokens
- gen_ai.usage.output_tokens
- gen_ai.usage.total_tokens
- gen_ai.token.type
Agent Operations
Multi-step agent tracing and tool-calling visibility
- gen_ai.agent.id
- gen_ai.agent.name
- gen_ai.tool.call.id
- gen_ai.tool.call.name
Response Metadata
Debugging, reproducibility, and quality analysis
- gen_ai.response.id
- gen_ai.response.finish_reasons
- gen_ai.request.temperature
- gen_ai.request.max_tokens
Platform Implementation Analysis
OpenTelemetry Native Platforms
Arize AI / Phoenix
NativeArchitecture: Built on OpenInference semantic conventions (OpenTelemetry extension). Entire stack designed around OTLP from day one.
Data Flow: Application → OTLP Collector → Phoenix (no translation) → ClickHouse with OTLP schema
- OpenInference span attributes map directly to UI elements
- Resource explorer shows Kubernetes/service hierarchies using OTEL resource attributes
- Embedding analysis for RAG uses OTLP embedding spans
- Phoenix OSS runs locally or self-hosted with identical schema
Migration Path: Change OTLP endpoint URL. No code changes required.
Langfuse
NativeArchitecture: OTEL-native SDK v3 released June 2025. Complies with GenAI semantic conventions with langfuse.* namespace for extensions.
Data Flow: Application → OTLP SDK → /api/public/otel endpoint → Direct storage with semantic preservation
- Langfuse.* namespace attributes for Langfuse-specific features
- OTLP context propagation enables automatic integration with HTTP frameworks, databases
- Property mapping maintains GenAI convention compliance
- MIT-licensed, full self-hosting support
Language Support: Immediate support for all OpenTelemetry SDK languages via OpenLLMetry/OpenLIT
Datadog LLM Observability
NativeArchitecture: Native GenAI Semantic Conventions support (v1.37+) with Datadog Agent acting as OTLP collector.
Data Flow: Application → OTLP → Datadog Agent → Datadog backend (preserves OTLP structure)
- Full-stack integration: correlates LLM traces with APM, RUM, logs
- Out-of-box evaluators use GenAI convention attributes directly
- Cost tracking leverages gen_ai.usage.* attributes without translation
- Agent-based collection enables unified observability strategy
Enterprise Value: Organizations already on Datadog gain LLM observability without new vendor relationship
OpenTelemetry Supported Platforms
LangSmith
SupportedArchitecture: Primary SDK is LangChain-native; OTLP endpoint added for interoperability. Translation layer converts OTLP to LangSmith’s internal trace format.
- OTLP exporter available for non-LangChain applications
- Best experience with native LangChain/LangGraph integration
- Some semantic conventions may lose fidelity in translation
Best Use Case: Teams heavily invested in LangChain ecosystem who occasionally need OTLP interop
W&B Weave
SupportedArchitecture: Accepts OTLP via standard endpoints but stores in Weave’s internal format optimized for ML experiment tracking lineage.
- One-line MCP agent auto-logging uses OTLP under the hood
- Translation maintains most GenAI conventions
- Best integration with broader W&B MLOps ecosystem
Best Use Case: Organizations using W&B for ML tracking seeking to add LLM observability
Proprietary SDK Platforms
Galileo AI
ProprietaryArchitecture: Purpose-built evaluation platform with proprietary SDK and data format.
Strategic Rationale:
- Luna-2 SLMs optimize for proprietary trace format (not OTLP)
- Agent-specific optimizations difficult to standardize
- Insights Engine requires vendor-specific schema
Lock-in Mitigation: Enterprise export APIs for extracting evaluation data
Value Proposition: 97% cost reduction vs GPT-4-as-judge justifies proprietary approach for cost-sensitive deployments
Braintrust
ProprietaryArchitecture: SDK-based with Brainstore database optimized for AI application logs.
- Loop AI agent requires tight coupling to evaluation loop
- Dataset version control and diffing built into proprietary format
- Fast evaluation execution via custom parallel processing
Migration Consideration: CI/CD GitHub Actions tightly integrated; switching requires rebuilding evaluation pipelines
Decision Framework
Choose OpenTelemetry Native When:
1. Multi-Vendor Strategy Required
Enterprise architecture committee mandates ability to switch observability vendors with <90 days notice. Regulatory or compliance requirements prevent vendor lock-in.
2. Polyglot Architecture
Application stack spans Python, Java, Go, Rust, Node.js. No single vendor provides native SDKs for all languages. Need consistent observability across diverse technology stack.
3. Existing OpenTelemetry Investment
Already instrumented infrastructure and backend services with OpenTelemetry. Want to extend to LLM/agent applications with same standard.
4. Cost Optimization Priority
Need collector-level control over sampling, filtering, aggregation. Want to optimize ingestion costs independently of vendor pricing.
5. Future-Proofing Critical
Building multi-year strategic platform. Need confidence that instrumentation will work with future vendors/tools. Want automatic support for emerging semantic conventions.
Choose OpenTelemetry Supported When:
1. Existing Vendor Relationship
Already using vendor for traditional observability. Want to add LLM observability without introducing new vendor. Unified billing and support valuable.
2. Framework-Specific Optimization
Heavily invested in specific framework (LangChain → LangSmith). Native integration provides richer features than generic OTLP.
3. Gradual Migration Path
Starting with proprietary SDK for speed, planning OTLP migration later. Need bridge between legacy instrumentation and modern standards.
Accept Proprietary SDK When:
1. Unique Differentiation Required
Vendor provides capabilities impossible with standard OTLP (e.g., Galileo Luna-2 cost savings). Feature set justifies lock-in risk.
2. Single-Language, Single-Framework Shop
Monolithic Python application using single framework. No plans for polyglot expansion. Vendor SDK coverage matches technology stack completely.
3. Proof of Concept / MVP
Short-term project where speed to value exceeds long-term concerns. Explicit plan to migrate before production scale.
Total Cost of Ownership Analysis
The $760K difference ($1.1M vs $340K) is driven almost entirely by migration cost. OTEL-native platforms require only collector endpoint configuration changes, while proprietary SDKs require rewriting all instrumentation code.
TCO Methodology & Assumptions
Scope: This analysis models a mid-to-large enterprise with significant instrumentation footprint (1,000+ instrumentation points) over a 5-year planning horizon.
Cost Basis
- Engineering labor: $75-100/hour fully-loaded cost
- Implementation team: 2-3 senior engineers for initial instrumentation
- Operations allocation: 0.25-0.5 FTE for ongoing monitoring, upgrades, configuration
OTEL-Native Calculations
- Implementation ($50K-$100K): 3 weeks avg × 2.5 engineers × 40 hrs × $85/hr = ~$85K mid-point
- Annual Operations ($20K-$40K): 0.25 FTE × $120K/yr fully-loaded = $30K mid-point
- Migration ($10K-$20K): Configuration change + validation testing; no code rewrites
- 5-Year Total: $340K (assumes 2 platform switches)
Proprietary SDK Calculations
- Implementation ($30K-$50K): Faster initial setup with vendor “quick start” SDKs
- Annual Operations ($30K-$60K): Higher due to vendor-specific SDK version management
- Migration ($500K-$1M): Rewriting instrumentation across thousands of call sites over 18-24 months
- 5-Year Total: $1.1M (assumes 1 migration event)
What’s NOT Included
- Platform subscription/licensing fees (varies widely by vendor)
- Infrastructure costs (collectors, storage, compute)
- Productivity gains/losses from platform-specific features
- Training costs beyond initial implementation
Strategic Recommendations
Primary Recommendation: Default to OpenTelemetry Native
Unless you have compelling reasons to choose otherwise, OpenTelemetry Native platforms should be your default. The benefits of vendor-agnostic instrumentation, reduced switching costs, and future-proofing outweigh the slightly higher initial complexity.
Strategic Flexibility
- Preserve M&A optionality (switches in weeks not years)
- Maintain strong vendor negotiation position
- Enable multi-vendor strategy for different use cases
Operational Excellence
- Unified observability across all services
- Consistent instrumentation approach
- Reduced training and onboarding complexity
Innovation Velocity
- Support new languages immediately
- Experiment with emerging frameworks
- Adopt new semantic conventions automatically
Cost Management
- Collector-level cost optimization
- Avoid vendor-specific pricing tiers
- Minimize switching costs
Acceptable Exceptions for Proprietary SDKs
- Demonstrable ROI: Galileo’s 97% cost reduction vs GPT-4-as-judge provides clear financial justification
- Narrow Use Case: Evaluation-only scenarios where production monitoring not required
- Framework Optimization: LangSmith for LangChain if native integration provides critical features unavailable via OTLP
- Temporary Solution: Proof of concept with explicit plan to migrate to OTLP before production scale
The $760K difference represents the “vendor lock-in premium” that compounds over multi-year enterprise relationships. OTEL-native platforms require only collector endpoint configuration changes ($10K-$20K), while proprietary SDKs require rewriting all instrumentation code ($500K-$1M).
