Close Menu
  • Home
  • AI
  • Big Data
  • Cloud Computing
  • iOS Development
  • IoT
  • IT/ Cybersecurity
  • Tech
    • Nanotechnology
    • Green Technology
    • Apple
    • Software Development
    • Software Engineering

Subscribe to Updates

Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

    What's Hot

    Firmware-Managed Verified Boot with Hardware, Building Trust From Power-On

    July 16, 2025

    Scientists discover the moment AI truly understands language

    July 16, 2025

    Zero Trust Explained: Why Traditional

    July 16, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Big Tee Tech Hub
    • Home
    • AI
    • Big Data
    • Cloud Computing
    • iOS Development
    • IoT
    • IT/ Cybersecurity
    • Tech
      • Nanotechnology
      • Green Technology
      • Apple
      • Software Development
      • Software Engineering
    Big Tee Tech Hub
    Home»Big Data»Technical Approaches and Practical Tradeoffs
    Big Data

    Technical Approaches and Practical Tradeoffs

    big tee tech hubBy big tee tech hubJuly 16, 20250010 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    Technical Approaches and Practical Tradeoffs
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Technical Approaches and Practical Tradeoffs

    (Chuysang/Shutterstock)

    In the world of monitoring software, how you process telemetry data can significantly impact your ability to derive insights, troubleshoot issues, and manage costs.

    There are 2 primary use cases for how telemetry data is leveraged:

    • Radar (Monitoring of systems) usually falls into the bucket of known knowns and known unknowns. This leads to scenarios where some data is almost ‘pre-determined’ to behave, be plotted in a certain way – because we know what we are looking for.
    • Blackbox (Debugging, RCA etc.) ones on the other hand are more to do with unknown unknowns. Which entails to what we don’t know and may need to hunt for to build an understanding of the system.

    Understanding Telemetry Data Challenges

    Before diving into processing approaches, it’s important to understand the unique challenges of telemetry data:

    • Volume: Modern systems generate enormous amounts of telemetry data
    • Velocity: Data arrives in continuous, high-throughput streams
    • Variety: Multiple formats across metrics, logs, traces, profiles and events
    • Time-sensitivity: Value often decreases with age
    • Correlation needs: Data from different sources must be linked together

    These characteristics create special considerations when choosing between ETL and ELT approaches.

     

    ETL for Telemetry: Transform-First Architecture

    Technical Architecture

    In an ETL approach, telemetry data undergoes transformation before reaching its final destination:

    1 ETL for Telemetry

    Fig. 1 — ETL for Telemetry

    A typical implementation stack might include:

    • Collection: OpenTelemetry, Prometheus, Fluent Bit
    • Transport: Kafka or Kinesis or in memory as the buffering layer
    • Transformation: Stream processing
    • Storage: Time-series databases (Prometheus) or specialized indices or Object Storage (s3)

    Key Technical Components

    1. Aggregation Techniques

    Pre-aggregation significantly reduces data volume and query complexity. A typical pre-aggregation flow looks like this:

    2 Aggregation Techniques

    Fig. 2 — Aggregation Techniques

    This transformation condenses raw data into 5-minute summaries, dramatically reducing storage requirements and improving query performance.

    Example: For a gaming application handling millions of requests per day, raw request latency metrics (potentially billions of data points) can be grouped by service and endpoint, then aggregated into 5-minute (or 1-minute) windows. A single API call that generates 100 latency data points per second (8.64 million per day) is reduced to just 288 aggregated entries per day (one per 5-minute window), while still preserving critical p50/p90/p99 percentiles needed for SLA monitoring.

    1. Cardinality Management

    High-cardinality metrics can break time-series databases. The cardinality management process follows this pattern:

    3 Cardinality Management

    Fig. 3 — Cardinality-Management

    Effective strategies include:

    • Label filtering and normalization
    • Strategic aggregation of specific dimensions
    • Hashing techniques for high-cardinality values while preserving query patterns

    Example: A microservice tracking HTTP requests includes user IDs and request paths in its metrics. With 50,000 daily active users and thousands of unique URL paths, this creates millions of unique label combinations. The cardinality management system filters out user IDs entirely (configurable, too high cardinality), normalizes URL paths by replacing dynamic segments with placeholders (e.g., /users/123/profilebecomes /users/{id}/profile), and applies consistent categorization to errors. This reduces unique time series from millions to hundreds, allowing the time-series database to function efficiently.

    4 Real time Enrichment

    Fig. 4 — Real-time Enrichment

    1. Real-time Enrichment

    Adding context to metrics during the transformation phase involves integrating external data sources:

    This process adds critical business and operational context to raw telemetry data, enabling more meaningful analysis and alerting based on service importance, customer impact, and other factors beyond pure technical metrics.

    Example: A payment processing service emits basic metrics like request counts, latencies, and error rates. The enrichment pipeline joins this telemetry with service registry data to add metadata about the service tier (critical), SLO targets (99.99% availability), and team ownership (payments-team). It then incorporates business context to tag transactions with their type (subscription renewal, one-time purchase, refund) and estimated revenue impact. When an incident occurs, alerts are automatically prioritized based on business impact rather than just technical severity, and routed to the appropriate team with rich context.

    Technical Advantages

    • Query performance: Pre-calculated aggregates eliminate computation at query time
    • Predictable resource usage: Both storage and query compute are controlled
    • Schema enforcement: Data conformity is guaranteed before storage
    • Optimized storage formats: Data can be stored in formats optimized for specific access patterns

    Technical Limitations

    • Loss of granularity: Some detail is permanently lost
    • Schema rigidity: Adapting to new requirements requires pipeline changes
    • Processing overhead: Real-time transformation adds complexity and resource demands
    • Transformation-time decisions: Analysis paths must be known in advance

    ELT for Telemetry: Raw Storage with Flexible Transformation

    Technical Architecture

    ELT architecture prioritizes getting raw data into storage, with transformations performed at query time:

    5 ELT for Telemetry

    Fig. 5 — ELT for Telemetry

    A typical implementation might include:

    • Collection: OpenTelemetry, Prometheus, Fluent Bit
    • Transport: Direct ingestion without complex processing
    • Storage: Object storage (S3, GCS) or data lakes in Parquet format
    • Transformation: SQL engines (Presto, Athena), Spark jobs, or specialized OLAP systems

    Key Technical Components

    6 Efficient Raw Storage

    Fig. 6 — Efficient-Raw-Storage

    1. Efficient Raw Storage

    Optimizing for long-term storage of raw telemetry requires careful consideration of file formats and storage organization:

    This approach leverages columnar storage formats like Parquet with appropriate compression (ZSTD for traces, Snappy for metrics), dictionary encoding, and optimized column indexing based on common query patterns (trace_id, service, time ranges).

    Example: A cloud-native application generates 10TB of trace data daily across its distributed services. Instead of discarding or heavily sampling this data, the complete trace information is captured using OpenTelemetry collectors and converted to Parquet format with ZSTD compression. Key fields like trace_id, service name, and timestamp are indexed for efficient querying. This approach reduces the storage footprint by 85% compared to raw JSON while maintaining query performance. When a critical customer-impacting issue occurred, engineers were able to access complete trace data from 3 months prior, identifying a subtle pattern of intermittent failures that would have been lost with traditional sampling.

    1. Partitioning Strategies

    Effective partitioning is crucial for query performance against raw telemetry. A well-designed partitioning strategy follows this hierarchy:

    7 Partitioning Strategies

    Fig. 7 — Partitioning-Strategies

    This partitioning approach enables efficient time-range queries while also allowing filtering by service and tenant, which are common query dimensions. The partitioning strategy is designed to:

    • Optimize for time-based retrieval (most common query pattern)
    • Enable efficient tenant isolation for multi-tenant systems
    • Allow service-specific queries without scanning all data
    • Separate telemetry types for optimized storage formats per type

    Example: A SaaS platform with 200+ enterprise customers uses this partitioning strategy for its observability data lake. When a high-priority customer reports an issue that occurred last Tuesday between 2-4pm, engineers can immediately query just those specific partitions: /year=2023/month=11/day=07/hour=1[4-5]/tenant=enterprise-x/*. This approach reduces the scan size from potentially petabytes to just a few gigabytes, enabling responses in seconds rather than hours. When comparing current performance against historical baselines, the time-based partitioning allows efficient month-over-month comparisons by scanning only the relevant time partitions.

    1. Query-time Transformations

    SQL and analytical engines provide powerful query-time transformations. The query processing flow for on-the-fly analysis looks like this (See Fig. 8).

    This query flow demonstrates how complex analysis like calculating service latency percentiles, error rates, and usage patterns can be performed entirely at query time without needing pre-computation. The analytical engine applies optimizations like predicate pushdown, parallel execution, and columnar processing to achieve reasonable performance even against large raw datasets.

    8 Query time Transformations

    Fig. 8 — Query-time-Transformations

    Example: A DevOps team investigating a performance regression discovered it only affected premium customers using a specific feature. Using query-time transformations against the ELT data lake, they wrote a single query that first filtered to the affected time period, joined customer tier information, extracted relevant attributes about feature usage, calculated percentile response times grouped by customer segment, and identified that premium customers with high transaction volumes were experiencing degraded performance only when a specific optional feature flag was enabled. This analysis would have been impossible with pre-aggregated data since the customer segment + feature flag dimension hadn’t been previously identified as important for monitoring.

    Technical Advantages

    • Schema flexibility: New dimensions can be analyzed without pipeline changes
    • Cost-effective storage: Object storage is significantly cheaper than specialized DBs
    • Retroactive analysis: Historical data can be examined with new perspectives

    Technical Limitations

    • Query performance challenges: Interactive analysis may be slow on large datasets
    • Resource-intensive analysis: Compute costs can be high for complex queries
    • Implementation complexity: Requires more sophisticated query tooling
    • Storage overhead: Raw data consumes significantly more space

    Technical Implementation: The Hybrid Approach

    Core Architecture Components

    Implementation Strategy

    1. Dual-path processing
      10 Dual path processing

      Fig. 10 — -Dual-path-processing

    Example: A global ride-sharing platform implemented a dual-path telemetry system that routes service health metrics and customer experience indicators (ride wait times, ETA accuracy) through the ETL path for real-time dashboards and alerting. Meanwhile, all raw data including detailed user journeys, driver activities, and application logs flows through the ELT path to cost-effective storage. When a regional outage occurred, operations teams used the real-time dashboards to quickly identify and mitigate the immediate issue. Later, data scientists used the preserved raw data to perform a comprehensive root cause analysis, correlating multiple factors that wouldn’t have been visible in pre-aggregated data alone.

    1. Smart data routing
    11 Smart data routing

    Fig. 11 — Smart Data Routing

    Example: A financial services company deployed a smart routing system for their telemetry data. All data is preserved in the data lake, but critical metrics like transaction success rates, fraud detection signals, and authentication service health metrics are immediately routed to the real-time processing pipeline. Additionally, any security-related events such as failed login attempts, permission changes, or unusual access patterns are immediately sent to a dedicated security analysis pipeline. During a recent security incident, this routing enabled the security team to detect and respond to an unusual pattern of authentication attempts within minutes, while the complete context of user journeys and application behavior was preserved in the data lake for subsequent forensic analysis.

    1. Unified query interface

    Real-world Implementation Example

    A specific engineering implementation at last9.io demonstrates how this hybrid approach works in practice:

    For a large-scale Kubernetes platform with hundreds of clusters and thousands of services, we implemented a hybrid telemetry pipeline with:

    • Critical-path metrics processed through a pipeline that:
      12 Unified query interface

      Fig. 12 — Unified query interface

      • Performs dimensional reduction (limiting label combinations)
      • Pre-calculates service-level aggregations
      • Computes derived metrics like success rates and latency percentiles
    • Raw telemetry stored in a cost-effective data lake:
      • Partitioned by time, data type, and tenant
      • Optimized for typical query patterns
      • Compressed with appropriate codecs (Zstd for traces, Snappy for metrics)
    • Unified query layer that:
      • Routes dashboard and alerting queries to pre-aggregated storage
      • Redirects exploratory and ad-hoc analysis to the data lake
      • Manages correlation queries across both systems

    This approach delivered both the query performance needed for real-time operations and the analytical depth required for complex troubleshooting.

    Decision Framework

    When architecting telemetry pipelines, these technical considerations should guide your approach:

    Decision Factor Use ETL Use ELT
    Query latency requirements < 1 second Can wait minutes
    Data retention needs Days/Weeks Months/Years
    Cardinality Low/Medium Very high
    Analysis patterns Well-defined Exploratory
    Budget priority Compute Storage

    Conclusion

    The technical realities of telemetry data processing demand thinking beyond simple ETL vs. ELT paradigms. Engineering teams should architect tiered systems that leverage the strengths of both approaches:

    • ETL-processed data for operational use cases requiring immediate insights
    • ELT-processed data for deeper analysis, troubleshooting, and historical patterns
    • Metadata-driven routing to intelligently direct queries to the appropriate tier

    This engineering-centric approach balances performance requirements with cost considerations while maintaining the flexibility required in modern observability systems.

    About the author: Nishant Modak is the founder and CEO of Last9, a high cardinality observability platform company backed by SequoiaNishant Modak India (now PeakXV). He’s been an entrepreneur and working with large scale companies for nearly two decades.

    Related Items:

    From ETL to ELT: The Next Generation of Data Integration Success

    Can We Stop Doing ETL Yet?

    50 Years Of ETL: Can SQL For ETL Be Replaced?



    Source link

    approaches Practical Technical tradeoffs
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    tonirufai
    big tee tech hub
    • Website

    Related Posts

    Revenue NSW modernises analytics with AWS, enabling unified and scalable data management, processing, and access

    July 15, 2025

    Announcing Google’s Gemma 3 on Databricks

    July 14, 2025

    How Data Analytics Improves Lead Management and Sales Results

    July 13, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    Firmware-Managed Verified Boot with Hardware, Building Trust From Power-On

    July 16, 2025

    Scientists discover the moment AI truly understands language

    July 16, 2025

    Zero Trust Explained: Why Traditional

    July 16, 2025

    Running high-performance PostgreSQL on Azure Kubernetes Service

    July 16, 2025
    Advertisement
    About Us
    About Us

    Welcome To big tee tech hub. Big tee tech hub is a Professional seo tools Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of seo tools, with a focus on dependability and tools. We’re working to turn our passion for seo tools into a booming online website. We hope you enjoy our seo tools as much as we enjoy offering them to you.

    Don't Miss!

    Firmware-Managed Verified Boot with Hardware, Building Trust From Power-On

    July 16, 2025

    Scientists discover the moment AI truly understands language

    July 16, 2025

    Subscribe to Updates

    Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

      • About Us
      • Contact Us
      • Disclaimer
      • Privacy Policy
      • Terms and Conditions
      © 2025 bigteetechhub.All Right Reserved

      Type above and press Enter to search. Press Esc to cancel.