Streamlined monitoring and debugging for Amazon EMR on EC2

[ad_1]

As organizations scale their data processing and analytics workloads on Amazon EMR on EC2, observability across cluster health, job execution, and resource usage becomes increasingly important. Teams often manage log collection across distributed nodes, correlate Amazon EMR steps with underlying YARN applications, and configure monitoring agents to capture the right level of detail for their environment.

With Amazon EMR release 7.11.0 and updates to the Amazon EMR console, Amazon EMR on EC2 introduces observability capabilities that streamline these workflows further. In this post, we walk you through five key enhancements: Amazon CloudWatch Logs integration, step-level Amazon Simple Storage Service (Amazon S3) logging controls, expanded console UIs for YARN and Tez, Amazon EMR step to YARN application ID mapping, and enhanced custom metrics with updated documentation.

What’s new

The following sections cover key improvements across the Amazon EMR console, logging, metrics collection, and documentation to give you deeper, end-to-end visibility into your Amazon EMR clusters and workloads.

1. CloudWatch Logs integration

Starting with Amazon EMR release 7.11.0, you can stream cluster logs to Amazon CloudWatch Logs in near real time without requiring custom bootstrap actions or manual agent configuration. With Amazon CloudWatch logging enabled, Amazon EMR automatically captures and streams Amazon EMR step execution logs, Spark driver, and Spark executor logs as they’re generated. This makes them immediately available for monitoring, troubleshooting, and post-mortem analysis through the CloudWatch console or API.

You can enable CloudWatch logging through the Amazon EMR console during cluster creation or programmatically using the AWS Command Line Interfaced (AWS CLI) and SDK by including the Amazon CloudWatch Agent in your application configuration and specifying your logging preferences in the configuration section.

With minimal configuration, Amazon EMR captures step logs and Spark driver logs by default, streaming them to a log group named /aws/emr/{cluster_id}. For production workloads requiring stricter organizational and security controls, you can customize the log group name, define a log stream prefix for streamlined filtering, enable encryption with an AWS Key Management Service (AWS KMS) key, and explicitly select which log types to capture. The following example demonstrates a fully customized configuration:

aws emr create-cluster
--name "EMR cluster with custom CloudWatch Logs"
--release-label emr-7.11.0
--applications Name=Spark Name=AmazonCloudWatchAgent
--instance-type m7g.2xlarge
--instance-count 3
--use-default-roles
--monitoring-configuration '
"CloudWatchLogConfiguration":
"Enabled": true,
"LogGroupName": "/my-company/emr/production",
"LogStreamNamePrefix": "cluster-prod",
"EncryptionKeyArn": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012",
"LogTypes": {
"STEP_LOGS": ["STDOUT", "STDERR"],
"SPARK_DRIVER": ["STDOUT", "STDERR"],
"SPARK_EXECUTOR": ["STDERR", "STDOUT"]
}
}
}'

This configuration directs the logs to a custom log group (/my-company/emr/production), prefixes log stream names with cluster-prod for consistent identification across clusters, encrypts log data at rest using the specified KMS key, and captures the full set of available log types: step stdout/stderr, Spark driver, and Spark executor output. Because logs are streamed to CloudWatch as they’re written, you have near real-time visibility into job execution without waiting for log aggregation to S3 or establishing direct connectivity to cluster nodes. Combined with CloudWatch Logs Insights, you can run structured querying across log streams, making it straightforward to trace failures, correlate errors across driver and executor logs, and build metric filters or alarms based on specific log patterns.

2. Step-level S3 logging improvements

S3 logging capabilities now provide granular control over how step logs are organized and secured. You can now specify a dedicated S3 log destination and AWS KMS encryption key at the individual Amazon EMR step level. This allows different steps within the same cluster to write logs to separate S3 paths with independent encryption configurations. This is particularly useful for multi-tenant clusters or workflows with varying data classification requirements.

Step-level logging is configured through the StepMonitoringConfiguration parameter, which accepts an S3MonitoringConfiguration object where you can define the target S3 path and an AWS KMS key for encryption at rest:

"StepMonitoringConfiguration": { "S3MonitoringConfiguration": { "LogUri": "s3://your-s3-bucket/", "EncryptionKeyArn": "arn:aws:kms:your-kms-key-arn" } }

This configuration is optional. When omitted, the step inherits the default S3 log path and encryption settings defined at the cluster level during creation. With this configuration, you can override logging behavior only for the steps that require it, while maintaining a consistent default for the rest of your workflow.

3. Enhanced console with direct access to monitoring UIs

Additional live application UIs are accessible directly from the Amazon EMR Console. These console-hosted interfaces remove the need to configure SSH (Secure Shell) tunnels, set up proxies, or establish any direct network connectivity to cluster nodes to reach application web UIs. The newly added interfaces include:

YARN ResourceManager UI – Monitor cluster-wide resource allocation, queue usage, and application lifecycle states across running and completed YARN applications. This interface also provides direct access to container-level logs for running YARN applications, enabling real-time debugging without requiring node-level access.
Tez UI – Inspect Hive query execution plans, DAG visualizations, vertex-level performance metrics, and task-level counters for queries executed through the Tez execution engine (for example, Hive and Pig workloads).

These join the existing Spark History Server and YARN timeline interfaces already available through the console. By surfacing these UIs, administrators can grant developers and analysts visibility into cluster workloads and application diagnostics without exposing direct network access to cluster infrastructure while maintaining tighter security boundaries and preserving full observability into job execution and resource consumption.

With these additions, Amazon EMR now offers three complementary approaches to accessing application web interfaces, each suited to different operational requirements. Live Application UIs provide console-hosted access to web interfaces on running clusters. They’re recommended for environments where direct network connectivity to cluster nodes must be restricted from end users. On-Cluster Web UIs offer full, unrestricted access to the complete set of native application web interfaces running on cluster nodes, suited for administrators and engineers who require deep, low-level visibility. Persistent Web UIs retain application-level data beyond cluster lifetime, so you can analyze and troubleshoot workloads on terminated clusters. Together, these options give you the flexibility to balance security boundaries, access scope, and data retention based on your team’s specific monitoring and debugging workflows.

4. EMR step to YARN application ID mapping

The Amazon EMR console now surfaces the YARN Application ID directly within the EMR step details panel. For each step executing a Spark, Hive, or other YARN-based workload, the console displays the submitted YARN Application ID associated with that step, establishing a direct link between the EMR step abstraction and the underlying YARN application. With this mapping, you can:

Directly correlate EMR steps to YARN applications – when a step fails or exhibits unexpected behavior, you can immediately identify the exact YARN application to investigate rather than manually cross-referencing timestamps or job names across interfaces.
Access live monitoring tools – with the YARN application ID readily available, you can navigate directly to the YARN ResourceManager Live UI or the Spark History Server to inspect resource consumption, task-level execution details, and application state for both running and completed jobs.
Retrieve logs for detailed troubleshooting – the application ID serves as the key lookup for retrieving container-level logs persisted to Amazon S3, significantly reducing the time to root-cause failures or diagnose performance regressions.

To use this feature, open the Steps tab on your Amazon EMR cluster detail page and select the step that you want to investigate. The YARN Application ID appears in the step details panel. From there, you can use the ID to navigate to the YARN ResourceManager Live UI at http://resourcemanager-host:8088/cluster/app/>, open the corresponding view in the Spark History Server, or locate the associated container logs in your configured S3 log destination.

5. Enhanced custom metrics and observability documentation

By default, Amazon EMR automatically sends cluster-level metrics to Amazon CloudWatch at five-minute intervals, covering YARN application states, node health, HDFS utilization, and I/O activity. With Amazon EMR Release 7.0 and later, enabling the Amazon CloudWatch Agent extends this baseline with additional detailed metrics collected at one-minute intervals across cluster nodes. Furthermore, Amazon EMR 7.1 introduced custom metric classifications that you can use to define precisely which component-level metrics to collect from Hadoop, YARN, and HBase subsystems, like DataNode I/O activity, NodeManager JVM heap utilization, container resource consumption, and HBase performance counters. Each classification supports configurable export intervals, giving you control over collection granularity based on your monitoring requirements.

After enabled, custom metrics are accessible directly from the Monitoring tab in the Amazon EMR console, where you can use a classification filter to switch between HDFS, YARN, HBase custom metric groupings that you’ve defined. Metric configurations can also be updated on running clusters through the console’s reconfiguration workflow, so you can adapt your monitoring strategy as workload requirements evolve without cluster downtime. For environments using Prometheus, metrics can also be forwarded to Amazon Managed Service for Prometheus and visualized through Grafana dashboards.

The following documentation and tutorials are available to help you get the most out of these capabilities:

Getting started

These observability improvements are available now for Amazon EMR on EC2. To get started:

CloudWatch Logs integration and step-level log configuration: To use these capabilities, launch a new cluster with Amazon EMR release 7.11.0 or later.
For console enhancements: Navigate to your existing Amazon EMR clusters in the AWS Console to access Live Application UI links and YARN Application ID mappings in step details, with no additional configuration required.
For custom metrics: Review our Enhanced Custom Metrics documentation to configure the CloudWatch Agent for publishing Hadoop, YARN, and HBase component metrics using custom classification files.

Conclusion

With these enhancements, Amazon EMR on EC2 provides deeper visibility into cluster health, job execution, and resource usage, helping you reduce time to root cause and focus on delivering value from your data. Note that enabling CloudWatch Logs integration and custom metrics incurs additional CloudWatch charges based on log ingestion volume and metric publishing frequency.

If you have feedback or questions, reach out to your AWS account team or post on the AWS re:Post.

About the authors

[ad_2]

Source link

What's Hot

Zane Maldonado LattePanda IOTA-Powered CG Deck Moves from Dream to Engineering Prototype

How Agentic AI Is Changing Network Traffic: Cisco Report

Apple’s incredible AirPods Pro 3 drop back below $200

Streamlined monitoring and debugging for Amazon EMR on EC2

The Fintech and Banking Tools Global Entrepreneurs Rely On

Enterprise AI Had a Default Stack, Microsoft and OpenAI Just Made It Optional |

Amazon Bedrock introduces new advanced prompt optimization and migration tool

Zane Maldonado LattePanda IOTA-Powered CG Deck Moves from Dream to Engineering Prototype

How Agentic AI Is Changing Network Traffic: Cisco Report

Apple’s incredible AirPods Pro 3 drop back below $200

A practical guide for platform teams managing shared AI deployments

Don't Miss!

Zane Maldonado LattePanda IOTA-Powered CG Deck Moves from Dream to Engineering Prototype

How Agentic AI Is Changing Network Traffic: Cisco Report

Subscribe to Updates

What's Hot

Streamlined monitoring and debugging for Amazon EMR on EC2

What’s new

1. CloudWatch Logs integration

2. Step-level S3 logging improvements

3. Enhanced console with direct access to monitoring UIs

4. EMR step to YARN application ID mapping

5. Enhanced custom metrics and observability documentation

Getting started

Conclusion

About the authors

Related Posts

Subscribe to Updates