Enhancing Security with Cloud Flow Logs

Organizations including the U.S. military, are increasingly adopting cloud deployments for their flexibility and cost savings in deployment. One aspect of such deployments is the shared security model promulgated by NSA, which describes many of the security services that cloud service providers (CSPs) support and provides for cooperation on security issues. This model also leaves security responsibilities on the organizations contracting for service. These responsibilities include ensuring the hosted application is accomplishing its intended purpose for the authorized set of users.

Cloud flow logs, as identified by network defenders, are a valuable source of data to support this security responsibility. If expected events (indicated by transfer of data to and from the cloud) happen, these logs help identify which external endpoints receive service, the extent of the service, and whether there are users who overuse cloud resources.

The SEI has a long history of support for flow log analysis, including its early 2025 releases (for Azure or AWS) of open-source scripts to facilitate cloud flow log analysis. This blog summarizes these efforts and explores challenges associated with correlating events across multiple CSPs.

Collecting Cloud Flow Logs

A cloud flow log is a collection of records that contain summaries of network traffic to and from endpoints in the cloud. Hosts in the cloud are specifically configured to produce and consume packets of data across the internet. This is unlike on-premises flow generation, which is done for all hosts on a given network based on sensors. Hosts (virtual private clouds or network security groups) or subnets (VNets) in the cloud may generate these flow records. While not necessarily intended for long-term retention for assessing security, these logs cover a history of cloud activity without respect to malware or alert signatures or any specific network events. This history provides context for detected events and profiles of expected, anomalous, or malicious activity. This context supports more reliable interpretation of alerts and network reports, which ultimately makes organizations more secure.

Ongoing collection also allows for identification of three sorts of traffic observations:

Events—isolated behaviors with security implications, including benign (assuring that something is happening that should happen) and malicious (identifying that something is happening that compromises security)
Patterns—collections of events that may constitute evidence of a defensive measure or an aggressive action. In general, patterns are collections of more than one event and provide context for evaluating actions.
Trends—sequences of events that cumulatively identify shifts in network behavior (again, cumulatively benign or cumulatively malicious)

Approaches to Analyzing Cloud Flow Logs

Cloud service providers offer a variety of collection options and record contents. For examples see Table 1, which is discussed below. The collection options include the interval for which the records aggregate network traffic (e.g., 1-minute or 5-minute intervals) and the sampling employed in the aggregation (e.g., all packets or a sample of one packet from each ten). These differences can complicate comparison or integration across CSPs. Assumptions made by CSPs, such as assumed traffic direction, may also complicate analysis of the network traffic. If the analysis process does not address these differences, fusion of data from different clouds becomes difficult and results increase in uncertainty. While analysis of cloud flow logs shares all the challenges of analyzing other network logs, the handling of these differences presents additional challenges.

Figure 1: Example set of timelines for an infrastructure implemented across two CSPs (C1 and C2) and an on-premises host (O).

As an example, consider Figure 1 above, which shows timelines for events across an infrastructure that is implemented across two CSPs and an on-premises hosting provider. An analyst wishes to evaluate the interactions, all of which are contacts from the same external host as shown in Figure 1 by the small horizontal lines. Looking at each event or timeline separately, the contact appears non-threatening. By evaluating the interactions in aggregate, the analyst obtains a broader view of the activity.

There are several possible ways of addressing differences between CSPs: present the results separately, use separate analyses and caveat the results, or interpolate the differences to restructure the data for a common analysis. Given the range of choices available, organizations seeking to improve their access and use of cloud flow logs may architect an analytic infrastructure to suit their needs. In any of these approaches, the overall goal will be to improve awareness of cloud activity and to apply that awareness to improve the security of the organization’s information.

The paragraphs below consider several approaches.

Figure 2: A separate results analysis approach

The separate results approach shown in Figure 2 above uses each cloud’s data to generate a set of results using data structures and analysis methods appropriate to that cloud. Since separate providers produce logs, the environment of each provider’s logs will differ.

Table 1 below shows artificially-generated entries with the content of logs from three cloud providers, simplified into tables and with selected record fields for clarity of display. Azure and Google logs are normally in JSON format, with Azure using a deeply nested structure and Google a relatively flat structure. AWS logs are normally in formatted text. The logs differ in that AWS (Table 1c) and Google (Table 1b) depict activity as samples over time, while Azure (Table 1a) describes activity with begin, continue, and end events at identified times.

In the example data in Table 1, the Azure and AWS logs use IP addresses to refer to instances, but the Google log uses instance identifier strings. The separate results approach would leave these differences and not try to reconcile between them.

It is apparent that the fields of the flow records differ between providers, and the format of the individual fields also differ, such as for time values. There is no clock synchronization across separate providers.

The separate results approach allows for the most accommodation to differences between clouds, without considering the comparison of results from other clouds. The separate results approach aligns with the specific CPS environments, but at the potential cost of obscuring common actors or techniques that affect multi-cloud hosting employed by an organization.

Table 1: Example cloud flow logs

Figure 3: An example of the separate results analysis with four events (P1-P4)

In Figure 3, the analyst examines each CSP and the on-premises data separately. This produces a series of four events (one in each of the cloud-hosted functionalities and two in the on-premises hosted functionality). These events can be ordered, but the differing nature of the cloud data collection prevents both precise time relationships and use of the details recorded in the flow record.

Using this approach does allow a broader view than the previously discussed analysis, but not the level of detail typically desired by the analyst. However, for those analysts primarily focused on a single cloud implementation, the separate results approach may be preferred for simplicity.

Figure 4: A separate analysis approach that includes result reconciliation

An alternate strategy is the separate analysis approach, which applies methods targeted to each CSP’s unique features but presents results with format and content that allow a reconciliation process to produce a common set of results as shown in Figure 4. For example, Each line of results may normalize IP addresses to a common format by using enrichment information, such as registration or DNS resolution. Each process may reconcile timestamps by offsetting for clock skew and using a shared format. This approach allows for a common awareness across multi-cloud hosting, but potential costs include sacrifice of the additional information that a single CSP may provide and loss of precision in timing and volume information to accommodate differences in collection processes between clouds. The SEI has released an open source set of scripts implementing this approach for AWS and for Azure.

Figure 5: An example of the separate analysis approach that leads to pattern identification

In Figure 5 above we see that applying the separate analysis approach allows identification that the two events on the CSPs are both instances of the same pattern. Looking at the data in Table 1, the query-response structure of the interactions involves examining port and protocol pairing in Table 1a but source and destination matching in Table 1b. This requires separate analysis logic to reach a common understanding. The similar behavior together with similar packet and byte sizes in each of the two clouds supports identification of the activity with a common pattern. This identification enables application of the features of the pattern in the analysis, although clocks in the separate clouds are not synchronized, which implies the event ordering may be inferred but not the time interval between events. However, for relatively low velocity collection across multiple clouds, the separate analysis approach may be preferred for the level of detail it supports.

Figure 6: An example of the common analysis approach

A third strategy is the common analysis approach as shown in Figure 6 above. This works by translating each set of cloud logs into a format and content that is achievable from each CSP’s flow logs. This approach allows more code-efficient analytical work processes since only a single analysis script is required to examine all of the logs in the common format, plus the transformation scripts from each CSP’s format to the common format. There is a potential for loss of certain fields from each CSP’s format, specifically those that have no common format equivalent. In addition, collection into a single location from multiple clouds will likely involve data-transfer costs to the organization. organizations will need to define and apply appropriate access restrictions for the logs in common format, based on their information security policies

Figure 7: A common timeline from a common analysis

Figure 7 continues the example by applying the common analysis approach to resolve differences in flow aggregation to interpolate activity into a common timeline. One possible interpolation would be to average the volume information into a common time unit, then align time units between sources (assuming the sources have reasonably aligned clocks, even if not fully synchronized). Converting the features of the flow records into common format (e.g., JSON, CSV, etc.), order of features, and resolving any data structure issues will also facilitate the common analysis. Once aligned and converted, the analyst may either bring the records into a common repository or apply the analysis separately in source-specific repositories and then aggregate the results into a common timeline.

This aggregate view offers the opportunity for a comprehensive view across data sources but at the cost of additional processing and imprecision due to the alignment process. For a more abstract view across multiple clouds and to ensure a common view of the results, the shared analysis approach may be preferred.

Future Work in Cloud Flow Analysis at the SEI

The work reported in this blog post is exploratory and at the proof-of-concept level. Future efforts will apply these methods in production and at a realistic scale. As such, further issues with infrastructure and with the work reported here will arise and be addressed.

This post has outlined three approaches for analysis of cloud flow log entries. Over time, further approaches may emerge and be applied in this analysis, including approaches more suited to streaming analysis rather than retrospective analysis.

Cloud flow logs are not the only operations-focused cloud data sources. CSP-specific sources, such as cloudTrail and S3 logs may have entries that correlate with cloud flow logs. Since these logs may provide more details on the applications producing the traffic, they may provide more context to improve security. To facilitate this correlation, identifying the baseline of activity in those logs and comparing it with the baseline in cloud flow logs will address issues of scale.

Security researchers have described malicious activity through Tactics, Techniques, and Procedures (TTPs). Several catalogs of such TTPs exist and analysts could map activity in cloud flow logs (and other data sources) to identify consistencies with TTPs. This would lead to improved security detection.

SEI researchers are working to develop the appropriate structure for a multi-cloud repository of flow log data. Given the cost model common among CSPs, such a repository will likely need to be a distributed structure, and that will involve complications in the query and response infrastructure.

Cloud data derived from multiple sources can be expensive to store due to the velocity of the data. Policies need to balance cost against value of the data. This can be complex since some analyses may require longer data retention periods. There have been network attacks such as the Sunburst attack on SolarWinds) that have exploited log retention times to conceal their activity. Some cloud data sources appear to have value in reporting transient conditions of relevance to security. For example, some service logs report inputs that fail to follow expected formatting. This may be due to misconfigurations, transmission errors, or a form of vulnerability probing. Such log entries are unlikely to be of lasting value in assessing security since they record detected (and likely blocked) inputs. Other cloud data sources are likely to be of more lasting value. An example would be entries mapping to TTPs as described earlier. A process is needed to evaluate cloud data sources for long term retention versus those that should only feed streaming anomaly detection, without long term storage of entries.

Source link

What's Hot

SVS Engineers: Who are the people that test-drive your network?

macOS Sequoia (version 15) is now available for your Mac with some big upgrades

Building a real-time ICU patient analytics pipeline with AWS Lambda event source mapping

Enhancing Security with Cloud Flow Logs

A Practical Guide to Threat Modeling

Game Emulation on the Carbon Engine with Dimitris “MVG” Giannakis

Microsoft supports cloud infrastructure demand in Asia

SVS Engineers: Who are the people that test-drive your network?

macOS Sequoia (version 15) is now available for your Mac with some big upgrades

Building a real-time ICU patient analytics pipeline with AWS Lambda event source mapping

The Download: Our bodies’ memories, and Traton’s electric trucks

Don't Miss!

SVS Engineers: Who are the people that test-drive your network?

macOS Sequoia (version 15) is now available for your Mac with some big upgrades

Subscribe to Updates

What's Hot

Enhancing Security with Cloud Flow Logs

Collecting Cloud Flow Logs

Approaches to Analyzing Cloud Flow Logs

Future Work in Cloud Flow Analysis at the SEI

Related Posts

Subscribe to Updates