Blog: Quality evaluation of Cyber Threat Intelligence feeds

Quality evaluation of Cyber Threat Intelligence feeds

Christian Doerr

In order to protect against emerging threats, many organizations subscribe to CTI feeds of malicious domain names or IP addresses to apply to their network and endpoint defense solutions. This approach is however only effective, if these feeds provide a comprehensive and complete view of ongoing threats, contain relevant information, and distribute indicators in a timely manner - soon enough so that attacks can actually be stopped in time. In this analysis, we looked at 1.3 million indicators of compromise from 24 cyber threat intelligence feeds, and evaluated the quality of information provided on these lists.

When used as an input to your system controls, CTI feeds contain a number of major problems, specifically we find in our study that

Indicators are included too late to enable timely detection
Providers have major biases depending on where their infrastructure is located, making your detection also biased
Applying IoCs without careful preprocessing exposes you to significant collateral damage
The low overlap forces you to subscribe to multiple vendors, and is still unlikely to give you full coverage in the end.

Therefore, to get the most value for your defense posture out of CTI feeds, some careful planning and evaluation is required, especially to avoid accidental damages to your own organization.

We were able to perform this unique study with the help of a large network owner, who could provide us with aggregate statistics of how many connections were made to IP addresses that were later on flagged as malicious by the feeds at a given point in time. The figure on the right shows aggregated traffic per day towards 6 exemplary IP addresses, which are included into a CTI feed on the day marked by "x" in a CTI feed. IP address 1, 2 and 3 are good examples of timely detection and thus showcases of what benefits CTI feeds could offer.

Frequently, indicators are however added to CTI feeds only with significant delay after the activity has started, as in the examples of IP 4 or 6. An important question that we analyzed in this work is also the sensitivity of CTI feeds, in other words how much activity is necessary before a malicious actor is seen by the provider's detection network. IP address 5 for example only gets added after the onset of major interaction even though this IP address has been active for a while.

Key findings

During our analysis, we have identified key points where CTI feeds can be ineffective. In this article we will discuss four key findings from our analysis, for further details and more results you can find the full analysis here.

Indicators are on average 21 days late. One of the main selling points of CTI feeds is that they provide indicators of compromise to enable timely detection of threats before an incident has occurred. When we analyzed the time when an indicator was placed on a list and how long already traffic was flowing to that address (for the methodology refer to the paper), we find that feeds are typically running behind by multiple weeks. Across all 24 feeds, we measured an average delay of 3 weeks, where some feeds include indicators even months after they first emerged.
Most CTI feeds thus actually do not deliver on the promise of timely detection, and when indicators arrive weeks after the onset of an activity it is not really "intelligence".

Feeds have a strong geographical bias. It depends a lot on the feed which threats are reported and where they are originating from. When we analyze which indicators are included in CTI feeds, we find that vendors have significant biases to particular countries, with some vendors drawing a quarter of their observations from one country that is barely mentioned by others. When choosing CTI feeds, thus be aware of such geographical biases and balance them accordingly by complementing your subscription to other lists.
Most indicators will cause collateral damage. When CTI feeds report IP addresses as sources of malicious activities, they will also inadvertently impact other services running on the same host that may be benign in nature. This is for example the case for shared hosting environments, where blocking an IP address based on one C&C domain will impact large quantities of other websites hosted by that provider. In our study we find that the degree of collateral damage depends on the list. One feed identifying bruteforcing activity was for example relatively targeted where 80% of IoCs only affected at most 6 other domain names, but overall blocking based on IPs is not precise enough and leads to significant collateral damage. We have seen major public DNS servers included in feeds, and frequently IoCs would have taken out tens of thousands of other domains. Applying IoCs from CTI feeds without careful pre-filtering will in thus practice often cause more harm than good.
A surprisingly low amount of duplicates indicates that overall coverage of malicious activity is (too) low. A good CTI feed will give you a unique insight into the activities of malicious actors. Yet at the same time, if CTI feeds would capture all malicious activities of for example currently ongoing ransomware, we should find the same set of C&C servers repeated on various lists. In practice, we find that there is almost no overlap between CTI feeds, only 6.2% of indicators appear on a second list. This would indicate that the coverage provided by CTI feeds of the malicious ecosystem is insignificant, as every feed assesses only a small chunk of a big universe. Thus, in addition to false positives and collateral damage, we should especially be concerned about false negatives, or missing the bulk of malicious activity because it is not included in feeds.

To learn more about the study, methodology and findings, you can find the full report here.