Harnessing threat intelligence using externaldata operator

Having a Threat Intelligence Platform (TIP) to maintain Indicators of Compromise (IoCs) is somewhat a standard these days. However, not all organizations use a TIP such as MISP, but this shouldn’t prevent anyone from using threat intelligence feeds for hunting, especially when it comes to Microsoft Defender XDR.

Table of Contents

  • What are threat intelligence (TI) feeds and why should I consider using them?
  • How can the externaldata operator help harness threat feeds?
  • What kind of files are supported?
  • How can I use the externaldata operator?
  • Actual examples of externaldata operator, harnessing threat feeds
    • Domains
    • IPs
    • File hashes
    • Keywords
  • Further resources to consider
  • Closing remarks

What are threat intelligence (TI) feeds and why should I consider using them?

Threat Intelligence (TI) feeds are streams of information and data often curated in some form of context (i.e. IPs, hashes or categories such specific malware, etc.) and they provide actionable insights about potential attacks cybersecurity threats and risks.

More and more organizations decide to use TI feeds in order to keep defenses up to date and also, be prepared for emerging threats.

How can the externaldata operator help harness threat feeds?

According to Microsoft documentation:

The externaldata operator returns a table whose schema is defined in the query itself, and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage.

externaldata operator

In simple words, you may have a file of data that interests you, and based on the structure and the data itself, you can build a new table with curated aggregation based on your needs.

For example, the following query will build a table by taking into account a json file, the ip_address and url_status fields but leaving outside any entries where url_status is offline.

What kind of files are supported?

Most common files that can be used include TXT, CSV and JSON files, however Microsoft documentation indicates that over a dozen of filetypes can be leveraged through externaldata operator including ORC, Parquet, PSV, RAW and others. Also, compressed files are supported through the formats of gzip and zip.

In order to test data before ingestion, the following validators can be used, per format whether it’s CSV or JSON.

How can I use the externaldata operator?

Before you start, there are three parameters that need to be defined.

NameTypeDescription
columnNamecolumnTypestringA list of column names and their types. This list defines the schema of the table.
storageConnectionStringstringstorage connection string of the storage artifact to query.
propertyNamepropertyValuestringA list of optional supported properties that determines how to interpret the data retrieved from storage.
externaldata parameters

The above, can be depicted in the query below with actually defined parameters. ColumnName refers to which data will be aggregated from the data source, storageConnection depicts the data source and propertyName and propertyValue define specific intrepretation options for the data.

Actual examples of externaldata operator, harnessing threat feeds

You may find many categories of threat feed data available, some might be the most obvious such as IPs, file hashes, domains etc. and some other might include less popular data type but equally valuable for your organization, in case you want to go threat hunting such as keywords, CVEs and more. You may also want to maintain your own threat feeds, probably by using your own hosting provider or a GitHub repo. Just remember to follow a file type supported by externaldata operator.

Domains

The following query, will detect inbound emails in Defender XDR which match the domains provided from a threat feed.

let domainList = externaldata(domain: string) [@"https://raw.githubusercontent.com/tsirolnik/spam-domains-list/master/spamdomains.txt"] with (format="txt"); // Change the text file to whatever you want
let excludedDomains = datatable(excludeddomain :string)  // Add as many domains you would like to exclude
 ["domain1.tld",
  "domain2.tld",
  "domain3.tld"];   
let Timeframe = 1d; // Choose the best timeframe for your investigation
let SuspiciousEmails = EmailEvents
    | where Timestamp > ago(Timeframe)
    | where EmailDirection == "Inbound"
    | extend EmailDomain = tostring(split(SenderMailFromAddress, '@')[1])
    | join kind=inner (domainList) on $left.EmailDomain == $right.domain
    | where not(EmailDomain in (['excludedDomains']))
    | project Timestamp, NetworkMessageId, SenderMailFromAddress, SenderFromAddress, SenderDisplayName, RecipientEmailAddress, EmailDomain, domain, Subject, LatestDeliveryAction;
SuspiciousEmails
    | join (EmailEvents
    | project NetworkMessageId
)on NetworkMessageId
    | sort by Timestamp desc

Source: github/cyb3rmik3

IPs

The following query will check in Sentinel whether any successful sign ins have taken place from any of the IPs provided in the threat feeds.

let BlockList = (externaldata(ip:string)
[@"https://rules.emergingthreats.net/blockrules/compromised-ips.txt",
@"https://raw.githubusercontent.com/stamparm/ipsum/master/levels/5.txt",
@"https://cinsscore.com/list/ci-badguys.txt",
@"https://infosec.cert-pa.it/analyze/listip.txt",
@"https://feodotracker.abuse.ch/downloads/ipblocklist_recommended.txt"
]
with(format="csv")
| where ip matches regex "(^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$)"
| distinct ip
);
SigninLogs
| where IPAddress in (BlockList)
| where ResultType == "0"

Source: KustoKing

File hashes

The following query will detect if any SHA256 hashes are present from a threat feed providing file hashes for Emotet malware.

let Emotetsha256 = externaldata(sha256: string)[@"https://githubraw.com/Cisco-Talos/IOCs/main/2022/11/Emotet_parents.txt"] with (format="txt", ignoreFirstRecord=True);
DeviceFileEvents
| where SHA256 in (Emotetsha256)
| project Timestamp, FileName, SHA256, DeviceName, InitiatingProcessCommandLine, InitiatingProcessFileName, InitiatingProcessAccountDomain, InitiatingProcessAccountName

Source: Bert-JanP

Keywords

The following query has been crafted to utilize the ProcessVersionInfoCompanyName table with a threat feed created by installing and testing corresponding tools.

let RMMSoftware = externaldata(RMMSoftware: string)[@"https://raw.githubusercontent.com/cyb3rmik3/Hunting-Lists/main/rmm-software.csv"] with (format="csv", ignoreFirstRecord=True);
let ExclDevices = datatable(excludeddev :string)  // Add as many devices you would like to exclude
 ["DeviceName1",
  "DeviceName2",
  "DeviceName3"];
let Timeframe = 7d; // Choose the best timeframe for your investigation
DeviceProcessEvents
    | where Timestamp > ago(Timeframe)
    | where ProcessVersionInfoCompanyName has_any (RMMSoftware)
    | where not(DeviceName in (['ExclDevices']))
    | project Timestamp, DeviceName, ActionType, FileName, FolderPath, ProcessVersionInfoCompanyName, ProcessVersionInfoProductName, ProcessCommandLine, AccountName, InitiatingProcessAccountName, InitiatingProcessFileName, InitiatingProcessCommandLine
    | sort by Timestamp desc

Source: github/cyb3rmik3

The list could go on, but the idea is that, sky is the limit. You can build queries for almost anything you want or harness threat feeds out there to hunt and detect within your Microsoft ecosystem.

Further resources to consider

  • Over the last year I have been experimenting with queries and lists I found useful. You may find them at my GitHub repos, KQL threat hunting queries and Hunting Lists.
  • Bert-Jan has crafted a curated list of Threat Intelligence feeds that can be used to empower defenses in Microsoft Defender XDR. Plus, queries included.
  • Our community contributed repository of KQL queries at kqlsearch has a ton of relevant queries, I bet some will meet your needs, or at least give you an idea to build your own query.
  • As always, Microsoft’s documentation provides great insights and also, a lot of options to pivot further for anything that concerns you.

Closing remarks

Being able to ingest threat intelligence feeds and empower your defenses, is a process that should be thoroughly evaluated, described and contextualized. While this blog has elaborated that with proper query building, you may harness almost any threat feed to hunt, it is important to remember that threat intelligence is meant to be scoped. In simple words, this means that you should only ingest threat intelligence that fits your organization’s requirements.

Happy hunting!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *