Data Sources and Collection - Lesson 2 - Threat Operations

📚 Learning Objectives

By the end of this lesson, you will be able to:

Identify key data sources for threat hunting
Understand endpoint telemetry data collection
Analyze network traffic for hunting purposes
Implement log aggregation and correlation
Integrate threat intelligence feeds
Ensure data quality and normalization

🔍 Data Sources Overview

Importance of Data Sources

Effective threat hunting relies heavily on the quality, breadth, and depth of available data sources. The more comprehensive your data collection, the better your chances of detecting sophisticated threats that may have evaded traditional security controls.

🔑 Key Principles for Data Sources:

Completeness: Collect data from all relevant sources
Quality: Ensure data accuracy and integrity
Timeliness: Real-time or near real-time collection
Retention: Maintain historical data for analysis
Normalization: Standardize data formats for correlation

Data Source Categories

🖥️ Endpoint Data

Process execution logs
File system changes
Registry modifications
Network connections
User activity logs

🌐 Network Data

Network flow data (NetFlow)
Packet capture (PCAP)
DNS query logs
Proxy logs
Firewall logs

📋 System Logs

Windows Event Logs
Syslog messages
Application logs
Security logs
Authentication logs

🎯 Threat Intelligence

IOC feeds
Threat actor profiles
Malware signatures
TTP information
Vulnerability data

🖥️ Endpoint Telemetry Data

Process Monitoring

Purpose: Track process creation, execution, and termination events

📊 Key Data Points:

Process name and path
Command line arguments
Parent process information
User and session context
Timing information

🔍 Hunting Use Cases:

Detection of suspicious process chains
Identification of living-off-the-land techniques
Analysis of lateral movement patterns
Monitoring of privilege escalation attempts

💻 Example Queries:

# PowerShell execution detection
ProcessName = "powershell.exe" AND CommandLine CONTAINS "-enc"

# Suspicious parent-child relationships
ParentProcessName = "explorer.exe" AND ProcessName = "cmd.exe"

# Process execution from temp directories
ProcessPath CONTAINS "temp" OR ProcessPath CONTAINS "appdata"

File System Monitoring

Purpose: Track file creation, modification, and deletion events

📊 Key Data Points:

File path and name
File size and timestamps
File hash values
File permissions and attributes
Process performing the operation

🔍 Hunting Use Cases:

Detection of malware drops
Identification of data exfiltration
Monitoring of configuration changes
Analysis of persistence mechanisms

💻 Example Queries:

# Suspicious file extensions
FileName ENDS WITH ".scr" OR FileName ENDS WITH ".pif"

# Files created in system directories
FilePath CONTAINS "system32" AND EventType = "FileCreated"

# Large file transfers
FileSize > 100000000 AND EventType = "FileCreated"

Registry Monitoring

Purpose: Track registry key and value modifications

📊 Key Data Points:

Registry key path
Value name and data
Operation type (create, modify, delete)
Process performing the operation
Timestamp of the change

🔍 Hunting Use Cases:

Detection of persistence mechanisms
Identification of system configuration changes
Monitoring of security software tampering
Analysis of privilege escalation attempts

💻 Example Queries:

# Run key modifications
RegistryPath CONTAINS "Run" AND EventType = "RegistryModified"

# Security software tampering
RegistryPath CONTAINS "antivirus" AND EventType = "RegistryDeleted"

# Suspicious value data
RegistryValue CONTAINS "powershell" OR RegistryValue CONTAINS "cmd"

🌐 Network Traffic Analysis

Network Flow Data (NetFlow)

Purpose: Analyze network communication patterns and connections

📊 Key Data Points:

Source and destination IP addresses
Source and destination ports
Protocol information
Byte and packet counts
Connection duration and timing

🔍 Hunting Use Cases:

Detection of C2 communications
Identification of data exfiltration
Analysis of lateral movement
Monitoring of suspicious connections

DNS Query Analysis

Purpose: Monitor domain name resolution requests

📊 Key Data Points:

Query domain names
Query types (A, AAAA, MX, etc.)
Response information
Client IP addresses
Query frequency and patterns

🔍 Hunting Use Cases:

Detection of DNS tunneling
Identification of domain generation algorithms
Analysis of suspicious domains
Monitoring of data exfiltration via DNS

Packet Capture (PCAP)

Purpose: Deep packet inspection for detailed network analysis

📊 Key Data Points:

Full packet payload
Protocol headers
Application layer data
Encrypted traffic metadata
Timing and sequence information

🔍 Hunting Use Cases:

Deep analysis of suspicious traffic
Identification of custom protocols
Analysis of encrypted communications
Reconstruction of attack sequences

📋 Log Aggregation and Correlation

Centralized Log Management

Effective threat hunting requires centralized collection and normalization of logs from diverse sources.

Windows Event Logs

Security Log: Authentication and authorization events
System Log: System-level events and errors
Application Log: Application-specific events
PowerShell Log: PowerShell execution events

Linux/Unix Logs

Syslog: System and application messages
Auth Log: Authentication events
Kernel Log: Kernel-level events
Application Logs: Service-specific logs

Network Device Logs

Firewall Logs: Traffic filtering and blocking events
Router Logs: Routing and network events
Switch Logs: Port and VLAN events
Proxy Logs: Web traffic and filtering events

Log Correlation Techniques

Time-based Correlation

Correlate events that occur within specific time windows to identify attack sequences.

# Example: Correlate failed logins with successful logins
EventType = "FailedLogin" AND Time > (EventType = "SuccessfulLogin" - 5 minutes)

IP-based Correlation

Track activities from specific IP addresses across multiple data sources.

# Example: Track all activities from suspicious IP
SourceIP = "192.168.1.100" OR DestinationIP = "192.168.1.100"

User-based Correlation

Monitor all activities associated with specific user accounts.

# Example: Track user activities across systems
Username = "admin" OR UserSID = "S-1-5-21-..."

🎯 Threat Intelligence Integration

Types of Threat Intelligence

Indicators of Compromise (IOCs)

IP Addresses: Malicious or suspicious IPs
Domain Names: Malicious domains and URLs
File Hashes: MD5, SHA1, SHA256 of malware
Email Addresses: Phishing and spam sources

Tactics, Techniques, and Procedures (TTPs)

Attack Patterns: Common attack methodologies
Tools and Techniques: Malware and attack tools
Infrastructure: C2 servers and domains
Behavioral Patterns: Attacker behaviors and habits

Threat Actor Intelligence

Attribution: Known threat groups and actors
Motivations: Financial, political, espionage
Capabilities: Technical skills and resources
Targeting: Industries and organizations

Integration Strategies

Automated IOC Matching

Automatically match collected data against known IOCs from threat intelligence feeds.

# Example: Match network connections against malicious IPs
NetworkConnection.SourceIP IN ThreatIntelligence.MaliciousIPs
OR NetworkConnection.DestinationIP IN ThreatIntelligence.MaliciousIPs

Behavioral Pattern Matching

Search for activities that match known attack patterns and techniques.

# Example: Detect living-off-the-land techniques
ProcessName IN ["powershell.exe", "cmd.exe", "wmic.exe"] 
AND CommandLine CONTAINS ThreatIntelligence.SuspiciousCommands

Contextual Enrichment

Enrich hunting queries with contextual information from threat intelligence.

# Example: Search for activities associated with specific threat groups
ThreatIntelligence.AttributedGroup = "APT29" 
AND (ProcessName CONTAINS "sophos" OR RegistryPath CONTAINS "security")

📊 Data Quality and Normalization

Data Quality Challenges

Ensuring high-quality data is crucial for effective threat hunting. Poor data quality can lead to missed threats or false positives.

Data Inconsistency

Different systems may log the same event in different formats or with different field names.

Solutions:

Implement data normalization rules
Use standard field naming conventions
Create data mapping tables

Missing Data

Some events may not be logged due to configuration issues or system limitations.

Solutions:

Implement comprehensive logging policies
Use multiple data sources for redundancy
Monitor data collection health

Data Volume

Large volumes of data can overwhelm analysis capabilities and slow down hunting activities.

Solutions:

Implement data filtering and aggregation
Use tiered storage strategies
Optimize query performance

Data Normalization Techniques

Field Standardization

Standardize field names and formats across all data sources.

# Standard field names
SourceIP, DestinationIP, SourcePort, DestinationPort
EventTime, EventType, UserName, ProcessName

Time Standardization

Convert all timestamps to a common timezone and format.

# UTC timestamp format
EventTime: "2024-01-15T14:30:25.123Z"

Value Normalization

Normalize values to consistent formats (e.g., lowercase, trimmed strings).

# Normalized values
ProcessName: "powershell.exe" (not "PowerShell.EXE")
UserName: "john.doe" (not "JOHN.DOE")

🧪 Hands-On Exercise

Exercise: Data Source Assessment and Planning

Objective: Assess available data sources and develop a data collection strategy for threat hunting.

📋 Scenarios:

Scenario 1: Small Enterprise Environment

Situation: You're setting up threat hunting for a 100-employee company with basic security infrastructure.

Requirements:

Identify available data sources
Assess data quality and completeness
Recommend data collection improvements
Develop data normalization strategy

Scenario 2: Large Enterprise Environment

Situation: You're optimizing threat hunting for a 10,000-employee enterprise with comprehensive security tools.

Requirements:

Map all available data sources
Identify data gaps and redundancies
Optimize data collection and storage
Develop correlation strategies

Scenario 3: Cloud-First Environment

Situation: You're implementing threat hunting for a cloud-native organization using AWS, Azure, and SaaS applications.

📊 Lesson 2: Data Sources and Collection

📚 Learning Objectives

🔍 Data Sources Overview

Importance of Data Sources

🔑 Key Principles for Data Sources:

Data Source Categories

🖥️ Endpoint Data

🌐 Network Data

📋 System Logs

🎯 Threat Intelligence

🖥️ Endpoint Telemetry Data

Process Monitoring

📊 Key Data Points:

🔍 Hunting Use Cases:

💻 Example Queries:

File System Monitoring

📊 Key Data Points:

🔍 Hunting Use Cases:

💻 Example Queries:

Registry Monitoring

📊 Key Data Points:

🔍 Hunting Use Cases:

💻 Example Queries:

🌐 Network Traffic Analysis

Network Flow Data (NetFlow)

📊 Key Data Points:

🔍 Hunting Use Cases:

DNS Query Analysis

📊 Key Data Points:

🔍 Hunting Use Cases:

Packet Capture (PCAP)

📊 Key Data Points:

🔍 Hunting Use Cases:

📋 Log Aggregation and Correlation

Centralized Log Management

Windows Event Logs

Linux/Unix Logs

Network Device Logs

Log Correlation Techniques

Time-based Correlation

IP-based Correlation

User-based Correlation

🎯 Threat Intelligence Integration

Types of Threat Intelligence

Indicators of Compromise (IOCs)

Tactics, Techniques, and Procedures (TTPs)

Threat Actor Intelligence

Integration Strategies

Automated IOC Matching

Behavioral Pattern Matching

Contextual Enrichment

📊 Data Quality and Normalization

Data Quality Challenges

Data Inconsistency

Solutions:

Missing Data

Solutions:

Data Volume

Solutions:

Data Normalization Techniques

Field Standardization

Time Standardization

Value Normalization

🧪 Hands-On Exercise

Exercise: Data Source Assessment and Planning

📋 Scenarios:

Scenario 1: Small Enterprise Environment

Requirements:

Scenario 2: Large Enterprise Environment

Requirements:

Scenario 3: Cloud-First Environment

Requirements:

📄 Deliverables:

🔗 External Resources:

📊 Knowledge Check

Question 1: What is the primary purpose of endpoint telemetry data in threat hunting?

Question 2: Which data source is most effective for detecting DNS tunneling attacks?

Question 3: What is the main challenge with data normalization in threat hunting?

🔗 Additional Resources

Official Documentation