Transformer-Based Anomaly Detection Technology for Network Traffic Monitoring

Identify and analyze anomalies in network traffic using advanced machine learning techniques with Zeek logs.
Technology No. CW-25-19

Summary

This software analyzes Zeek logs (derived from packet captures) to surface anomalous device and protocol activity. It converts heterogeneous network events into sequences a transformer model can understand, then produces an anomaly score per sequence and flags items exceeding a configurable threshold. The pipeline uses Gower distance and DBSCAN to form feature representations before sequence modeling, enabling unsupervised detection that can be reproduced across environments. To our knowledge, this is the first approach to tokenize device and protocol logs as inputs to a transformer for anomaly detection without labeled data; the code is intended for public release to support reproducible research and community extension.

Solution

  • Input: Zeek logs produced from packet captures (PCAP → Zeek → logs).
  • Feature construction: Apply Gower distance for mixed-type similarity, then DBSCAN to identify structure and derive features.
  • Sequence modeling: Tokenize device/protocol events and feed sequences to a transformer that outputs per-sequence anomaly scores.
  • Decisioning: User-settable threshold converts scores to anomaly/not-anomaly labels for triage.
  • Operational use: Cyber analysts review high-scoring sequences; ML data scientists adapt and re-train on local network data.
  • Reproducibility & openness: Public codebase designed for repeatable experiments and extension.

Key Advantages

  • No labels required: Unsupervised pipeline can surface novel and rare behaviors.
  • Sequence-aware: Transformer captures temporal/contextual patterns across events—not just point anomalies.
  • Mixed-data ready: Gower distance handles numeric, categorical, and binary fields without lossy encodings.
  • Noise-robust clustering: DBSCAN can separate dense normal behavior from outliers without pre-set cluster counts.
  • Analyst-controlled thresholds: Adjustable cutoffs let teams tune precision/recall to their environment.
  • Reproducible & extensible: Public release enables peer validation and rapid iteration.

Market Applications

  • Security Operations Centers (SOCs): Prioritize investigations by routing high-scoring sequences to analysts.
  • MSSPs & IR teams: Run against customer networks to spot unusual activity during monitoring or incident response.
  • Enterprise & Government networks: Continuous anomaly screening alongside existing IDS/IPS and SIEM workflows.
  • Research & academia: Benchmark unsupervised sequence models on Zeek-based datasets and compare configurations.
  • ML engineering teams: Integrate the pipeline into data platforms for ongoing model improvement.

Access

This software is open-source and available at no cost using the following GitHub link: https://github.com/IdahoLabUnsupported/sequential-network-anomaly-detection

  • expand_more mode_edit Authors (2)
    Anna T Quach
    Dempsey D Rogers
  • expand_more cloud_download Supporting documents (1)
    Product brochure
    Transformer-Based Anomaly Detection Technology for Network Traffic Monitoring.pdf
Questions about this technology?