At this year’s Black Hat USA conference, Sophos Senior Data Scientists Ben Gelman and Sean Bergeron will give a talk on their research into command line anomaly detection – examining how large language models (LLMs) and classical anomaly detection can be synergistically combined to identify critical data for augmenting dedicated command line classifiers.
Anomaly detection in cybersecurity has long promised the ability to identify threats by highlighting deviations from expected behavior. For classifying malicious command lines, however, its practical application often results in high false positive rates, making it expensive and inefficient. But that’s not the whole story when it comes to command line anomaly detection; recent innovations in AI provide a new angle for researchers to explore.
In their talk, Ben and Sean will explore this topic by developing a pipeline that does not depend on anomaly detection as a point of failure. Using anomaly detection to feed a different process avoids the potentially catastrophic false positive rates of an unsupervised method. Instead, Ben and Sean created improvements in a supervised model targeted towards classification.
Unexpectedly, the success of their method did not depend on anomaly detection locating malicious command lines. They gained a valuable insight: anomaly detection, when paired with LLM-based labeling, yields a remarkably diverse set of benign command lines. Leveraging this benign data when training command line classifiers significantly reduces false positive rates. Furthermore, it allows researchers and defenders to use plentiful existing data without the needles in a haystack that are malicious command lines in production data.
Ben and Sean will share the results of their research, and the methodology of their experiment, highlighting how diverse benign data identified through anomaly detection broadens the classifier’s understanding and contributes to creating a more resilient detection system. By shifting focus from solely aiming to find malicious anomalies to harnessing benign diversity, they developed a potential paradigm shift in command line classification strategies – something that can be implemented in detection systems at a large scale and low cost.
Ben and Sean will present their talk at the Black Hat USA conference in Las Vegas, Nevada on Thursday 7 August at 1.30pm PDT. A more detailed article on their research will be published following the presentation.