Clinton Cao - Projects

Automated Test-Case Generation for REST APIs Using Model Inference Search Heuristic

2025

The rising popularity of the microservice architectural style has led to a growing demand for automated testing approaches tailored to these systems. EvoMaster is a state-of-the-art tool that uses Evolutionary Algorithms (EAs) to automatically generate test cases for microservices' REST APIs. One limitation of these EAs is the use of unit-level search heuristics, such as branch distances, which focus on fine-grained code coverage and may not effectively capture the complex, interconnected behaviors characteristic of system-level testing. To address this limitation, we propose a new search heuristic (MISH) that uses real-time automaton learning to guide the test case generation process. We capture the sequential call patterns exhibited by a test case by learning an automaton from the stream of log events outputted by different microservices within the same system. Therefore, MISH learns a representation of the systemwide behavior, allowing us to define the fitness of a test case based on the path it traverses within the inferred automaton. We empirically evaluate MISH's effectiveness on six real-world benchmark microservice applications and compare it against a state-of-the-art technique, MOSA, for testing REST APIs. Our evaluation shows promising results for using MISH to guide the automated test case generation within EvoMaster.

CATMA: Conformance Analysis Tool for Microservice Applications

2024

The microservice architecture allows developers to divide the core functionality of their software system into multiple smaller services. However, this architectural style also makes it harder for them to debug and assess whether the system's deployment conforms to its implementation. We present CATMA, an automated tool that detects non-conformances between the system's deployment and implementation. It automatically visualizes and generates potential interpretations for the detected discrepancies. Our evaluation of CATMA shows promising results in terms of performance and providing useful insights. CATMA is available at https://cyber-analytics.nl/catma.github.io/, and a demonstration video is available at https://youtu.be/WKP1hG-TDKc.

State Frequency Estimation for Anomaly Detection

2024

Many works have studied the efficacy of state machines for detecting anomalies within NetFlows. These works typically learn a model from unlabeled data and compute anomaly scores for arbitrary traces based on their likelihood of occurrence or how well they fit within the model. However, these methods do not dynamically adapt their scores based on the traces seen at test time. This becomes a problem when an adversary produces seemingly common traces in their attack, causing the model to miss the detection by assigning low anomaly scores. We propose SEQUENT, a new approach that uses the state visit frequency to adapt its scoring for anomaly detection dynamically. SEQUENT subsequently uses the scores to generate root causes for anomalies. These allow the grouping of alarms and simplify the analysis of anomalies. Our evaluation of SEQUENT on three NetFlow datasets indicates that our approach outperforms existing methods, demonstrating its effectiveness in detecting anomalies.

SoK: Explainable Machine Learning for Computer Security Applications

2023

Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, who utilize XAI for 4 distinct objectives within an ML pipeline, namely 1) XAI-enabled user assistance, 2) XAI-enabled model verification, 3) explanation verification & robustness, and 4) offensive use of explanations. Our analysis of the literature indicates that many of the XAI applications are designed with little understanding of how they might be integrated into analyst workflows – user studies for explanation evaluation are conducted in only 14% of the cases. The security literature sometimes also fails to disentangle the role of the various stakeholders, e.g., by providing explanations to model users and designers while also exposing them to adversaries. Additionally, the role of model designers is particularly minimized in the security literature. To this end, we present an illustrative tutorial for model designers, demonstrating how XAI can help with model verification. We also discuss scenarios where interpretability by design may be a better alternative. The systematization and the tutorial enable us to challenge several assumptions, and present open problems that can help shape the future of XAI research within cybersecurity.

Learning State Machines to Monitor and Detect Anomalies on a Kubernetes Cluster

2022

These days more companies are shifting towards using cloud environments to provide their services to their client. While it is easy to set up a cloud environment, it is equally important to monitor the system’s runtime behaviour and identify anomalous behaviours that occur during its operation. In recent years, the utilisation of Recurrent Neural Networks (RNNs) and Deep Neural Networks (DNNs) to detect anomalies that might occur during runtime has been a trending approach. However, it is unclear how to explain the decisions made by these networks and how these networks should be interpreted to understand the runtime behaviour that they model. On the contrary, state machine models provide an easier manner to interpret and understand the behaviour that they model. In this work, we propose an approach that learns state machine models to model the runtime behaviour of a cloud environment that runs multiple microservice applications. To the best of our knowledge, this is the first work that tries to apply state machine models to microservice architectures. The state machine model is used to detect the different types of attacks that we launch on the cloud environment. From our experiment results, our approach can detect the attacks very well, achieving a balanced accuracy of 99.2% and a F1 score of 0.982.

ENCODE: Encoding NetFlows for Network Anomaly Detection

2022

NetFlow data is a popular network log format utilized by many network analysts and researchers. Compared to deep packet inspection, NetFlow is easier to collect, easier to process, and less intrusive to privacy. Numerous studies have used machine learning (ML) to detect network attacks using NetFlow data. Typically, the initial step in these ML pipelines is to preprocess the data before passing it to the ML algorithm. Several approaches exist to preprocess NetFlow data; however, these often apply existing methods to the data, not considering the specific properties of network data. We argue that for data originating from software systems, such as NetFlow or software logs, similarities in frequency and contexts of feature values are more important than similarities in the value itself. In this work, we propose an encoding algorithm (ENCODE) that directly takes the frequency and context of the feature values into account when the data is being processed. Our algorithm clusters similar types of network behavior based on the computed frequencies and context of the feature values, aiding the process of detecting anomalies within the network. We empirically evaluate the effectiveness of our encoding on three public NetFlow datasets. We encode each dataset using our algorithm and train several ML models for network anomaly detection using the encoded data. Our evaluation reveals that ML models benefit from using our encoding for network anomaly detection.

UAV: Warnings from Multiple Automated Static Analysis Tools at a Glance

2017

Automated Static Analysis Tools (ASATs) are an integral part of today’s software quality assurance practices. At present, a plethora of ASATs exist, each with different strengths. However, there is little guidance for developers on which of these ASATs to choose and combine for a project. As a result, many projects still only employ one ASAT with practically no customization. With UAV, the Unified ASAT Visualizer, we created an intuitive visualization that enables developers, researchers, and tool creators to compare the complementary strengths and overlaps of different Java ASATs. UAV’s enriched treemap and source code views provide its users with a seamless exploration of the warning distribution from a high-level overview down to the source code. We have evaluated our UAV prototype in a user study with ten second-year Computer Science (CS) students, a visualization expert and tested it on large Java repositories with several thousands of PMD, FindBugs, and Checkstyle warnings. Project Website: https://clintoncao.github.io/uav/