This repository contains the code used in the exploration study to compare LoRA and ReFT for log anomaly detection.
- Python 3.11
- Install required packages:
pip install -r requirements.txt
- Ensure you have the dataset from Le et al. in the
logs_datasetdirectory. Source: https://github.com/LogIntelligence/LogADEmpirical/tree/devRaw and preprocessed datasets (including parsed logs and their embeddings) are available at Zenodo.
- Download Llama3 - https://github.com/meta-llama/llama3 and refer to
Downloadsection. Convert to Hugging Face by using theconversion scriptin this link https://huggingface.co/docs/transformers/en/model_doc/llama3. Get the8Bmodel and store the weights inllama3HFfolder
-
Navigate to the Data Loader Script
- Go to
data_process_logs/data_loader.py.
- Go to
-
Modify the Dataset Setting
- Change the line
dataset = "BGL"to one of the following options:"BGL","HDFS","Spirit", or"Thunderbird".
- Change the line
-
Adjust Settings (if required)
- You can modify the following settings as needed:
window_size: Default is50.step_size: Default is50.train_size: Default is0.8.is_test_train_ratio: Default isFalse.
- You can modify the following settings as needed:
-
Evaluate Train Ratio
- To evaluate the train ratio, set
is_test_train_ratiotoTrueand adjusttrain_size. Experimental settings are0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8.
- To evaluate the train ratio, set
-
Run the Data Loader Script
- Execute the following command in your terminal:
python data_process/data_loader.py
- Execute the following command in your terminal:
Here is an example configuration for data_loader.py:
dataset = "HDFS"
window_size = 50
step_size = 50
train_size = 0.8
is_test_train_ratio = FalseNote: Ensure you have the dataset from Le et al. in the logs_dataset directory. Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev
Raw and preprocessed datasets (including parsed logs and their embeddings) are available at Zenodo.
- Optional to set
--max_n_train_exampleand--max_n_eval_exampleto limit the sample size - To adjust the rank, adjust
-rto the desired rank - To adjust the intervention position, adjust
-pto the desired position. Options includesfx,lx,fx+lx. Replace x with the number. E.g.,f1means the first input position, whilel1means the last input token. - Option to adjust other hyperparameters as desired
-
Run the Script for Llama3-ReFT
- Execute the following script:
./scripts/main_results/llama3_reft.sh
- Settings for all dataset is as per the script, to adjust
DATASETto the right dataset:"BGL","HDFS","Spirit", or"Thunderbird" - Epoch
-eis 3 for all"BGL","HDFS","Spirit", or"Thunderbird"
- Execute the following script:
-
Run the Script for RoBERTa-ReFT
- Execute the following script:
./scripts/main_results/roberta_reft.sh
- Settings for all dataset is as per the script, to adjust
DATASETto the right dataset:"BGL","HDFS","Spirit", or"Thunderbird". - Epoch
-eis 6, 3, 3, 6 for"BGL","HDFS","Spirit","Thunderbird"respectively
- Execute the following script:
-
Run the Script for GPT2-ReFT
- Execute the following script:
./scripts/main_results/gpt2_reft.sh
- Settings for all dataset is as per the script, to adjust
DATASETto the right dataset:"BGL","HDFS","Spirit", or"Thunderbird". - Epoch
-eis 6 for all"BGL","HDFS","Spirit","Thunderbird"
- Execute the following script:
-
Run the Script for Llama3-LoRA
- Execute the following script:
./scripts/main_results/llama3_lora.sh
- Settings for all dataset is as per the script, to adjust
DATASETto the right dataset:"BGL","HDFS","Spirit", or"Thunderbird". - Epoch
-eis 3 for all"BGL","HDFS","Spirit","Thunderbird"
- Execute the following script:
-
Run the Script for RoBERTa-LoRA
- Execute the following script:
./scripts/main_results/roberta_lora.sh
- Settings for all dataset is as per the script, to adjust
DATASETto the right dataset:"BGL","HDFS","Spirit", or"Thunderbird". - Epoch
-eis 3, 3, 3, 9 for all"BGL","HDFS","Spirit","Thunderbird"respectively
- Execute the following script:
-
Run the Script for GPT2-LoRA
- Execute the following script:
./scripts/main_results/gpt2_lora.sh
- Settings for all dataset is as per the script, to adjust
DATASETto the right dataset:"BGL","HDFS","Spirit", or"Thunderbird". - Epoch
-eis 3, 6, 6, 6 for all"BGL","HDFS","Spirit","Thunderbird"respectively
- Execute the following script:
First, generate all the necessary datasets, refer to Process Dataset Then run the scripts similar to Main results and hyperparameters, with the following modifications:
- Add constant
TRAIN_RATIO=0.1 - Edit as follows
-train_dataset ./logs_dataset/${DATASET}/${TRAIN_RATIO}train.pkl-eval_dataset ./logs_dataset/${DATASET}/${TRAIN_RATIO}test.pkl. Note the addition of theTRAIN_RATIOconstant - Adjust
TRAIN_RATIOaccording to the dataset generated. E.g.,0.1to0.7with0.1increments - Example for Llama3-ReFT and Llama3-LoRA given in
/scripts/train_ratio - Epoch used is 3 for all experiments. The other settings are kept the same.
-
Run the Script for Llama3-ReFT
- Execute the following script:
./scripts/unstable_logs/llama3_reft.sh
- Settings for all dataset is as per the script, to adjust
INJECTION_RATIOto one of0.01,0.02,0.03,0.05,0.1,0.2,0.3
- Execute the following script:
-
Run the Script for Llama3-LoRA
- Execute the following script:
./scripts/unstable_logs/llama3_lora.sh
- Settings for all dataset is as per the script, to adjust
INJECTION_RATIOto one of0.01,0.02,0.03,0.05,0.1,0.2,0.3
- Execute the following script:
- First, set the train dataset by changing
DATASET_TRAINto one of"BGL","HDFS","Spirit","Thunderbird". Also remove-do_eval. Ensure you have-save_model - Start training by running the script
OR
./scripts/zero_shot/llama3_reft.sh
./scripts/zero_shot/llama3_lora.sh
- Once the model is finetuned, locate the model directory in
results. You should see the directory in the logs - Add
-my_model {$NAME_OF_MODEL}example:-my_model ./results/REFT_HDFS_llama3HF_20240831080505129560 \ - Remove
-do_trainand add-do_eval. Also update theDATASET_TESTto the desired dataset to test on, to one of"BGL","HDFS","Spirit","Thunderbird".
They are contained in the directory /other_methods
I clone the repository directly from the source and update some of the code so that they could run on the common dataset in logs_dataset
Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev
Run
python ./other_methods/LogADEmpirical/main_run.py --config_file=<config_file>
# where `<config_file>` is the path to the configuration file.
# e.g., python ./other_methods/LogADEmpirical/main_run.py --config_file=./config/other_methods/LogADEmpirical/HDFS/cnn.yamlSource: https://github.com/HelenGuohx/logbert
Navigate to /other_methods/logbert/
Navigate to a dataset folder e.g., BGL or HDFS or Tbird or Spirit
Run
bash init.shNavigate back to logbert
Copy train.pkl and test.pkl from logs_dataset of the respective dataset to output/${DATASET} folder
Run
bash running_script_${DATASET}.shDATASET is one of bgl, hdfs, spirit, tbird