LLM Finetuning Toolkit

Getting Started

LLM Finetuning toolkit is a config-based CLI tool for launching a series of finetuning experiments and gathering their results. From one single yaml config file, you can define the following:



  • Bring your own dataset in any of jsoncsv, and huggingface formats
  • Define your own prompt format and inject desired columns into the prompt

Fine Tuning

  • Configure desired hyperparameters for quantization and LoRA fine-tune.


  • Intuitively define multiple hyperparameter settings to iterate through


  • Configure desired sampling algorithm and parameters


  • Test desired properties such as length and similarity against reference text

This documentation page is organized in the following sections:

  • Quick Start provides a quick overview of the toolkit and helps you get started running your own experiments
  • Configuration walks you through all the changes that can be made to customize your experiments
  • Developer Guides goes over how to extend each component for custom use-cases and for contributing to this toolkit
  • API Reference details the underlying modules of this toolkit
git clone https://github.com/georgian-io/LLM-Finetuning-Hub.git
cd LLM-Finetuning-Hub/
docker (recommended)
# build image
docker build -t llm-toolkit .
# launch container
docker run -it llm-toolkit              # with CPU
docker run -it --gpus all llm-toolkit   # with GPU
poetry (recommended)

The toolkit has everything you need to get started. This guide will walk you through the initial setup, explain the key components of the configuration, and offer advice on customizing your fine-tuning job. Let's dive in!

First, make sure you have read the installation guide above and installed all the dependencies. Then, To launch a LoRA fine-tuning job, run the following command in your terminal:

python3 toolkit.py

This command initiates the fine-tuning process using the settings specified in the default YAML configuration file config.yaml.

save_dir: "./experiment/"

  use_ablate: false

# Data Ingestion -------------------
  file_type: "huggingface" # one of 'json', 'csv', 'huggingface'
  path: "yahma/alpaca-cleaned"
    >- # prompt, make sure column inputs are enclosed in {} brackets and that they match your data
    Below is an instruction that describes a task. 
    Write a response that appropriately completes the request. 
    ### Instruction: {instruction}
    ### Input: {input}
    ### Output:
    >- # Stub to add for training at the end of prompt, for test set or inference, this is omitted; make sure only one variable is present
  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
  train_test_split_seed: 42

# Model Definition -------------------
  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
  quantize: true
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"

# LoRA Params -------------------
  task_type: "CAUSAL_LM"
  r: 32
  lora_alpha: 16
  lora_dropout: 0.1
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj

# Training -------------------
    num_train_epochs: 5
    per_device_train_batch_size: 4
    optim: "paged_adamw_32bit"
    learning_rate: 2.0e-4
    bf16: true # Set to true for mixed precision training on Newer GPUs
    tf32: true
    max_seq_length: 1024

  max_new_tokens: 1024
  do_sample: True
  top_p: 0.9
  temperature: 0.8


The toolkit has everything you need to get started. This guide will walk you through the initial setup, explain the key components of the configuration, and offer advice on customizing your fine-tuning job. Let's dive in!

First, make sure you have read the installation guide and installed all the dependencies. Then, To launch a LoRA fine-tuning job, run the following command in your terminal:

python3 toolkit.py

This command initiates the fine-tuning process using the settings specified in the default YAML configuration file config.yaml.

save_dir: "./experiment/"

  use_ablate: false

# Data Ingestion -------------------
  file_type: "huggingface" # one of 'json', 'csv', 'huggingface'
  path: "yahma/alpaca-cleaned"
    >- # prompt, make sure column inputs are enclosed in {} brackets and that they match your data
    Below is an instruction that describes a task. 
    Write a response that appropriately completes the request. 
    ### Instruction: {instruction}
    ### Input: {input}
    ### Output:
    >- # Stub to add for training at the end of prompt, for test set or inference, this is omitted; make sure only one variable is present
  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
  train_test_split_seed: 42

# Model Definition -------------------
  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
  quantize: true
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"

# LoRA Params -------------------
  task_type: "CAUSAL_LM"
  r: 32
  lora_alpha: 16
  lora_dropout: 0.1
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj

# Training -------------------
    num_train_epochs: 5
    per_device_train_batch_size: 4
    optim: "paged_adamw_32bit"
    learning_rate: 2.0e-4
    bf16: true # Set to true for mixed precision training on Newer GPUs
    tf32: true
    max_seq_length: 1024

  max_new_tokens: 1024
  do_sample: True
  top_p: 0.9
  temperature: 0.8

🎉 Congratulations! You've ran the first fine-tuning job using this toolkit!

Artefact Outputs

This config will run finetuning and save the artefacts under directory ./experiment/[unique_hash]. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to stop the training before it finishes, you can relaunch the script and the program will automatically load the existing dataset that was generated in the directory, allowing you to resume where you left off instead of starting over from the beginning.

After the script finishes running you will see these distinct artifacts:

/config/config.yml: copy of the config file used for this experiment

/dataset/dataset.pkl: generated pkl file in huggingface Dataset format

/model/*: model weights saved using huggingface format

/results/results.csv: csv of prompt, ground truth, and predicted values

/qa/qa.csv: csv of quality assurance unit tests (e.g. vector similarity between gold and predicted output)


Remember to carefully review the documentation and requirements of the new model you choose to ensure compatibility with your task and the toolkit.

Quality Assurance

Once the model is trained, it’s crucial to verify its readiness for production. We offer Quality Assurance testing specifically tailored for Language Model applications. This approach is distinct from conventional testing methods, as there’s currently no direct means of ensuring that a fine-tuned model meets enterprise standards. Moreover, developers have the flexibility to integrate their own tests into the process.

Generation Length

  • FunctionLengthTest
  • Description: Determines the length of the summarized output and the input sentence. The output length is expected to exceed the input length, aligning with the specific use case.

POS Composition

  • Description: Analyzes the grammar of the generated output, focusing on:

Word Overlap

  • FunctionWordOverLapTest
  • Description: Determines the length of the summarized output and the input sentence. The output length is expected to exceed the input length, aligning with the specific use case.


  • FunctionRougeScore
  • Description: Computes the Rouge score for the output, providing insight into the quality of summarization.

Jaccard Similarity

  • FunctionJaccardSimilarity
  • Description: Calculates similarity by encoding inputs and outputs.

Dot Product (Cosine) Similarity

  • FunctionDotProductSimilarity
  • Description: Computes the dot product between the encoded inputs and outputs


The configuration file is the central piece that defines the behavior of the toolkit. It is written in YAML format and consists of several sections that control different aspects of the process, such as data ingestion, model definition, training, inference, and quality assurance.


The lora section configures the Low-Rank Adaptation (LoRA) settings. Supplied arguments are used to construct a peft LoraConfig object. It includes the following parameters:

  • task_type: Type of transformer architecture; for decoder only - use CAUSAL_LM. for encoder-decoder - use SEQ_2_SEQ_LM
  • r: The rank of the LoRA adaptation matrices.
  • lora_alpha: The scaling factor for the LoRA adaptation.
  • lora_dropout: The dropout probability for the LoRA layers.
  • target_modules: The list of module names to apply LoRA to.
  • fan_in_fan_out: Flag to indicate if the layer weights are stored in a (fan_in, fan_out) order.
  • modules_to_save: List of additional module names to save in the final checkpoint.
  • layers_to_transform: The list of layer indices to apply LoRA to.
  • layers_pattern: The regular expression pattern to match layer names for LoRA application.
  r: 32
  lora_alpha: 16
  lora_dropout: 0.1
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj
  fan_in_fan_out: false
  modules_to_save: null
  layers_to_transform: null
  layers_pattern: null

The fan_in_fan_out parameter is a boolean flag that indicates whether the weights of the layers being adapted are stored in a (fan_in, fan_out) order. This is important for correctly applying the LoRA adaptation.

  fan_in_fan_out: true

In this example, setting fan_in_fan_out to true indicates that the weights of the layers being adapted are stored in a (fan_in, fan_out) order. If the weights are stored in a different order, you should set this parameter to false.

The layers_to_transform parameter is used to specify the indices of the layers to which LoRA should be applied. This allows you to selectively apply LoRA to specific layers of the model.

  layers_to_transform: [2, 4, 6]

In this example, LoRA will be applied to the layers with indices 2, 4, and 6. The layer indices are zero-based, so the first layer has an index of 0, the second layer has an index of 1, and so on.

You can also specify a single layer index:

  layers_to_transform: 3

In this case, LoRA will be applied only to the layer with index 3.

The layers_pattern parameter allows you to specify a regular expression pattern to match the names of the layers to which LoRA should be applied. This provides a more flexible way to select layers based on their names.

  layers_pattern: "transformer\.h\.\d+\.attn"

In this example, the regular expression pattern transformer\.h\.\d+\.attn will match the names of the attention layers in a transformer model. The pattern will match layer names like transformer.h.0.attntransformer.h.1.attn, and so on.

You can adjust the regular expression pattern to match the specific layer names in your model.


When use_ablate is set to true, the toolkit will generate multiple configurations by permuting the specified parameters. This allows you to easily compare different settings and their impact on the model's performance.

API Reference

API Reference for all the important modules

Main Classes



< source >


( path: str )

The Ingestor class is an abstract base class for data ingestors.


  • path: str - The path of the dataset.

to_dataset() -> Dataset
(  )

An abstract method to be implemented by subclasses. Converts the input data to a Dataset object.


Dataset - The converted Dataset object.


< source >


( path: str )

The JsonIngestor class is a subclass of Ingestor for ingesting JSON data.


  • path: str - The path of the JSON dataset.

to_dataset() -> Dataset
(  )

Converts the JSON data to a Dataset object.


Dataset - The converted Dataset object.


< source >


( path: str )

The CsvIngestor class is a subclass of Ingestor for ingesting CSV data.


  • path: str - The path of the CSV dataset.

to_dataset() -> Dataset
(  )

Converts the CSV data to a Dataset object.


Dataset - The converted Dataset object.


< source >


( file_type: str, path: str, prompt: str, prompt_stub: str, test_size: Union[float, int], train_size: Union[float, int], train_test_split_seed: int )

The DatasetGenerator class is responsible for generating and formatting datasets for training and testing.


  • file_type: str - The type of input file ("json", "csv", or "huggingface").
  • path: str - The path to the input file or HuggingFace dataset.
  • prompt: str - The prompt template for formatting the dataset.
  • prompt_stub: str - The prompt stub used during training.
  • test_size: Union[float, int] - The size of the test set.
  • train_size: Union[float, int] - The size of the training set.
  • train_test_split_seed: int - The random seed for splitting the dataset.

(  )

Generates and returns the formatted train and test datasets.


A tuple containing the train and test datasets.

( save_dir: str )

Saves the generated dataset to the specified directory.


  • save_dir: str - The directory to save the dataset.

( save_dir: str )

Loads the dataset from a pickle file in the specified directory.


  • save_dir: str - The directory containing the dataset pickle file.


A tuple containing the loaded train and test datasets.