LLM Fine-tuning Toolkit

Github

Getting Started

LLM Fine-tuning toolkit is a config-based CLI tool for launching a series of fine-tuning experiments and gathering their results. From one single yaml config file, users can define the following:

Data

Bring a own dataset in any of json, csv, and huggingface formats.
Define prompt format and inject desired columns into the prompt.

Fine Tuning

Configure desired hyperparameters for quantization and LoRA fine-tune.

Ablation

Define multiple hyperparameter settings to iterate through.

Inference

Configure desired sampling algorithm and parameters.

Testing

Test desired properties such as length and similarity against reference text.

Content

This documentation page is organized in the following sections:

Quick Start provides a quick overview of the toolkit and helps get started running own experiments.
Configuration walks through all the changes that can be made to customize the experiments.
Developer Guides goes over how to extend each component for custom use-cases and for contributing to this toolkit.
API Reference details the underlying modules of this toolkit.

Installation

Clone Repository

git clone https://github.com/georgian-io/LLM-Finetuning-Hub.git
cd LLM-Finetuning-Hub/

Install CLI

docker (recommended) poetry (recommended) pip conda

docker (recommended)

# build image
docker build -t llm-toolkit .
# launch container
docker run -it llm-toolkit # with CPU
docker run -it --gpus all llm-toolkit # with GPU

poetry (recommended)

First, make sure poetry is installed

Then run:

poetry install

pip

pip install -r requirements.txt

conda

conda create --name llm-toolkit python=3.11
conda activate llm-toolkit
pip install -r requirements.txt

Running the Toolkit

This guide is intended to walk through the initial setup, explain the key components of the configuration, and offer advice on customizing the fine-tuning job.

First, make sure you have read the installation guide above and installed all the dependencies. Then, to launch a LoRA fine-tuning job, run the following command in your terminal:

python3 toolkit.py

This command initiates the fine-tuning process using the settings specified in the default YAML configuration file config.yaml.

save_dir: "./experiment/"

ablation:
  use_ablate: false

# Data Ingestion -------------------
data:
  file_type: "huggingface" # one of 'json', 'csv', 'huggingface'
  path: "yahma/alpaca-cleaned"
  prompt:
    >- # prompt, make sure column inputs are enclosed in {} brackets and that they match your data
    Below is an instruction that describes a task. 
    Write a response that appropriately completes the request. 
    ### Instruction: {instruction}
    ### Input: {input}
    ### Output:
  prompt_stub:
    >- # Stub to add for training at the end of prompt, for test set or inference, this is omitted; make sure only one variable is present
    {output}
  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
  train_test_split_seed: 42

# Model Definition -------------------
model:
  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
  quantize: true
  bitsandbytes:
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"

# LoRA Params -------------------
lora:
  task_type: "CAUSAL_LM"
  r: 32
  lora_alpha: 16
  lora_dropout: 0.1
  target_modules:
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj

# Training -------------------
training:
  training_args:
    num_train_epochs: 5
    per_device_train_batch_size: 4
    optim: "paged_adamw_32bit"
    learning_rate: 2.0e-4
    bf16: true # Set to true for mixed precision training on Newer GPUs
    tf32: true
  sft_args:
    max_seq_length: 1024

inference:
  max_new_tokens: 1024
  do_sample: True
  top_p: 0.9
  temperature: 0.8

Quick Start

Artefact Output

This config will run finetuning and save the artefacts under directory ./experiment/[unique_hash]. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to stop the training before it finishes, you can relaunch the script and the program will automatically load the existing dataset that was generated in the directory, allowing you to resume where you left off instead of starting over from the beginning.

Fine-tuning

Launch Fine-tuning Job

This guide will walk you through the initial setup, explain the key components of the configuration, and offer advice on customizing your fine-tuning job.

First, read the installation guide and install all the dependencies. Then, to launch a LoRA fine-tuning job, run the following command in your terminal:

python3 toolkit.py

Default Config

This command initiates the fine-tuning process using the settings specified in the default YAML configuration file config.yaml.

Artefact Outputs

This config will run fine-tuning and save the artefacts under directory ./experiment/[unique_hash]. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to stop the training before it finishes, you can relaunch the script and the program will automatically load the existing dataset that was generated in the directory, allowing you to resume where you left off instead of starting over from the beginning.

After the script finishes running you will see these distinct artifacts:

/config/config.yml: copy of the config file used for this experiment

/dataset/dataset.pkl: generated pkl file in huggingface Dataset format

/model/*: model weights saved using huggingface format

/results/results.csv: csv of prompt, ground truth, and predicted values

/qa/qa.csv: csv of quality assurance unit tests (e.g. vector similarity between gold and predicted output)

Custom Fine-tuning

You can modify config.yaml to launch custom training jobs. For a more detailed and nuanced treatment of what you can input into the config file, please reference the "Configuration" section of the documentation.

Loading Custom Datasets

Change the file_type and path under data in config.yml to point to your custom dataset. Ensure your dataset is properly formatted and adjust the prompt accordingly.

New Config Old Config

New Config

...
data:
  file_type: "csv"
  path: "path/to/your/dataset.csv"
  prompt: "Your custom prompt template with {column_name} placeholders"
...

Old Config

...
data:
  file_type: "huggingface"
  path: "yahma/alpaca-cleaned"
  prompt:
    >- Below is an instruction that describes a task.
       Write a response that appropriately completes the request.
       ### Instruction: {instruction}
       ### Input: {input}
       ### Output:
...

Changing LoRA Rank

Adjust the r and lora_alpha parameters in the LoRA section to experiment with different adaptation strengths.

New Config Old Config

New Config

...
lora:
  r: 64
  lora_alpha: 32
...

Old Config

...
lora:
  r: 32
  lora_alpha: 16
...

Changing Base Model

Modify hf_model_ckpt to fine-tune a different base model. Ensure it is compatible with your task and make sure to specify the right modules to tune (different models may have different module names).

New Config Old Config

New Config

...
model:
  hf_model_ckpt: "EleutherAI/gpt-neo-1.3B"
  target_modules:
    - c_attn
    - c_proj
    - c_fc
    - c_mlp.0
    - c_mlp.2
...

Old Config

...
model:
  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
  target_modules:
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj
...

In the new config snippet for changing the model, we've updated the hf_model_ckpt to use the "EleutherAI/gpt-neo-1.3B" model instead of "NousResearch/Llama-2-7b-hf". We've also adjusted the target_modules to match the module names specific to the GPT-Neo architecture.

WARNING

Remember to carefully review the documentation and requirements of the new model you choose to ensure compatibility with your task and the toolkit.

Quality Assurance

Once the model is trained, verify its readiness for production. Quality Assurance testing specifically tailored for Language Model applications may be useful in verifying the model. This approach is distinct from conventional testing methods, as there’s currently no direct means of ensuring that a fine-tuned model meets enterprise standards. Moreover, developers have the flexibility to integrate their own tests into the process.

Available Tests

Generation Property

Generation Length

Function: LengthTest
Description: Determines the length of the summarized output and the input sentence. The output length is expected to exceed the input length, aligning with the specific use case.

POS Composition

Description: Analyzes the grammar of the generated output, focusing on:
- Verb Percentage: Indicates the proportion of verbs present.
- Adjective Percentage: Indicates the proportion of adjectives present.
- Noun Percentage: Indicates the proportion of nouns present.

Word Similarity

Word Overlap

Function: WordOverLapTest
Description: Determines the length of the summarized output and the input sentence. The output length is expected to exceed the input length, aligning with the specific use case.

ROUGE Score

Function: RougeScore
Description: Computes the Rouge score for the output, providing insight into the quality of summarization.

Embedding Similarity

Jaccard Similarity

Function: JaccardSimilarity
Description: Calculates similarity by encoding inputs and outputs.

Dot Product (Cosine) Similarity

Function: DotProductSimilarity
Description: Computes the dot product between the encoded inputs and outputs.

Configuration

General Structure

The configuration file has a hierarchical structure with the following main sections:

save_dir: The directory where the experiment results will be saved.
ablation: Settings for ablation studies.
data: Configuration for data ingestion.
model: Model definition and settings.
lora: Configuration for LoRA (Low-Rank Adaptation).
training: Settings for the training process.
inference: Configuration for the inference stage.

Each section contains subsections and parameters that fine-tune the behavior of the toolkit.

Data

The data section defines how the input data is loaded and preprocessed. It includes the following parameters:

Parameters

file_type: The type of the input file, which can be "json", "csv", or "huggingface".
path: The path to the input file or the name of the Hugging Face dataset.
prompt: The prompt template used for formatting the input data. Use brackets to specify column names.
prompt_stub: The prompt stub used during training (i.e. this will be omitted during inference for completion). Use brackets to specify the column name.
train_size: The size of the training set, either as a float (proportion) or an integer (number of examples).
test_size: The size of the test set, either as a float (proportion) or an integer (number of examples).
train_test_split_seed: The random seed used for splitting the data into train and test sets.

Example

data:
  file_type: "csv"
  path: "path/to/your/dataset.csv"
  prompt: >-
    Below is an instruction that describes a task.
    Write a response that appropriately completes the request.
    ### Instruction: {instruction}
    ### Input: {input}
    ### Output:
  prompt_stub: >-
    {output}
  test_size: 0.1
  train_size: 0.9
  train_test_split_seed: 42

Model

The model section defines the base model and load settings. It includes the following parameters:

Parameters

hf_model_ckpt: The path or name of the pre-trained model checkpoint from the Hugging Face Model Hub.
device_map: The device map for model parallelism. Set to "auto" for automatic device mapping or specify a custom device map.
quantize: Boolean flag to enable quantization of the model weights; if true, then loads it with bitsandbytes config
bitsandbytes: Settings for quantization using BitsAndBytesConfig object within transformers.
load_in_8bit: Flag to enable 8-bit quantization.
llm_int8_threshold: Outlier threshold for 8-bit quantization.
llm_int8_skip_modules: List of module names to exclude from 8-bit quantization.
llm_int8_enable_fp32_cpu_offload: Flag to enable offloading of non-quantized weights to CPU.
load_in_4bit: Flag to enable 4-bit quantization using bitsandbytes.
bnb_4bit_compute_dtype: Compute dtype for 4-bit quantization.
bnb_4bit_quant_type: Quantization data type for 4-bit quantization.
bnb_4bit_use_double_quant: Flag to enable double quantization for 4-bit quantization.

Example

model:
  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
  device_map: "auto"
  quantize: true
  bitsandbytes:
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"

LoRA

The lora section configures the Low-Rank Adaptation (LoRA) settings. Supplied arguments are used to construct a peft LoraConfig object. It includes the following parameters:

Parameters

task_type: Type of transformer architecture; for decoder only - use CAUSAL_LM. for encoder-decoder - use SEQ_2_SEQ_LM
r: The rank of the LoRA adaptation matrices.
lora_alpha: The scaling factor for the LoRA adaptation.
lora_dropout: The dropout probability for the LoRA layers.
target_modules: The list of module names to apply LoRA to.
fan_in_fan_out: Flag to indicate if the layer weights are stored in a (fan_in, fan_out) order.
modules_to_save: List of additional module names to save in the final checkpoint.
layers_to_transform: The list of layer indices to apply LoRA to.
layers_pattern: The regular expression pattern to match layer names for LoRA application.

Examples

lora:
  r: 32
  lora_alpha: 16
  lora_dropout: 0.1
  target_modules:
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj
  fan_in_fan_out: false
  modules_to_save: null
  layers_to_transform: null
  layers_pattern: null

Advanced Settings

fan_in_fan_out

The fan_in_fan_out parameter is a boolean flag that indicates whether the weights of the layers being adapted are stored in a (fan_in, fan_out) order. This boolean flag is important for correctly applying the LoRA adaptation.

Example

lora:
  fan_in_fan_out: true

In this example, setting fan_in_fan_out to true indicates that the weights of the layers being adapted are stored in a (fan_in, fan_out) order. If the weights are stored in a different order, you should set this parameter to false.

layers_to_transform

The layers_to_transform parameter is used to specify the indices of the layers to which LoRA should be applied. This parameter allows the user you to selectively apply LoRA to specific layers of the model.

Example

lora:
  layers_to_transform: [2, 4, 6]

In this example, LoRA will be applied to the layers with indices 2, 4, and 6. The layer indices are zero-based, so the first layer has an index of 0, the second layer has an index of 1, and so on.

You can also specify a single layer index:

Example

lora:
  layers_to_transform: 3

In this case, LoRA will be applied only to the layer with index 3.

layers_pattern

The layers_pattern parameter allows the user to specify a regular expression pattern to match the names of the layers to which LoRA should be applied. This provides a more flexible way to select layers based on their names.

Example

lora:
  layers_pattern: "transformer\.h\.\d+\.attn"

In this example, the regular expression pattern transformer\.h\.\d+\.attn will match the names of the attention layers in a transformer model. The pattern will match layer names like transformer.h.0.attn, transformer.h.1.attn, and so on.

You can adjust the regular expression pattern to match the specific layer names in your model.

Training

The training section configures the training process. It includes two subsections:

Parameters

training_args: General training arguments such as the number of epochs, batch size, gradient accumulation steps, optimizer, learning rate, etc.
- num_train_epochs: Number of training epochs.
- per_device_train_batch_size: Batch size per training device.
- gradient_accumulation_steps: Number of steps for gradient accumulation.
- gradient_checkpointing: Flag to enable gradient checkpointing.
- optim: Optimizer to use for training.
- logging_steps: Number of steps between logging.
- learning_rate: Learning rate for the optimizer.
- bf16: Flag to enable BF16 mixed-precision training.
- tf32: Flag to enable TF32 mixed-precision training.
- fp16: Flag to enable FP16 mixed-precision training.
- max_grad_norm: Maximum gradient norm for gradient clipping.
- warmup_ratio: Ratio of total training steps used for a linear warmup.
- lr_scheduler_type: Type of learning rate scheduler.
sft_args: Arguments specific to the SFT (Supervised Fine-Tuning) process.
- max_seq_length: Maximum sequence length for input sequences.
- neftune_noise_alpha: Alpha parameter for NEFTUNE noise embeddings. If not None, activates NEFTUNE noise embeddings.

Example

training:
  training_args:
    num_train_epochs: 5
    per_device_train_batch_size: 4
    gradient_accumulation_steps: 4
    gradient_checkpointing: true
    optim: "paged_adamw_32bit"
    logging_steps: 100
    learning_rate: 2.0e-4
    bf16: true
    tf32: true
    max_grad_norm: 0.3
    warmup_ratio: 0.03
    lr_scheduler_type: "constant"
  sft_args:
    max_seq_length: 5000
    neftune_noise_alpha: null

Inference

The inference section sets the parameters for the inference stage. It includes:

Parameters

max_new_tokens: The maximum number of new tokens to generate.
use_cache: Whether to use the cache during inference.
do_sample: Whether to use sampling during inference.
top_p: The cumulative probability threshold for top-p sampling.
temperature: The temperature value for sampling.

Example

inference:
  max_new_tokens: 1024
  use_cache: true
  do_sample: true
  top_p: 0.9
  temperature: 0.8

Quality Assurance

🚧 The qa section is not yet directly configurable in the provided configuration file and is currently being integrated into the CLI toolkit. In the meantime, however, you can manually execute and extend the toolkit to include quality assurance tests by implementing custom test classes that inherit from LLMQaTest.

Ablation

The ablation section controls the settings for ablation studies. It includes:

Parameters

use_ablate: Whether to perform ablation studies.
study_name: The name of the ablation study.

TIP

When use_ablate is set to true, the toolkit will generate multiple configurations by permuting the specified parameters. This allows you to compare different settings and their impact on the model's performance.

Example

ablation:
  use_ablate: true
  study_name: "ablation_study_1"

Putting it All Together

To create a custom configuration file, start by copying the provided template and modify the parameters according to your needs. Pay attention to the structure and indentation of the YAML file to ensure it is parsed correctly.

Once you have defined your configuration, you can run the toolkit with your custom settings. The toolkit will load the configuration file, preprocess the data, train the model, perform inference and optionally run quality assurance tests and ablation studies based on the configuration.

Remember to adjust the paths, prompts and other parameters to match the user’s specific use case. Experiment with different settings to find the optimal configuration for the task.

Example

Here's an example of a complete configuration file combining all the sections:

save_dir: "./experiments"

ablation:
  use_ablate: true
  study_name: "ablation_study_1"

data:
  file_type: "csv"
  path: "path/to/your/dataset.csv"
  prompt: "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Input: {input} ### Output:"
  prompt_stub: "{output}"
  test_size: 0.1
  train_size: 0.9
  train_test_split_seed: 42

model:
  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
  device_map: "auto"
  quantize: true
  bitsandbytes:
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"

lora:
  r: 32
  lora_alpha: 16
  lora_dropout: 0.1
  target_modules:
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj
  fan_in_fan_out: false
  modules_to_save: null
  layers_to_transform: null
  layers_pattern: null

training:
  training_args:
    num_train_epochs: 5
    per_device_train_batch_size: 4
    gradient_accumulation_steps: 4
    gradient_checkpointing: true
    optim: "paged_adamw_32bit"
    logging_steps: 100
    learning_rate: 2.0e-4
    bf16: true
    tf32: true
    max_grad_norm: 0.3
    warmup_ratio: 0.03
    lr_scheduler_type: "constant"
  sft_args:
    max_seq_length: 5000
    neftune_noise_alpha: null

inference:
  max_new_tokens: 1024
  use_cache: true
  do_sample: true
  top_p: 0.9
  temperature: 0.8

Developer Guide

Extending Modules

The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, fine tuning, inference and quality assurance testing, is designed to be easily extendable.

General Guidelines

There are various scenarios where you might want to extend a particular module of the toolkit. For example:

Data Ingestion: If you have a custom data format or source that is not supported out of the box, the Ingestor class can be extended to handle your specific data format. For instance, if you have data stored in a proprietary binary format, you can create a new subclass of Ingestor that reads and processes your binary data and converts it into a compatible format for the toolkit.

Fine-tuning: If you want to experiment with different fine-tuning techniques or modify the fine-tuning process, you can extend the Finetune class. For example, if you want to incorporate a custom loss function or implement a new fine-tuning algorithm, you can create a subclass of Finetune and override the necessary methods to include your custom logic.

Inference: If you need to modify the inference process or add custom post-processing steps, you can extend the Inference class. For instance, if you want to apply domain-specific post-processing to the generated text or integrate the inference process with an external API, you can create a subclass of Inference and implement your custom functionality.

Quality Assurance (QA) Testing: If you have specific quality metrics or evaluation criteria that are not included in the existing QA tests, you can extend the LLMQaTest class to define your own custom tests. For example, if you want to evaluate the generated text based on domain-specific metrics or compare it against a custom benchmark, you can create a new subclass of LLMQaTest and implement your custom testing logic.

By extending the toolkit's components, you can tailor it to your specific requirements and incorporate custom functionality that is not provided by default. This flexibility allows you to adapt the toolkit to various domains, data formats, and evaluation criteria.

In the following sections, we will provide detailed guidance on how to extend each component of the toolkit, along with code examples.

Extending Data Ingestor

To extend the data ingestor component, follow these steps:

Open the file src/data/ingestor.py.
Define a new class that inherits from the abstract base class Ingestor.
Implement the required abstract method to_dataset in your custom ingestor class. This method should load and preprocess the data from the specified source and return a Dataset object.
Update the `get_ingestor` function to include your custom ingestor class based on a new file type or data source.

Example

from src.data.ingestor import Ingestor

class CustomIngestor(Ingestor):
    def __init__(self, path):
        self.path = path

    def to_dataset(self):
        # Implement the logic to load and preprocess data from the specified path
        ...

def get_ingestor(data_type):
    if data_type == "custom":
        return CustomIngestor
    ...

Extending Finetuning

To extend the finetuning component, follow these steps:

Create a new file in the src/finetune directory, e.g., custom_finetune.py.
In this file, define a new class that inherits from the abstract base class Finetune from src/finetune/finetune.py.
Implement the required abstract methods finetune and save_model in your custom finetuning class.
The finetune method should take the training dataset and perform the finetuning process using the provided configuration.
The save_model method should save the fine tuned model to the specified directory.
Modify the toolkit.py file to import your custom finetuning class and use it instead of the default LoRAFinetune class if needed.

Example

from src.finetune.finetune import Finetune

class CustomFinetune(Finetune):
    def finetune(self, train_dataset: Dataset):
        # Implement your custom finetuning logic here
        ...

    def save_model(self):
        # Implement the logic to save the finetuned model
        ...

Extending Inference

To extend the inference component, follow these steps:

Create a new file in the src/inference directory, e.g., custom_inference.py.
In this file, define a new class that inherits from the abstract base class Inference from src/inference/inference.py.
Implement the required abstract methods infer_one and infer_all in your custom inference class.
The infer_one method should take a single prompt and generate the model's prediction.
The infer_all method should iterate over the test dataset and generate predictions for each example.
Modify the toolkit.py file to import your custom inference class and use it instead of the default LoRAInference class if needed.

Example

from src.inference.inference import Inference

class CustomInference(Inference):
    def infer_one(self, prompt: str):
        # Implement the logic to generate a prediction for a single prompt
        ...

    def infer_all(self):
        # Implement the logic to generate predictions for the entire test dataset
        ...

Extending QA Test

To extend the quality assurance (QA) tests, follow these steps:

Open the file src/qa/qa_tests.py.
Define a new class that inherits from the abstract base class LLMQaTest from src/qa/qa.py.
Implement the required abstract property test_name and the abstract method get_metric in your custom QA test class.
The test_name property should return a string representing the name of the test.
The get_metric method should take the prompt, ground truth, and model prediction, and return a metric value (e.g., float, int, or bool) indicating the test result.
Include instances of new CustomQATest when instantiating the LLMTestSuite object.

Example

from src.qa.qa import LLMQaTest

class CustomQATest(LLMQaTest):
    @property
    def test_name(self):
        return "Custom QA Test"

    def get_metric(self, prompt, ground_truth, model_prediction):
        # Implement the logic to calculate the metric for the custom QA test
        ...


test_suite = LLMTestSuite([JaccardSimilarityTest(), CustomQATest()], prompts, ground_truths, model_preds)

Contribution Guide

Thank you for your interest in contributing to this open-source project.

Getting Started

To start contributing to the project, follow these steps:

Fork the repository on GitHub.
Clone your forked repository to your local machine.
Create a new branch for your feature or bug fix.
Make your changes and commit them with descriptive commit messages.
Push your changes to your forked repository.
Submit a pull request to the main repository's main branch.

Before submitting a pull request, ensure that your code follows the project's coding style and passes all existing tests (🚧 work in progress).

Development Setup

To set up the development environment, follow these steps:

Install the required dependencies using recommended installation methods.
Run the existing tests to ensure everything is functioning correctly (🚧 work in progress).

Coding Guidelines

When contributing code to the project, please adhere to the following guidelines:

Follow the PEP 8 style guide for Python code.
Use black formatter
Use meaningful variable and function names that clearly convey their purpose.
Write docstrings for classes, methods, and functions to provide clear documentation.
Include inline comments to explain complex or non-obvious code sections.
Break down large functions or methods into smaller, reusable components.
Write unit tests for new features or bug fixes to ensure code correctness.

Contributing to Documentation

Improvements to the project's documentation (either the documentation site or docstrings in codebase) are appreciated. If you find any errors, inconsistencies or areas that need clarification, please feel free to submit a pull request with the necessary changes.

When contributing to the documentation, follow these guidelines:

Use clear and concise language.
Provide step-by-step instructions or examples when appropriate.
Ensure that the documentation is up to date with the latest changes in the codebase.
Maintain a consistent formatting and structure throughout the documentation.

Reporting Issues

If you encounter any bugs or issues, or have suggestions for improvements, please submit an issue on the project's GitHub repository. When submitting an issue, provide as much detail as possible, including:

A clear and descriptive title.
Steps to reproduce the issue or bug.
Expected behavior and actual behavior.
Any relevant error messages or logs.
Your operating system and Python version.

PR Process

When submitting a pull request, please ensure that:

Your code adheres to the project's coding guidelines.
Your changes are well-tested and do not introduce new bugs.
Your commit messages are descriptive and explain the purpose of the changes.
You have updated the relevant documentation, if necessary.
Once your pull request is submitted, the project maintainers will review your changes and provide feedback. Be prepared to make revisions or address any concerns raised during the review process.

Licensing

By contributing to this project, you agree that your contributions will be licensed under the LICENSE file in the repository.

API Reference

Main Classes

Data

Ingestors

Ingestor

class