Commit 50fe2fb3 authored by Lea Kyveli Chrysanthopoulou's avatar Lea Kyveli Chrysanthopoulou
Browse files

Add dates to project plan & small adjustments

parent cc1381cf
Loading
Loading
Loading
Loading
+15 −14
Original line number Diff line number Diff line
@@ -12,9 +12,9 @@ Chris Pracht, Finn Hillengass, Lea Kyveli Chrysanthopoulou <br>

Our project aims to leverage different prompting architectures to extract Task Oriented Dialogue (TOD) datasets from Large Language Models (LLMs) that are available free of charge, e.g., Llama-2 (7B, 13B or 32B). This has the aim of bulding high-quality TOD datasets at low cost, which could then be used to train smaller chatbots on, enhancing their performance. To achieve this aim, we intend to deploy a variety of automated metrics, such as GRUEN, DEAM, GRADE and FactScore. Based on the performance of the prompting architectures on the metrics, we then intend to optimize and refine them as needed.

## Flow Chart
## Visualisation of Project Plan

<!-- TODO: Add Flow Chart -->
![](project_map.png)

## Methods

@@ -33,7 +33,7 @@ Finally, if there is time, we want to try a **MultiAgent Approach**, in which tw

## Models

While selecting the LLM for our project, one of our concerns was for it to be open-source, i.e., for it to be available to download and for us to host it ourselves, so we would not have to pay for an API or interact with it manually over its web-interface, as would be the case with a GPT model. 
While selecting the LLM for our project, one of our concerns was for it to be open-source, i.e., for it to be available to download and for us to host it ourselves, so we would not have to pay for an API or interact with it manually over its web-interface, as would be the case with a Chat-GPT model. 

Of the open-source models available, we decided on LLama-2 [(Touvron et al. 2023)](#8-touvron-hugo-louis-martin-kevin-stone-peter-albert-amjad-almahairi-yasmine-babaei-nikolay-bashlykov-et-al-2023-llama-2-open-foundation-and-fine-tuned-chat-models-arxiv-httparxivorgabs230709288) in its fine-tuned chat versions. We were able to run the 7B version on the CLuster and locally. By utilizing both GPU-nodes of the CLuster, we should be able to run Llama-2-13B. Should we get access to the bwForCluster, even the 70B model would be possible to use for inference.

@@ -94,29 +94,30 @@ The tools we will be using for our project are the following:

## Project Timeline

1. Preparation and Setup (TBD: Target Date): Parallel Processes
    - Set up knowledge database and API
1. Preparation and Setup (Target Date: 12.12): Parallel Processes
    - Set up knowledge database and API (Chris)
        - Extract knowledge from WOZ slots into Common Knowledge (CK) for Prompt Injection (PI) and Specific Knowledge (SK) for Knowledge Graph Retrieval (KGR)
        - Import into $DB
        - Embed datapoints with SBERT
        - Implement LangChain Retrieval Augmented Generation (RAG) adapter
    - Select and prepare models and access to computing power (CLuster & bwForCluster)
    - Select and prepare models and access to computing power (CLuster & bwForCluster) (Finn)
        - Evaluate memory requirements and hardware availability
        - Select model
        - Test workflows
    - Implement automated metrics on preliminary data
2. Join Parallel Processes (TBD: Target Date)
    - Generate dialogues using varied methods.
    - Implement automated metrics on preliminary data (Lea)
2. Join Parallel Processes (Target Date: 02.01)
    - Generate dialogues using varied methods (Chris & Finn)
        - One Shot approach
        - RAG approach
    - Apply evaluation metrics for quality assessment.
    - Apply evaluation metrics for quality assessment (Finn & Lea)
        - Develop processing pipeline for automated evaluation
        - Plotting of results
3. Comparison and Optimization (TBD: Target Date)
    - Analyze results from different methods based on the metrics.
    - Potentially adjust prompting architecture based on metric insights.
3. Comparison and Optimization (Target Date: 23.01) (Chris & Finn & Lea)
    - Analyze results from different methods based on the metrics
    - Potentially adjust prompting architecture based on metric insights
4. Given Spare Time:
​​​​​​​​​​  - ​Multi-Agent Approach
    - ​Multi-Agent Approach
​​​​​

## References