argilla/ notus

Table adapted from Zephyr-7b-β and Starling’s original tables for MT-Bench and AlpacaEval benchmarks. Results are shown sorted by AlpacaEval win rates and ommit some >7B for brevity.

Notus stays on par with Zephyr on MT-Bench, while surpassing Zephyr and Claude 2 on AlpacaEval. Making Notus the most-competitive 7B commercial model on AlpacaEval.

Model	Size	Alignment	MT-Bench (score)	AlpacaEval (win rate %)	License
GPT-4-turbo	-	?	9.32	97.70	Proprietary
XwinLM 70b V0.1	70B	dPPO	-	95.57	LLaMA 2 License
GPT-4	-	RLHF	8.99	95.03	Proprietary
Tulu 2+DPO 70B V0.1	70B	dDPO	6.29	95.28	Proprietary
LLaMA2 Chat 70B	70B	RLHF	6.86	92.66	LLaMA 2 License
Starling-7B	7B	C-RLFT + APA	8.09	91.99	CC-BY-NC-4.0
Notus-7b-v1	7B	dDPO	7.30	91.42	MIT
Claude 2	-	RLHF	8.06	91.36	Proprietary
Zephyr-7b-β	7B	dDPO	7.34	90.60	MIT
Cohere Command	-	RLHF	-	90.62	Proprietary
GPT-3.5-turbo	-	RLHF	7.94	89.37	Proprietary

Usage

CLI

ollama run argilla/notus

API

Example:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "notus",
  "prompt":"Here is a story about llamas eating grass"
}'

Context

You can find the entire process of the creation of Notus in our blogpost.

Notus is a collection of fine-tuned models using SFT, DPO, SFT+DPO, and/or any other RLAIF/RLHF techniques; following a data-first, human-centric approach, since that’s what we do best at Argilla.

Notus models are intended to be used as assistants via chat-like applications, and are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison with other similar LLMs.

Notus name comes from the ancient Greek god Notus, as a wink to Zephyr, which comes from the ancient Greek god Zephyrus; with the difference that Notus is the god of the south wind, and Zephyr the god of the west wind. More information at https://en.wikipedia.org/wiki/Anemoi.

Being able to fine-tune LLMs while still keeping a data-first approach wouldn’t have been possible without the inestimable help of the open source community and all the amazing resources out there intended for the general public. We are very grateful for that, and we hope that our work can be useful for others as well.

🎩 h/t HuggingFace H4 team for their amazing work with alignment-handbook, and also for the fruitful discussions we had with them and their support.

News

December 1st, 2023: Notus 7B v1 is released! 🎉 Using the same DPO fine-tuning approach as Zephyr 7B Beta, but changing the data source from UltraFeedback to binarize it using the average of the different criterias, instead of the critique score. Notus 7B improved in both AlpacaEval and LM Eval Harness compared to Zephyr 7B Beta, while for MT-Bench the results were on par. More information at v1/.

Resources

🤗 HuggingFace Hub Collection

Available at: Hugging Face

💬 Chat UI

Chat with Notus at Hugging Face Spaces (powered by Hugging Face Chat UI)

<div align="center">
  <h1>💨 Notus</h1>
  <img src="https://github.com/argilla-io/notus/assets/36760800/95468857-14cf-42be-9412-45e186d7ba80" alt="A banner representing Notus, the wind god of the south, in a mythical and artistic style. The banner features a strong, swirling breeze, embodying the warm, wet character of the southern wind. Gracefully flowing across the scene are several paper planes, caught in the gentle yet powerful gusts of Notus. The background is a blend of warm colors, symbolizing the heat of the south, with hints of blue and green to represent the moisture carried by this wind. The overall atmosphere is one of dynamic movement and warmth."/>
</div>

---

# Summary

- Developed by: Argilla (based on HuggingFace H4 and MistralAI)
- Shared by: Argilla
- Hugging Face Hub: https://huggingface.co/argilla/notus-7b-v1
- Model type: GPT-like 7B model DPO fine-tuned
- Language(s) (NLP): Mainly English
- License: MIT (same as Zephyr 7B-beta)
- Finetuned from model: alignment-handbook/zephyr-7b-sft-full
- Repository: https://github.com/argilla-io/notus
- Paper: N/A
- Play with Notus on HuggingChat: https://argilla-notus-chat-ui.hf.space/

# Performance

Table adapted from Zephyr-7b-β and Starling's original tables for [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks. Results are shown sorted by AlpacaEval win rates and ommit some >7B for brevity.

Notus stays on par with Zephyr on MT-Bench, while surpassing Zephyr and Claude 2 on AlpacaEval. Making Notus the most-competitive 7B commercial model on AlpacaEval.

<table>
    <tr>
        <th>Model</th>
        <th>Size</th>
        <th>Alignment</th>
        <th>MT-Bench (score)</th>
        <th>AlpacaEval (win rate %)</th>
        <th>License</th>
    </tr>
    <tr>
        <td>GPT-4-turbo</td>
        <td>-</td>
        <td>?</td>
        <td>9.32</td>
        <td>97.70</td>
        <td>Proprietary</td>
    </tr>
    <tr>
        <td>XwinLM 70b V0.1</td>
        <td>70B</td>
        <td>dPPO</td>
        <td>-</td>
        <td>95.57</td>
        <td>LLaMA 2 License</td>
    </tr>
    <tr>
        <td>GPT-4</td>
        <td>-</td>
        <td>RLHF</td>
        <td>8.99</td>
        <td>95.03</td>
        <td>Proprietary</td>
    </tr>
    <tr>
        <td>Tulu 2+DPO 70B V0.1</td>
        <td>70B</td>
        <td>dDPO</td>
        <td>6.29</td>
        <td>95.28</td>
        <td>Proprietary</td>
    </tr>
    <tr>
        <td>LLaMA2 Chat 70B</td>
        <td>70B</td>
        <td>RLHF</td>
        <td>6.86</td>
        <td>92.66</td>
        <td>LLaMA 2 License</td>
    </tr>
    <tr>
        <td>Starling-7B</td>
        <td>7B</td>
        <td>C-RLFT + APA</td>
        <td><strong>8.09</strong></td>
        <td><strong>91.99</strong></td>
        <td>CC-BY-NC-4.0</td>
    </tr>
    <tr style="background-color: #FFFF99;">
        <td><strong>Notus-7b-v1</strong></td>
        <td>7B</td>
        <td>dDPO</td>
        <td>7.30</td>
        <td>91.42</td>
        <td>MIT</td>
    </tr>
    <tr>
        <td>Claude 2</td>
        <td>-</td>
        <td>RLHF</td>
        <td>8.06</td>
        <td>91.36</td>
        <td>Proprietary</td>
    </tr>
    <tr>
        <td>Zephyr-7b-β</td>
        <td>7B</td>
        <td>dDPO</td>
        <td>7.34</td>
        <td>90.60</td>
        <td>MIT</td>
    </tr>
    <tr>
        <td>Cohere Command</td>
        <td>-</td>
        <td>RLHF</td>
        <td>-</td>
        <td>90.62</td>
        <td>Proprietary</td>
    </tr>
    <tr>
        <td>GPT-3.5-turbo</td>
        <td>-</td>
        <td>RLHF</td>
        <td>7.94</td>
        <td>89.37</td>
        <td>Proprietary</td>
    </tr>
</table>

# Usage

## CLI

```
ollama run argilla/notus
```

## API

Example:

```
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "notus",
  "prompt":"Here is a story about llamas eating grass"
}'
```

# Context

You can find the entire process of the creation of Notus in <a href="https://argilla.io/blog/notus7b/" rel="nofollow">our blogpost</a>.

Notus is a collection of fine-tuned models using SFT, DPO, SFT+DPO, and/or any other RLAIF/RLHF techniques; following a data-first, human-centric approach, since that's what we do best at <a href="https://argilla.io/blog/notus7b/" rel="nofollow">Argilla</a>.

Being able to fine-tune LLMs while still keeping a data-first approach wouldn't have been possible without the inestimable help of the open source community and all the amazing resources out there intended for the general public. We are very grateful for that, and we hope that our work can be useful for others as well.

🎩 h/t HuggingFace H4 team for their amazing work with [`alignment-handbook`](https://github.com/huggingface/alignment-handbook), and also for the fruitful discussions we had with them and their support.

## News

* **December 1st, 2023**: Notus 7B v1 is released! 🎉 Using the same DPO fine-tuning approach as Zephyr 7B Beta, but changing the data source from UltraFeedback to binarize it using the average of the different criterias, instead of the critique score. Notus 7B improved in both AlpacaEval and LM Eval Harness compared to Zephyr 7B Beta, while for MT-Bench the results were on par. More information at [`v1/`](./v1/).

## Resources

### 🤗 HuggingFace Hub Collection

<div align="center">
  <img width="680" alt="image" src="https://github.com/argilla-io/notus/assets/36760800/08876ba2-ee55-4b80-9256-e0809fb2baf0">
  <p>Available at: <a href="https://huggingface.co/collections/argilla/notus-7b-v1-655529d7c73cb6c830e9555a" rel="nofollow">Hugging Face</a></p>
</div>

### 💬 Chat UI

<div align="center">
  <img width="1624" alt="image" src="https://github.com/argilla-io/notus/assets/36760800/a950f7f2-74ea-4873-a314-3afd1d4d7ac8">
  <p>Chat with Notus at <a href="https://argilla-notus-chat-ui.hf.space/" rel="nofollow">Hugging Face Spaces</a> (powered by <a href="https://github.com/huggingface/chat-ui" rel="nofollow">Hugging Face Chat UI</a>)</p>
</div>

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)