An extension of Llama 2 that supports a context of up to 128k tokens.
14.4K Pulls Updated 6 months ago
13b-128k
7.4GB
13b-64k
7.4GB
13b-128k-q4_0
7.4GB
13b-128k-q4_1
8.2GB
13b-128k-q5_0
9.0GB
13b-128k-q5_1
9.8GB
13b-128k-q8_0
14GB
13b-128k-q2_K
5.4GB
13b-128k-q3_K_S
5.7GB
13b-128k-q3_K_M
6.3GB
13b-128k-q3_K_L
6.9GB
13b-128k-q4_K_S
7.4GB
13b-128k-q4_K_M
7.9GB
13b-128k-q5_K_S
9.0GB
13b-128k-q5_K_M
9.2GB
13b-128k-q6_K
11GB
13b-128k-fp16
26GB
13b-64k-q4_0
7.4GB
13b-64k-q4_1
8.2GB
13b-64k-q5_0
9.0GB
13b-64k-q5_1
9.8GB
13b-64k-q8_0
14GB
13b-64k-q2_K
5.4GB
13b-64k-q3_K_S
5.7GB
13b-64k-q3_K_M
6.3GB
13b-64k-q3_K_L
6.9GB
13b-64k-q4_K_S
7.4GB
13b-64k-q4_K_M
7.9GB
13b-64k-q5_K_S
9.0GB
13b-64k-q5_K_M
9.2GB
13b-64k-q6_K
11GB
13b-64k-fp16
26GB
7b-128k
3.8GB
7b-64k
3.8GB
7b-128k-q4_0
3.8GB
7b-128k-q4_1
4.2GB
7b-128k-q5_0
4.7GB
7b-128k-q5_1
5.1GB
7b-128k-q8_0
7.2GB
7b-128k-q2_K
2.8GB
7b-128k-q3_K_S
2.9GB
7b-128k-q3_K_M
3.3GB
7b-128k-q3_K_L
3.6GB
7b-128k-q4_K_S
3.9GB
7b-128k-q4_K_M
4.1GB
7b-128k-q5_K_S
4.7GB
7b-128k-q5_K_M
4.8GB
7b-128k-q6_K
5.5GB
7b-128k-fp16
13GB
7b-64k-q4_0
3.8GB
7b-64k-q4_1
4.2GB
7b-64k-q5_0
4.7GB
7b-64k-q5_1
5.1GB
7b-64k-q8_0
7.2GB
7b-64k-q2_K
2.8GB
7b-64k-q3_K_S
2.9GB
7b-64k-q3_K_M
3.3GB
7b-64k-q3_K_L
3.6GB
7b-64k-q4_K_S
3.9GB
7b-64k-q4_K_M
4.1GB
7b-64k-q5_K_S
4.7GB
7b-64k-q5_K_M
4.8GB
7b-64k-q6_K
5.5GB
7b-64k-fp16
13GB
Updated 6 months ago
6 months ago
75df67be3cee · 3.8GB
model
archllama
·
parameters7B
·
quantization4-bit
3.8GB
params
{"num_ctx":65536}
17B
Readme
Yarn Llama 2 is a model based on Llama2 that extends its context size up to 128k context. It is developed by Nous Research by implementing the YaRN method to further train the model to support larger context windows.
CLI
64k context size:
ollama run yarn-llama2
128k context size:
ollama run yarn-llama2:7b-128k
API
Example:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "yarn-llama2:7b-128k",
"prompt":"Here is a story about llamas eating grass"
}'
References
YaRN: Efficient Context Window Extension of Large Language Models