AI: Difference between revisions
m (→Benchmarks) |
m (→LLama) |
||
| Line 57: | Line 57: | ||
[https://github.com/ggerganov/llama.cpp/issues/603#issuecomment-1490136086 benchmark_threads.txt] |
[https://github.com/ggerganov/llama.cpp/issues/603#issuecomment-1490136086 benchmark_threads.txt] |
||
https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g - <code>python3 llama.py vicuna-AlekseyKorshuk-7B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g.safetensors</code> |
|||
== Benchmarks == |
== Benchmarks == |
||
Revision as of 18:33, 14 April 2023
LLama
https://wiki.installgentoo.com/wiki/Home_server#Expanding_Your_Storage
https://rentry.org/llama-tard-v2
https://hackmd.io/@reneil1337/alpaca
https://boards.4channel.org/g/catalog#s=lmg%2F
https://find.4chan.org/?q=AI+Dynamic+Storytelling+General
https://find.4chan.org/?q=AI+Chatbot+General
https://find.4chan.org/?q=%2Flmg%2F (local models general)
https://boards.4channel.org/g/thread/92400764#p92400764
https://files.catbox.moe/lvefgy.json
https://pytorch.org/hub/nvidia_deeplearningexamples_tacotron2/
python server.py --model llama-7b-4bit --wbits 4python server.py --model llama-13b-4bit-128g --wbits 4 --groupsize 128
https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59 for installing with out of space error
https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode
https://github.com/pybind/pybind11/discussions/4566
https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_with_llama.cpp%3A
https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g
Vicuna generating its own prompts
should be worse than Q4_1 (which is QK=32) but there are several PRs in the work that should improve quantization accuracy in general
https://github.com/ggerganov/llama.cpp/pull/729
https://github.com/ggerganov/llama.cpp/pull/835
https://github.com/ggerganov/llama.cpp/pull/896
https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g - python3 llama.py vicuna-AlekseyKorshuk-7B c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g.safetensors
Benchmarks
| Interface | Model | GPTQ | Xformers? | HW | Load | Speed |
|---|---|---|---|---|---|---|
| text-gen | anon8231489123-vicuna-13b-GPTQ-4bit-128g | GPTQ-for-LLaMa-triton | yes | 240gb SSD, 16gb,desktop off | 10.53 | 7.97 tokens/s |
| text-gen | anon8231489123-vicuna-13b-GPTQ-4bit-128g | GPTQ-for-LLaMa-triton | No xformers | 240gb SSD, 16gb,desktop off | 10.22s | 7.55 tokens/s |
| text-gen | anon8231489123-vicuna-13b-GPTQ-4bit-128g | GPTQ-for-LLaMa-cuda | No xformers | 240gb SSD, 16gb,desktop off | 16.68s | 4.03 tokens/s |
| text-gen | anon8231489123-vicuna-13b-GPTQ-4bit-128g | GPTQ-for-LLaMa-cuda | yes | 240gb SSD, 16gb,desktop off | 9.34s | 4.01 tokens/s |
| text-gen | llama-30b-sft-oa-alpaca-epoch-2-4bit-ggml | no | no | 2TB SSD, 64gb | ? | 0.67 tokens/s |
| text-gen | llama-30b-sft-oa-alpaca-epoch-2-4bit-ggml | no | no | 2TB SSD, 64gb, --threads 8 | maybe 30s? | 0.51 tokens/s |
| text-gen | llama-30b-sft-oa-alpaca-epoch-2-4bit-ggml | no | no | 2TB SSD, 64gb, --threads 7 | 0.68 tokens/s | |
| text-gen | anon8231489123-vicuna-13b-GPTQ-4bit-128g-ggml | no | no | 2TB SSD, 64gb | 1.17 tokens/s | |
| text-gen | anon8231489123-vicuna-13b-GPTQ-4bit-128g | GPTQ-for-LLaMa-triton | yes | 2TB SSD, 64gb, --pre_layer 25 | 45.69 | 0.25 tokens/s |
| text-gen | anon8231489123-vicuna-13b-GPTQ-4bit-128g | GPTQ-for-LLaMa-triton | yes | 2TB SSD, 64gb | 36.47 | 9.63 tokens/s |
| llama.cpp | llama-30b-sft-oa-alpaca-epoch-2-4bit-ggml | 2TB SSD, 64gb | 10317.90 ms | 1096.21 ms per token | ||
| llama.cpp-modern-avx512 | llama-30b-sft-oa-alpaca-epoch-2-4bit-ggml | 2TB SSD, 64gb | 9288.69 ms | 1049.03 ms per token |