AI: Difference between revisions

From Hegemon Wiki
Jump to navigation Jump to search
Line 57: Line 57:


== Benchmarks ==
== Benchmarks ==
{| class="wikitable"
{| class="wikitable sortable"
|+
|+
!Interface
!Interface
Line 73: Line 73:
|240gb SSD, 16gb,desktop off
|240gb SSD, 16gb,desktop off
|10.53
|10.53
|7.97 tokens/sec
|7.97 tokens/s
|-
|-
|text-gen
|text-gen
Line 81: Line 81:
|240gb SSD, 16gb,desktop off
|240gb SSD, 16gb,desktop off
|10.22s
|10.22s
|7.55 tokens/sec
|7.55 tokens/s
|-
|-
|text-gen
|text-gen
Line 89: Line 89:
|240gb SSD, 16gb,desktop off
|240gb SSD, 16gb,desktop off
|16.68s
|16.68s
|4.03 tokens/sec
|4.03 tokens/s
|-
|-
|text-gen
|text-gen
Line 97: Line 97:
|240gb SSD, 16gb,desktop off
|240gb SSD, 16gb,desktop off
|9.34s
|9.34s
|4.01 tokens/sec
|4.01 tokens/s
|-
|-
|text-gen
|text-gen
Line 105: Line 105:
|2TB SSD, 64gb
|2TB SSD, 64gb
|?
|?
|0.67 tokens/sec
|0.67 tokens/s
|-
|-
|text-gen
|text-gen
Line 113: Line 113:
|2TB SSD, 64gb, '''--threads 8'''
|2TB SSD, 64gb, '''--threads 8'''
|maybe 30s?
|maybe 30s?
|0.51 tokens/sec
|0.51 tokens/s
|-
|-
|text-gen
|text-gen
Line 121: Line 121:
|2TB SSD, 64gb, '''--threads 7'''
|2TB SSD, 64gb, '''--threads 7'''
|
|
|0.68 tokens/sec
|0.68 tokens/s
|-
|-
|text-gen
|text-gen
Line 138: Line 138:
|45.69
|45.69
|0.25 tokens/s
|0.25 tokens/s
|-
|text-gen
|anon8231489123-vicuna-13b-GPTQ-4bit-128g
|GPTQ-for-LLaMa-'''triton'''
|yes
|2TB SSD, 64gb
|36.47
|9.63 tokens/s
|}
|}

Revision as of 17:14, 14 April 2023

LLama

https://wiki.installgentoo.com/wiki/Home_server#Expanding_Your_Storage

https://rentry.org/llama-tard-v2

https://rentry.org/llamaaids

https://hackmd.io/@reneil1337/alpaca

https://boards.4channel.org/g/catalog#s=lmg%2F

https://find.4chan.org/?q=AI+Dynamic+Storytelling+General

https://find.4chan.org/?q=AI+Chatbot+General

https://find.4chan.org/?q=%2Flmg%2F (local models general)

https://boards.4channel.org/g/thread/92400764#p92400764

https://rentry.org/llamaaids


https://files.catbox.moe/lvefgy.json

https://pytorch.org/hub/nvidia_deeplearningexamples_tacotron2/


python server.py --model llama-7b-4bit --wbits 4

python server.py --model llama-13b-4bit-128g --wbits 4 --groupsize 128

https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59 for installing with out of space error

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode


https://github.com/pybind/pybind11/discussions/4566

https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_with_llama.cpp%3A

https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g


Here's the uncucked Vicuna model (trained on the dataset that don't have the moralistic bullshit anymore) Too bad it's just the CPU quantized version

Vicuna generating its own prompts


should be worse than Q4_1 (which is QK=32) but there are several PRs in the work that should improve quantization accuracy in general

https://github.com/ggerganov/llama.cpp/pull/729

https://github.com/ggerganov/llama.cpp/pull/835

https://github.com/ggerganov/llama.cpp/pull/896

Benchmarks

Interface Model GPTQ Xformers? HW Load Speed
text-gen anon8231489123-vicuna-13b-GPTQ-4bit-128g GPTQ-for-LLaMa-triton yes 240gb SSD, 16gb,desktop off 10.53 7.97 tokens/s
text-gen anon8231489123-vicuna-13b-GPTQ-4bit-128g GPTQ-for-LLaMa-triton No xformers 240gb SSD, 16gb,desktop off 10.22s 7.55 tokens/s
text-gen anon8231489123-vicuna-13b-GPTQ-4bit-128g GPTQ-for-LLaMa-cuda No xformers 240gb SSD, 16gb,desktop off 16.68s 4.03 tokens/s
text-gen anon8231489123-vicuna-13b-GPTQ-4bit-128g GPTQ-for-LLaMa-cuda yes 240gb SSD, 16gb,desktop off 9.34s 4.01 tokens/s
text-gen llama-30b-sft-oa-alpaca-epoch-2-4bit-ggml no no 2TB SSD, 64gb ? 0.67 tokens/s
text-gen llama-30b-sft-oa-alpaca-epoch-2-4bit-ggml no no 2TB SSD, 64gb, --threads 8 maybe 30s? 0.51 tokens/s
text-gen llama-30b-sft-oa-alpaca-epoch-2-4bit-ggml no no 2TB SSD, 64gb, --threads 7 0.68 tokens/s
text-gen anon8231489123-vicuna-13b-GPTQ-4bit-128g-ggml no no 2TB SSD, 64gb 1.17 tokens/s
text-gen anon8231489123-vicuna-13b-GPTQ-4bit-128g GPTQ-for-LLaMa-triton yes 2TB SSD, 64gb, --pre_layer 25 45.69 0.25 tokens/s
text-gen anon8231489123-vicuna-13b-GPTQ-4bit-128g GPTQ-for-LLaMa-triton yes 2TB SSD, 64gb 36.47 9.63 tokens/s