Skip to content

Commit

Permalink
doc
Browse files Browse the repository at this point in the history
  • Loading branch information
kenarsa committed May 28, 2024
1 parent d849b3c commit 3c517e5
Show file tree
Hide file tree
Showing 14 changed files with 1,166 additions and 13 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
binding/android/PicoLLMTestApp/** linguist-detectable=false
binding/android/PicoLLM/picollm/src/main/java/ai/picovoice/picollm/dialog/** linguist-detectable=false
binding/ios/PicoLLMAppTest/** linguist-detectable=false
binding/nodejs/** linguist-detectable=false
binding/web/cypress/** linguist-detectable=false
binding/web/scripts/** linguist-detectable=false
binding/web/src/picollm.ts linguist-detectable=false
Expand Down
55 changes: 51 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Linux (x86_64), macOS (arm64, x86_64), and Windows (x86_64)
Expand All @@ -29,6 +29,15 @@ models. picoLLM Inference Engine is:

- [picoLLM](#picollm-inference-engine)
- [Table of Contents](#table-of-contents)
- [Showcases](#showcases)
- [Raspberry Pi](#raspberry-pi)
- [Android](#android)
- [iOS](#ios)
- [Web](#web)
- [GPU](#gpu)
- [Local LLM-Powered Voice Assistant on Raspberry Pi](#local-llm-powered-voice-assistant-on-raspberry-pi)
- [Local LLM-Powered Voice Assistant on CPU](#local-llm-powered-voice-assistant-on-cpu)
- [Accuracy](#accuracy)
- [Models](#models)
- [AccessKey](#accesskey)
- [Demos](#demos)
Expand All @@ -48,6 +57,44 @@ models. picoLLM Inference Engine is:
- [Releases](#releases)
- [FAQ](#faq)

## Showcases


### Raspberry Pi

[![Porcupine in Action](https://img.youtube.com/vi/CeKPXZ_8hkI/0.jpg)](https://www.youtube.com/watch?v=CeKPXZ_8hkI)

### Android

[![Porcupine in Action](https://img.youtube.com/vi/XeUMkue-5lI/0.jpg)](https://www.youtube.com/watch?v=XeUMkue-5lI)

### iOS

[![Porcupine in Action](https://img.youtube.com/vi/dNK5esdkI0Y/0.jpg)](https://www.youtube.com/watch?v=dNK5esdkI0Y)

### Web

[Live Offline Demo](https://picovoice.ai//picollm/)

### GPU

[![Porcupine in Action](https://img.youtube.com/vi/4mcVwbOOIqk/0.jpg)](https://www.youtube.com/watch?v=4mcVwbOOIqk)

### Local LLM-Powered Voice Assistant on Raspberry Pi

[![Porcupine in Action](https://img.youtube.com/vi/GEndT3RGRvw/0.jpg)](https://www.youtube.com/watch?v=GEndT3RGRvw)

### Local LLM-Powered Voice Assistant on CPU

[![Porcupine in Action](https://img.youtube.com/vi/uV0GlXDFSPw/0.jpg)](https://www.youtube.com/watch?v=uV0GlXDFSPw)

## Accuracy

picoLLM Compression recovers MMLU score degradation of widely adopted GPTQ by 91%, 99%, and 100% at 2, 3, and 4-bit settings.
The figure below depicts the MMLU comparison between picoLLM and GPTQ for Llama-3-8b [[1]](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/).

![picoLLM Compression vs GPTQ MMLU scores when applied to Llama-3-8B](/resources/mmlu-llama-3-8b.svg)

## Models

picoLLM Inference Engine supports the following open-weight models. The models are on
Expand Down Expand Up @@ -126,13 +173,13 @@ picollm-completion-demo --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --
Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` with the path to a model file
downloaded from Picovoice Console, and `${PROMPT}` with a prompt string.

For more information about Node.js demos go to [demo/nodejs](./demo/nodejs).
For more information about Node.js demos go to [Node.js demo](./demo/nodejs).

### Android Demos

Using Android Studio, open the [Completion demo](demo/android/Completion/) as an Android project, copy your AccessKey into MainActivity.java, and run the application.
Using Android Studio, open the [Completion demo](demo/android/Completion) as an Android project, copy your AccessKey into MainActivity.java, and run the application.

To learn about how to use picoLLM in a chat application, try out the [Chat demo](demo/android/Chat/).
To learn about how to use picoLLM in a chat application, try out the [Chat demo](demo/android/Chat).

For more information about Android demos go to [demo/android](demo/android/README.md).

Expand Down
2 changes: 1 addition & 1 deletion binding/android/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
2 changes: 1 addition & 1 deletion binding/ios/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
2 changes: 1 addition & 1 deletion binding/nodejs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
2 changes: 1 addition & 1 deletion binding/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
2 changes: 1 addition & 1 deletion binding/web/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
2 changes: 1 addition & 1 deletion demo/android/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
2 changes: 1 addition & 1 deletion demo/ios/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
2 changes: 1 addition & 1 deletion demo/nodejs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
2 changes: 1 addition & 1 deletion demo/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language
models. picoLLM Inference Engine is:

- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Runs on CPU and GPU
Expand Down
1 change: 1 addition & 0 deletions resources/.lint/spell-check/.cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
"**/*.wasm",
"**/dist/**",
"**/lib/ios/**/*",
"**/*.svg",

// iOS
"**/*.pbxproj",
Expand Down
1 change: 1 addition & 0 deletions resources/.lint/spell-check/dict.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ logit
logits
mipmap
mixtral
mmlu
picollm
picollmactivitydemo
picollmdemo
Expand Down
Loading

0 comments on commit 3c517e5

Please sign in to comment.