tSNE vector compression optimised using wasm-bhtsne #112

kartik-gupta-ij · 2023-09-08T17:35:48Z

do-me · 2024-01-11T10:24:24Z

The computation with wasm-bhtsne seems to be very fast and tsne appears to be a solid choice!
However, did you consider any other algorithms or resources why (bh)tSNE might be superior to e.g. UMAP or PCA?

Some resources:

Nature Article 2022 comparing algorithms
Scikit Learn Examples, mainly clustering 2D points but some algorithms work for high-dimensional data too
Tensorflow Embedding Projector using tSNE, UMAP and PCA
Reddit Discussion

I'm just curious which algorithm might perform best overall.

generall · 2024-01-11T14:15:45Z

Hi @do-me , it looks like the main bottleneck is not the algorithm, but the implementation and ability to use multiple threads for computation. If there is a solid implementation of any of those, we are happy to make it configurable

do-me · 2024-01-29T17:10:12Z

@kartik-gupta-ij if I saw correctly, the implementation of wasm-bhtsne has all parameters hard coded apart from the iterations number. E.g. due to the perplexity default it crashes with less than 60 vectors (wrote a dirty fix here adding random vectors for the calculation to make it work in either case). Also, depending on the matrix size other defaults might be better suited.

Is anyone interested in forking and adding the lacking params to the wasm version? Unfortunately I'm not experienced in Rust.

Apart from the lack of parameteres it's fairly performant. I tested e.g. with 1566 vectors of size 384, 1000 iterations, takes 20s on my i7 laptop.

functorism · 2024-09-04T10:24:00Z

@generall This demonstrates a method to do dimensionality reduction based on a HNSW index. It's very fast compared to tSNE. If Qdrant provided a way to produce such results to the frontend based on the collections HNSW index then (in my testing) visualizing even tens of thousands of nodes would take mere seconds.

https://github.com/jean-pierreBoth/annembed/blob/master/src/embedder.rs

tSNE vector compression optimised using wasm-bhtsne

d39b3fb

kartik-gupta-ij requested a review from generall September 8, 2023 17:36

kartik-gupta-ij self-assigned this Sep 8, 2023

kartik-gupta-ij added documentation Improvements or additions to documentation enhancement New feature or request and removed documentation Improvements or additions to documentation labels Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tSNE vector compression optimised using wasm-bhtsne #112

tSNE vector compression optimised using wasm-bhtsne #112

kartik-gupta-ij commented Sep 8, 2023 •

edited

Loading

do-me commented Jan 11, 2024 •

edited

Loading

generall commented Jan 11, 2024

do-me commented Jan 29, 2024

functorism commented Sep 4, 2024

tSNE vector compression optimised using wasm-bhtsne #112

Are you sure you want to change the base?

tSNE vector compression optimised using wasm-bhtsne #112

Conversation

kartik-gupta-ij commented Sep 8, 2023 • edited Loading

do-me commented Jan 11, 2024 • edited Loading

generall commented Jan 11, 2024

do-me commented Jan 29, 2024

functorism commented Sep 4, 2024

kartik-gupta-ij commented Sep 8, 2023 •

edited

Loading

do-me commented Jan 11, 2024 •

edited

Loading