Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tSNE vector compression optimised using wasm-bhtsne #112

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kartik-gupta-ij
Copy link
Member

@kartik-gupta-ij kartik-gupta-ij commented Sep 8, 2023

@kartik-gupta-ij kartik-gupta-ij self-assigned this Sep 8, 2023
@kartik-gupta-ij kartik-gupta-ij added documentation Improvements or additions to documentation enhancement New feature or request and removed documentation Improvements or additions to documentation labels Sep 8, 2023
@do-me
Copy link

do-me commented Jan 11, 2024

The computation with wasm-bhtsne seems to be very fast and tsne appears to be a solid choice!
However, did you consider any other algorithms or resources why (bh)tSNE might be superior to e.g. UMAP or PCA?

Some resources:

I'm just curious which algorithm might perform best overall.

@generall
Copy link
Member

Hi @do-me , it looks like the main bottleneck is not the algorithm, but the implementation and ability to use multiple threads for computation. If there is a solid implementation of any of those, we are happy to make it configurable

@do-me
Copy link

do-me commented Jan 29, 2024

@kartik-gupta-ij if I saw correctly, the implementation of wasm-bhtsne has all parameters hard coded apart from the iterations number. E.g. due to the perplexity default it crashes with less than 60 vectors (wrote a dirty fix here adding random vectors for the calculation to make it work in either case). Also, depending on the matrix size other defaults might be better suited. 

Is anyone interested in forking and adding the lacking params to the wasm version? Unfortunately I'm not experienced in Rust. 

Apart from the lack of parameteres it's fairly performant. I tested e.g. with 1566 vectors of size 384, 1000 iterations, takes 20s on my i7 laptop.

@functorism
Copy link
Contributor

@generall This demonstrates a method to do dimensionality reduction based on a HNSW index. It's very fast compared to tSNE. If Qdrant provided a way to produce such results to the frontend based on the collections HNSW index then (in my testing) visualizing even tens of thousands of nodes would take mere seconds.

https://github.com/jean-pierreBoth/annembed/blob/master/src/embedder.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants