Skip to content

Practical Examples Using Ngt Command

masajiro edited this page May 25, 2020 · 2 revisions

Practical examples with a large-scale dataset for a default NGT graph (ANNG) are described.

Dataset generation

First, to describe how to search large-scale datasets, NGT dataset needs to be generated. After downloading the fastText dataset, it should be converted to the NGT registration format as follows.

curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.vec.zip
zcat wiki-news-300d-1M-subword.vec.zip | tail -n +2 | cut -d " " -f 2- > objects.ssv

objects.ssv is a registration file that has 1 million objects. Next, three objects in the middle of the file are extracted as queries.

head -100000 objects.ssv | tail -3 > queries.ssv

ANNG Construction and Search

An ANNG index is constructed with cosine similarity for metric space.

ngt create -d 300 -D c fasttext.anng objects.ssv

The ANNG index can be searched with the queries as follows.

ngt search -n 10 fasttext.anng queries.ssv

Below are the search results.

Query No.1
Rank    ID      Distance
1       99998   0
2       52298   0.305776
3       75134   0.316977
4       207850  0.345267
5       258522  0.347003
6       307367  0.356967
7       538054  0.379649
8       76751   0.386644
9       535024  0.390781
10      202010  0.392031
Query Time= 0.00144647 (sec), 1.44647 (msec)
Size of Memory Usage=1531284
Query No.2
Rank    ID      Distance
1       99999   0
2       291507  0.232563
3       207863  0.285354
4       122249  0.3664
5       349590  0.37732
6       259506  0.380484
7       96071   0.390346
8       312097  0.400417
9       382245  0.404268
10      84622   0.404282
Query Time= 0.00166992 (sec), 1.66992 (msec)
Size of Memory Usage=1531340
Query No.3
Rank    ID      Distance
1       100000  0
2       565218  0.384514
3       623867  0.404919
4       194709  0.43841
5       206136  0.452629
6       927014  0.45504
7       66427   0.457764
8       772264  0.463388
9       456866  0.463402
10      742553  0.463514
Query Time= 0.00255108 (sec), 2.55108 (msec)
Size of Memory Usage=1531344
Average Query Time= 0.00188916 (sec), 1.88916 (msec), (0.00566747/3)

When a higher accuracy is needed, you can specify a larger search_range_coefficient value than the default 0.1 as shown below.

ngt search -n 10 -e 0.15 fasttext.anng queries.ssv

When a short query time is needed at the expense of accuracy, you can specify a smaller search_range_coefficient value.