Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes inference endpoint the primary way to download and deploy ELSER and E5 #2765

Merged
merged 5 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 54 additions & 10 deletions docs/en/stack/ml/nlp/ml-nlp-e5.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,11 @@ contextual meaning and user intent, rather than exact keyword matches.
E5 has two versions: one cross-platform version which runs on any hardware
and one version which is optimized for Intel® silicon. The
**Model Management** > **Trained Models** page shows you which version of E5 is
recommended to deploy based on your cluster's hardware.
recommended to deploy based on your cluster's hardware. However, the
recommended way to use E5 is through the
{ref}/infer-service-elasticsearch.html[{infer} API] as a service which makes it
easier to download and deploy the model and you don't need to select from
different versions.

Refer to the model cards of the
https://huggingface.co/elastic/multilingual-e5-small[multilingual-e5-small] and
Expand All @@ -42,17 +46,48 @@ for semantic search or the trial period activated.
[[download-deploy-e5]]
== Download and deploy E5

You can download and deploy the E5 model either from
**{ml-app}** > **Trained Models**, from **Search** > **Indices**, or by using
the Dev Console.
The easiest and recommended way to download and deploy E5 is to use the {ref}/inference-apis.html[{infer} API].

NOTE: For most cases, the preferred version is the **Intel and Linux optimized**
model, it is recommended to download and deploy that version.
1. In {kib}, navigate to the **Dev Console**.
2. Create an {infer} endpoint with the `elasticsearch` service by running the following API request:
+
--
[source,console]
----------------------------------
PUT _inference/text_embedding/my-e5-model
{
"service": "elasticsearch",
"service_settings": {
"num_allocations": 1,
"num_threads": 1,
"model_id": ".multilingual-e5-small"
}
}
----------------------------------
--
The API request automatically initiates the model download and then deploy the model.

Refer to the {ref}/infer-service-elasticsearch.html[`elasticsearch` {infer} service documentation] to learn more about the available settings.

After you created the E5 {infer} endpoint, it's ready to be used for semantic search.
The easiest way to perform semantic search in the {stack} is to {ref}/semantic-search-semantic-text.html[follow the `semantic_text` workflow].


[discrete]
[[alternative-download-deploy-e5]]
=== Alternative methods to download and deploy E5

You can also download and deploy the E5 model either from **{ml-app}** > **Trained Models**, from **Search** > **Indices**, or by using the trained models API in Dev Console.

NOTE: For most cases, the preferred version is the **Intel and Linux optimized** model, it is recommended to download and deploy that version.


.Using the Trained Models page
[%collapsible%closed]
=====
[discrete]
[[trained-model-e5]]
=== Using the Trained Models page
==== Using the Trained Models page

1. In {kib}, navigate to **{ml-app}** > **Trained Models**. E5 can be found in
the list of trained models. There are two versions available: one portable
Expand Down Expand Up @@ -80,14 +115,18 @@ allocations and threads per allocation values.
+
--
[role="screenshot"]
image::images/ml-nlp-deployment-id-e5.png[alt="Deploying ELSER",align="center"]
image::images/ml-nlp-deployment-id-e5.png[alt="Deploying E5",align="center"]
--
5. Click Start.
=====


.Using the search indices UI
[%collapsible%closed]
=====
[discrete]
[[elasticsearch-e5]]
=== Using the search indices UI
==== Using the search indices UI

Alternatively, you can download and deploy the E5 model to an {infer} pipeline
using the search indices UI.
Expand Down Expand Up @@ -116,11 +155,15 @@ image::images/ml-nlp-start-e5-es.png[alt="Start E5 in Elasticsearch",align="cent

When your E5 model is deployed and started, it is ready to be used in a
pipeline.
=====


.Using the traned models API in Dev Console
[%collapsible%closed]
=====
[discrete]
[[dev-console-e5]]
=== Using the Dev Console
==== Using the traned models API in Dev Console

1. In {kib}, navigate to the **Dev Console**.
2. Create the E5 model configuration by running the following API call:
Expand Down Expand Up @@ -149,6 +192,7 @@ with a delpoyment ID:
POST _ml/trained_models/.multilingual-e5-small/deployment/_start?deployment_id=for_search
----------------------------------
--
=====


[discrete]
Expand Down
100 changes: 55 additions & 45 deletions docs/en/stack/ml/nlp/ml-nlp-elser.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,11 @@ computing the similarity between a query and a document.
ELSER v2 has two versions: one cross-platform version which runs on any hardware
and one version which is optimized for Intel® silicon. The
**Model Management** > **Trained Models** page shows you which version of ELSER
v2 is recommended to deploy based on your cluster's hardware.
v2 is recommended to deploy based on your cluster's hardware. However, the
recommended way to use ELSER is through the
{ref}/infer-service-elser.html[{infer} API] as a service which makes it easier
to download and deploy the model and you don't need to select from different
versions.

If you want to learn more about the ELSER V2 improvements, refer to
https://www.elastic.co/search-labs/introducing-elser-v2-part-1[this blog post].
Expand All @@ -105,8 +109,37 @@ that walks through upgrading an index to ELSER V2.
[[download-deploy-elser]]
== Download and deploy ELSER

You can download and deploy ELSER either from **{ml-app}** > **Trained Models**,
from **Search** > **Indices**, or by using the Dev Console.
The easiest and recommended way to download and deploy ELSER is to use the {ref}/inference-apis.html[{infer} API].

1. In {kib}, navigate to the **Dev Console**.
2. Create an {infer} endpoint with the ELSER service by running the following API request:
+
--
[source,console]
----------------------------------
PUT _inference/sparse_embedding/my-elser-model
{
"service": "elser",
"service_settings": {
"num_allocations": 1,
"num_threads": 1
}
}
----------------------------------
--
The API request automatically initiates the model download and then deploy the model.

Refer to the {ref}/infer-service-elser.html[ELSER {infer} service documentation] to learn more about the available settings.

After you created the ELSER {infer} endpoint, it's ready to be used for semantic search.
The easiest way to perform semantic search in the {stack} is to {ref}/semantic-search-semantic-text.html[follow the `semantic_text` workflow].


[discrete]
[[alternative-download-deploy]]
=== Alternative methods to download and deploy ELSER

You can also download and deploy ELSER either from **{ml-app}** > **Trained Models**, from **Search** > **Indices**, or by using the trained models API in Dev Console.

[NOTE]
====
Expand All @@ -120,10 +153,12 @@ separate deployments for search and ingest mitigates performance issues
resulting from interactions between the two, which can be hard to diagnose.
====


.Using the Trained Models page
[%collapsible%closed]
=====
[discrete]
[[trained-model]]
=== Using the Trained Models page
==== Using the Trained Models page

1. In {kib}, navigate to **{ml-app}** > **Trained Models**. ELSER can be found
in the list of trained models. There are two versions available: one portable
Expand Down Expand Up @@ -154,11 +189,14 @@ allocations and threads per allocation values.
image::images/ml-nlp-deployment-id-elser-v2.png[alt="Deploying ELSER",align="center"]
--
5. Click **Start**.
=====


.Using the search indices UI
[%collapsible%closed]
=====
[discrete]
[[elasticsearch]]
=== Using the search indices UI
==== Using the search indices UI

Alternatively, you can download and deploy ELSER to an {infer} pipeline using
the search indices UI.
Expand All @@ -184,43 +222,14 @@ model deployment.
[role="screenshot"]
image::images/ml-nlp-start-elser-v2-es.png[alt="Start ELSER in Elasticsearch",align="center"]
--
=====

When your ELSER model is deployed and started, it is ready to be used in a
pipeline.


[discrete]
[[elasticsearch-ingest-pipeline]]
==== Adding ELSER to an ingest pipeline

To add ELSER to an ingest pipeline, you need to copy the default ingest
pipeline and then customize it according to your needs.

1. Click **Copy and customize** under the **Unlock your custom pipelines** block
at the top of the page. This enables the **Add inference pipeline** button.
+
--
[role="screenshot"]
image::images/ml-nlp-pipeline-copy-customize.png[alt="Start ELSER in Elasticsearch",align="center"]
--
2. Under **{ml-app} {infer-cap} Pipelines**, click **Add inference pipeline**.
3. Give a name to the pipeline, select ELSER from the list of trained ML models,
and click **Continue**.
4. Select the source text field, define the target field, and click **Add** then
**Continue**.
5. Review the index mappings updates. Click **Back** if you want to change the
mappings. Click **Continue** if you are satisfied with the updated index
mappings.
6. You can optionally test your pipeline. Click **Continue**.
7. **Create pipeline**.

Once your pipeline is created, you are ready to ingest documents and utilize
ELSER for text expansions in your search queries.


.Using the traned models API in Dev Console
[%collapsible%closed]
=====
[discrete]
[[dev-console]]
=== Using the Dev Console
==== Using the trained models API in Dev Console

1. In {kib}, navigate to the **Dev Console**.
2. Create the ELSER model configuration by running the following API call:
Expand Down Expand Up @@ -251,9 +260,7 @@ POST _ml/trained_models/.elser_model_2/deployment/_start?deployment_id=for_searc

You can deploy the model multiple times with different deployment IDs.
--

After the deployment is complete, ELSER is ready to use either in an ingest
pipeline or in a `text_expansion` query to perform semantic search.
=====


[discrete]
Expand Down Expand Up @@ -440,10 +447,12 @@ To learn more about ELSER performance, refer to the <<elser-benchmarks>>.
* {ref}/semantic-search-elser.html[Perform semantic search with ELSER]
* https://www.elastic.co/blog/may-2023-launch-information-retrieval-elasticsearch-ai-model[Improving information retrieval in the Elastic Stack: Introducing Elastic Learned Sparse Encoder, our new retrieval model]


[discrete]
[[elser-benchmarks]]
== Benchmark information

IMPORTANT: The recommended way to use ELSER is through the {ref}/infer-service-elser.html[{infer} API] as a service.

The following sections provide information about how ELSER performs on different
hardwares and compares the model performance to {es} BM25 and other strong
baselines.
Expand All @@ -459,6 +468,7 @@ any platform.


[discrete]
[[version-overview-v2]]
==== ELSER V2

Besides the performance improvements, the biggest change in ELSER V2 is the
Expand Down