Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task]: Figure out a reasonable set of configurations for the search query #2160

Open
acouch opened this issue Sep 17, 2024 · 0 comments
Open
Labels

Comments

@acouch
Copy link
Collaborator

acouch commented Sep 17, 2024


Migrated from navapbc#31
Originally created by @chouinar on Thu, 16 May 2024 14:35:15 GMT


Summary

While I've been looking into a few ideas for how we'll actually construct the queries / mappings / staging configurations.

This is less structure of the query, and more configurational. There are a lot of ways we might set things up, and while I expect this will evolve once we're testing, we should aim to get something that makes sense for us / works out a few of the oddities.

A few things to investigate:

  • Which type of query we want to do - https://opensearch.org/docs/latest/query-dsl/full-text/simple-query-string/ looks like it would be pretty solid, and gives us some basic things like supporting X | (Y & Z).
    • One thing worth calling out, we can technically support different query approaches if we want. Since this is all handled when constructing the query, we could always have a parameter in the request that dictates what type of query we construct, including potentially using different analyzers or the advanced query string approach: https://opensearch.org/docs/latest/query-dsl/full-text/query-string/
  • Need to figure out when dashes negatively affect things, those are interpreted as boundaries in a lot of cases so search USAID-ABC can sometimes cause issues as it interprets that as USAID and ABC separately. I've found that using <whatever_field>.keyword fixes it, but might be something we handle in mapping or another manner.
  • Capitalization seems to matter for some fields. The values are stored as lowercase, so a search for "USAID" doesn't give results, but "usaid" does. We either need to adjust that, or just lower-case the query string.
  • Do we want to add a recency bias to avoid still open opportunities from 10 years ago from being relevant (maybe based on the last updated timestamp?) - Might be able to do something with the rank value: https://opensearch.org/docs/latest/field-types/supported-field-types/rank/
  • <TODO - will add more as we find things>

Acceptance criteria

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Icebox
Development

No branches or pull requests

1 participant