Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: searching for example #27

Open
ThaDafinser opened this issue Mar 21, 2017 · 7 comments
Open

Docs: searching for example #27

ThaDafinser opened this issue Mar 21, 2017 · 7 comments

Comments

@ThaDafinser
Copy link
Contributor

ThaDafinser commented Mar 21, 2017

Hello,

i tried now to complete the examples for Kibana, see
https://gist.github.com/ThaDafinser/d27b4fa9d144b0083ee7dad37484fdd8

For the example i've gone through the complete plugin-list
https://github.com/jprante/elasticsearch-plugin-bundle#a-plugin-bundle-for-elastisearch

For those plugins i couldn't find docs ( @jprante could cou help me here pls?)

  • elasticsearch-analysis-autophrase
  • elasticsearch-analysis-concat (update: found a small example, but dunno the options)
  • elasticsearch-analysis-sortform
  • elasticsearch-analysis-symbolname (update: found a small example, but dunno the options)
  • elasticsearch-analysis-year (update: found a small example, but dunno the options)

Other missing examples for now (could not create a "live" example yet)

  • could not create over _analyze API for icu_collation
  • elasticsearch-analysis-naturalsort (one example added)
  • elasticsearch-analysis-reference (@todo could not create a working example with ES 5.1.2)
  • elasticsearch-mapper-crypt (one example added)
  • elasticsearch-mapper-langdetect (one example added)

Are there any other things missing? When they are finished: Do you want them in README or in a seperate file?

@ThaDafinser
Copy link
Contributor Author

For auto_phrase i found so far (could not get it working)

GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "auto_phrase",
      "phrases": [
        "C:/Data/test.txt"
      ]
    }
  ],
  "text": "what is my income tax refund this year now that my property tax is so high"
}
https://github.com/jprante/elasticsearch-plugin-bundle/blob/68dc19c34c40364e04400f92500b973a6cbae170/src/main/java/org/xbib/elasticsearch/index/analysis/autophrase/AutoPhrasingTokenFilterFactory.java

@nkrot
Copy link

nkrot commented Apr 5, 2017

Hi,

In addition to the original issue, LemmatizeTokenFilter lacks description too. I would appreciate any info on how to configure it, on supported languages and what is behind this plugin.

To me this plugin looks similar to baseform plugin. From skimming through the code I can guess that the lemmatizer replaces the original word while baseform-er adds generated form alongside the original.

Thanx

@ThaDafinser
Copy link
Contributor Author

@nkrot in general you gave the answer.

I updated the gist with an example. Like you said, it just keeps the baseform and removes the original word

GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "lemmatize",
      "language": "de"
    }
  ],
  "text": "Ich gehe gerne mit meinen neuen Schuhen"
}

@nkrot
Copy link

nkrot commented Apr 5, 2017

@ThaDafinser , thank you. Do you have any info on

  1. respectKeywords, available in lemmatize plugin
  2. lemmaOnly, available in lemmatize plugin
  3. from where come lemmatizer resources (FSA) and how they compare to baseform

thanx,

@ThaDafinser
Copy link
Contributor Author

Sadly not yet.

You can see a lot of examples in the tests, how it should work.

@jprante
Copy link
Owner

jprante commented Apr 5, 2017

LemmatizeTokenFilter is still work in progress, in experimental stage. It is considered as an alternative to a synonym token filter https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html
but based on a language-specific dictionary of known compound words.

@ThaDafinser
Copy link
Contributor Author

ThaDafinser commented Apr 6, 2017

After going through a lot of examples, code and so on...

I think the best would be to create something like this
https://github.com/ThaDafinser/elasticsearch-plugin-bundle/blob/feature/doc/docs/index.md

For a "one pager" (or add all in Readme) there are too many things to explain, and with such an approach the documentation can be created step by step.

Like mentioned at the end, it's similar to the ES reference guide structure https://www.elastic.co/guide/en/elasticsearch/reference/5.3/index.html

@jprante what do you think? If you like it, i will add some more pages and create a PR for this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants