Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case sensitive #3

Open
leabaertschi opened this issue May 19, 2014 · 4 comments
Open

Case sensitive #3

leabaertschi opened this issue May 19, 2014 · 4 comments

Comments

@leabaertschi
Copy link

I'm using this plugin for german text and it seems that it's case sensitive. Is that the case? If yes, what's the reason for that?

@jprante
Copy link
Owner

jprante commented May 19, 2014

In german there are words with different meaning when written upper- or lowercase (not many, only a few)

Example:

Rasen = grass
rasen = to dash, to rush

@leabaertschi
Copy link
Author

yeah, that's true :S. How hard would it be to adapt the files for german and build the plugin ourselves?

@jprante
Copy link
Owner

jprante commented May 19, 2014

Just fork and feel free to modify https://github.com/jprante/elasticsearch-analysis-baseform/tree/master/src/main/resources to your requirements ;-)

N.B. for lowercasing (with some ambiguities), you could simply combine this baseform analyzer with a lowercase filter.

@leabaertschi
Copy link
Author

Actually, our problem is that users might enter the searchstring all lowercase and that it then cannot convert it into its base form. The second problem is that we use this plugin in combination with the decompound plugin which returns the tokens in lowercase and we have cases, where for some reason it does not return the tokens in their base form. E.g. Fleischtomaten converts into fleisch and tomate, but Datteltomaten converts to dattel and tomateN and the baseform plugin can then not convert tomaten into its base form because it's lowercase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants