Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langdetect error (500): duplicate of the same language profile, using REST endpoint #17

Open
marbleman opened this issue Mar 25, 2016 · 7 comments

Comments

@marbleman
Copy link

I have noticed a strange error caused by langdetect, I haven't seen on my old 1.7 setup before:
I am using PHP Elasticsearch\Client which uses Guzzle for the HTTP connection (which may or may not be part of the problem):

Everything is fine, if I just have one active thread on the PHP server talking to the ES cluster. When I open a second thread, I randomly see Exceptions is ES like

[2016-03-25 01:21:23,599][ERROR][org.xbib.elasticsearch.module.langdetect.LangdetectService] duplicate of the same language profile: en java.io.IOException: duplicate of the same language profile: en at org.xbib.elasticsearch.module.langdetect.LangdetectService.addProfile(LangdetectService.java:205) at org.xbib.elasticsearch.module.langdetect.LangdetectService.loadProfileFromResource(LangdetectService.java:199) at org.xbib.elasticsearch.module.langdetect.LangdetectService.load(LangdetectService.java:148) at org.xbib.elasticsearch.module.langdetect.LangdetectService.setProfile(LangdetectService.java:223) at org.xbib.elasticsearch.action.langdetect.TransportLangdetectAction.doExecute(TransportLangdetectAction.java:32) at org.xbib.elasticsearch.action.langdetect.TransportLangdetectAction.doExecute(TransportLangdetectAction.java:16) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70) at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351) at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:52) at org.elasticsearch.rest.BaseRestHandler$HeadersAndContextCopyClient.doExecute(BaseRestHandler.java:83) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:351) at org.xbib.elasticsearch.rest.action.langdetect.RestLangdetectAction.handleRequest(RestLangdetectAction.java:30) at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:54) at org.elasticsearch.rest.RestController.executeHandler(RestController.java:207)

The language is different in each log entry and each logentry seems to relale to a different request.
I am using the REST endpoint and I have limited the languages in elasticsearch.yml to about 10 languages.
Before I drill deeper experimenting with combinations of settings and all that time consuming stuff I hope you can give me a hint about the best startpoint of investigation....

Thx in advance!

@jprante
Copy link
Owner

jprante commented Mar 25, 2016

Looks like a race condition. LangdetectService is not thread safe. I think it will help to synchronize the call to LangdetectService in TransportLangdetectAction.

@marbleman
Copy link
Author

Thanks for the hint!! However, that kind of change is out of the range of my current possibilities, I am afraid.
AFAIK ES PHP module uses a round robin of all cluster nodes. Probably the race condition comes up when two requests hit the same node at the same time. This would explain the strange random factor.

I'll give it a try to direct each thread to a dedicated cluster node.

@jprante
Copy link
Owner

jprante commented Mar 25, 2016

Yes, two threads execute on same node is the race condition. I will push a fix today, it is just wrapping the execution of detectAll in a synchronized statement.

@jprante
Copy link
Owner

jprante commented Mar 25, 2016

@marbleman
Copy link
Author

Amazing!! Unfortunately I cannot install it:

ERROR: java.lang.IllegalStateException: jar hell!
class: org.apache.lucene.analysis.ar.ArabicAnalyzer$DefaultSetHolder
jar1: /usr/share/elasticsearch/lib/lucene-analyzers-common-5.4.1.jar
jar2: /tmp/1504669576103186/temp_name-206789507/lucene-analyzers-common-5.4.1.jar

@jprante
Copy link
Owner

jprante commented Mar 25, 2016

Thanks.

My build procedure is broken, as a quick fix, just remove lucene-core-5.4.1.jar and lucene-analyzers-common-5.4.1.jar from plugins/bundle directory...

@marbleman
Copy link
Author

Thaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaank you so much! Rus like hell but without jar hell now... and multihreaded withou any errors!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants