Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SemanticMediaWiki compatible with new ElasticSearch #5451

Merged
merged 29 commits into from
Aug 28, 2023

Conversation

marijnvanwezel
Copy link
Contributor

@marijnvanwezel marijnvanwezel commented Feb 27, 2023

This pull request makes Semantic MediaWiki compatible with new ElasticSearch (8.x tested). It is currently still a draft, since I have not yet updated the tests. It is working in my personal testing environment.

There are a number of major changes in the ElasticSearch API, as well as some breaking changes in the ElasticSearch PHP package, that this pull request addresses. Most notably:

@marijnvanwezel
Copy link
Contributor Author

I apologise for the somewhat messy commit history. If/when this is merged, a squash commit might be appropriate.

@jaideraf
Copy link
Member

jaideraf commented Feb 27, 2023

I would suggest you to "Make SemanticMediaWiki compatible with ElasticSearch 7.10.x" wich is probably the last version of ElasticSearch Wikimedia will support[0]. After that, a migration to OpenSearch[1][2] will follow because of license issues[3].

[0] https://www.mediawiki.org/wiki/Extension:CirrusSearch
[1] https://phabricator.wikimedia.org/T280482
[2] https://phabricator.wikimedia.org/T325028
[3] https://en.wikipedia.org/wiki/OpenSearch_(software)#OpenSearch

EDIT: or "Make SemanticMediaWiki compatible with OpenSearch X.x"

@marijnvanwezel
Copy link
Contributor Author

I have not tested it, but I think this PR is also compatible with ElasticSearch 7.x, or requires minimal changes to make it compatible.

@marijnvanwezel marijnvanwezel changed the title Make SemanticMediaWiki compatible with ElasticSearch 8.x Make SemanticMediaWiki compatible with new ElasticSearch Feb 27, 2023
@marijnvanwezel
Copy link
Contributor Author

I will try to see if I can get a test wiki running with ElasticSearch 7.10.2 and this PR somewhere in the coming week (probably Friday).

@malberts
Copy link
Contributor

Thanks for starting with this! Another compatibility question is whether this will still work with ES 6.x, or if these changes can accommodate it. If not, then this will have to be part of SMW 5.x.

IMO if it's very difficult/time-consuming, then it is probably not worth it to complicate the code to support MW 1.35-era ElasticSearch 6.x. Perhaps SMW 5.x is a good time to raise SMW's minimum versions to MW 1.39 and ES 7.x.

@marijnvanwezel
Copy link
Contributor Author

I have confirmed that file ingestion is now also working with this pull request.

Screenshot 2023-02-27 at 16 28 39

@krabina
Copy link
Contributor

krabina commented Mar 3, 2023

See https://phabricator.wikimedia.org/T280482 for reference

@marijnvanwezel
Copy link
Contributor Author

Ad asked Cindy and she will ask the Wikimedia Foundation what their stance will be. For now, it is probably best to:

  1. Make Semantic MediaWiki compatible with the latest open-source version of ElasticSearch (7.10.x);
  2. Wait until Wikimedia updates the minimum requirements of CirrusSearch;
  3. Make Semantic MediaWiki compatible with the versions of ElasticSearch or OpenSearch (depending on the outcome of point 2) that are supported by CirrusSearch;
  4. Try to keep Semantic MediaWiki and CirrusSearch compatible.

@krabina @jaideraf What do you think?

I didn't have time to test this patch on 7.10 unfortunately, but hopefully I will have time for it next week.

@krabina
Copy link
Contributor

krabina commented Mar 3, 2023

I totally agree!

@jaideraf
Copy link
Member

jaideraf commented Mar 3, 2023

I totally agree, too!

@marijnvanwezel
Copy link
Contributor Author

@JeroenDeDauw I am also wondering what you think. In particular, see this comment by jaideraf and this comment by me. Thank you in advance for taking the time to look at this!

@JeroenDeDauw
Copy link
Member

What you wrote sounds reasonable to me

@cicalese
Copy link
Contributor

I was able to test this patch successfully with the following versions:

MediaWiki: 1.39.1
PHP: 7.4.33
Elasticsearch: 7.10.2

@krabina
Copy link
Contributor

krabina commented Aug 28, 2023

Great, thank you for testing! Is this now ready for integration, @JeroenDeDauw, @gesinn-it-gea ?

@krabina krabina marked this pull request as ready for review August 28, 2023 18:11
@JeroenDeDauw JeroenDeDauw merged commit de31c6a into SemanticMediaWiki:master Aug 28, 2023
@wgevaert
Copy link
Contributor

wgevaert commented Sep 5, 2023

Some time ago, we tested the branch from marijnvanwezel/SemanticMediaWiki and one of our testers did find some issues. We have then reverted to a version with ES 6.8, so we do not have full details of the issues.

issues:

  1. Sort parameter does not seem to work. Tested the following wikicode: {{#ask:[[Modification date::+]] |?Modification date |sort=Modification date }}. Does not give results. Without sort it does give results
  2. Running rebuildData gave "errors" and after that "nothing worked anymore" (no results from ask queries. (I do not have more info because we moved back to ES6.8 and testers did not make screenshots).

@krabina
Copy link
Contributor

krabina commented Sep 5, 2023

@cicalese and @marijnvanwezel can you reproduce the sorting problem?
I guess problem 2 would have been noticed.

@marijnvanwezel
Copy link
Contributor Author

I was unable to reproduce problem 2, but I was able to reproduce problem 1. ES reports the following syntax error in queries with sort:

org.elasticsearch.common.ParsingException: Failed to parse; org.elasticsearch.xcontent.XContentParseException: [1:112] [bool] failed to parse field [must]

I am looking into why this happens.

@marijnvanwezel
Copy link
Contributor Author

Query without sort:

{
    "index": "smw-data-mediawiki",
    "body": {
        "_source": false,
        "from": 0,
        "size": 51,
        "query": {
            "constant_score": {
                "filter": {
                    "bool": {
                        "filter": {
                            "exists": {
                                "field": "P:29.datField"
                            }
                        }
                    }
                }
            }
        },
        "sort": [
            {
                "subject.sortkey.sort": {
                    "order": "asc"
                },
                "subject.title.sort": {
                    "order": "asc"
                }
            }
        ]
    }
}

Query with sort:

{
    "index": "smw-data-mediawiki",
    "body": {
        "_source": false,
        "from": 0,
        "size": 51,
        "query": {
            "constant_score": {
                "filter": {
                    "bool": {
                        "must": [
                            {
                                "bool": {
                                    "filter": {
                                        "exists": {
                                            "field": "P:29.datField"
                                        }
                                    }
                                }
                            },
                            [
                                {
                                    "exists": {
                                        "field": "P:29.datField"
                                    }
                                }
                            ]
                        ]
                    }
                }
            }
        },
        "sort": [
            {
                "P:29.datField": {
                    "order": "asc"
                }
            }
        ]
    }
}

@marijnvanwezel
Copy link
Contributor Author

The sorting issue is fixed in the above PR.

marijnvanwezel added a commit to WikibaseSolutions/SemanticMediaWiki that referenced this pull request Oct 31, 2023
Credentials have been moved to `$smwgElasticSearchCredentials` in SemanticMediaWiki#5451
JeroenDeDauw pushed a commit that referenced this pull request Oct 31, 2023
* Improve compatibility with OpenSearch

* Fix fatal error on Special:SemanticMediaWiki&action=elastic

* Use tabs instead of spaces

* `endpoints` no longer contains the credentials

Credentials have been moved to `$smwgElasticSearchCredentials` in #5451
@bertrandgorge
Copy link
Contributor

I've added some documentation here: #5614 (hope I got it right)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants