Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to do a full import from file (ES 1.3.4) #93

Open
AtzeDeVries opened this issue Jan 16, 2016 · 8 comments
Open

Unable to do a full import from file (ES 1.3.4) #93

AtzeDeVries opened this issue Jan 16, 2016 · 8 comments

Comments

@AtzeDeVries
Copy link

Hi,

I'm trying to export the from my es server (about 22GB, 100K documents, 1 index) to a file. The following situations happen.

  • If i create a tar.gz it stops after a 2GB file. Importing it results into 4GB elasticsearch data.
  • If i create a bulk.gz it creates about 1.8GB of data, importing results to 23GB and only 14K documents
  • If i _push from one server to other server it works correctly.

I would like to have all the data in a file, since it is portable.

Command to export:

curl -XPOST 'localhost:9200/_export?path=/data/elasticsearch_export/nda_export.bulk.gz'

The are two clusters. cluster A containg 1 node, and cluster B containing 3 nodes. I'm trying to move data from A to B.

download link of plugin is
http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/${es_version}.0/elasticsearch-knapsack-${es_version}.0-plugin.zip where $es_version is 1.3.4

@jprante
Copy link
Owner

jprante commented Jan 16, 2016

I forgot to upload 1.3.4.1 in October. Now it's there. Can you try 1.3.4.1 to check if the problems persist? Thanks.

http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/1.3.4.1/

@AtzeDeVries
Copy link
Author

Hi,

So i've did a lot of testing, but found the solution. The mapping was not transfered (or not correctly transfered) to the new server if you move the data via a file. If i inject the mapping before the _import that it seems to work fine (the export is a bulk.gz of one index).

@jprante
Copy link
Owner

jprante commented Jan 19, 2016

Yes. The bulk archive is not able to transport mappings. The ES bulk format has no mechanism for creating mappings, only for document indexing.

@AtzeDeVries
Copy link
Author

ok, than stil the issue of 'non' bulk exports only begin 2GB is still
standing. I did not try to export it to a tar file instead of tar.gz. I did
test it to breakup in multipe files, but the total of tar.gz multiple
files was 2GB

2016-01-19 15:06 GMT+01:00 Jörg Prante [email protected]:

Yes. The bulk archive is not able to transport mappings. The ES bulk
format has no mechanism for creating mappings, only for document indexing.


Reply to this email directly or view it on GitHub
#93 (comment)
.

@jprante
Copy link
Owner

jprante commented Jan 19, 2016

Yes, I checked. The fix was not backported.

If you can build form source, here is a quick fix:

Set longFileMode in this line

https://github.com/jprante/elasticsearch-knapsack/blob/1.3/src/main/java/org/xbib/io/archive/tar/TarArchiveOutputStream.java#L84

to LONGFILE_GNU

@AtzeDeVries
Copy link
Author

so it is only a problem with tar files? Then could just use .zip which is fine be me. (i can't test at the moment, since the testing server is runnig a different job)/.

@jprante
Copy link
Owner

jprante commented Jan 19, 2016

Yes, it's a tar format peculiarity, the original tar is limited to 2GB, while POSIX TAR or GNU TAR is not.

@AtzeDeVries
Copy link
Author

Ok, then i'll try the zip method tomorrow. I'll report back on that

2016-01-19 15:50 GMT+01:00 Jörg Prante [email protected]:

Yes, it's a tar format peculiarity, the original tar is limited to 2GB,
while POSIX TAR or GNU TAR is not.


Reply to this email directly or view it on GitHub
#93 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants