Skip to content
This repository has been archived by the owner on May 23, 2019. It is now read-only.

Is RNeo4j Transactional Endpoint slow? #52

Open
mamonu opened this issue Mar 9, 2016 · 9 comments
Open

Is RNeo4j Transactional Endpoint slow? #52

mamonu opened this issue Mar 9, 2016 · 9 comments

Comments

@mamonu
Copy link

mamonu commented Mar 9, 2016

Got used to import a csv via the neo4j console. I had 50000 rows.
After setting up an index I imported them in about 0.8 sec

Tried the same thing today with the transactional endpoint and it took 3 mins.

Is it that slow or am I doing something wrong?

@nicolewhite
Copy link
Owner

Can you show me your code?

@mamonu
Copy link
Author

mamonu commented Mar 9, 2016

sure.
If I go to the Neo4j web interface and I do this for example


CREATE INDEX ON :Person(person_ID)

//# Added 1 index, statement executed in 1662 ms.

USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///home/data/SPRINT3-a-v1.csv"
AS row MERGE (a:Person { person_ID: row.person_id1 , source:"a"}) RETURN (a)

//#Returned 12613 rows in 894 ms

Now of I run the following code for the same size :
(after loading the data in a dataframe called data!


library (RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")


addIndex(graph, "Person", "person_ID")
getIndex(graph)

t1 <- Sys.time()


query = ' MERGE (a:Person { person_ID: {person_ID} , source:"a"})'


t = newTransaction(graph)

for (i in 1:nrow(data)) {
  person_ID = data[i, ]$person_id1


  appendCypher(t, 
               query, 
               person_ID = person_ID)
}

commit(t)

t2 <- Sys.time()
t2 - t1  

i get

Time difference of 3.250754 mins

@mamonu
Copy link
Author

mamonu commented Mar 21, 2016

Any news about this? I might be doing something wrong but from what I understood this is the way to use the transactional endpoint. But the performance is worrisome.

@nicolewhite
Copy link
Owner

Sorry, thought I had responded to you. The problem is that you're committing in batches of 1000 in LOAD CSV and in a single batch of 12613 in the R code. It's not really a fair comparison. Can you commit in batches of 1000 in your R code and get back to me?

@mamonu
Copy link
Author

mamonu commented Mar 21, 2016

ok will do that and will get back to you

@sdoyen
Copy link

sdoyen commented Apr 9, 2016

Any workaround for this? - Thanks

@mamonu
Copy link
Author

mamonu commented Apr 18, 2016

Apologies for the long delay some other projects took my time...
Back to the problem in hand...I run the following code which loads the same data as the
LOAD CSV command in cypher.


library (RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
clear(graph)
setwd("/home/bigdata/data/")
data <- read.table(file = "SPRINT3-a-v1.csv",sep=",",header = TRUE)

addIndex(graph, "Person", "person_ID")
getIndex(graph)




query = ' MERGE (a:Person { person_ID: {person_ID} , source:"a"})'

t1 <- Sys.time()
tx = newTransaction(graph)

for (i in 1:nrow(data)) {



  if(i %% 1000 == 0) {
    # Commit current transaction.
    commit(tx)
    print(paste("Batch:", i / 1000, "committed."))
    # Open new transaction.
    tx = newTransaction(graph)
  }



  person_ID = data[i, ]$person_id1


  appendCypher(tx, 
               query, 
               person_ID = person_ID)
}


commit(tx)
print("Last batch committed.")
print("All done!")



t2 <- Sys.time()
t2 - t1  

i think that this makes a fair comparison... (load the data in batches of 1000)
I still get 3 mins for the operation.
Apologies if this code is wrong and I have not understood the concept well...

@mkllr888
Copy link

I have the same problem. Neither createNode / createRelation nor appendCypher are fast enough to use. My workaround is to use getNode and cypher with normal queries. Also, I create CSV files and import them via READ CSV. Both have the disadvantage that the R Code is not really understandable if the reader doesn't know what cypher/Neo4j is plus creating the CSV files needs storage.

Thanks for your hard work.

@nicolewhite
Copy link
Owner

Sorry, I don't think the transactional endpoint will ever be as fast as LOAD CSV or neo4j-import. createNode() and createRe()l definitely won't be as fast as they are creating nodes / relationships one at a time in a single transaction.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants