Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great product, thanks for releasing it for the opensource community. Have you considered replacing Neo4j with something more cost-effective like Memgraph?


When I worked as a consultant in the knowledge graph area I ran into clients over and over against who had failed with Neo4J, which I'd never done because I read the Neo4J manual and understood it just wasn't an industrial strength tool.


Can you recommend any graph db's in particular, preferably with some discussion of why?


There is no perfect product because of the high diversity of graph workloads.

I am inclined to like SPARQL databases because of their multiscale nature. You can have a tiny SPARQL database in RAM that you use like a hashtable and also have a big one with a few billion triples. It is a common situation that you want to gather all the facts to make a decision about a case (such as handling a customer at a call center) and it reasonable to fetch all of that and get it in RAM.

Two major problems w/ SPARQL databases are:

(1) even though RDF has two official ways to represent ordered collections and there is an unofficial one that works very well, SPARQL does not have facilities to work with ordered collections like you would have in N1QL or AQL or similar document-oriented query languages. This could be added but it hasn't been done.

(2) If you are writing transactional or agentic systems in SQL you have a lot of help in that a "row" is a unit to do inserts, deletes, and updates in. It is not so easy to get it right if you are updating a triple at a time, there are algorithms to define a part of a graph that form a "record" (e.g. go to the right from a starting node, passing through blank nodes, not passing through URIs) but this is all stuff you have to implement yourself.

---

Salesforce.com has a patent which has recently expired that covers a triple store that automatically profiles itself and builds indexes for very efficient query execution, if this was built into graph database products it could be game changing but so far it isn't.

---

There is "graph as a universal data structure" as in "the graph of pointers in a C program" and then there are the "graph algorithms" that Mark Newman writes about. The later are much less interesting than the former (go bomb the #1 centrality node in a terrorist network -- did you win the war?)

If you are doing the latter or any kind of really intensive job you may be better doing it as a batch job, in fact back in the day I developed Hadoop-based pipelines to do things like reconstruct the relationships inside the Freebase data dump.

----

For quite a few projects I've used Arangodb which is a great product but the license really sucks. I have something I have been working on for a while that uses it and if I am going to either open source or commercialize it I'm going to have to switch to something else.


Thanks for the feedback! Yes, we are definitely planning to add support for other graph datastores including Memgraph and others.


Does the structure of data & query patterns required demand a graph store for acceptable performance? Would a Postgres-based triplestore & recursive CTE’s suck badly?


Yes, it won't scale well. I have used postgres exactly the way you specified in my past job and it didn't scale well after a certain point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: