More

AndrewBowman · on Sept 18, 2022

There were 2 emails sent, you must have seen the screenshot from the first one only.

The second one was notification that they would be charging our "prepaid accounts" the next day (Saturday Sept 17) for the next month's services. It was very much more of an invoice (breaking down what they would charge us for) than the first email.

If anyone did have a credit card attached to their account, they're looking at it from the fraud angle. Everyone else is wondering if they're going to try to force some kind of debt collection because we are not going to pay, that's likely where the knives-out reactions are coming from.

Doing an automatic in-place "upgrade" from a free tier to a paid account tier to years-old accounts (and without customer consent!) is a scummy move. Even if not intended to be malicious, it's an idiotic thing to do, especially business-wise. They just flushed away any good will leftover from their acquisition of the FogBugz name.

The cherry on top is that any attempt to log into the account to cancel it, or to contact customer services (which requires creating a customer service account) errors out, so any of the legitimate ways one might think of using to address the situation are blocked.

chrisbuc · on Sept 20, 2022

> If anyone did have a credit card attached to their account, they're looking at it from the fraud angle. Everyone else is wondering if they're going to try to force some kind of debt collection because we are not going to pay, that's likely where the knives-out reactions are coming from.

This exactly

AndrewBowman · on June 7, 2022

The purpose of a license is to grant rights under conditions to those who would otherwise have no rights to use or modify the software at all.

As the copyright owner, you already have full rights to use the code you wrote the way you want, so you do not need to enter a license agreement with yourself to gain rights that you already have. And so you are not subjected to the license; the license is between you as the copyright owner granting rights to others who are NOT the copyright owner.

As far as public contributions to the GPL'd code (such as Neo4j Community Edition), usually there are agreements that are made (contributor license agreements or copyright transfer agreements) prior to merging the contributed code that resolve issues of relicensing and usage.

This answer on StackExchange provides some good detail.

https://opensource.stackexchange.com/questions/2077/so-the-g...

AndrewBowman · on July 15, 2021

I wouldn't say relational databases are poorly designed, any more than saying a hammer is poorly designed because it makes for a bad screwdriver. A hammer is still excellent at working with nails, it's hard to find a better tool to work with that. This is just about using the right tool for the job.

Back when data was simpler and not as big, relational databases were perfect, and there have been years of engineering and bug fixes that have gone into them. They are excellent at what they do, and they continue to improve.

But as technology has improved, as our disks and memory have gotten bigger, as the data we collect and want to query over has gotten bigger, and as our queries have gotten more complex, we've been running against the limitations of log(n) joins and relational database technology for some use cases. Now, not every problem is a nail. Some are screws. Some are more exotic.

That's been the reason for nosql databases in the first place, to try to address the shortcomings that arise as data gets bigger, more complex, and as queries and operations become more complex over large data.

log(n) joins are fine...until data explodes, and you're no longer doing just a handful of joins per query, but a very large number of them, maybe even unbounded, and maybe the rules for what data to traverse has soft or even no restrictions. When your data is graphy, when the questions you want to answer require traversals of this scale and nature, and when you want to make sure your traversal costs are proportional only to the graph you want to traverse (and not proportional to the total data in the database), then graph databases provide a very good tool for modeling and efficiently querying over that data.

Graph databases are relatively young, compared to relational databases. Yet their usage has been proven, especially as more graphy problems and data have grown more common.

Relational databases are still useful, and still improving, and graph databases will also continue to grow and improve side by side with them.

We even have a GQL initiative, on the language side, aimed at becoming an ISO standard that will hold an equivalent position as SQL, but for graph querying. That should speak to the value and importance of the paradigm.

goto11 · on July 20, 2021

The fundamental premise of the relational model is the physical/logical distinction. The relational model deliberately does not make any requirements or assumptions about how data is physically stored or structured.

The difference between relational and graph (and other NoSQL database systems) is not about particular sizes and shapes of data, it is about level of abstraction. For example assuming joins are "log(n)" makes certain assumptions about how relations and indexes are implemented which is only true for some naive implementation (like Access or MySQL).

Just as an example, materialized views is a physical-level optimization where an arbitrary complex query result is stored and kept updated, which means data can be retrieved as fast as physically possible. Of course this has a cost at insert-time, since materialized views also have to be updated - but this is a performance trade-off just like the structure of a graph database is a performance trade-off.

NoSQL databases has a tight coupling between the physical and logical structure, which makes them easier to optimize towards particular usage patterns but harder to adapt to changing requirements over time. The relational model was specifically designed for large databases used by multiple applications and changing over time.

AndrewBowman · on June 18, 2021

To propose a different perspective, a relationship in a graph db is like a materialized join. You pay on relationship creation (you might be using index lookups to find the nodes to connect, similar to a relational db), then for traversal it's just pointer hopping across the relationships to the connected nodes. Aside from the initial lookup of starting node(s), traversing the graph won't use indexes at all, so becomes constant time operations.

AndrewBowman · on June 17, 2021

If that were so then there would be no need for native graph databases at all, and we would not be seeing cases that could not be served by relational dbs that are possible with Neo4j and native graphs.

You may be thinking of non-graph use cases. When hundreds of thousands to millions or more traversals are required to address graphy use cases, if those traversals are implemented as table joins, and the join complexity is dependent upon tables that are millions or billions in size (so dependence on total size of the data instead of just the relevant connected relationships) then you can see where pointer hopping on only relevant connected relationship and node records (proportional only to the elements of the subgraph traversed, not total data) would outperform the relational model. Also you have the flexibility of being as strict or as lenient as required with the labels of nodes traversed or the relationship types traversed, as well as their direction. That's tougher to do when you may not know what tables are meant to be joined or how, or if you pour all your nodes into a giant table, where the join cost is proportional to your total data.

Relational databases are very good at what they do. But no tool is perfect and covers all use cases easily. Design is a matter of tradeoffs, and some of the design choices made that make them excellent in many categories becomes a weakness in others. We're in an era of big data, huge data, where modeling, traversing, and exploring the connections within this data is increasingly valuable, and increasingly costly due to the sheer amount of data and the complexity of both the connections between and the use cases themselves. Native graph databases are a tool for these cases, and can also bring along simplicity in modeling and querying to the table as well as the performance that gives them an edge in these cases.

AndrewBowman · on April 15, 2021

Hello, just making sure you're aware that while the Neo4j Sandbox is a quick way to get a feel for Neo4j and try out some tutorials and datasets, it isn't meant to be used as a general Neo4j cloud service. Your sandbox gets wiped after 10 days, for example, and the resources supporting sandbox instances aren't optimized for performance. It's a casual try-it-out experience.

The Sandbox also predates Aura (our ACTUAL cloud service) by a number of years, so you could say the sandbox is kind of like a prototype.

If you're looking for a Neo4j cloud service, please refer to Neo4j Aura: https://neo4j.com/cloud/aura/

The free tier is coming soon.

AndrewBowman · on April 27, 2020

The idea is not just to keep the curve low initially and then abandon all measures, but to figure out how to keep the curve consistently low over time until a vaccine or highly effective treatment is discovered.

Basically, when you're in free fall and the parachute slows you down comfortably, don't take that as the cue to cut your chute.

Also you may have overlooked adequate testing availability and contact tracing.

And even if contact-tracing is lagging, an abundant supply of testing can at least help to ensure that symptomless or incubating-but-not-showing-symptoms-yet infections can be identified early to prevent wildfire-type spread.

My guess would be that with more tests and resources available, start with a stuttered type approach, open for 2-3 weeks, then lockdown for 2-3 weeks while we wait for new infections to manifest symptoms and for infected to quarantine until they're not infectious.

That could serve as a baseline for what to watch for during reopening, to gauge risk as to whether to repeat with some alteration in timing.

AndrewBowman · on Sept 25, 2019

That compares against Neo4j 1.9.4, released in 2013. All technologies in question have improved much since then, especially graph db technology, efficiency, and speed, so I don't think that paper has as much relevance anymore. Would love to see a more updated comparison.

AndrewBowman · on Sept 23, 2019

In Neo4j at least the node ids are offsets into the node and relationship stores, so you are literally pointer hopping through the store files from node structure to relationship structure to node structure. No need for a hash table or b-tree index (excepting finding your starting nodes in the graph before beginning traversal.)

kyllo · on Sept 23, 2019

Wouldn't that make deletes super expensive, similar to deleting nodes from a doubly-linked list on disk in terms of complexity? How could it delete a node with millions of relationships if it needs to read all those blocks from disk to traverse all the pointers?

And doesn't it cause more cache misses when the on-disk pointers refer to nodes that are spread out across different file blocks?

And why does Neo4j have indexes if it claims to have no need for them? https://neo4j.com/docs/cypher-manual/current/schema/index/

I don't know how Neo4j is implemented, but I'm skeptical that it's purely index-free adjacency, I suspect there is some hybrid data structure backing it.

AndrewBowman · on Sept 24, 2019

Deletes do have an extra cost as relationships do need to be deleted first, and there are some batching approaches for handling this case. For graph databases there's not much choice on this unless you want to deal with dangling relationships (and resulting inconsistencies).

Note that deleting of nodes does not have to create new relationships between the adjacent nodes, so not quite like deleting nodes from the middle of a doubly-linked list.

A large pagecache is recommended for optimal speed, and SSDs are also recommended. Hardware continues to become cheaper.

Relational databases and Neo4j use indexes differently, which I think is part of your confusion here. We both use indexes for looking up nodes, true, but Neo4j only uses this for finding certain starting (or end) nodes in the graph. The important (and more complicated and costly) part of a query isn't finding your starting nodes...it's expanding and traversing from these nodes through your graph.

Neo4j uses index-free adjacency for traversing the graph. Relational dbs need to use table joins. One of these is only dependent on the relationships present on the nodes traversed (or rather only the relationships you're interested in, if you've specified restrictions on the relationship type and/or direction). Table joins are dependent on the size of the tables joined (then of course you must consider how many joins you must perform...and how to do these joins if there's nothing restricting which tables to join in the course of traversal).

Again, index-free adjacency does not mean that we must adhere to this in the most literal sense. Ideological purity is not the point. Graph traversals are the most complex part of a graph query, and this is where index-free adjacency is used to the advantage of native graph dbs.

And just to note, we certainly can join nodes based on property values, just like a relational database, and yes we can even use an index to speed that up, in the same manner as relational dbs. In fact you may indeed need to do this in order to create the relationships that you'll use later in your queries. Graph dbs are optimized such that if you do need to use joins, you'll perform them early, and once, so that you can take advantage of index-free adjacency during traversal in your read queries. Traversal speed and efficiency is the point of index-free adjacency.

AndrewBowman · on Sept 19, 2019

There is a older, but seemingly relevant, question/answer for this on stack overflow: https://stackoverflow.com/a/5611541/92359

> if you shrink a table, or update a partitioned table (causing a row to move to another partition) or if you are rebuilding a table, or export/import a table, or... or... or... the rowid will change.

If the rowids are not stable across db operations, it wouldn't make sense to use them for implementing index-free adjacency. Do any alternatives remain? If not you're back to joins.

One of the reasons why Neo4j can use index-free adjacency is that the ids used for nodes and relationships are pointers to the location of the nodes and relationships in the relevant store files. Those are stable across updates and deletes of other data, and when you delete a node, all its relationships must be deleted first so there are no hanging relationships.