Fork is actually a very fast system call. It never blocks, and (on Linux), only involves copying a very small amount of bookkeeping information. If you exec right after the fork, there is basically no overhead.
However, forking a new shell to parse "mv foo bar" is more expensive than just using the rename system call. And it's easier to check for errors, and so on.
SQLite is also not as slow as people think it is; you can easily handle 10s of millions of requests per day with it. If your application's semantics require table locks, MySQL and Postgres are not going to magically eliminate competition for locks. It's just that they both pick very weak locking levels by default. (They run fast, but make it easy to corrupt your data. Incidentally, I think they do this not for speed, but so that transactions never abort. Apparently that scares people, even though it's the whole point of transactions. </rant>.)
Most of my production apps are SQLite or BerekelyDB, and they perform great. I am not Google, however.
Actually, PostgreSQL's MVCC architecture makes lock contention radically less likely than with most other DBMSes. For SELECTs, you're only ever taking "Access Share" locks on the table ("Hey, I'm using this table; you can't DROP it right now."). For DML queries (UPDATE, INSERT, DELETE), you'll see those, plus "Row Exclusive", which is just what it sounds like.
There are a few other lock types you'll run into as well, but they're typically only seen in narrow, specific cases -- when you're performing maintenance (vacuuming, clustering, &c), indexing, DDL changes, or when you explicitly LOCK a table for whatever reason.
> SQLite is also not as slow as people think it is; you can easily handle 10s of millions of requests per day with it
Scaling relational databases to the point of 10s of millions of requests is extremely non-trivial. Unless you can show me personally or can show me evidence otherwise, don't make this claim. You're doing a disservice to the people that have worked countless hours to eke every last millisecond of performance out of MySQL and Postgres.
10 million per day is about 100 per second. SQLite performs about this quickly. I wrote a test script and it did 125 (unindexed) lookups per second. Then I ran two of these tests at the same time, and the rate stayed about the same. I have 8 cores, so I made 8 processes, and it was the same. 125 requests/second * 8 * 86400 seconds/day = 86_400_000 requests per day.
I added another thread writing as quickly as possible to the mix (7 readers, 1 writer), and this brought the read rate down to about 45/reads per second per thread. Still more than 10 million per day, so technically I am right.
Also, I don't doubt that MySQL and Postgres (and BDB) are both significantly faster than this. It's just that SQLite is not going to guarantee "instant failure" of your project, as the article implies.
(One thing to note -- every time you type a character in Firefox's address bar, you are doing an SQLite query. It is Fast Enough for many, many applications.)
I respect your research into the matter, but it frankly doesn't matter one bit. Here's why.
Your peak load/sec is never going to be close to your average load/sec. I've seen be between 2x-5x average load.
Read performance on simple selects is straightforward to scale. Most solutions to that problem just put the data in memory, via the database cache or memcache.
Part of the difficulty is in scaling writes. What if your 1000 queries/sec are all inserts on the same 500M row table? What if they are updates updates on a table that's 10M rows long? This is when you have to make hard decisions about sharding and the like.
I certainly believe that SQLite is "fast enough" for many applications. I also know that many applications are NOT doing 10s of millions of requests a day.
That -is- fast, but I still have trouble reconciling that deep down in my computer, a human readable SQL query gets built, and then another process parses that SQL. Seems so wasteful building and then parsing a human readable string for something that's happening on the same machine.
I know nothing of SQLites's internals, but wouldnt it make more sense to parse the query once and then store a compiled version of the query for subsequent lookups? Like you might do with a regexp?
Yes, This is known as a prepared statement. You compile a parametrized statement once, then execute it as many times as you like with different arguments.
Also, SQLite, unlike most other databases, is an embedded database which does everything in-process rather than invoking multiple processes.
Scaling relational databases to the point of 10s of millions of requests is extremely non-trivial.
Simply not true. That level of database activity is bread-and-butter for many people these days. Hundreds of millions, non-trivial in the sense that it will cost you a pile of money, but these days, basically straightforward, especially if you've done it before. Thousands of millions, now that's where things get interesting.
Oh, and he's talking about requests. The trading systems I've worked on can and do handle hundreds of millions of commits per day.
However, forking a new shell to parse "mv foo bar" is more expensive than just using the rename system call. And it's easier to check for errors, and so on.
SQLite is also not as slow as people think it is; you can easily handle 10s of millions of requests per day with it. If your application's semantics require table locks, MySQL and Postgres are not going to magically eliminate competition for locks. It's just that they both pick very weak locking levels by default. (They run fast, but make it easy to corrupt your data. Incidentally, I think they do this not for speed, but so that transactions never abort. Apparently that scares people, even though it's the whole point of transactions. </rant>.)
Most of my production apps are SQLite or BerekelyDB, and they perform great. I am not Google, however.