I did not realize they were on Azure until I read this comment. GitLab post did ...

NetStrikeForce · on Nov 13, 2016

I'm glad I wasn't the only one to notice! Literally the first thing the Google engineer does is mention Azure, while I had to go back to Gitlab's article all confused as I didn't notice at any point they were running on Azure.

I can't say it's a bad thing to do; I just can't help but notice how Google invests in social media participation. That's why they own the conversation in places like HN and can pull off this stuff.

What I fail to understand is, how does GCE or AWS tackle the issue described in the article? As far as I understand, their problem seems difficult to work around due to the nature of the Cloud (shared).

How would GCE be better than AWS or Azure at this? I would be really interested to know and I'm sure that'll be useful for other HNers with the same worries.

illumin8 · on Nov 14, 2016

Disclaimer: I work for AWS.

To solve problems just like these, we offer EBS (elastic block storage) with provisioned IOPs guarantees. Essentially, you can get guaranteed IOPs if you need it for I/O intensive applications; up to 30,000 IOPs per EBS volume.

But, PIOPs EBS volumes wouldn't be my first recommendation. It sounds like what they really need is an elastic, scale-out filesystem with NFS semantics. We have Elastic File System, or EFS, which is exactly that. It's a petabyte scale filesystem that is highly available across multiple availability zones, and scales in IOPs and performance as it scales in size.

Their application should also look at leveraging S3 object storage, rather than NFS, because that is a highly distributed, highly available object storage system, that is likely to give better scalability, availability, and performance, than rolling your own Ceph infrastructure.

NetStrikeForce · on Nov 18, 2016

This is a great answer, thanks illumin8! (and cool nick too :D)