Per-service proxy deployments are a bit complex for the infrastructure but provide a nice abstraction for the service and service developers themselves. The configuration scheme is indeed daunting, which is what we're hoping Envoy and its xDS APIs + centralized configs can help us solve for developer teams.
We're actually looking to put Envoy in front of the redesign stack at some point in the near future! The major services backing the redesign can be isolated into a few smaller pieces, and we'd like to have Envoy be a routing layer that can abstract this for the central browser client as we evolve the backend.
Awesome blog post! I very much enjoy hearing how large web properties implement these technologies and any issues they experience along the way.
Are you using envoy at all in your main http ingress path? You mentioned haproxy and AWS ELBs, but it wasn't clear if envoy is also being considered for public ingress traffic.
We have not yet put Envoy in our main HTTP ingress path, but internally we have designs and implementation paths ready to go, and it's definitely being considered for public ingress traffic. As we noted in the last "teaser" section of the post we'd really like to leverage Envoy's routing functionality to facilitate migrating client-facing APIs in the backend without affecting frontend interfaces.
Our HAProxy layer that routes ingress traffic to the core backend infrastructure has considerable routing logic that can be moved to Envoy and then further extended. We'd love to explore that path in the coming months.
I look forward to hearing more about your plans for ingress and how the various pieces fit together (CDN, L4/L7 LBs, TLS termination, Geo/policy DNS balancing). Especially regarding the performance and new features available using Envoy. I've use HAProxy before and it was great for simple routing/reverse proxy but not so great at complex/dynamic configuration or cert management.
HAProxy supports quite complex configurations. We've actually found that many of our users are only realizing extremely basic capabilities so we have been working on increasing our blog content to help them take advantage of some of the more complex configurations that can be done. We've even found that many users are not aware that HAProxy now supports Hitless Reloads [1].
Quite a bit of complex routing and dynamic configurations can be provided by map files [2] and these and many other settings can be updated directly from the Runtime API [3].
With that said -- we are actively working to make things even better and intend to introduce support for updating SSL certificates/keys directly through the Runtime API as well as introducing a Data Plane API for HAProxy.
We have a new release coming any day now and this will lay the foundation that will allow us to continue to provide best-in-class performance while accelerating cutting edge feature delivery.
Yes! HAProxy is a terrific piece of tech, and has been awesome for our use cases so far. We do quite a bit with it for our main ingress routing and it was basically flawless as our data plane in SmartStack.
I'm really excited about what we're building out for next year and can't wait to share as well. Feel free to reach out on reddit (u/wangofchung) or directly at courtney.wang@reddit.com for more in-depth discussion!
1. Have you considered/are considering ISTIO control plane for your Envoy fleet? Why or why not?
2. Did you containerize your applications before using Envoy? The blog post talks about running them on autoscaled ec2 instances but its not clear if you're running application binaries on those vms or serving from containers
1. We are considering Istio! This is especially true for our Kubernetes environment. We are already planning to deploy Pilot for the first iteration of our control plane in our non-K8s environment, so the other pieces that comprise Istio is a natural place for us to continue exploring.
2. We have not containerized prior to Envoy. We're running application binaries provisioned with Puppet on EC2 for most of our infrastructure still.
We run one proxy per machine, even when there are multiple services running. The proxy is just an abstraction to the downstream dependencies. Even if there are multiple services per machine, they can still reach downstream services via the same proxy path.
Thanks for all the answers, I really appreciate it! I've got one more question.
In the period you had parts of your system with envoy and parts without, have your routed the outbound traffic from envoy-equiped services through their local proxy before reaching its envoy-less destination? Or did you omit envoy then?
We route all outbound traffic from internal services through Envoy, even if the destination isn't running Envoy. We don't have envoy running as a "front" proxy right now, i.e. our L4 setup isn't Envoy <-> Envoy, it's Envoy -> service directly. An example of this is the DB layer - traffic going to our DBs from services goes through Envoy service-side but Envoy isn't running on our DB instances.
What are the reasons for your to have chosen to do so? I mean going for a "back" proxy instead of envoy-envoy (which seems to be the most "advertised" approach) or "front" proxy. As I understand, this way you're losing the envoy features for your most "shallow" service. Or do you also run envoy on your ingresses?
The "back" proxy was the initial setup with SmartStack, so we went with that for minimal viable first steps. We wanted to make incremental changes, changing as little as possible, for this migration so we could monitor for correctness and performance at every step. The eventual plan is to run Envoy as a front proxy for ingress, and maybe even Envoy <-> Envoy everywhere, where we have Envoy as both a front and back proxy on every service deployment (instance, container, etc.)
Others have mentioned that there are some gotchas with Envoy, and you mention a few about the migration bumps. Did you encounter other gotchas? And do you have any suggestions on how to avoid/mitigate their impact?
but as the above indicates, they were resolved _very_ quickly.
The most important thing when making a transition like this is to have as much monitoring and observability as possible without the new tech. We were able to quickly identify and respond to issues we had with Envoy based on existing application and system instrumentation that weren't directly provided by Envoy, along with the vigilance of our engineering team.
Hey, I myself am planning to introduce envoy into an existing mixed kubernetes/bare-metal architecture, having the same "one service at a time" considerations.
Have you been thinking about adopting istio? If yes, why didn't you?
We're currently evaluating the pieces that comprise Istio, both within Kubernetes and outside of it in our existing infrastructure.
We didn't do so immediately because we did not want to immediately update all of our technology at once and felt that a piece-wise migration would be both the least disruptive to our infrastructure and safest. I think of Istio as like Smartstack in that it's not actually a complete "thing" so much as a suite of technologies that can be individually evaluated and deployed. It's very easy to fall into the trap of wanting to do everything at once, and we opted to make small progressive steps for this initiative.
Yeah, I'm thinking of going with only the minimal istio installation (pilot only), not to roll our own, and setting it up with our existing consul service Discovery.
It's odd to me too. They had a great deal of pre-order information, something that most other server teams that face crazy launches don't ever have. I would guess that they could at least estimate the load to an order of magnitude. The fact that single player can't even be accessed is absurd, something that your comment on sharding hits on. The server teams at Blizzard should also have had some experience with this in their WoW trials and errors.
I would think that's pretty offensive to anyone who's in any sort of "* Phi Delta *" fraternity. In a thread that touches on the offensiveness of stereotypes and subsequent alienation, the irony of both the comment and your LOL reaction should be pointed out. As far as "thoughtful comment" goes, there certainly wasn't much thought put into the phrase "Phi Delta Toolbag".
Thank you for calling out the negative stereotyping in these comments of frat bros. I realize that "won't somebody think of the bros" isn't going to get much sympathy in these forums, but stereotyping is stereotyping, be it against women, or another group of people just enjoying their activities without harming other people (in this case, the fraternity crowd).
You're right, that hadn't occurred to me. I'm sorry--if I'd been a bit more thoughtful I wouldn't have reacted like that. I was, in part, trying to be appreciative of a comment that was clearly thoughtful.
Penn sometimes jokingly refer to our CS group (which has the abbreviation "DP") Delta Phi, especially when we're planning social events or the like. So it was an amusing coincidence, too.
Interesting, isn't it? Our direct reactions tend to be the most honest, but we don't often want to admit that to ourselves for we know that we should be aspiring to be better.
Tolerance might be the opposite of abstraction, and thats a hard reality in a field where abstraction has gained us so much.
Maybe it's just me, but I saw "Phi Delta Toolbag" and immediately imagined a fraternity created for the express purpose of congregating toolbags together ... which seems likely and hilarious. I did not consider it a generalization about fraternities in general.
I'm curious as to why you feel that way. Why is "harshing on frats" as you put it not a negative act? Also, how does the reddit post not apply to the original issue here?
Because frats are not a put upon minority, or any segment of society that deserves our contrition and respect. They're drunken twenty-something white guys whose primary concern is how to be drunker and whiter and twenty-somethinger. They are an outgrowth of an otherwise useful function (university), not useful in and of themselves.
The reddit post is applicable because it is a rejoinder to those who claim offense at that which cannot reasonably be claimed to cause real offense to those for whom such a thing matters.
"They're drunken twenty-something white guys whose primary concern is..." <--- Stereotyping. Generalizations. Right here. In your comment.
There are fraternities and sororities whose members are not all "twenty-something white guys" and whose efforts are worthwhile. Do a smattering of research and you'll discover this.
That's actually my lecture that ClassMetric broke down in the blog post. From a (first time) instructor's perspective, I don't see the tool as a distraction to students. I'm not punishing students for using it, and students have another way to communicate to me that they don't understand something in lecture. I only lectured for one hour a week, the TAs spent the majority of the time with the students. In my case, Class Metric allowed me to touch on the big ideas of the week during my lecture, measure confusion on topics, and figure out how TAs should structure their discussions and labs to be the most useful to the most students. The message posting system also allowed TAs to participate directly during lecture as well, answering questions and addressing confusion as it appeared so students could better keep up with the lecture material. Yes, it may be distracting to have knowledge coming from two channels, but students can still refer back to that channel later when they are reviewing the whole lecture material.