> I don't get that. There was no code issue. The old and new code both worked as intended, it was a deployment and deployment-verification problem.
A code review could raise the issue of repurposing a flag in case they had to revert the deploy. Changing the semantics of a flag is a big no-no anyway, and there are ways to guard against that.
> Yes they did. They staged the new code and tested it. They did a slow deployment also.
But they didn't had a staging environment that matched their live environment, apparently. You want a staging environment that is 1:1.
> They did revert the deploy within the 45 minutes. It made it worse.
If you think reverting a deploy by simply pushing an older version is the same as a contingency plan, think again.
I agree with the GP; I don't think code reviews or testing was the problem.
I think the best-practices they violated is that they deprecated and repurposed a flag within a single release cycle. That sort of activity should take two release cycles at least, one to remove the old functionality and one to add the new functionality.
- They had dead code in the system
- They repurposed a flag for a previous functionality
- They (apparently) didn't had code reviews
- They didn't had a staging environment
- They didn't had a tested deployment process
- They didn't had a contingency plan to revert the deploy
It could be minimized or avoided altogether by fixing just one of the points. Incredible.