Metrics and dashboards
We're working on a service... and as I wrote on the prologue to the specs for the dashboard:
Launching a product or a system that you cannot monitor is a nonstarter. Adding monitoring to an existing system is far harder than designing it in from the beginning.
-me
I was working on adding some alerts to the system today as well. So I fire up a light load test to get some metrics flowing and start working on stuff.
And it's not working right. I'm getting errors when I'm not expecting them.
I get an email from an alert I had set up.
What's up?
Oh... it turns out I'm monitoring an outage of the third-party we're talking to in real-time.
They host on Heruku. And if just happens that they managed to have an incident right as I was testing.
I guess the monitoring works. :-)