Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The ideal solution for health checks is to set both a maximum timeout duration, and a maximum number of retries. Typically you would want to fail after X retries first, and up to Y time (to account for network weirdness). But you definitely want to fail earlier, and not just wait for a long-ass time to pass before you finally fail.

That's for a standard service health check anyway. That service and health check shouldn't be started until the container it depends on has started and is healthy. In Kubernetes that's an Init Container in a Pod, in AWS ECS that's a dependsOn stanza in your Task Container Definition, and for Docker Compose it's the depends_on stanza in a Services entry.

  set -eu
  nowtime="$(date +%s)"
  maxwait=300
  maxloop=5
  c=0
  while [ $c -lt $maxloop ] ; do
      if timeout "$maxwait" curl --silent --fail-with-body 10.0.0.1:8080/health ; then
          exit 0
      else
          sleep 1
      fi
      if [ "$(date +%s)" -gt "$((nowtime+maxwait))" ] ; then
          echo "$0: Error: max wait time $maxwait exceeded"
          exit 1
      fi
      c=$((c+1))
  done
However, Curl already supports this natively so there's no need to write a script.

  curl --silent --fail-with-body --connect-timeout 5 --retry-all-errors --retry-delay 1 --retry-max-time 300 --retry 300 10.0.0.1:8080/health


(edit: I forgot after "done" you need "exit 1" so it fails with an error after the max loop count, and "--retry 300" should be "--retry 5" for consistency with the script)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: