There are lots of hosted tools available which will monitor the resources you provide for your applications, however they all come with a high price tag when running any sort of serious infrastructure. At aTech we monitoring everything internally using a combination of tools including Nagios and Cacti - installing both of these tools is easy although configuring them can be a bit confusing until you get you head around how both systems work. We use these tools for two different purposes; Nagios is primarily used for making sure everything is online and notifying our team should we detect any issues whereas Cacti is used to keep track of historical statistics and plot them onto charts for viewing at a later date.
As part of our live monitoring, we monitor a number of key metrics across all our servers including:
- Current load, memory, hard drive & swap usage on all servers
- The status of any RAID arrays within any of our physical servers
- The number of running processes and number of zombie processes
- The status of key services – which actual services depends on the server being monitored – for example, on Codebase we monitor HTTP & SSH as these are the key public facing services whereas on Deliver we monitor SMTP.

- The numbers and types of MySQL queres which are executed on all our database servers (see above).
- The volume & types of jobs which are executed by any of our applications which include a background runner (see below).
- The number of requests processed by our load balancers.
- The power usage of all our data centre equipment.
- All network bandwidth information from our routers, firewalls and switches.

All this information allows us to keep our fingers on the pulse of our entire infrastructure and all our applications. The information is available to all staff through our intranet and also projected onto a wall in our office allowing us to react immediately to any issues during office hours and we’ll automatically receive text alerts outside of office hours.

The screenshot above shows the display which is projected onto our office wall and includes our most common and important machines.
