lunes, 8 de junio de 2009

A first look at God and Monit

Hey people!,

Lately I've been configuring a production server which basically is composed of a mongrel cluster and some other daemons that we use (backgroundrb, i.e.). To keep alive both mongrel instances and our daemons we needed some monitor tool to help us know when something went wrong and restart those processes if needed.

For that matter, I investigate the following tools: Monit and God.

Monit

Monit (http://mmonit.com/monit/) is a great utility for managing and monitoring, processes, files, directories and devices on a Unix system.

Installation

It's very straight-forward:
  1. Download the latest version on http://mmonit.com/monit/download/
  2. tar zxvf monit-x.y.z.tar.gz
  3. cd monit-x.y.z
  4. ./configure
  5. make && sudo make install (sudo if necessary)
Usage

After installing it, we need to create a configuration file. Monit would look for this file in the following locations:
  • ~/.monitrc
  • /etc/monitrc
  • /.monitrc
You can choose where you want to locate that configuration file. As an example, below I show the configuration file I used for my mongrel cluster:

#.monitrc
set daemon 30
set logfile /home/myuser/monit/monit.log
set httpd port 9111
allow remotehost
allow admin:admin # Allow Basic Auth
check process mongrel_cluster_3010 with pidfile "/var/www/myapp/current/tmp/pids/mongrel.3010.pid"
start program = "/usr/local/bin/ruby /usr/local/bin/mongrel_rails start -d -e production -a 127.0.0.1 -c /var/www/myapp/current --user myuser --group deploy -p 3010 -P tmp/pids/mongrel.3010.pid -l log/mongrel.3010.log"
stop program = "/usr/local/bin/ruby /usr/local/bin/mongrel_rails stop -p 3010 -P /var/www/myapp/current/tmp/pids/mongrel.3010.pid"
if failed port 3010 protocol http # check for response
with timeout 10 seconds
then restart

group mongrel
check process mongrel_cluster_3011 with pidfile "/var/www/myapp/current/tmp/pids/mongrel.3011.pid"
start program = "/usr/local/bin/ruby /usr/local/bin/mongrel_rails start -d -e production -a 127.0.0.1 -c /var/www/myapp/current --user lokkedc --group deploy -p 3011 -P tmp/pids/mongrel.3011.pid -l log/mongrel.3011.log"
stop program = "/usr/local/bin/ruby /usr/local/bin/mongrel_rails stop -p 3011 -P /var/www/myapp/current/tmp/pids/mongrel.3011.pid"
if failed port 3011 protocol http # check for response
with timeout 10 seconds
then restart
group mongrel


You can check the complete documentation of the commands allowed for monit in http://mmonit.com/monit/documentation/monit.html.

But I just wanted to show you a simple example that checks for the availability of the mongrel instances (in this case I just configured a couple of those). As you may notice, I have to configure each instance separatly (besides they both belongs to the same group, so I can start/stop all of those at the same time).

Other thing to notice, is the events that I want to monitor. In this case I've configured to restart a given mongrel's instance (the fragment that takes care of this is highlighted in the code below) when there is no response from it. But there are many other events, such us memory usage, cpu time, that can be measured to alert the administrator (vía email).

Run it!

To start it we just execute: moint. And we can see how the logs starts to populate:

[UYT Jun 5 19:12:15] info : monit: generated unique Monit id 15fdb0bdb830b0a114a3831d995ec32e and stored to '/home/myuser/.monit.id'
[UYT Jun 5 19:12:15] info : Starting monit daemon with http interface at [*:9111]
[UYT Jun 5 19:12:15] info : Starting monit HTTP server at [*:9111]
[UYT Jun 5 19:12:15] info : monit HTTP server started
[UYT Jun 5 19:12:15] info : 'willy' Monit started
[UYT Jun 5 19:12:34] info : Shutting down monit HTTP server
[UYT Jun 5 19:12:35] info : monit HTTP server stopped
[UYT Jun 5 19:12:35] info : monit daemon with pid [21560] killed
[UYT Jun 5 19:12:35] info : 'willy' Monit stopped
[UYT Jun 5 19:13:42] info : Starting monit daemon with http interface at [*:9111]
[UYT Jun 5 19:13:42] info : Starting monit HTTP server at [*:9111]

One of the greatest things about monit is that it offers a web interface (as you may notice from the configuration file). Which looks pretty nice!





God

God its a newer tool written in ruby and available as a rubygem. For this matter, it is easier to install it also (i.e. [sudo] gem install god).

Usage

We need to create a configuration file, which is entirely written in ruby. This is one the best things about God, because it helps us reduce duplication among configurations, as you can see in the following example (which is intended to monitor the same mongrel cluster mentioned above):

#myapp.god
RAILS_ROOT = "/var/www/myapp/current"
%w{3010 3011}.each do |port| God.watch do |w|
w.name = "mongrel_cluster_#{port}"
w.group = 'mongrels'
w.interval = 30.seconds
w.start = "mongrel_rails start -c #{RAILS_ROOT} -p #{port} \
-P #{RAILS_ROOT}/tmp/pids/mongrel.#{port}.pid -d -e production"
w.stop = "mongrel_rails stop -P #{RAILS_ROOT}/tmp/pids/mongrel.#{port}.pid -e production"
w.restart = "mongrel_rails restart -P #{RAILS_ROOT}/tmp/pids/mongrel.#{port}.pid -e production" w.start_grace = 10.seconds
w.restart_grace = 10.seconds

w.pid_file = File.join(RAILS_ROOT, "tmp/pids/mongrel.#{port}.pid")

w.behavior(:clean_pid_file)

w.start_if do |start| start.condition(:process_running) do |c|
c.interval = 5.seconds

c.running = false
end
end

end


As you can see, I can refactor the mongrel's configuration into just one block which is instantiated for each mongrel's port. Besides that, I've also reduce the amount of configuration using RAILS_ROOT constant in many settings (reducing error prune also).

Run It!

To start our God monitoring tool, we just exectue: god -c myapp.god

After which, we can see the log using the following command: god log mongrels (to show the logs relative to our group of mongrel instances). Which should display something like this:

I [2009-06-08 12:59:16] INFO: mongrel_cluster_3010 move 'unmonitored' to 'up'
I [2009-06-08 12:59:16] INFO: mongrel_cluster_3010 moved 'unmonitored' to 'up'
I [2009-06-08 12:59:16] INFO: mongrel_cluster_3010 [trigger] process is not running (ProcessRunning)
I [2009-06-08 12:59:16] INFO: mongrel_cluster_3010 move 'up' to 'start'
I [2009-06-08 12:59:16] INFO: mongrel_cluster_3010 before_start: deleted pid file (CleanPidFile)
I [2009-06-08 12:59:16] INFO: mongrel_cluster_3010 start: mongrel_rails start -c /var/www/myapp/current -p 3010 -P /var/www/myapp/current/tmp/pids/mongrel.3010.pid -d -e production
I [2009-06-08 12:59:27] INFO: mongrel_cluster_3010 moved 'up' to 'up'
I [2009-06-08 12:59:27] INFO: mongrel_cluster_3010 [ok] process is running (ProcessRunning)

Conclusion - My Choice

Any of those tools gets the job done pretty well. But I decided to use monit, because it is easier/faster for me to manage the process using a web-interface instead of logging into the server via ssh.

Besides that, God seems to be a great tool as it provides a way to reduce duplication (because its configuration is defined with ruby code).

1 comentario:

iPad Application Development dijo...

This is my first visit here. I found much informative stuff on your blog, especially its discussion. Form the tons of comments and posts, I guess I am not the only one having all the enjoyment here. Keep up the good work.