Authors: Jeffrey McCune James Turnbull
The system administrator has merged the changes using the--no-ff
option in order to create a merge commit for each of the two topic branches. In the future, this merge commit will allow the team to refer back to the change list as a whole rather than having to tease apart which commit is associated with which topic. We’re able to verify that both the changes from the operator and the developer are now in the master branch of the system administrator’s repository, by using thegit log
command:
sysadmin:~modules/ $ git log --abbrev-commit --pretty=oneline origin/master..
1bbda50... Merge commit 'origin/operator/ssh'
9b41d49... Merge commit 'origin/developer/postfix'
e4e27c7... Updated config.pp to use $module_name
0c164f6... Added manual change warning to postfix config
eea4fbb... Added AllowGroups to sshd_config
Notice that this time, the system administrator has used thegit log
command to display abbreviated log messages from the current head of theorigin/master
branch to the current head of the local checked out branch. He choseorigin/master
because he has not pushed the newly merged changes to the central repository and this command therefore shows a list of changes that will be pushed if he decides to do so.
Everything looks good, as he expected. He also doesn’t see the commit the developer made on his own topic branch to add the missing curly brace, because the developer chose to rebase his topic branch against the master branch before publishing his change list.
The team members decide to make the newly merged master branch the first testing branch, and they decide to continue developing on the master branch over the next couple of weeks. In a few days or weeks, the team will come together and decide on which of the change lists that each member has contributed are ready for merging into the testing branch. The system administrator starts this process by merging the changes he just made to the master branch into the testing branch, then pushing all of these changes to the central repository:
sysadmin:~modules/ $ git checkout testing
Switched to branch "testing"
sysadmin:~modules/ $ git merge master
Updating fa9812f..1bbda50
Fast forward
postfix/files/master.cf | 4 +++-
postfix/manifests/config.pp | 4 ++--
ssh/files/sshd_config | 1 +
3 files changed, 6 insertions(+), 3 deletions(-)
sysadmin:~modules/ $ git push origin
Counting objects: 6, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 494 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To [email protected]:git/modules.git
fa9812f..1bbda50 master -> master
sysadmin:~modules/ $ git push origin testing:testing
Total 0 (delta 0), reused 0 (delta 0)
To [email protected]:git/modules.git
* [new branch] testing -> testing
Notice that the system administrator executes two different push commands: one plaingit push origin
, and onegit push origin testing:testing
. This is becausegit push,
by default, will only push the changes made to local branches into a remote repository if there is a branch with the same name in both locations.
Previously, the operator and developer logged into the puppet master and activated their changes by checking out their code in/etc/puppet/environments/development/modules
. Similarly, the system administrator needs to fetch and checkout the new testing branch into the/etc/puppet/environments/testing/modules
repository to activate the new configuration in the testing environment. Before doing so, he verifies that the remote named “origin” is configured to connect to the central repository at[email protected]:git/modules.git:
puppet:~ $ cd /etc/puppet/environments/testing/modules/
puppet:modules/ $ git remote -v
origin /etc/puppet/modules/.git
puppet:modules/ $ git remote rm origin
puppet:modules/ $ git remote add origin [email protected]:git/modules.git
puppet:modules/ $ git fetch origin
remote: Counting objects: 39, done.
remote: Compressing objects: 100% (24/24), done.
remote: Total 28 (delta 9), reused 0 (delta 0)
Unpacking objects: 100% (28/28), done.
From [email protected]:git/modules
* [new branch] developer/postfix -> origin/developer/postfix
* [new branch] master -> origin/master
* [new branch] operator/ssh -> origin/operator/ssh
* [new branch] testing -> origin/testing
Now that the testing environment repository has an up-to-date list of the branches, including the new testing branch, the system administrator performs agit checkout
to activate the new changes on the system:
puppet:modules/ $ git checkout -b testing --track origin/testing
Branch testing set up to track remote branch refs/remotes/origin/testing.
Switched to a new branch "testing"
The system administrator is finally able to test a Puppet agent against the new testing environment, which now contains both changes: the SSH contribution from the operator, and the Postfix contribution from the developer. The testing environment is the only place where both changes are currently active in the configuration management system.
root:~ # puppet agent --test --noop --environment testing
info: Caching catalog for scd.puppetlabs.vm
info: Applying configuration version '1289770137'
…
info: /Stage[main]/Ssh::Config/File[/etc/ssh/sshd_config]: Scheduling refresh of Service[sshd]
notice: /Stage[main]/Ssh::Service/Service[sshd]: Triggered 'refresh' from 1 events
notice: Finished catalog run in 2.77 seconds
Our team of Puppet contributors at Example.com has been effectively making changes to the configuration management system. Using Puppet Environments and a version control system, they’re able to work efficiently and independently of one another without creating conflicts or obstructing another person’s work. We’ve seen how the operator and the developer were able to make two changes in parallel, publishing those changes in a branch in the central version control repository for the system administrator to merge into a testing branch.
The team has also tested a number of machines using the Puppet agent in the testing environment, and is now ready to release the configuration to the production systems. This section covers how the team creates their first release, and provides a process to follow for subsequent releases.
You’ll also see how a Git feature called “tagging” is useful to provide a method of referring to a specific point in time when the production configuration was active. You’ll see how tags provide the ability to quickly roll back changes that might not be desirable in the production environment.
First, the team decides to release the current testing branch into production. Before doing so, the system administrator creates a tag so this release can be easily referred back to in the future. The system administrator does this in his own personal repository in his home directory:
sysadmin:~ $ cd ~/modules/
sysadmin:~modules/ $ git checkout testing
Switched to branch "testing"
sysadmin:~modules/ $ git tag -m 'First release to production' 1.0.0
sysadmin:~modules/ $ git push --tags origin
Counting objects: 1, done.
Writing objects: 100% (1/1), 177 bytes, done.
Total 1 (delta 0), reused 0 (delta 0)
To [email protected]:git/modules.git
* [new tag] 1.0.0 -> 1.0.0
The process of creating a tag is often called “cutting a release.” The system administrator has done just this, tagged the current testing branch as a release to production, and then published the new tagged release into the central repository.
New branches, such as the testing or topic branches, were activated in the development and testing environments in the previous section. The process of activating a new production release is very similar, except instead of checking out a branch, which may change over time, a specific tag is checked out, which is static and refers to a very specific point in the history of configuration changes.
To activate the new production release, the system administrator logs into the Puppet master system as the userpuppet
, fetches the new tag from the central repository, and then checks out the tagged production release. Unlike the development and testing environments, Example.com has chosen to configure the production environment to use the working copy at/etc/puppet/modules
rather than as a sub directory of/etc/puppet/environments
where the development and testing active working copies reside.
puppet:~ $ cd /etc/puppet/modules
puppet:modules/ $ git fetch origin
remote: Counting objects: 21, done.
remote: Compressing remote: objects: 100% (13/13), done.
remote: Total 14 (delta 3), reused 0 (delta 0)
Unpacking objects: 100% (14/14), done.
From [email protected]:git/modules
* [new branch] developer/postfix -> origin/developer/postfix
fa9812f..1bbda50 master -> origin/master
* [new branch] testing -> origin/testing
* [new tag] 1.0.0 -> 1.0.0
Remember that thegit fetch
command does not affect the currently checked out configuration; it only updates the internal git index of data. The system administrator then checks out the newly-released production environment using the same familiar syntax we’ve seen so far:
puppet:modules/ $ git checkout tags/1.0.0
git checkout tags/1.0.0
Note: moving to "tags/1.0.0" which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
git checkout -b
HEAD is now at 1bbda50... Merge commit 'origin/operator/ssh'
The note about moving to a non-local branch may be safely ignored. A tag is static reference, and the team should not have any need to directly modify the files in/etc/puppet/modules
or make commits from the active production environment repository.
After executing thegit checkout
command to place the 1.0.0 release of the configuration into the production environment, everything is now active for the puppet agents. The system administrator verifies this by executingpuppet agent
in the default environment:
root:~ # puppet agent --test --noop
info: Caching catalog for scd.puppetlabs.vm
info: Applying configuration version '1289772102'
notice: Finished catalog run in 0.53 seconds
You will also remember that the default environment is the production environment, and as such, the system administrator did not need to set the--environment
command line option. If something were to have gone wrong in the production environment, a previous tag may be activated quickly, rolling back the changes introduced by the release of a new production configuration. One of the team members simply needs to executegit checkout tags/x.y.z
to roll back the configuration.
The changes and workflow we’ve seen the operator, developer, and system administrator undertake in this chapter may now be repeated in a cycle. This development, testing, and release cycle provides an effective method to make changes to the configuration management system in a safe and predictable manner. Changes to the production system can be made with confidence: They’ve been vetted through the development and testing phases of the release process, they’ve been explicitly tagged in a release, and they can be quickly and easily backed out if things go awry.
You’ve seen how Puppet environments enable a team of contributors to work effectively and efficiently. Puppet environments, combined with a modern version control system, enable three people to make changes simultaneously and in parallel without obstructing each other’s work. Furthermore, the tagging and branching features of modern version control systems provide an effective release management strategy. The process a single team member may follow in order to make changes is summarized as:
http://www.debian.org/doc/FAQ/ch-ftparchives.en.html
http://docs.puppetlabs.com/guides/environment.html
http://projects.puppetlabs.com/projects/1/wiki/Using_Multiple_Environments
We’ve seen that the Puppet agent and master require very little work to get up and running on a handful of nodes using the default configuration. It is, however, a significantly more involved undertaking to scale Puppet to handle hundreds of nodes. Yet many installations are successfully using Puppet to manage hundreds, thousands and tens of thousands of nodes. In this chapter, we cover a number of proven strategies that are employed to scale Puppet.
In this chapter you’ll see how to enable a single Puppet master system to handle hundreds of nodes using the Apache web server. We also demonstrate how to configure more than one Puppet master system to handle thousands of nodes using a load balancer. Throughout, we make a number of recommendations to help you avoid the common pitfalls related to performance and scalability.
Finally, you’ll learn how to measure the performance of the Puppet master infrastructure in order to determine when it’s time to add more capacity. We also provide two small scripts to avoid the “thundering herd effect” and to measure catalog compilation time.
First, though, we need review some of the challenges you’ll be facing along the way.
Earlier in the book, you learned a bit about Puppet’s client-server configuration and the use of SSL to secure connections between the agent and the master. Puppet uses SSL, specifically the HTTPS protocol, to communicate. As a result, when we’re scaling Puppet we are in fact scaling a web service, and many of the problems (and the solutions) overlap with traditional web scaling. Consequently, the two challenges we’re going to need to address when scaling Puppet are:
The first challenge requires that we increase the performance and potential number of possible master and agent connections. The second challenge requires that we implement good management of the SSL certificates that secure the connection between the master and the agent. Both challenges require changes to Puppet’s “out-of-the-box” configuration.
In
Chapter 1
we started the Puppet Master using thepuppet master
command. The defaultpuppet master
configuration makes use of the WEBRick Ruby-based HTTP server. Puppet ships WEBRick to eliminate the need to set up a web server like Apache to handle the HTTPS requests out of the box. While the WEBrick server provides quick and easy testing, it does not provide a scalable solution and should not be used except to evaluate, test and develop Puppet. In production situations, a more robust web server such as Apache or Nginx is necessary to handle the number of client requests.
Therefore, the first order of business when scaling Puppet is to replace the default WEBRick HTTP server. In the following section, we first replace WEBrick with the Apache web server on a single Puppet master system and then show how this strategy can be extended to multiple Puppet master systems working behind a load balancer.
The second change to Puppet’s out-of-the-box configuration is the management of the SSL certificates that Puppet uses to secure the connection between agent and master. The Puppet master stores a copy of every certificate issued, along with a revocation list. This information needs to be kept in sync across the Puppet worker nodes. So, together with the transport mechanism between the agent and master, we’ll explore the two main options of handling SSL certificates in a scalable Puppet deployment:
The first scaling example we’re going to demonstrate is the combination of the Apache web server with a module called Phusion Passenger, which is also known as mod_rails, mod_passenger, or just Passenger. Passenger is an Apache module that allows the embedding of Ruby applications, much like mod_php or mod_perl allow the embedding of PHP and Perl applications. The Passenger module is not a standard module that ships with Apache web server, and as a result, must be installed separately. Passenger is available as a Ruby gem package, or may be downloaded and installed fromhttp://www.modrails.com/
.
For networks of one to two thousand Puppet managed nodes, a single Puppet master system running inside of Apache with Passenger is often sufficient. Later in this chapter, we examine how to run multiple Puppet master systems if you want a highly available system or support for an even larger number of Puppet-managed nodes. These more complex configurations all build on the basic Apache and Passenger configuration we introduce to you. We also build upon the Puppet master configuration we created in
Chapter 2
and the environment structure we introduced in
Chapter 3
.
First, you need to install Apache and Passenger, then configure Apache to handle the SSL authentication and verification of the Puppet agent, and finally connect Apache to the Puppet master and ensure everything is working as expected.
As we scale Puppet up, it is important to draw the distinction between the idea of a front-end HTTP request handler and a back-end Puppet master worker process. The front-end request handler is responsible for accepting the TCP connection from the Puppet agent, selecting an appropriate back-end worker, routing the request to the worker, accepting the response and finally serving it back to the Puppet agent. This distinction between a front-end request handler and a back-end worker process is a common concept when scaling web services.
To get started, you need to install Apache and Passenger. Apache and Passenger are a relatively simple and easy to set up. Pre-compiled Passenger packages may not be available for your platform, however, making configuration a little more complex. This section covers the installation of Apache and Passenger on the Enterprise Linux family of systems such as CentOS, RedHat Enterprise Linux, and Oracle Enterprise Linux.
In
Listing 4-1
, we’ve used thepuppet resource
command to ensure that Apache and the Apache SSL libraries are installed. We’ve also ensured that the Apache service is not currently running. The next step is to obtain Passenger, which is implemented as an Apache loadable module, similar to mod_ssl or mod_perl.
Listing 4-1.
Installing Apache on Enterprise Linux
# puppet resource package httpd ensure=present
notice: /Package[httpd]/ensure: created
package { 'httpd':
ensure => '2.2.3-43.el5.centos'
}
# puppet resource package mod_ssl ensure=present
notice: /Package[mod_ssl]/ensure: created
package { 'mod_ssl':
ensure => '2.2.3-43.el5.centos'
}
# puppet resource service httpd ensure=stopped
notice: /Service[httpd]/ensure: ensure changed 'running' to 'stopped'
service { 'httpd':
ensure => 'running'
}
In order to install Passenger on our Enterprise Linux system, configureyum
to access a local yum repository with packages for Puppet andrubygem_passenger
. An example of the yum repository configuration for the x86_64 architecture is:
root:~ # yum list rubygem-passenger
Available Packages
rubygem-passenger.x86_64 2.2.11-3.el5 localyum
We’ve verified that therubygem-passenger
package is now available on this system, so we’re able to install the package usingpuppet resource
, as shown in
Listing 4-2
.
Listing 4-2.
Installing Phusion Passenger on Enterprise Linux
# puppet resource package rubygem-passenger ensure=present
notice: /Package[rubygem-passenger]/ensure: created
package { 'rubygem-passenger':
ensure => '2.2.11-3.el5'
}
At the time of writing, Passenger packages are available in Debian 5, “Lenny.” The packages available in the stable repository have known issues, however, and we recommend installing version 2.2.11 of Passenger from the backports package repository.
DEBIAN BACKPORTS
Debian backports provide the means to install packages that are available in the testing and unstable branch in a stable system. The packages are designed to link against libraries provided in Debian stable to minimize compatibility issues. More information about using Debian backports is available at
http://backports.debian.org/
Installing Apache on Debian is very straightforward (see
Listing 4-3
). The packages available in the stable release of the Debian operating system work extremely well with Puppet. Please ensure you’ve
enabled Debian backports as per the instructions at http://backports.debian.org/ before attempting to install the passenger package.
Listing 4-3.
Installing Apache and Passenger on Debian / Ubuntu
# puppet resource package apache2 ensure=present
notice: /Package[apache2]/ensure: created
package { apache2:
ensure => '2.2.9-10+lenny8’
}
# puppet resource package libapache2-mod-passenger ensure=present
notice: /Package[libapache2-mod-passenger]/ensure: created
package { libapache2-mod-passenger:
ensure => ‘2.2.11debian-1~bpo50+1’
}
As an alternative to the Puppet resource commands shown in
Listing 4-3
, Passenger may be installed from Debian backports using the commandaptitude -t lenny-backports install libapache2-mod-passener
.
Compiled binary packages of Passenger 2.2.11 are available for some platforms, but not all. Ruby Gems provide an alternative way to install the Passenger module. The passenger gem behaves slightly differently from most binary packages; the source code for Passenger is installed using the Gem format complete with a shell script to assist in the compilation of the Apache module.
For this installation method to succeed, the Apache development packages for your platform will need to be installed and present. The Passenger build scripts will link the library using the available version of Apache development libraries (
Listing 4-4
).
Listing 4-4.
Installing Passenger using Rubygems
# gem install rack -v 1.1.0
# gem install passenger -v 2.2.11
# passenger-install-apache2-module
The output of the passenger-install-apache2-module script is quite long and has been truncated. For additional information and troubleshooting tips related to installing Passenger using Ruby Gems please see:
http://www.modrails.com/install.html
Tip
Up-to-date information about Passenger versions known to work with Puppet is available online at:
http://projects.puppetlabs.com/projects/1/wiki/Using_Passenger
If you haven’t already done so, make sure you’ve started the Puppet master at least once to create the SSL certificates you’re going to configure Apache to use. Apache will then verify that the Puppet agent certificate is signed with the generated Puppet CA, and present a certificate that the Puppet agent uses to verify the authenticity of the server. Once you have you SSL certificates in place, configure Apache by enabling the Passenger module and creating an Apache virtual host for the Puppet master service.
First, enable mod_passenger with the following configuration provided in
Listing 4-5
.
Listing 4-5.
The Apache Passenger configuration file
# /etc/httpd/conf.d/10_passenger.conf
# The passenger module path should match ruby gem version
LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-2.2.11/ext/apache2/mod_passenger.so
PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-2.2.11
PassengerRuby /usr/bin/ruby
# Recommended Passenger Configuration
PassengerHighPerformance on
PassengerUseGlobalQueue on
# PassengerMaxPoolSize control number of application instances,
# typically 1.5x the number of processor cores.
PassengerMaxPoolSize 6
# Restart ruby process after handling specific number of request to resolve MRI memory leak.
PassengerMaxRequests 4000
# Shutdown idle Passenger instances after 30 min.
PassengerPoolIdleTime 1800
# End of /etc/httpd/conf.d/10_passenger.conf