Pro Puppet (29 page)

Authors: Jeffrey McCune James Turnbull

BOOK: Pro Puppet

4.51Mb size Format: txt, pdf, ePub

Measuring Performance

Catalog retrieval time is the primary measure of how one or more Puppet masters are performing. Catalog compilation is a very I/O-, CPU- and memory-intensive process. All of the imported manifests must be located and read from the file system, and CPU and memory are used to parse and compile the
catalog. In order to measure this process, you can use a simple
curlscript to periodically obtain a compiled catalog. If the command takes longer than is normal for the environment, there is a strong indication additional capacity should be added to the Puppet master infrastructure.

Using the un-encrypted Puppet master back-end workers configured when setting up the Apache load balancer, you can write a small script to measure the catalog compilation time of the node
test.example.com.

To do this, you need to know the four components of a catalog request:

The URI containing the environment, catalog, and node to obtain a catalog from
The SSL authentication headers
A list of facts and their values
A header telling the Puppet master what encoding formats the client accepts

All of this information is available in the Apache access logs (see
Listing 4-27
). The list of facts is easily obtained by running the Puppet agent normally, then inspecting the HTTP access logs and copying the URL into a script.

Listing 4-27.
Curl URL based on Apache access logs

# tail balancer_access.log
127.0.0.1 - - [05/Dec/2010:05:41:41 -0800] "GET \
/production/catalog/test.example.lan?facts_format=b64_zlib_yaml&facts=eNqdVVt…
HTTP/1.1" 200 944 "-" "-"

The path following the
GETverb contains
/production/catalog/test.example.lan. This indicates a catalog request for the host
test.example.lanfrom the production environment. The query portion of the URL contains two pieces of information: the format of the facts listing, and the listing of facts itself. These pieces of information are encoded in the
facts_formatand
factsquery parameters of the URL.

To construct the full URL, prefix the URL from
Listing 4-28
with
http://127.0.0.1:18141, the address of the Apache worker virtual host. The command the operator uses to measure catalog compilation time is:

Listing 4-28.
Curl catalog request command

$ time curl -v -H "Accept: pson, yaml" \
   -H "X-Client-DN: /CN=test.example.com" \
   -H "X-Client-Verify: SUCCESS"  \
'http://127.0.0.1:18141/production/catalog/test.example.com?facts=…&facts_format=b64_zlib_yaml

Placing this command in a script and executing it on the Puppet master worker nodes allows us to know when catalog compilation time grows beyond normal thresholds.

Splay Time

Related to catalog compilation time, Puppet agent processes sometimes present a thundering herd problem when all systems have their clocks synchronized and are configured to run from the
crondaemon at a specific time. The catalog compilation process is quite processor–intensive, and if the
Puppet master receives too many requests in a short period of time, the systems may start to thrash and degrade in performance.

When running a Puppet agent out of
cron, we recommend introducing a small random splay time to ensure that all of the Puppet agent nodes do not request their configuration catalog at exactly the same moment. The Example.com operator follows this recommendation and uses the Puppet agent wrapper script shown in
Listing 4-29
when executing the Puppet agent out of
cron.

Listing 4-29.
Bash script to splay Puppet agents

#! /bin/bash
set -e
set -u
sleep $((RANDOM % 300))
exec puppet agent --no-daemonize --onetime

The
sleepcommand in this shell script causes a delay between zero and five minutes. With hundreds of Puppet agent managed nodes, this random delay will ensure incoming requests to the Puppet Mater workers are spread out over a short window of time.

Summary

In this chapter, you’ve configured the Puppet master infrastructure in a number of ways. Specifically, you configured the Apache web server as a reverse HTTPS proxy to handle the SSL verification and authentication of incoming Puppet agent managed nodes. Once authenticated, the Apache system behaves as a HTTP load balancer, distributing requests automatically to some number of back-end Puppet master worker virtual hosts.

In addition, we showed you how to handle incoming certificate requests in a special manner, forwarding all certificate requests to a single Puppet CA worker process with a hot standby ready and waiting for redundancy. The consolidation of certificate requests to a single Puppet CA worker mitigates the overhead and problems associated with keeping the Puppet CA certificate revocation list, serial numbers, and index synchronized across workers.

In addition to HTTP load balancing, distributing incoming requests using DNS round robin is a viable alternative when using the
--ca_serverPuppet agent configuration option. Similar to the HTTP load-balancing configuration, the
ca_serveroption allows the operator to consolidate certificate requests onto a single worker system and alleviates the issues managing and synchronizing the certificate authority database files.

Finally, you learned how to measure the catalog compilation time of the Puppet master workers and use splay time to avoid overwhelming the Puppet masters.

Resources

Using Passenger -
http://projects.puppetlabs.com/projects/1/wiki/Using_Passenger
Apache Configuration Reference -
http://httpd.apache.org/docs/2.2/
Apache Mod Proxy Balancer -
http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html
DNS Round Robin -
http://en.wikipedia.org/wiki/Round_robin_DNS
Puppet REST API -
http://docs.puppetlabs.com/guides/rest_api.html

C H A P T E R 5

Externalizing Puppet Configuration

In
Chapter 2
we talked about the ways that you could define your hosts or nodes to Puppet. We talked about specifying them in a variety of forms as node statements in your Puppet manifest files. We also mentioned that Puppet has the capability to store node information in external sources. This avoids the need to specify large numbers of nodes manually in your manifests files, a solution which is time-consuming and not scalable.

Puppet has two ways to store node information externally:

External Node Classification
LDAP server classification

The first capability is called External Node Classification (ENC). ENC is a script-based integration system that Puppet queries for node data. The script returns classes, inheritance, variables and environment configuration that Puppet can then use to define a node and configure your hosts.

Tip
External node classifiers are also one of the means by which tools like the Puppet Dashboard and Foreman can be integrated into Puppet and provide node information, as you will see in
Chapter 7
.

The second capability allows you to query Lightweight Directory Access Protocol (LDAP) directories for node information. This integration is used less often than ENCs, but it is especially useful because you can specify an existing LDAP directory, for example your asset management database or an LDAP DNS back end, for your node data.

Using external node classification, either via an ENC or via LDAP, is the recommended way to scale your Puppet implementation to cater for large volumes of hosts. Most of the multi-thousand node sites using Puppet, for example Google and Zynga, make use of external node classification systems to allow them to deal with the large number of nodes. Rather than managing files containing hundreds, thousands or even tens of thousands of node statements, you can use this:

node mail.example.com { … }
node web.example.com { … }
node db.example.com { … }
…

This allows you to specify a single source of node information and make quick and easy changes to that information without needing to edit files.

In this chapter, we discuss both approaches to storing node information in external sources. First we look at creating an external node classifier, and we provide some simple examples of these for you to model your own on; then we demonstrate the use of the LDAP node classifier.

External Node Classification

Writing an ENC is very simple. An ENC is merely a script that takes a node name, for example
mail.example.com, and then returns the node's configuration in the form of YAML data. YAML or Yet Another Markup Language (
http://www.yaml.org/) is a serialization language used in a variety of programming languages. YAML is human-friendly, meaning it's structured and is designed to be easy for humans to read. It is often used as a configuration file format; for example, the database configuration file used in Ruby on Rails applications, database.yml, is a YAML file.

Let's look at some simple YAML examples to get an idea for how it works. YAML is expressed in a hash where structure is important. Let's start by specifying a list of items:

---
- foo
- bar
- baz
- qux

The start of a YAML document is identified with three dashes, “---“. Every ENC needs to return these three dashes as the start of its output. We've then got a list of items preceded by dashes.

We can also express the concept of assigning a value to an item, for example:

---
foo: bar

Here we've added our three dashes and then expressed that the value of item “foo” is “bar.” We can also express grouped collections of items (which we're going to use heavily in our ENCs):

---
foo:
 - bar
baz:
 - qux

We've again started with our three dashes and then specified the names of the lists we're creating: foo and baz. Inside each list are the list items, again preceded with a dash, but this time indented one space to indicate their membership of the list.

This indentation is very important. For the YAML to be valid, it must be structured correctly. This can sometimes be a real challenge but there are some tools you can use to structure suitable YAML. For example, VIM syntax highlighting will recognize YAML (if the file you're editing has a .yml or .yaml extension) or you can use the excellent Online YAML Parser to confirm the YAML you're generating is valid:
http://yaml-online-parser.appspot.com/.

But before we generate our first YAML node, we need to configure Puppet to use an external node classifier instead of our file-based node configuration.

Note
You can see a more complete example of structured YAML at
http://www.yaml.org/start.html.

Configuring Nodes Using An External Node Classifier

To use external nodes, we first need to tell Puppet to use a classifier to configure our nodes rather than use node definitions. We do this by specifying the
node_terminusoption and the name and location of our classifier in the
[master](or
[puppetmasterd]in pre-2.6.0 versions) section of the
puppet.confconfiguration file on our Puppet master. You can see this in
Listing 5-1
, where we've specified a classifier called
puppet_node_classifierlocated in the
/usr/bindirectory.

Listing 5-1.
The
external_nodesconfiguration option

[master]
node_terminus = exec
external_nodes = /usr/bin/puppet_node_classifier

The
node_terminusconfiguration option is used to configure Puppet for node sources other than the default flat file manifests. The
execoption tells Puppet to use an external node classifier script.

A classifier can be written in any language, for example shell script, Ruby, Perl, Python, or a variety of other languages. The only requirement is that the language can output the appropriate YAML data. For example, you could also easily add a database back end to a classifier that queries a database for the relevant hostname and returns the associated classes and any variables.

Following are some example node classifiers written in different languages.

Note
You can have nodes specified both in Puppet manifests and external node classifiers. For this to work correctly, though, your ENC must return an empty YAML hash.

An External Node Classifier in a Shell Script

In
Listing 5-2
, you can see a very simple node classifier, the
puppet_node_classifierscript we specified in
Listing 5-1
. This classifier is written in shell script.

Listing 5-2.
Simple Node Classifier

#!/bin/sh
cat <<"END"
---
classes:
  - base
parameters:
  puppetserver: puppet.example.com
END
exit 0

The script in
Listing 5-2
will return the same classes and variables each time it is called irrelevant of what hostname is passed to the script.

$ puppet_node_classifier web.example.com

Will return:

---
classes:
  - base
parameters:
  puppetserver: puppet.example.com

The
classesblock holds a list of the classes that belong to this node, and the
parametersblock contains a list of the variables that this node specifies. In this case, the node includes the
baseclass and has a variable called
$puppetserverwith a value of
puppet.example.com.

Puppet will use this data to construct a node definition as if we'd defined a
nodestatement. That node statement would look like
Listing 5-3
.

Listing 5-3.
Node definition from
Listing 5-2
's classifier

node web.example.com {
       $puppetserver = 'puppet.example.com'
       include base
}

This is the simplest ENC that we can devise. Let's look at some more complex variations of this script that can return different results depending on the particular node name being passed to the classifier, in the same way different nodes would be configured with different classes, definitions, and variables in your manifest files.

Tip
Any parameters specified in your ENC will be available as top-scope variables.

A Ruby External Node Classifier

Let's look at another example of an ENC, this time specifying a list of hosts or returning an empty YAML hash if the host is not found. This ENC is written in Ruby, and you can see it in
Listing 5-4
.

Listing 5-4.
Ruby node classifier

#!/usr/bin/env ruby
require 'yaml'
node = ARGV[0]
default = { 'classes' => []}
unless node =~ /(^\S+)\.(\S+\.\S+)$/

  print default.to_yaml
  exit 0
end
hostname = $1
base = { 'environment' => 'production',
         'parameters' => {
                    'puppetserver' => 'puppet.example.com'
         },
         'classes' => [ 'base' ],
       }
case hostname
  when /^web?\w+$/
     web = { 'classes' => 'apache' }
     base['classes'] << web['classes']
     puts YAML.dump(base)
  when /^db?\w+$/
     db = { 'classes' => 'mysql' }
     base['classes'] << db['classes']
     puts YAML.dump(base)
  when /^mail?\w+$/
     mail = { 'classes' => 'postfix' }
     base['classes'] << mail['classes']
     puts YAML.dump(base)
  else
    print default.to_yaml
end
exit 0

Other books

KC Frantzen - May the K9 Spy 03 - May Leads the Way: Trouble Near Tofino by KC Frantzen

An Unexpected Encounter ( Half Moon House, Novella 1) by Deb Marlowe

Totentanz by Al Sarrantonio

Doctor Who: Earthshock by Ian Marter

How the Days of Love and Diphtheria by Robert Kloss

The Taken by Inger Ash Wolfe

Bewitched in Budapest (Xcite Romance) by Elyot, Justine

Children of the Underground by Trevor Shane

GHOST: An Evil Dead MC Story (The Evil Dead MC Series Book 5) by Nicole James

Fakers by Meg Collett