lets see if i can make this!

I’ve laid out some basic things for now

Obviously a subnet (

A firewall ( which will be installed first and then cloned for at least another system to save time

A combined dns/dhcp/cobbler/web server that will provide the fastest boostrap time for cobbler. I’m not delighted with this as it means I dont have easy integration with my underlying xen host and no good plan for migrating to a proper structured network with separation of install/infrastructure/client networks.

A iSCSI subnet in there, potentially vlan-based but rather another „virtual“ lan

A Filer Emu box (ontap 7.3)

A dynamips box

In a moment I’ll start the install for the firewall, but I’ll hold my horses till I got a script to provision the actual storage

For the sake of easyness this will ALL be based on CentOS for the infrastructre and my basic server will come with 5GB disk, additional 15GB to the filer emu box  and 20GB for the cobbler install server)

now, lets see: firstofall a new bridge needs to be build using my old XenDistro  bridge script ‚create_bridge‘ which builds actually WORKING bridges, other than all this new virsh crap.

i’d paste this here if opera didnt suck balls right now

ok the script ended up with some little bugs, wasting about 25 minutes and in the end i pasted it into my shell. doesnt really matter. the firewall is just doing it’s first boot, for it’s install i attached it to the „production“ lan via xenbr0. Other little mistakes was

  • selecting the wrong radio button when trying to disable DHCPv6. My own clumsyness, why isn’t my v6 stuff up and running fine anyway!
  • using the old device name of ‚xvda‘ which worked with rhel52 beta but doesnt work with centos52. yet another time you get actually punished for knowing how things used to be before everyone else figured them ;p I switched to sda/sdb and this should do now.
  • it came down to a stupid typo: using phy=vgxen/lvname instead of phy:vgxen/lvname :>

I disabled all package groups in the installer, short of spending a few hours on shrinking via .ks files this is the best i can do to get a lean install. for the gui-affectionate other hosts a few yum groupinstalls will be needed (eeek) and by then i’ll have my local mirror to speed this up.

The installation took about 3 minutes, main factors are

  • separate disks for the install repo (a loopback mounted dvd iso image
  • enough ram on server and client
  • ftp protocol

another glitch: lacking a sophisticated kickstart file both disk images ( os disk and swap ) ended up in VolGroup00 and got fully allocated – thus I can’t even pvmove the swap volume out. I’ll waste some time on this RIGHT now and disable the swap to see if i can fix this real quick.

swapoff, lvremove the lv (1GB in size was too much anyway), lvcreate -l 15 -n LogVol01 VolGroup00 xvdb1  and mkswap it and swapon -a and we’re there. Why the fuzz? because this allows me to use different physical disks on the host for swap and os data.

This is basically _it_., and now i’ll have to decide whether to really copy the lvs or the use the .ks file. I think i’ll play safe and use the .ks file as this ensures I won’t get in trouble due to duplicate UUIDs for various bits and pieces.

So lets copy off the kickstart file to a random box with an http server and install the new dhcp & stuffs host.

Now this was a big failure I didnt convice the install kernel to actually read the kickstart file, I can just wonder as to why. It does have the –preload xennet ramdisk so this should just work.  Anyway, i gave in and fired off another install but I suppose i’ll have to recheck this later on. SOOO frustrating. This time I simply excluded the second disk device to conserve space for the later swap.

One more note: Always grab the autogenerated mac address in the interface configuration dialog.

Now the service host got a 3rd 20GB disk device dubbed lv<hostname>data so i can assign this a raid1 storage if needed and I attached it over to xenbr6, the lab network. Time to reduce the firewall’s ram size, add another vif on bridge6 and start routing.

I hand-edited ifcfg-eth0 to bootproto static with a new ip address, added ifcfg-eth1 with the correct mac address, added that to the xen config and enabled routing in sysctl.conf.

at the very start I tried to use system-config-something as it came up from first boot, but it turned out too annoying so i used vi instead 🙂

On my  production side firewall I added a route to the test network. Per se the main firewall would drop traffic there, BUT with some luck the icmp redirects will do the trick anyway.

Big mistake, now a dhcp timeout gets in the way,  I should have changed eth0’s config on the service host on time,

Ok, fixed the configuration and the new box is reachable. Time to prep volumes REAL fast now. I added it to VolGroup01 after doing fdisk & pvcreate and  created a 10GB lv to hold the mirror for the centos updates. No doubt I’m running out of time now, maybe I should rename this post 😉


I got stuck in firewall issues and will chose sleep over reaching my goals now 🙂

The next day I solved the firewall mess when I figured there was no reason to put the lab firewall into my normal lan. Now it’s in parallel to my internal firewall and only allows incoming access from the internal lan to the firewall and a few services. Well some pieces dont properly work (I default deny, the fixes won’t even add to 10 minutes of thinking but right now I rather stop it for a short moment and restart after doing what I had to)

Cobbler worked like a charm after reapplying the security contexts and after finally remembering how to it actually worked – I used it in the 0.4x.x days and hadn’t gotten around to it again. My first successful install kept looping, I’ll have to look for both the option to turn that of as well as a better xen config handling. But Xen issues aren’t in the heart of this doc, nor my firewall troubles.

Other than that the big next things are:

  • properly mirroring  -updates,
  • double using the cobbler centos mirror as a local repo
  • fine tuning of kickstart, cfengine integration
  • noting filer emu options for silent install

I suppose a big head start would be doing this during the daytime, but right now insomnia has it’s full grasp on me and I’m not expecting to be *awake* during that time.

Above this line there’s notes and experiences, luckily I got a lot further by now


Below ths line there’s a more thorough manual, no details but up to date with what I have running


  • centos52 dvd media
  • xen dom0, disk space (minimum 50GB)
  • one ip in your production net unless you have a two-firewall setup

optimized order of steps:

  1. use script to make storage for firewall domU
  2. manually ks install firewall domU
  3. set up firewall, routing, nat
  4. storage for cobbler domU
  5. manually ks install cobbler host domU
  6. install dhcp, named, tftp, xinetd, cobbler, koan, python-cheetah
  7. open firewall ports
  8. cobbler check, configure dhcp
  9. set up 20GB filesystem for /var/www and relabel selinux attrs
  10. cobbler  import and setup  &
  11. tweak kickstart config (base os storage, cfgengine?)
  12. cobbler add host netapp emulator & others
  13. create storage for netapp emu host
  14. create storage for dynamips host
  15. create storage for clients
  16. create yum updates repo and include in profiles
  17. cobbler install netapp emu, dynamips host, clients (i had to supply ks config in xen cfg)
  18. set up 15GB filesystem for netapp sim
  19. configure netapp sim
  20. configure http (+licence), nfs exports & iscsi targets in netapp sim
  21. install dynamips, dynagen, pemu rpms on dynamips host, enable them, open firewall ports

missing steps:

  • ldap+kerberos for user auth, including cobbler ui and filer
  • ldap+kerberos early config
  • filer rc scripts + screen
  • filer fw ports
  • cfengine early config
  • pxe boot menus, one_time dhcp option to avoid install loop
  • assigning mac’s uuid ip’s automatically and before domU building and putting them in dns etc


So far i’m 9 hours into this effort, and the lab is about 80% up. I’ll strive to polish edges (i.e. why manually setup the filer if i can have a cfengine snippet) and I doubt I’ll give in until this whole process is scripted well enough to work within the set time limit.

In a few weeks I’ll have to deploy this, short of the Xen bit for an in-house VxVM workshop. By then this should be a reliable bunch of scripts and save me a lot of grey hair. On the other hand I’m in this job long enough to know that noone besides other people that have done a few kickstarts already will figure I’ve been quite fast. So in all honesty – here and now I’ll make this work as fast as possible, and then I’ll sit back for two days and tidy things so my boss at least *grasps* that building a full lab takes a little effort.

You actually have to work as slow as the rest or the rest won’t even get you’re working hard.


Installation / Konfiguration von MRTG

Zeitaufwand: 30 Minuten

Einfach mal so zum nachbauen – kleine Anleitung zum Monitoren eines ganzen Netzwerks inklusive?

Mit nmap alle Hosts auf denen SNMP lauscht, ermitteln

yum -y install net-snmp mrtg nmap

nmap -sU -p 161

je gefundenen Host per SNMP die abfragbaren Instanzen und generelle Systeminfos ermitteln

cfgmaker –global „WorkDir: /home/httpd/html/mrtg/“ –output=/var/lib/mrtg/

cfgmaker –global „WorkDir: /home/httpd/html/mrtg/“ –output=/var/lib/mrtg/

cfgmaker –global „WorkDir: /home/httpd/html/mrtg/“ –output=/var/lib/mrtg/

je Host ersten Lauf von mrtg ausfuehren

/usr/bin/mrtg /var/lib/mrtg/

/usr/bin/mrtg /var/lib/mrtg/

/usr/bin/mrtg /var/lib/mrtg/

je Host eine HTML-Seite zu den Ergebnissen aufbauen

indexmaker –output=/home/httpd/html/mrtg/

indexmaker –output=/home/httpd/html/mrtg/

indexmaker –output=/home/httpd/html/mrtg/

eine Gesamtuebersicht erstellen

indexmaker –output=/home/httpd/html/mrtg/index.html

Informationen & vielfaeltigere Tools:

Installation / Konfiguration von NTOP

Zeitbedarf: 0.5-2 Stunden

NTOP ist ein erstklassiges Tool zum Generieren von Statistiken zum Routerdurchsatz oder aehnlichem Krams.

Fuer Fedora 7 gibt’s noch kein fertiges rpm, das ist eine gute Gelegenheit, mal ein wenig Software zu installieren 🙂

Lade NTOP bei Sourceforge runter:

cd /usr/local/src

wget ‚http://surfnet.dl.sourceforge.net/sourceforge/ntop/ntop-3.3.tar.gz&#8216;

tar -xzvf ntop-3.3.tar.gz

cd ntop-3.3

Mittels ./autogen kann ein Compileversuch durchgefuehrt werden, dieser wird mit Fehlern abbrechen, solange keine Buildchain und Headerfiles installiert sind.

Die Fehlermeldungen sind aber aussagekraeftig genug, um jedes fehlende Paket binnen 20 Sekunden zu ergoogeln. Tipp: ./config.log enthaelt oft genauere Fehlerangaben.

Nachdem alle fehlende Software mittels yum installiert wurde, kann man dann mit ‚make‘ den eigentlichen compile anstarten und die Software mit ‚make instal‘ aus dem Arbeitsverzeichnis ins System installieren.

Der Start wird mit ntop -u ntop -w 8080 durchgefuehrt (User ntop, Embedded http-Server auf Port 8080)

Bei Interesse kann ich auch noch einen Netflow-Stream von meinem Router liefern.


Alle Tests in domU und dom0 machen, die Tests funktionieren auch alle auf nem normalen Linuxrechner und in Virtualisierungsdingern. Nen Tag brauchst Du aber auch fuer banalere Tests.


  • Atkueller Durchsatz ist mit sar -d 20 1 gut zu sehen
  • hdparm -tT /dev/hda
  • dd if=/dev/hda of=/dev/null bs=1024k count=4096 (damit man nicht nur den Diskcache testet)
  • Rohplattendurchsatz eigentlich nur via rawdevices sichbar
  • FreeSBIE booten, dd if=/dev/rda0 of=/dev/null bs=1024 count=4096


  • dom0 <-> domU
  • dom0 <-> non-Xen-Server
  • domU <-> domU
  • dom0 Rechner mit Standardkernel (knoppix, etc) <-> non-Xen-Server

Standardtest I/O+CPU:

Ein Shellscript mit folgenden Kommandos erstellen (braucht 200+ MB in /tmp):

  • mkfs builddisk # filesystem kannste Dir aussuchen :p
  • mount -o noatime builddisk /usr/local/src
  • cd /usr/local/src
  • tar -xjf /tmp/gcc-4.0.x-src.tar.bz
  • mkdir gcc-build
  • cd gcc-build
  • export CC=“ccache gcc“
  • ../gcc-4.x…./configure –prefix=/tmp/gcc-crashtest –with-irgendwas –without-irgendwas
  • make -j 16

Testen, wenn es laeuft, gut so 🙂

Als Benchmark folgendermassen zu benutzen:

  • rebooten
  • time script.sh

Falls Dir die Laufzeiten zu hoch sind:

  • weniger gcc-module mitbauen
  • anstatt /tmp eine Ramdisk von 256MB benutzen, die gcc-source kann ruhig weiter in /tmp liegen. Das sieht natuerlich auch besser aus, wenn da steht „Building temporary Ramdisk“ 🙂

Gut kaeme noch eine Runde Web- und Javabenchmarks, aber erfahrunggemaess sind die ein wenig zeitaufwaendig in der Konfiguration, also ein extra-Tag.

 Weitere sinnvolle Tests:

  • Zeit vom Bootprompt des Hostrechners bis zur Verfuegbarkeit eines virtualisierten Webservers via http
  • Zeit zum Starten eines virtualisierten Rechners

Howto for a fairly scalable cluster running bacula that can be expanded from 1 to 16 nodes online and will provide loadbalancing among the nodes. How much loadbalancing is something that the next few days will show 🙂

edit: as you can see this howto has come to be 3 years old by now. if I update / rewrite it, this would be more towards the end of the year. some guideance would be to just switch heartbeat/crm to „that other OSS“ cluster software.

The Howto will not cover PKI / TLS settings as I have not used them so far. Contribution would be highly welcome, the only constraint being not to use TLS where it’s not needed (i.e. traffic via interconnect network)

Help wanted:

I seek for people to verify the crucial parts by people experienced on larger setups of the following

  • heartbeat2+CRM
  • ocfs2
  • bacula
  • mysql
  • chrooting lighttpd including static bconsole for bweb

Parts that have been checked will have a note saying so.

I bed your pardon for the flawed formatting, this will be fixed once I know the feed and caring of wordpress. right now it doesn’t give a damn about my line feeds.
Test environment:

  • 2 Xen domUs running Fedora Core 6
  • 1 Xen domU providing kickstart install services
  • 150GB of sharable storage (Xen vbd’s with shared write access enable using the ‚w!‘ flag)
  • multiple test clients
  • low-spec hardware configuration (192MB Ram per domU on single-cpu host) that will easily find bottlenecks
  • 1 public lan, 3 private lans

Cluster running:

  • Database(s)
  • bacula-dir
  • bacula-fd (1 for the clustered storage, 1 for each node)
  • bacula-sd (1 per 4 storage devices, loadbalanced among nodes)
  • Reporting Webserver (lighttpd)
  • PHP Reporting Site (bweb)
  • separate ssh daemons per application to avoid host key mess and administrative errors

Maximum number of nodes: 16 (Limit for both ocfs2 and possibly heartbeat2. Also a limit of sanity)

Current status:

  • mysql is talking ip and only bound to the interconnect lan
  • bacula is talking ip to to mysql
  • i locked myself and bacula out of mysql in the process
  • heartbeat is running with CRM, the packages (groups and ressources in linux terms) are mostly rebuilt, I’m currently fixing the cluster config with some help from the linux-ha list.
  • Hostnames unmessed
  • vip assignments done

The howto only covers Filestorage devices, tapes are of course also possible but require either an iSCSI-bridge (sucks) or appropriate zoning.

I will cover the setup / configuration of:

  • Operating System (here: Fedora Core 6) including appropriate kickstart configuration for setup of cluster nodes
  • Bacula Version 2.0.23
  • Underlying Filesystem OCFS2
  • Application Clustering via heartbeat 2 + CRM
  • mysql database
  • required ports and firewall adjustment (Table)
  • Bacula Configuration for backup of cluster & nodes

Notes – This will migrate into full text later on:

found no good rules for using rpm in a clustered environment so far. either have binaries on each host, or in the clustered storage. There are potential issues with upgrades either way, also running the same application node-local and on cluster seems to be messy in conjunction with both heartbeat and rpm. It would appear noone thought about it.

Currently using fully static environment:

  • Fedora ssl libraries are linked with kerberos but seems to lack proper includes,-lkrb5 fixed lighttpd, but not bacula
  • mysql offers a download of „MySQL standard edition (static)“ which is statically linked and very hassle free, except it delivers a broken mysql.a, which is known since 2004. Bug closed by them as unfixable -yeah, right
  • building static mysql from source gets You a working mysql.a
  • easiest solution seems rebuilding ssl w/o krb5
  • whole static linking stuff is nofunatall, needs a dedicated build host
  • disadvantage: needs a lot of thought put into directory layout
  • advantage: good directory layout. heh.
  • on the other hand more reliable and unfortunately a cleaner solution than i.e. an extra /var/db/rpm on the clusterstorage.

NOTE: I’m definitely looking for advice here, but if You don’t understand the issue with rpm and clusters, please don’t waste Your time.

Configure commands

Bacula, bacula-dir still dynamic 😦

./configure –prefix=/mnt/new/bacula/apps/bacula –enable-static-cons –enable-static-fd –enable-ipv6 –with-mysql=/mnt/new/bacula/apps/mysql –with-openssl –with-dir-user=bacula –with-dir-group=bacula –with-sd-user=bacula –with-sd-group=disk –with-fd-user=root –with-fd-group=bacula && make


export CC=“ccache gcc“ ;./configure –prefix=/mnt/new/bacula/apps/lighthttpd –with-mysql=/mnt/new/bacula/apps/mysql –with-openssl

I just noticed I linked lighttpd non-static too. I think, that’s it and I’ll just rebuild everything non-static. haha, still such a great experience.

OS Installation:

[ kickstart ]

Next step is disabling NFS and few other services active in the base install we won’t need. Do not disable the netfs ’service‘, it’s part of the OCFS2 transport layer.

for svc in portmap nfs nfslock autofs ; do service $svc stop ; chkconfig $svc off ; done

for svc in xfs xinetd cups isdn rpcidmapd ; do service $svc stop ; chkconfig $svc off ; done

[ installation of ocfs2 ]

Installation and configuration of node-local Bacula client:

This has to be done on all cluster nodes [so prepare a damn script! 🙂 ]

Aquire the most current bacula-client rpm; I used the one from rpms-contrib-fschwarz for FC6; it’s downloadable at http://switch.dl.sourceforge.net/sourceforge/bacula/bacula-client-2.0.2-1.fc6.i386.rpm

but PLEASE if You read this after march 2007 check for a current version at http://sourceforge.net/project/showfiles.php?group_id=50727

anywy, it’s just installing the rpm and restricting it to the local ip (probably eth0’s). The clusters bacula fd on the other hand will bind to the virtual ip assigned to it.

rpm -ivH bacula-client-2.0.2-1.fc6.i386.rpm

vi /etc/bacula/bacula-fd.conf to add the FDAddress Directive

FileDaemon { # this is me
Name = node01

FDAddress = IP Address of node01 on the public lan

FDport = 9102 # where we listen for the director
WorkingDirectory = /var/bacula
Pid Directory = /var/run
Maximum Concurrent Jobs = 20

Configuration of clustered bacula-fd:

Currently started by abusing the bacula-ctl-fd script. This will changeas the script is far from being OCF compliant.

WorkingDirectory = /bacula/var/bacula

Pid Directory = /bacula/var/run

Set to bind to director vip using FDAddress

Cluster application Dependency order:

  1. Package of VIP for database, database+clustered bacula-fd+sshd on which both
  2. Package of VIP for bacula-dir, bacula-dir+sshd and Packages of bacula-sd’s depend on which
  3. Package of VIP for webserver, webserver+sshd depends
  • All packages can run on all nodes
  • Optionally it seems ocfs2 could be integrated with heartbeat2, I can’t yet tell if there is any advantage in that.
  • Preference nodes should only be used if there is no statement to auto-spread the applications.
  • in that case, give different preferences for the bacula-sds and the database and weight director/webserver against those. (oh… wait, that’s opposite of the startup order. 🙂

Lan setup:

  • Public Lan – this is where You access the clusters frontend-services for management (bconsole, ssh, https) or backup purposes (bacula-dir, bacula-sd)
  • Bacula Interconnect – the director uses this to connect to the storage daemons and the database; the webfrontend uses this to connect to the director and optionally database.
  • OCFS2 Interconnect – for OCFS2 metadata updates
  • Heartbeat – for OCFS2 and heartbeat2′ s unencrypted heartbeats.

Heartbeats are also done via the other networks, encrypted if possible,no backups are pushed through the private lans. it’s tempting but doesn’t make too much sense (would require another medium-bandwidth lan and extra configuration.)

[ configuration section ]

[ introduction to firewall settings ]

List of Firewall ports and applications:

  • 22 sshd
  • 694 ha-cluster
  • 3306 mysqld
  • 5560 i don’t remember 😦
  • 7777 ocfs2 cluster communication
  • 9101 bacula-dir
  • 9102 bacula-sd
  • 9103 bacula-fd
  • 443 https

Matrix of firewall access lists:

  • 22 management hosts to all vips
  • 443 management hosts to vip of lighttpd
  • 694 all cluster nodes to all cluster nodes
  • 5560 management hosts to all cluster nodes (NOT! sure)
  • 7777 all cluster nodes to all cluster nodes
  • 9101 all bacula clients and all clusternodes to vip running bacula-dir
  • 9102 all bacula clients to all vips for a bacula-sd
  • 9103 allow access from vip running bacula-dir on all nodes
  • mysql bacula-dir and lighttpd via have access here

This isn’t properly laid out yet, some communication needs only be allowed via the interconnect lan. once this is corrected I’ll add copy’n’paste rules.

MySQL access configuration:

We ensure no public lan-side access is allowed:

  1. Firewall rules only allow access to the database via the interconnect lan from director vip and webserver vip
  2. mysql will be set to only listen on the interconnect lan
  3. the host table will allow only the interconnect lan via address & netmask
  4. There’s no defaultgw on the interconnect lan

For performance reasons and for stability we disable hostname resolution we set the following in/etc/my.cnf:


This could be an annoyance if you need to change IPs. On the other hand there’s not too many reasons for switching the interconnect IPs, and if You’re switching them You can refer to the table [follows] for a list of things to update, which will include the mysql host table. We will assign IPs for a maximum configuration right from the start, so no trouble there.

OCFS2 Configuration:

[root@domU-bacula2 ~]# /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets (‚[]‘). Hitting
<ENTER> without typing an answer will keep that current value. Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [y]:

Cluster to start on boot (Enter „none“ to clear) [backup]: bc00
Specify heartbeat dead threshold (>=7) [31]:
Writing O2CB configuration: OK
Starting O2CB cluster bc00: OK

Create filesystems:

The fdisk space allocations are ONLY suitable for my test setup. As far as I’m aware there is now way of resizing ocfs2 filesystems, so chose the space allocations wisely, and use extra spindles where one should, i.e. if You have 10*163GB disks it would be good to use nine of them to let each hold one disk storage 🙂

# 2GB Partition for Applications
mkfs.ocfs2 -b 2k -C 32K -L „/bacula“ -N 16 /dev/sdX

# 10 GB Partition for database
mkfs.ocfs2 -b 4k -C 32K -L „/baculadb“ -N 1 /dev/sdX

# 50 GB Partition
mkfs.ocfs2 -b 4k -C 128K -L „/baculasd01“ -N 1 /dev/sdX

# 50GB Partition
mkfs.ocfs2 -b 4k -C 128K -L „/baculasd02“ -N 1 /dev/sdX

Heartbeat Bringup and Configuration:

make the following changes in /etc/ha.d

Generate an authkey for the cluster, this must manually be distributed to the other nodes, preserving permissions.

echo „auth 1“ > authkeys
echo „1 sha1 `dd if=/dev/urandom count=4 2>/dev/null | openssl dgst -sha1`“ >> authkeys
chmod 400 authkeys

create a ha.cf file, with debugging enabled, and logging to syslog, you DO run splunk for your site logs, right?
The heartbeat settings (ucast …) are crap and only work for my testing purposes, do NOT use them in production. They need to be replaced with one unencrypted mcast on eth1 and one encrypted on eth0.

logfacility local5
debug 1
traditional_compression false
crm on
auto_failback off
autojoin any
ucast eth0
ucast eth0
ucast eth0
ucast eth0

Generate a heartbeat2 cib.xml from a throwaway haresources file.

touch /etc/haressources
rm /etc/haresources

This is what my Ressource setup currently looks like

# crm_resource -L

Resource Group: bc00-access
ip_intercon_access (heartbeat::ocf:IPaddr)
ip_pub_access (heartbeat::ocf:IPaddr)
app_httpd (lsb:httpd)
Resource Group: bc00-dir
ip_intercon_dir (heartbeat::ocf:IPaddr)
app_bacula_dir (lsb:bacula-ctl-dir)
ip_pub_dir (heartbeat::ocf:IPaddr)
Resource Group: bc00-db00
app_mysql00 (lsb:mysql)
ip_intercon_mysql00 (heartbeat::ocf:IPaddr)
Missing stuff (all quite ugly for me due to the lack of insight in heartbeat):

  • use IPAddr2 instead of IPAddr
  • integrate ocfs2 mount/umount of the diskstorage with sd
  • integrate ocfs2 mount/umount of the database directory with mysql
  • generate dependencies both inside the groups (so the ip’s come before firewall rules and applications) and
  • dependencies between resource groups, so that i.e. the director comes up later than the database and sd‘, but before the webserver
  • node weighting
  • ensuring heartbeat handles the things as supposed, i.e.
  • automatic startup of things
  • failing over from busy hosts as nodes start up

Actually I think this is ugly for everyone that’s no coder and lacks the deep enthusiams for messing with large xml files for trivial issues, one can add resources either via the haclient GUI or using cibadmin, piping in XML statements. That’s when I launched a vncserver and clicked at things.

How the failing over stuff goes with things like running backups remains to be seen. Failing over a director running 500 jobs would suck, but i.e. a SD failover would be tolerable (director resumes the session). Thus the director ought to be rather sticky, and so the mysql, while the rest could hop around and an extension of the bc00-dir resources for one that uses bconsole to issue a restart for jobs aborted at the last stop comes into mind.

Database setup:

We will set up a MySQL database node called bc-db00, the supplied /etc/hosts file also contains an entry for a second server so you can bring in another database node if it needs be.

[ setup mysql ]

[ Reconfigure mysql networking, turn off socket, turn on listening on vip ]

[ set /etc/my.conf so mysql/mysqladmin use ip ]

Take care that the database is generated in /bacula/db instead of /bacula/apps/mysql/var/db

[ drop anonymous access, drop test tables, passwordless root ][ create root@ with password ‚mysqlroot_password‘ ALL permissions ]

[ create root@ with password ‚mysqlrroot_password‘ startup/shutdown ]

[ the script that brings mysql up – / down needs to know this password. ]
Bacula Database Access:


we need to mondify the grant_mysql_privileges script a bit:

use mysql; grant all privileges on bacula.* to bacula@ identified by ‚bacula_password‘ ; flush-privileges;

A corresponding entry in bacula-dir.conf tells bacula the database credentials and access information, we need to adjust it to include hostname and password.

# Generic catalog service
Catalog {
Name = MyCatalog
DB Address =
DB Port = 3306
dbname = bacula; user = bacula; password = „bacula_password“

Additional Catalogs can be added very easy, simply copy the above statement using a different value for Name. Bacula handles the rest just fine.

bweb Database Access:

The IP used here is the one assigned to the reporting webserver on the interconnect lan.

GRANT SELECT ON bacula.* TO ‚bweb’@’‘ IDENTIFIED BY ‚bweb_password‘;
GRANT INSERT,UPDATE,DELETE ON bacula.Location TO ‚bweb’@’‘ IDENTIFIED BY ‚bweb_password‘;
GRANT INSERT,UPDATE,DELETE ON bacula.LocationLog TO ‚bweb’@’‘ IDENTIFIED BY ‚bweb_password‘;
GRANT UPDATE (LocationId,Comment,RecyclePoolId) ON bacula.Media TO ‚bweb’@’‘ IDENTIFIED BY ‚bweb_password‘;

Next is adding the restricted bconsole for use by bweb, this means you’ll have two bconsoles, an umodified one for shell usage and a restricted one for use by the web frontend. The restricted one simply uses a different access key.

Install perl modules for bweb:

perl-Class-DBI-mysql perl-HTML-Template perl-CGI-Simple perl-GD perl-GDGraph perl-Expect perl-HTML-Template perl-Time-modules

bweb connection configuration:

Currently this won’t connect via LAN and also most people won’t find verdana.ttf on their hosts. The bweb_bconsole.conf will be added shortly, moving it to a path in the webserver root. (It is possible to use a static bconsole using –enable-static-console if you want to chroot the http daemon)

$VAR1 = { template_dir => “/usr/share/bweb/tpl” };$VAR1 = bless( {‘graph_font’ => ‘/usr/share/fonts/verdana.ttf’, ‘name’ => undef,
‚config_file‘ => ‚/etc/bacula/bweb.conf‘
‚bconsole‘ => ‚/usr/sbin/bconsole -n -c /etc/bacula/bweb_bconsole.conf‘,
‚fv_write_path‘ => ‚/dev/null‘,
‚password‘ => ‚bweb_password‘,
‚template_dir‘ => ‚/usr/share/bweb/tpl‘,
‚dbi‘ => ‚DBI:mysql:database=bacula;host=bc00-db00‘, # connection string not correct yet, lacks hostname
‚error‘ => “,
‚debug‘ => 0,
‚user‘ => ‚bacula‘,
‚email_media‘ => ‚root@bc00-dir‘
}, ‚Bweb::Config‘ );


A) List of things to change for IP migrations

  • /etc/hosts
  • heartbeat CRM virtual IP assignments :p
  • Firewall rulesets
  • mysql host table; I used phpMyAdmin, and I yet need to find out the tables name (only if change includes interconnect lan)
  • OCFS2 configuration /etc/ocfs2/cluster.conf (only if change includes interconnect lan)

B) Links for downloadable configuration files – Passwords removed, Plan a good hour for assigning passwords 🙂

  • /etc/hosts – http://paste.uni.cc/13301
  • /etc/sysconfig/iptables
  • kickstart config
  • bacula-sd.conf
  • bacula-dir.conf
  • bacula-fd.conf
  • /etc/my.cnf
  • /etc/ha.d … /var/lib/crm …
  • lighttpd config
  • bweb config

C) Possibly a few scripts to automate the most timeconsuming tasks

  • script for reassigning passwords.
  • script for configuring the fd’s.
  • buildscript
  • script that takes a devicename as an argument and creates a ready-to-use ocfs2 filesystem on it. Rather I will just put the command here so I’m not liable/feeling guilty if someone shoots himself in the leg.



Jeder, der denkt, oder rumlaeuft und erzaehlt, dass das hier in Active/Active Cluster sei:


Bitte: Shoot Yourself In The Head.

Ein Cluster mit zwei Nodes, die aktiv sind, macht noch lange keinen active/active Cluster. Hoert auf, den Sandkasten als Hotel zu verkaufen.



Mittels Kickstart koennen Systeme automatisiert und schnell neu installiert werden, je nach Installationsumfang und Systemleistung ist eine Installation zwischen 5 und 30 Minuten abgeschlossen. Die Installation besteht aus mehreren Teilen:

  • Laden eines Standardkernels (hier mit Xensupport, aus /images/xen/)  und einer initrd mit Installationsanweisungen und Parser fuer das Kickstart-Configfile.
  • Laden des Kickstartfiles von der angegebenen Adresse und Parsen des Installationstyps und des zu verwendenden Configfiles.
  • Initialisieren der Installation, Laden einer zweiten Ramdisk mit dem Rest der benoetigten Umgebung (alles einschliesslich Minimal-X-Server)
  • Normale Installation gemaess den Einstellungen im Kickstartfile.

Kickstartfiles koennen entweder mit system-config-kickstart oder durch Ueberarbeiten von /root/anakonda.ks erstellt werden.

Nachdem die Dateien fertig sind, wuerde ich empfehlen sie auf ll-kicks unter /var/www/html/ abzulegen, dort finden sich auch andere Kickstartfiles (moeglichst nicht ueberschreiben).

ll-kicks:/var/www/html per NFS zu exportieren, ist wohl eine naheliegende Uebung.

Die Adressen der Sourcerepositories sind wie folgt:

Diese sind auch von externen Rechnern erreichbar, dazu muss durch http://wartungsfenster.dyndns.org ersetzt werden.

Fuer den professionellen Einsatz ( also 10+ Rechner… ) gibt’s dann Cobbler und Koan unter http://cobbler.et.redhat.com/

Wiki / Howto dazu: http://wiki.xdroop.com/space/RedHat/kickstart/Cobbler