Bacula Cluster Howto

22Feb07

Howto for a fairly scalable cluster running bacula that can be expanded from 1 to 16 nodes online and will provide loadbalancing among the nodes. How much loadbalancing is something that the next few days will show🙂

edit: as you can see this howto has come to be 3 years old by now. if I update / rewrite it, this would be more towards the end of the year. some guideance would be to just switch heartbeat/crm to „that other OSS“ cluster software.

The Howto will not cover PKI / TLS settings as I have not used them so far. Contribution would be highly welcome, the only constraint being not to use TLS where it’s not needed (i.e. traffic via interconnect network)

Help wanted:

I seek for people to verify the crucial parts by people experienced on larger setups of the following

  • heartbeat2+CRM
  • ocfs2
  • bacula
  • mysql
  • chrooting lighttpd including static bconsole for bweb

Parts that have been checked will have a note saying so.

I bed your pardon for the flawed formatting, this will be fixed once I know the feed and caring of wordpress. right now it doesn’t give a damn about my line feeds.
Test environment:

  • 2 Xen domUs running Fedora Core 6
  • 1 Xen domU providing kickstart install services
  • 150GB of sharable storage (Xen vbd’s with shared write access enable using the ‚w!‘ flag)
  • multiple test clients
  • low-spec hardware configuration (192MB Ram per domU on single-cpu host) that will easily find bottlenecks
  • 1 public lan, 3 private lans

Cluster running:

  • Database(s)
  • bacula-dir
  • bacula-fd (1 for the clustered storage, 1 for each node)
  • bacula-sd (1 per 4 storage devices, loadbalanced among nodes)
  • Reporting Webserver (lighttpd)
  • PHP Reporting Site (bweb)
  • separate ssh daemons per application to avoid host key mess and administrative errors

Maximum number of nodes: 16 (Limit for both ocfs2 and possibly heartbeat2. Also a limit of sanity)

Current status:

  • mysql is talking ip and only bound to the interconnect lan
  • bacula is talking ip to to mysql
  • i locked myself and bacula out of mysql in the process
  • heartbeat is running with CRM, the packages (groups and ressources in linux terms) are mostly rebuilt, I’m currently fixing the cluster config with some help from the linux-ha list.
  • Hostnames unmessed
  • vip assignments done

The howto only covers Filestorage devices, tapes are of course also possible but require either an iSCSI-bridge (sucks) or appropriate zoning.

I will cover the setup / configuration of:

  • Operating System (here: Fedora Core 6) including appropriate kickstart configuration for setup of cluster nodes
  • Bacula Version 2.0.23
  • Underlying Filesystem OCFS2
  • Application Clustering via heartbeat 2 + CRM
  • mysql database
  • required ports and firewall adjustment (Table)
  • Bacula Configuration for backup of cluster & nodes

Notes – This will migrate into full text later on:

found no good rules for using rpm in a clustered environment so far. either have binaries on each host, or in the clustered storage. There are potential issues with upgrades either way, also running the same application node-local and on cluster seems to be messy in conjunction with both heartbeat and rpm. It would appear noone thought about it.

Currently using fully static environment:

  • Fedora ssl libraries are linked with kerberos but seems to lack proper includes,-lkrb5 fixed lighttpd, but not bacula
  • mysql offers a download of „MySQL standard edition (static)“ which is statically linked and very hassle free, except it delivers a broken mysql.a, which is known since 2004. Bug closed by them as unfixable -yeah, right
  • building static mysql from source gets You a working mysql.a
  • easiest solution seems rebuilding ssl w/o krb5
  • whole static linking stuff is nofunatall, needs a dedicated build host
  • disadvantage: needs a lot of thought put into directory layout
  • advantage: good directory layout. heh.
  • on the other hand more reliable and unfortunately a cleaner solution than i.e. an extra /var/db/rpm on the clusterstorage.

NOTE: I’m definitely looking for advice here, but if You don’t understand the issue with rpm and clusters, please don’t waste Your time.

Configure commands

Bacula, bacula-dir still dynamic😦

./configure –prefix=/mnt/new/bacula/apps/bacula –enable-static-cons –enable-static-fd –enable-ipv6 –with-mysql=/mnt/new/bacula/apps/mysql –with-openssl –with-dir-user=bacula –with-dir-group=bacula –with-sd-user=bacula –with-sd-group=disk –with-fd-user=root –with-fd-group=bacula && make

Lighttpd:

export CC=“ccache gcc“ ;./configure –prefix=/mnt/new/bacula/apps/lighthttpd –with-mysql=/mnt/new/bacula/apps/mysql –with-openssl

I just noticed I linked lighttpd non-static too. I think, that’s it and I’ll just rebuild everything non-static. haha, still such a great experience.

OS Installation:

[ kickstart ]

Next step is disabling NFS and few other services active in the base install we won’t need. Do not disable the netfs ’service‘, it’s part of the OCFS2 transport layer.

for svc in portmap nfs nfslock autofs ; do service $svc stop ; chkconfig $svc off ; done

for svc in xfs xinetd cups isdn rpcidmapd ; do service $svc stop ; chkconfig $svc off ; done

[ installation of ocfs2 ]

Installation and configuration of node-local Bacula client:

This has to be done on all cluster nodes [so prepare a damn script!🙂 ]

Aquire the most current bacula-client rpm; I used the one from rpms-contrib-fschwarz for FC6; it’s downloadable at http://switch.dl.sourceforge.net/sourceforge/bacula/bacula-client-2.0.2-1.fc6.i386.rpm

but PLEASE if You read this after march 2007 check for a current version at http://sourceforge.net/project/showfiles.php?group_id=50727

anywy, it’s just installing the rpm and restricting it to the local ip (probably eth0’s). The clusters bacula fd on the other hand will bind to the virtual ip assigned to it.

rpm -ivH bacula-client-2.0.2-1.fc6.i386.rpm

vi /etc/bacula/bacula-fd.conf to add the FDAddress Directive

FileDaemon { # this is me
Name = node01

FDAddress = IP Address of node01 on the public lan

FDport = 9102 # where we listen for the director
WorkingDirectory = /var/bacula
Pid Directory = /var/run
Maximum Concurrent Jobs = 20
}

Configuration of clustered bacula-fd:

Currently started by abusing the bacula-ctl-fd script. This will changeas the script is far from being OCF compliant.

WorkingDirectory = /bacula/var/bacula

Pid Directory = /bacula/var/run

Set to bind to director vip using FDAddress

Cluster application Dependency order:

  1. Package of VIP for database, database+clustered bacula-fd+sshd on which both
  2. Package of VIP for bacula-dir, bacula-dir+sshd and Packages of bacula-sd’s depend on which
  3. Package of VIP for webserver, webserver+sshd depends
  • All packages can run on all nodes
  • Optionally it seems ocfs2 could be integrated with heartbeat2, I can’t yet tell if there is any advantage in that.
  • Preference nodes should only be used if there is no statement to auto-spread the applications.
  • in that case, give different preferences for the bacula-sds and the database and weight director/webserver against those. (oh… wait, that’s opposite of the startup order.🙂

Lan setup:

  • Public Lan – this is where You access the clusters frontend-services for management (bconsole, ssh, https) or backup purposes (bacula-dir, bacula-sd)
  • Bacula Interconnect – the director uses this to connect to the storage daemons and the database; the webfrontend uses this to connect to the director and optionally database.
  • OCFS2 Interconnect – for OCFS2 metadata updates
  • Heartbeat – for OCFS2 and heartbeat2′ s unencrypted heartbeats.

Heartbeats are also done via the other networks, encrypted if possible,no backups are pushed through the private lans. it’s tempting but doesn’t make too much sense (would require another medium-bandwidth lan and extra configuration.)

[ configuration section ]

[ introduction to firewall settings ]

List of Firewall ports and applications:

  • 22 sshd
  • 694 ha-cluster
  • 3306 mysqld
  • 5560 i don’t remember😦
  • 7777 ocfs2 cluster communication
  • 9101 bacula-dir
  • 9102 bacula-sd
  • 9103 bacula-fd
  • 443 https

Matrix of firewall access lists:

  • 22 management hosts to all vips
  • 443 management hosts to vip of lighttpd
  • 694 all cluster nodes to all cluster nodes
  • 5560 management hosts to all cluster nodes (NOT! sure)
  • 7777 all cluster nodes to all cluster nodes
  • 9101 all bacula clients and all clusternodes to vip running bacula-dir
  • 9102 all bacula clients to all vips for a bacula-sd
  • 9103 allow access from vip running bacula-dir on all nodes
  • mysql bacula-dir and lighttpd via have access here

This isn’t properly laid out yet, some communication needs only be allowed via the interconnect lan. once this is corrected I’ll add copy’n’paste rules.

MySQL access configuration:

We ensure no public lan-side access is allowed:

  1. Firewall rules only allow access to the database via the interconnect lan from director vip and webserver vip
  2. mysql will be set to only listen on the interconnect lan
  3. the host table will allow only the interconnect lan via address & netmask
  4. There’s no defaultgw on the interconnect lan

For performance reasons and for stability we disable hostname resolution we set the following in/etc/my.cnf:

[mysqld]
skip-name-resolve

This could be an annoyance if you need to change IPs. On the other hand there’s not too many reasons for switching the interconnect IPs, and if You’re switching them You can refer to the table [follows] for a list of things to update, which will include the mysql host table. We will assign IPs for a maximum configuration right from the start, so no trouble there.

OCFS2 Configuration:

[root@domU-bacula2 ~]# /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets (‚[]‘). Hitting
<ENTER> without typing an answer will keep that current value. Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [y]:

Cluster to start on boot (Enter „none“ to clear) [backup]: bc00
Specify heartbeat dead threshold (>=7) [31]:
Writing O2CB configuration: OK
Starting O2CB cluster bc00: OK

Create filesystems:

The fdisk space allocations are ONLY suitable for my test setup. As far as I’m aware there is now way of resizing ocfs2 filesystems, so chose the space allocations wisely, and use extra spindles where one should, i.e. if You have 10*163GB disks it would be good to use nine of them to let each hold one disk storage🙂

# 2GB Partition for Applications
mkfs.ocfs2 -b 2k -C 32K -L „/bacula“ -N 16 /dev/sdX

# 10 GB Partition for database
mkfs.ocfs2 -b 4k -C 32K -L „/baculadb“ -N 1 /dev/sdX

# 50 GB Partition
mkfs.ocfs2 -b 4k -C 128K -L „/baculasd01“ -N 1 /dev/sdX

# 50GB Partition
mkfs.ocfs2 -b 4k -C 128K -L „/baculasd02“ -N 1 /dev/sdX

Heartbeat Bringup and Configuration:

make the following changes in /etc/ha.d

Generate an authkey for the cluster, this must manually be distributed to the other nodes, preserving permissions.

echo „auth 1“ > authkeys
echo „1 sha1 `dd if=/dev/urandom count=4 2>/dev/null | openssl dgst -sha1`“ >> authkeys
chmod 400 authkeys

create a ha.cf file, with debugging enabled, and logging to syslog, you DO run splunk for your site logs, right?
The heartbeat settings (ucast …) are crap and only work for my testing purposes, do NOT use them in production. They need to be replaced with one unencrypted mcast on eth1 and one encrypted on eth0.

logfacility local5
debug 1
traditional_compression false
crm on
auto_failback off
autojoin any
ucast eth0 192.168.250.1
ucast eth0 192.168.10.193
ucast eth0 192.168.250.2
ucast eth0 192.168.10.194

Generate a heartbeat2 cib.xml from a throwaway haresources file.

touch /etc/haressources
/usr/lib/heartbeat/haresources2cib.py
rm /etc/haresources

This is what my Ressource setup currently looks like

# crm_resource -L

Resource Group: bc00-access
ip_intercon_access (heartbeat::ocf:IPaddr)
ip_pub_access (heartbeat::ocf:IPaddr)
app_httpd (lsb:httpd)
Resource Group: bc00-dir
ip_intercon_dir (heartbeat::ocf:IPaddr)
app_bacula_dir (lsb:bacula-ctl-dir)
ip_pub_dir (heartbeat::ocf:IPaddr)
Resource Group: bc00-db00
app_mysql00 (lsb:mysql)
ip_intercon_mysql00 (heartbeat::ocf:IPaddr)
Missing stuff (all quite ugly for me due to the lack of insight in heartbeat):

  • use IPAddr2 instead of IPAddr
  • integrate ocfs2 mount/umount of the diskstorage with sd
  • integrate ocfs2 mount/umount of the database directory with mysql
  • generate dependencies both inside the groups (so the ip’s come before firewall rules and applications) and
  • dependencies between resource groups, so that i.e. the director comes up later than the database and sd‘, but before the webserver
  • node weighting
  • ensuring heartbeat handles the things as supposed, i.e.
  • automatic startup of things
  • failing over from busy hosts as nodes start up

Actually I think this is ugly for everyone that’s no coder and lacks the deep enthusiams for messing with large xml files for trivial issues, one can add resources either via the haclient GUI or using cibadmin, piping in XML statements. That’s when I launched a vncserver and clicked at things.

How the failing over stuff goes with things like running backups remains to be seen. Failing over a director running 500 jobs would suck, but i.e. a SD failover would be tolerable (director resumes the session). Thus the director ought to be rather sticky, and so the mysql, while the rest could hop around and an extension of the bc00-dir resources for one that uses bconsole to issue a restart for jobs aborted at the last stop comes into mind.

Database setup:

We will set up a MySQL database node called bc-db00, the supplied /etc/hosts file also contains an entry for a second server so you can bring in another database node if it needs be.

[ setup mysql ]

[ Reconfigure mysql networking, turn off socket, turn on listening on vip ]

[ set /etc/my.conf so mysql/mysqladmin use ip ]

Take care that the database is generated in /bacula/db instead of /bacula/apps/mysql/var/db

[ drop anonymous access, drop test tables, passwordless root ][ create root@127.0.0.1 with password ‚mysqlroot_password‘ ALL permissions ]

[ create root@127.0.0.1 with password ‚mysqlrroot_password‘ startup/shutdown ]

[ the script that brings mysql up – / down needs to know this password. ]
Bacula Database Access:

create_mysql_database
make_mysql_tables

we need to mondify the grant_mysql_privileges script a bit:

use mysql; grant all privileges on bacula.* to bacula@192.168.202.196 identified by ‚bacula_password‘ ; flush-privileges;

A corresponding entry in bacula-dir.conf tells bacula the database credentials and access information, we need to adjust it to include hostname and password.

# Generic catalog service
Catalog {
Name = MyCatalog
DB Address = 192.168.202.198
DB Port = 3306
dbname = bacula; user = bacula; password = „bacula_password“
}

Additional Catalogs can be added very easy, simply copy the above statement using a different value for Name. Bacula handles the rest just fine.

bweb Database Access:

The IP used here is the one assigned to the reporting webserver on the interconnect lan.

GRANT SELECT ON bacula.* TO ‚bweb’@’192.168.202.197‘ IDENTIFIED BY ‚bweb_password‘;
GRANT INSERT,UPDATE,DELETE ON bacula.Location TO ‚bweb’@’192.168.202.197‘ IDENTIFIED BY ‚bweb_password‘;
GRANT INSERT,UPDATE,DELETE ON bacula.LocationLog TO ‚bweb’@’192.168.202.197‘ IDENTIFIED BY ‚bweb_password‘;
GRANT UPDATE (LocationId,Comment,RecyclePoolId) ON bacula.Media TO ‚bweb’@’192.168.202.197‘ IDENTIFIED BY ‚bweb_password‘;

Next is adding the restricted bconsole for use by bweb, this means you’ll have two bconsoles, an umodified one for shell usage and a restricted one for use by the web frontend. The restricted one simply uses a different access key.

Install perl modules for bweb:

perl-Class-DBI-mysql perl-HTML-Template perl-CGI-Simple perl-GD perl-GDGraph perl-Expect perl-HTML-Template perl-Time-modules

bweb connection configuration:

Currently this won’t connect via LAN and also most people won’t find verdana.ttf on their hosts. The bweb_bconsole.conf will be added shortly, moving it to a path in the webserver root. (It is possible to use a static bconsole using –enable-static-console if you want to chroot the http daemon)

$VAR1 = { template_dir => “/usr/share/bweb/tpl” };$VAR1 = bless( {‘graph_font’ => ‘/usr/share/fonts/verdana.ttf’, ‘name’ => undef,
‚config_file‘ => ‚/etc/bacula/bweb.conf‘
‚bconsole‘ => ‚/usr/sbin/bconsole -n -c /etc/bacula/bweb_bconsole.conf‘,
‚fv_write_path‘ => ‚/dev/null‘,
‚password‘ => ‚bweb_password‘,
‚template_dir‘ => ‚/usr/share/bweb/tpl‘,
‚dbi‘ => ‚DBI:mysql:database=bacula;host=bc00-db00‘, # connection string not correct yet, lacks hostname
‚error‘ => “,
‚debug‘ => 0,
‚user‘ => ‚bacula‘,
‚email_media‘ => ‚root@bc00-dir‘
}, ‚Bweb::Config‘ );

Appendix:

A) List of things to change for IP migrations

  • /etc/hosts
  • heartbeat CRM virtual IP assignments :p
  • Firewall rulesets
  • mysql host table; I used phpMyAdmin, and I yet need to find out the tables name (only if change includes interconnect lan)
  • OCFS2 configuration /etc/ocfs2/cluster.conf (only if change includes interconnect lan)

B) Links for downloadable configuration files – Passwords removed, Plan a good hour for assigning passwords🙂

  • /etc/hosts – http://paste.uni.cc/13301
  • /etc/sysconfig/iptables
  • kickstart config
  • bacula-sd.conf
  • bacula-dir.conf
  • bacula-fd.conf
  • /etc/my.cnf
  • /etc/ha.d … /var/lib/crm …
  • lighttpd config
  • bweb config

C) Possibly a few scripts to automate the most timeconsuming tasks

  • script for reassigning passwords.
  • script for configuring the fd’s.
  • buildscript
  • script that takes a devicename as an argument and creates a ready-to-use ocfs2 filesystem on it. Rather I will just put the command here so I’m not liable/feeling guilty if someone shoots himself in the leg.


6 Responses to “Bacula Cluster Howto”

  1. 1 Rodrigo Gregori

    Hallo!

    I know that it’s an old post, but maybe you can help me with your experience … I’m trying to run bweb on a chrooted apache2 environment. After some tweaking I’ve managed to make it run, but I still have an error when it tries to launch a „pty“ when trying to run bconsole ….

    I had a look in the code but couldn’t find what library/module/access it would need in the chroot environment…

    do you have any tips?

    thanks in advance!
    Rodrigo

    • 2 darkfader

      Hi,

      it’s been too far since I’ve been away from it. With some luck I’ll have to refreshen it this your.

      Now, for finding the library needed, just use „ldd“ or write to bacula-devel.🙂

      I’m glad to see i’m not the only one who wanted to have a clean, safe setup.

  2. You really make it seem really easy along with your presentation but I to find this topic to be really something that I believe I would never understand. It sort of feels too complicated and very large for me. I’m having a look ahead in your next publish, I’ll attempt to get the cling of it!


  1. 1 Bacula Konferenz « deranfangvomen.de
  2. 2 scoprire Anonimo su ask
  3. 3 vps

Schreibe einen Kommentar

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s


%d Bloggern gefällt das: