[Linuxtrent] heartbeat e qualosa che mi sfugge

  • From: Mario Vittorio Guenzi <jclark@xxxxxxxxxx>
  • To: linuxtrent <linuxtrent@xxxxxxxxxxxxx>
  • Date: Tue, 03 Jun 2008 10:53:26 +0200

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Buongiorno a tutti,
ho preparato un cluster con heartbeat e drbd di cui posto le conf di
seguito.
drbd.conf
global {

usage-count no;
}

common {
syncer { rate 100M; }
}


resource r0 {


protocol C;

handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ;
/etc/init.d/heartbeat stop";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD
Alert' root";
split-brain "echo split-brain. drbdadm -- --discard-my-data connect
$DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";
}

startup {

wfc-timeout 60;


degr-wfc-timeout 120; # 2 minutes.


}

disk {

on-io-error detach;


}

net {
rr-conflict disconnect;

after-sb-0pri discard-younger-primary;
after-sb-1pri consensus;
after-sb-2pri disconnect;
}

syncer {

rate 100M;


al-extents 257;
}
## cache
on alessandra {
device /dev/drbd0;
disk /dev/hda10;
address 192.168.1.245:7788;
meta-disk internal;

}

on lalla {
device /dev/drbd0;
disk /dev/hda9;
address 192.168.1.15:7788;
meta-disk internal;
}
}


resource "r1" {
protocol C;

startup {
wfc-timeout 60; ## Infinite!
degr-wfc-timeout 120; ## 2 minutes.
}

disk {
on-io-error detach;
}
net {

}
syncer { rate 100M;
}

on alessandra {
device /dev/drbd1;
disk /dev/hda11;
address 192.168.1.245:7789;
meta-disk internal;
}

on lalla {
device /dev/drbd1;
disk /dev/hda10;
address 192.168.1.15:7789;
meta-disk internal;
}
}

ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
udpport 694
ucast eth1 192.168.2.246
auto_failback on
node alessandra
node lalla

autkeys
auth 2
2 sha1 unabellapasswordlungaecomplicata

haresources

alessandra IPaddr::192.168.2.247/24/eth1 \
drbddisk::r0 Filesystem::/dev/drbd0::/cache::ext3::defaults \
drbddisk::r1 Filesystem::/dev/drbd1::/jumper::ext3::defaults \
bind9 samba winbind bla bla bla \
MailTo::jclark@xxxxxxxxxx::Transizione_alessandra_lalla

uguali sulle due macchine a meno del parametro ucast che e'
chiaramente su alessandra l'ip di eth1 di lalla e su lalla quello di
alessandra.
i drbdx sono in /etc/fstab con l'opzione noauto.
Se spengo fisicamente il master (alessandra) o tiro giu' a manina
heartbeat lalla immediamente fa il take over e si prende tutto il
malloppo senza dire ne a ne b
il drbd switcha di ruolo senza nessun problema, idem al riavvio o all
restart di heartbeat.
Pero'...
se Io simulo un guasto di rete (I.E. uno switch che va arrosto)
unpluggando la scheda di rete eth1 su alessandra mi appare nei log di
lalla questa cosa

lalla:/# May 30 09:32:16 lalla heartbeat: [3018]: WARN: node
alessandra: is dead
May 30 09:32:16 lalla heartbeat: [3018]: WARN: No STONITH device
configured.
May 30 09:32:16 lalla heartbeat: [3018]: WARN: Shared disks are not
protected.
May 30 09:32:16 lalla heartbeat: [3018]: info: Resources being
acquired from alessandra.
May 30 09:32:16 lalla heartbeat: [3018]: info: Link alessandra:eth1 dead.
May 30 09:32:16 lalla heartbeat: [8063]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
May 30 09:32:16 lalla harc[8063]: info: Running /etc/ha.d/rc.d/status
status
May 30 09:32:16 lalla heartbeat: [8064]: info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys lalla] to acquire.
May 30 09:32:16 lalla heartbeat: [3018]: debug:
StartNextRemoteRscReq(): child count 1
May 30 09:32:16 lalla mach_down[8092]: info: Taking over resource
group IPaddr::192.168.2.247/24/eth1
May 30 09:32:16 lalla ResourceManager[8118]: info: Acquiring resource
group: alessandra IPaddr::192.168.2.247/24/eth1 drbddisk::r0
Filesystem::/dev/drbd0::/cache::ext3::defaults drbddisk::r1
Filesystem::/dev/drbd1::/jumper::ext3::defaults bind9
MailTo::mario.guenzi@xxxxxxxxxx::Transizione_lalla_alessandra
May 30 09:32:16 lalla IPaddr[8145]: INFO: Resource is stopped
May 30 09:32:16 lalla ResourceManager[8118]: info: Running
/etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start
May 30 09:32:16 lalla ResourceManager[8118]: debug: Starting
/etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start
May 30 09:32:17 lalla IPaddr[8243]: INFO: Using calculated netmask for
192.168.2.247: 255.255.255.0
May 30 09:32:17 lalla IPaddr[8243]: DEBUG: Using calculated broadcast
for 192.168.2.247: 192.168.2.255
May 30 09:32:17 lalla IPaddr[8243]: INFO: eval ifconfig eth1:0
192.168.2.247 netmask 255.255.255.0 broadcast 192.168.2.255
May 30 09:32:17 lalla IPaddr[8243]: DEBUG: Sending Gratuitous Arp for
192.168.2.247 on eth1:0 [eth1]
May 30 09:32:17 lalla IPaddr[8214]: INFO: Success
May 30 09:32:17 lalla ResourceManager[8118]: debug:
/etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 start done. RC=0
May 30 09:32:17 lalla ResourceManager[8118]: info: Running
/etc/ha.d/resource.d/drbddisk r0 start
May 30 09:32:17 lalla ResourceManager[8118]: debug: Starting
/etc/ha.d/resource.d/drbddisk r0 start
May 30 09:32:29 lalla ResourceManager[8118]: debug:
/etc/ha.d/resource.d/drbddisk r0 start done. RC=1
May 30 09:32:29 lalla ResourceManager[8118]: ERROR: Return code 1 from
/etc/ha.d/resource.d/drbddisk
May 30 09:32:29 lalla ResourceManager[8118]: CRIT: Giving up resources
due to failure of drbddisk::r0
May 30 09:32:29 lalla ResourceManager[8118]: info: Releasing resource
group: alessandra IPaddr::192.168.2.247/24/eth1 drbddisk::r0
Filesystem::/dev/drbd0::/cache::ext3::defaults drbddisk::r1
Filesystem::/dev/drbd1::/jumper::ext3::defaults bind9
MailTo::mario.guenzi@xxxxxxxxxx::Transizione_lalla_alessandra
May 30 09:32:30 lalla ResourceManager[8118]: info: Running
/etc/ha.d/resource.d/MailTo mario.guenzi@xxxxxxxxxx
<mailto:mario.guenzi@xxxxxxxxxx> Transizione_lalla_alessandra stop
May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting
/etc/ha.d/resource.d/MailTo mario.guenzi@xxxxxxxxxx
<mailto:mario.guenzi@xxxxxxxxxx> Transizione_lalla_alessandra stop
May 30 09:32:30 lalla MailTo[8431]: INFO: Success
May 30 09:32:30 lalla ResourceManager[8118]: debug:
/etc/ha.d/resource.d/MailTo mario.guenzi@xxxxxxxxxx
<mailto:mario.guenzi@xxxxxxxxxx> Transizione_lalla_alessandra stop
done. RC=0
May 30 09:32:30 lalla ResourceManager[8118]: info: Running
/etc/init.d/bind9 stop
May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting
/etc/init.d/bind9 stop
May 30 09:32:30 lalla ResourceManager[8118]: debug: /etc/init.d/bind9
stop done. RC=0
May 30 09:32:30 lalla ResourceManager[8118]: info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop
May 30 09:32:30 lalla ResourceManager[8118]: debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop
May 30 09:32:30 lalla Filesystem[8531]: INFO: Running stop for
/dev/drbd1 on /jumper
May 30 09:32:31 lalla Filesystem[8520]: INFO: Success
May 30 09:32:31 lalla ResourceManager[8118]: debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd1 /jumper ext3 defaults stop
done. RC=0
May 30 09:32:31 lalla ResourceManager[8118]: info: Running
/etc/ha.d/resource.d/drbddisk r1 stop
May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting
/etc/ha.d/resource.d/drbddisk r1 stop
May 30 09:32:31 lalla ResourceManager[8118]: debug:
/etc/ha.d/resource.d/drbddisk r1 stop done. RC=0
May 30 09:32:31 lalla ResourceManager[8118]: info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop
May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop
May 30 09:32:31 lalla Filesystem[8641]: INFO: Running stop for
/dev/drbd0 on /cache
May 30 09:32:31 lalla Filesystem[8630]: INFO: Success
May 30 09:32:31 lalla ResourceManager[8118]: debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /cache ext3 defaults stop
done. RC=0
May 30 09:32:31 lalla ResourceManager[8118]: info: Running
/etc/ha.d/resource.d/drbddisk r0 stop
May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting
/etc/ha.d/resource.d/drbddisk r0 stop
May 30 09:32:31 lalla ResourceManager[8118]: debug:
/etc/ha.d/resource.d/drbddisk r0 stop done. RC=0
May 30 09:32:31 lalla ResourceManager[8118]: info: Running
/etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop
May 30 09:32:31 lalla ResourceManager[8118]: debug: Starting
/etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop
May 30 09:32:32 lalla IPaddr[8769]: INFO: ifconfig eth1:0 down
May 30 09:32:32 lalla IPaddr[8740]: INFO: Success
May 30 09:32:32 lalla ResourceManager[8118]: debug:
/etc/ha.d/resource.d/IPaddr 192.168.2.247/24/eth1 stop done. RC=0
May 30 09:32:32 lalla mach_down[8092]: info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
May 30 09:32:32 lalla mach_down[8092]: info: mach_down takeover
complete for node alessandra.
May 30 09:32:32 lalla heartbeat: [3018]: info: mach_down takeover
complete.
May 30 09:33:02 lalla hb_standby[8823]: Going standby [foreign].
May 30 09:33:02 lalla heartbeat: [3018]: info: lalla wants to go
standby [foreign]
May 30 09:33:13 lalla heartbeat: [3018]: WARN: No reply to standby
request. Standby request cancelled.
e ovviamente non va su niente ne ip ne servizi.

dato che restuisce errore 1 ho provato a cercare con google ma o sono
un tordo integrale e non capisco nulla o nessuno sa cosa sia questo
errore 1 che e' definito errore generico.
indipendentementte da come sia definito pero' vorrei capire cosa sbaglio.
come ulteriore informazione aggiungo:
distribuzione etch con pacchetti backport
drbd 8.0.11 compilato staticamente nel kernel
kernel 2.6.25
heartbeat 2.1.3
Qualche idea da darmi?
grazie in anticipo e scusate il crossposting.

- --
Mario Vittorio Guenzi

http://clark.tipistrani.it

Si vis pacem para bellum


- --
Per REVOCARE l'iscrizione alla lista, inviare un email a
debian-italian-REQUEST@xxxxxxxxxxxxxxxx con oggetto "unsubscribe". Per
problemi inviare un email in INGLESE a listmaster@xxxxxxxxxxxxxxxx

To UNSUBSCRIBE, email to debian-italian-REQUEST@xxxxxxxxxxxxxxxx
with a subject of "unsubscribe". Trouble? Contact
listmaster@xxxxxxxxxxxxxxxx


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIRQaGm6qs1ZkNrIoRArhnAJ9vheLEre6dbrEw/Uw7dTrMq+LvlQCfbMiJ
MtLDvkw5GIgcoSn+GNLe5ns=
=pMbd
-----END PGP SIGNATURE-----
-- 
Per iscriversi  (o disiscriversi), basta spedire un  messaggio con OGGETTO
"subscribe" (o "unsubscribe") a mailto:linuxtrent-request@xxxxxxxxxxxxx


Other related posts: