Monday, October 18, 2010

Sun Cluster ccradm syntax change

Update: To cut a long story short, here is the new syntax;
/usr/cluster/lib/sc/ccradm recover -o /etc/cluster/ccr/global/infrastructure

Full skinny follows;
If running SC 3.2u3 or 3.2 Cluster core patch equal to or greater then 126105-36 (5.9) or 126106-36 (5.10) or 126107-36 (5.10 x86) the syntax for regenerating ccr checksums has changed.

Also, if running Solaris Cluster 3.2u2 or higher, the directory path /etc/cluster/ccr is replaced with /etc/cluster/ccr/global. The same applies if running Cluster core patch equal to or greater then 126105-27 (5.9) or 126106-27 (5.10) or 126107-27 (5.10 x86).



#ccradm
usage: ccradm subcommand args ...
where 'subcommand' is one of the following:
recover [-Z zoneclustername] [-f] [-o] ccrtablefile
replace [-Z zoneclustername] -i newdatafile ccrtablefile

addtab [-Z zoneclustername] ccrtablefile
remtab [-Z zoneclustername] ccrtablefile

addkey [-Z zoneclustername] -v value -k key ccrtablefile
changekey [-Z zoneclustername] -v value -k key ccrtablefile
delkey [-Z zoneclustername] -k key ccrtablefile
showkey [-Z zoneclustername] -k key ccrtablefile

Sun Cluster Maintenance Commands ccradm(1M)

NAME
ccradm - CCR table files administration command

SYNOPSIS
******

/usr/cluster/lib/sc/ccradm recover [-Z zoneclustername][-o] ccrtablefile

/usr/cluster/lib/sc/ccradm replace [-Z zoneclustername]-i newdatafile ccrtablefile

/usr/cluster/lib/sc/ccradm addtab [-Z zoneclustername]ccrtablefile

/usr/cluster/lib/sc/ccradm remtab [-Z zoneclustername]ccrtablefile

/usr/cluster/lib/sc/ccradm addkey [-Z zoneclustername]{-s value | -f file} -k key ccrtablefile

/usr/cluster/lib/sc/ccradm delkey [-Z zoneclustername]key ccrtablefile

/usr/cluster/lib/sc/ccradm changekey [-Z zoneclustername]{-s value | -f file} -k key ccrtablefile

/usr/cluster/lib/sc/ccradm showkey [-Z zoneclustername]-k key ccrtablefile

DESCRIPTION

The ccradm command supports administration of
Cluster Configuration Repository (CCR) information.

The CCR information all resides somewhere under /etc/cluster/ccr.
CCR information about the global cluster resides in the
directory /etc/cluster/ccr/global. CCR information about
a zone cluster "zoneclustername" resides in the directory
/etc/cluster/ccr/zoneclustername. CCR information should only
be accessed via the supported programming interfaces. The file
permissions are intentionally set to prevent direct access to
CCR information.

CCR information is stored in the form of a table, with one table
stored in its own file. Each line of the CCR table file consists
of two ASCII strings, where one is the key an the other is the value.
.
Each CCR file starts with a generation number, ccr_gennum,
and a checksum, ccr_checksum.

The ccr_gennum indicates the current generation number of the CCR table
file. The system manages the ccr_gennum. The highest number is the
latest version of the file. Two special values exist (refer to the
recover subcommand for more information).

The ccr_checksum indicates the checksum of the CCR table contents,
and provides a consistency check of the data in the table.
The system will not use a CCR table file with an invalid checksum.

"ccrtablefile" is the name of the file representing the CCR
table on the local node. When the "-Z" option is specified, the
ccrtablefile belongs to the specified zone cluster. When no "-Z" option
is specified, the ccrtablefile belongs to the global cluster. Note
that the global cluster and the zone clusters can all have
a ccrtablefile of the same name with different information.

SUBCOMMANDS
The following subcommands are supported:

<< note that the "recover" subcommand replaces "checksum" >>

recover
This option is only for use by engineers who are experts
on the internal operations of the CCR. This option supports
manual recovery operations. Normal users should not use
this option.

This option can be used only in non-cluster mode.

The recover subcommand always sets the value of the ccr_gennum
and re-computes the checksum sets the value of ccr_checksum in
the ccrtablefile.

When used without the "-o" option, the recover subcommand sets the
generation number to INIT_VERSION. A generation number of INIT_VERSION
means that the CCR table file is valid only until the local node
rejoins the cluster, at which time the cluster will replace
the contents of ccrtablefile with the contents of ccrtablefile
from another node in the cluster.
A prerequisite is either one of
the following:
1) one of the other nodes in the clus-
ter must have the override version set for the CCR table file, or
2) at least one of the other nodes must have a valid copy of
the CCR table file. A CCR table file is valid if it has a
valid checksum and its generation number is greater than or
equal to zero.

If ccrtablefile has a generation number of INIT_VERSION on
all nodes, then the CCR table will remain invalid after
recovery has completed. Therefore, do not use the init
subcommand without the -o option on a CCR table file on
all nodes in the cluster.

When used with the "-o" option, the recover subcommand sets the
generation number to OVRD_VERSION. A generation number of OVRS_VERSION
means that the system will propagate the contents of ccrtablefile
on the local node to all other cluster nodes. After propagating
the contents to other nodes, the system will change the generation
number to 0. Only one node should have a ccrtablefile with a value
of OVRD_VERSION. If someone makes a mistake and sets OVRD_VERSION
on the same ccrtablefile on multiple nodes, the system will
arbitrarily use one ccrtablefile contents.

replace
This option is only for use by engineers who are experts
on the internal operations of the CCR. This option supports
manual recovery operations. Normal users should not use
this option.

This subcommand can be used only in cluster mode.

Contents of "newdatafile" will replace the contents of
"ccrtablefile".

The checksum will be recomputed and the generation number
will be reset to 0.

addtab
Creates a table in the cluster configuration repository
for the specifid cluster. The table initially contains
just the ccr_gennum and ccr_checksum.

This subcommand can be used only in cluster mode.

remtab
Remove a table in the cluster configuration repository.

This subcommand can be used only in cluster mode.

addkey
Adds a key - value pair to ccrtablefile for the specifid cluster.

This subcommand can be used only in cluster mode.

When used with the -s option the data is a string value.
When used with the -f option the value is the first string in the
file and the file contains exactly one string. The command
returns an error if the file is not in this format.

delkey
Deletes a key - value pair from ccrtablefile based upon
the specified key. If the key is not found in ccrtablefile,
the command returns ESPIPE.

This subcommand can be used only in cluster mode.

changekey
Modify the value of a key in ccrtablefile
based upon the specified key and newvalue.
If the key is not found in ccrtablefile,
the command returns ESPIPE.

This subcommand can be used only in cluster mode.

When used with the -s option the data is a string value.
When used with the -f option the value is the first string in the
file and the file contains exactly one string. The command
returns an error if the file is not in this format.

showkey
Displays the value for the specified key in ccrtablefile.
If the key is not found in ccrtablefile,
the command returns ESPIPE.
The showkey command writes to standard output just the value string
followed by an end of line for the specified key.
When an error occurs, the command writes nothing.

This subcommand can be used only in cluster mode.

OPTIONS
The following options are supported:

Note -

Both the short and long form of each option is shown in
this section.

-?
--help

Displays help information.

You can specify this option with or without a subcom-
mand.

If you specify this option without a subcommand, the
list of all available subcommands is displayed.

If you specify this option with a subcommand, the usage
for that subcommand is displayed.

-Z {zoneclustername | global}
--zoneclustername={zoneclustername | global}
--zoneclustername {zoneclustername | global}

Specifies the cluster in which the CCR transactions
has to be carried out.

This option is supported by all subcommands.

If you specify this option, you must also specify one
argument from the following list:

zoneclustername Specifies that the command with
which you use this option is to
operate on all specified resource
groups in only the zone cluster
named zoneclustername.

global Specifies that the command with
which you use this option is to
operate on all specified resource
groups in the global cluster only.

-i newdatafile
--input=newdatafile
--input newdatafile

Specifies the CCR table file you want to use for
recovery operation.

-o
--override

The override option is used with the recover subcommand.
This option can be used only in non-cluster mode.
It sets the generation number to OVRD_VERSION.
This option is used to designate one CCR table
file to be the master copy. This version of the
CCR table file will override other versions of the
file that are on the remaining nodes during
recovery. If a CCR table file has a generation
number of OVRD_VERSION on more than one node,then
only one of the files will be selected and a warn-
ing message will be printed on the console of one
of the nodes. After recovery the table's genera-
tion number will be reset to 0.

-s {value}
--string={value}
--string {value}

Specifies the value for the key of a CCR table.
There can be no white space characters in the value string.
This means that there can be no spaces, tabs, carriage returns, or
line feeds.

-f {filename}
--filename={filename}
--filename {filename}

Specifies that you want to use the contents of the
file to add or modify the key value that is located
in the filename file. The file must contain exactly one string,
which is to be used as the new value.
There can be no white space characters in the value string.
This means that there can be no spaces, tabs, carriage returns, or
line feeds.

USAGE

ccradm can be used for administrative actions on CCR table files.

EXAMPLES

REPAIR PROCEDURE FOR A CCR TABLE

Example 1.

If a CCR table file is manually edited as part of some emer-
gency repair procedure, then the checksum must be recomputed.

Perform these steps to repair a corrupted CCR table only
when directed as part of an emergency repair procedure.

Reboot all nodes in non-cluster mode.

Edit the file on all nodes to contain the correct data.
The file must be identical on all nodes.
Because the file is identical on all nodes, it also can
be designated as the override version on all nodes.
Recompute the checksum and designate this CCR table
file to be the override version by running the follow-
ing command on all nodes. file is the name of the CCR
table.

On each node of the cluster execute the following command:

/usr/cluster/lib/sc/ccradm recover -Z global -o ccrtablefile

Reboot all nodes in cluster mode.

Example 2.

In this example, in order to carry out emergency repairs
the administrator wants ccrtablefile on node 1 to be used
only until the cluster reforms. The administrator wants
the ccrtablefile contents on node 2 to be used on all nodes
once the cluster forms.

In non-cluster mode on node 1, recompute the checksum and set the gen-
eration number to INIT_VERSION for CCR table file www on the
local node:

# ccradm recover -Z global ccrtablefile

In non-cluster mode on node 2, recompute the checksum and set the gen-
eration number to OVRD_VERSION for CCR table file xxx on the
local node:

# ccradm recover -Z global -o ccrtablefile

Example 3.

In this example, in order to carry out emergency repairs
the administrator wants to replace the current contents
of ccrtablefile with the contents from a backup file.

In cluster mode, replace the CCR table yyy with the contents
of its backup version in the file /etc/cluster/ccr/yyy.bak:

# ccradm replace -Z global -i /etc/cluster/ccr/global/yyy.bak yyy

TABLE OPERATIONS

In cluster mode, create a CCR table foo in the global cluster.

# ccradm addtab foo

To create a CCR table foo in zonecluster zc1

# ccradm addtab -Z zc1 foo

To remove a CCR table foo in the global cluster.

# ccradm remtab foo

To modify a CCR table key value in the global cluster
# ccradm modify-key -f /tmp/data1 -k key1 ccrtablefile

# ccradm modify-key -s "NewValue" -k key1 ccrtablefile

To display a CCR table key value in the global cluster
# ccradm show-key -k key1 ccrtablefile
ValueForKey1
#

EXIT STATUS
The following exit values are returned:

0 No errors occurred.

>0 Error occurred.

EINVAL Invalid argument, such as a value string with white space

ESPIPE The specified key does not exist in ccrtablefile

ENOENT The specified file does not exist

Wednesday, August 11, 2010

Incorrect vxdmpadm ENCLR_TYPE

I saw a problem today where after a VxVM upgrade (to 5.1) the Symmetrix disks had been renamed in vxdisk list and vxdmpadm showed that they weren't being accessed as PowerPath devices.

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
disk_0 auto:none - - online invalid
disk_1 auto:ZFS - - ZFS
disk_2 auto:SVM - - SVM
emc0_0b1c auto:cdsdisk mydg_35 mydg online
emc0_0b2c auto:cdsdisk mydg_39 mydg online
emc0_0b10 auto:cdsdisk mydg_32 mydg online
emc0_0b14 auto:cdsdisk mydg_33 mydg online
emc0_0b18 auto:cdsdisk mydg_34 mydg online
emc0_0b20 auto:cdsdisk mydg_36 mydg online
emc0_0b24 auto:cdsdisk mydg_37 mydg online

# vxdmpadm listctlr
CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
=====================================================
c0 Disk ENABLED disk
c3 Disk ENABLED disk
c4 EMC ENABLED emc0
c1 EMC ENABLED emc0
c2 EMC ENABLED emc0
c5 EMC ENABLED emc0


To fix this I ensured that all of the VxVM SMF services were enabled, and then restarted the configuration daemon;
# vxconfigd -k

Now the output looks like;
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
disk_0 auto:none - - online invalid
disk_1 auto:ZFS - - ZFS
disk_2 auto:SVM - - SVM
emcpower0 auto:cdsdisk mydg_35 mydg online
emcpower1 auto:cdsdisk mydg_39 mydg online
emcpower2 auto:cdsdisk mydg_32 mydg online
emcpower3 auto:cdsdisk mydg_33 mydg online
emcpower4 auto:cdsdisk mydg_34 mydg online
emcpower5 auto:cdsdisk mydg_36 mydg online
emcpower6 auto:cdsdisk mydg_37 mydg online

# vxdmpadm listctlr
CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
=====================================================
c0 Disk ENABLED disk
c3 Disk ENABLED disk
emcp PP_EMC ENABLED pp_emc0


I have a strong suspicion that all that was required was the "vxconfigd -k" and that the services were a red herring so if this occurs again this will be the first thing I try.


Update:
vxdctl enable fixes this also. No need to go checking services, just run everyone's favourite VxVM command.

Sunday, August 8, 2010

dhcpmgr help - "Could not start browser to view help"

Whilst doing battle with dhcpmgr manager this morning I thought I'd have a browse through the help pages to see if they held anything of interest. Unfortunately I didn't get very far due to a dialogue box telling me "Could not start browser to view help".

A quick scan of the source code told me it was looking for /usr/sfw/bin/mozilla. (Hardcoded - nice).

This file was missing so I created it as a link to my firefox binary and the help pages started up.

Friday, July 23, 2010

VLANs on opensolaris cookbook

I wanted to play about with VLANs on my nge1 interface but there is an extra step required which is different to Solaris 10, as per http://hub.opensolaris.org/bin/view/Community+Group+on/2008120402


So to create an interface with VID 100 on my interface nge1 I did;

# dladm create-vlan -l nge1 -v 100

# dladm show-vlan
LINK VID OVER FLAGS
nge100001 100 nge1 -----

# ifconfig nge100001 plumb up

# ifconfig nge100001 192.168.199.1 netmask 0xffffff00 broadcast 192.168.199.255

# ifconfig nge100001
nge100001: flags=201100843 mtu 1500 index 8
inet 192.168.199.1 netmask ffffff00 broadcast 192.168.199.255
ether 0:1f:c6:e8:6:b0


Sorted!

Sunday, June 20, 2010

Slimserver on Solaris. Upgrade squeezeboxserver-7.5.0-30464 -> squeezeboxserver-7.5.1-30836

I had to scratch my head over this one due to a mismatched perl module. Details below;

First of all disable the current server;
svcadm disable slimserver

Then download and uncompress the new tarball;
gzcat *tgz | gtar xvf -
cp -r /tank/tmp/751/squeezeboxserver-7.5.1-30836 .
cp -r squeezeboxserver-7.5.0-30464/prefs squeezeboxserver-7.5.1-30836

Refer to modules listed in squeezeboxserver-7.5.0-30464/modules and compare with previous versions.
The only updated module was Audio::Scan.
Download new Audio::Scan from http://svn.slimdevices.com/repos/slim/7.5/trunk/vendor/CPAN/
unzip and then;
/opt/csw/bin/perl Makefile.PL
make

Copy over previously compiled modules;
cp -r /usr/local/squeezeboxserver-7.5.0-30464/CPAN/arch/5.8.8 /usr/local/squeezeboxserver-7.5.1-30836/CPAN/arch

Then move any changed modules sideways and put the new versions in their place.
cd /usr/local/squeezeboxserver-7.5.1-30836/CPAN/arch/5.8.8/i86pc-solaris-thread-multi/auto
mv Audio Audio.old
cd /tank/tmp/751/modules/Audio-Scan-0.82/blib/arch/auto
cp -r Audio /usr/local/squeezeboxserver-7.5.1-30836/CPAN/arch/5.8.8/i86pc-solaris-thread-multi/auto

Now change all *.pl scripts under /usr/local/squeezeboxserver-7.5.1-30836 to use /opt/csw/bin/perl
Copy over previous errmsg.* files from the old MySQL dir to the new.
cp squeezeboxserver-7.5.0-30464/MySQL/errmsg.txt squeezeboxserver-7.5.1-30836/MySQL/
cp squeezeboxserver-7.5.0-30464/MySQL/errmsg.sys squeezeboxserver-7.5.1-30836/MySQL/
mkdir squeezeboxserver-7.5.1-30836/Logs
chown -R slim:apps squeezeboxserver-7.5.1-30836/

Now we should be in a position to test the new server, so;
su - slim
/usr/local/squeezeboxserver-7.5.1-30836/slimserver.pl --perfmon --cliaddr=192.168.0.1 --playeraddr=192.168.0.1 --daemon --logdir=/usr/local/slimserver/Logs --pr--prefsdir=/usr/local/slimserver/prefs
This failed with;
Undefined subroutine &Class::XSAccessor::Array::newxs_getter called at /usr/local/squeezeboxserver-7.5.1-30836/CPAN/Class/XSAccessor/Array.pm line 64.

Looking in /usr/local/squeezeboxserver-7.5.1-30836/CPAN/Class/XSAccessor/Array.pm, this module appears to be Version 1.05
Strings on /usr/local/squeezeboxserver-7.5.1-30836/CPAN/arch/5.8.8/i86pc-solaris-thread-multi/auto/Class/XSAccessor/Array/Array.so indicates Version 1.04 so is this a version mismatch?

If we find the Array.pm that belongs to http://svn.slimdevices.com/repos/slim/7.5/trunk/vendor/CPAN/Class-XSAccessor-Array-1.04.tar.gz and copy this into place maybe it will work?
cd /tank/tmp/751/modules/Class-XSAccessor-Array-1.04
cp ./lib/Class/XSAccessor/Array.pm /usr/local/squeezeboxserver-7.5.1-30836/CPAN/Class/XSAccessor/Array.pm
Test again;
su - slim
/usr/local/squeezeboxserver-7.5.1-30836/slimserver.pl --perfmon --cliaddr=192.168.0.1 --playeraddr=192.168.0.1 --daemon --logdir=/usr/local/slimserver/Logs --pr--prefsdir=/usr/local/slimserver/prefs
NOTE: Class::XSAccessor 1.05+ not found, install it for better performance
[10-06-20 12:14:23.2332] main::init (323) Starting Squeezebox Server (v7.5.1, r30836, Tue Jun 1 06:59:24 MDT 2010) perl 5.008008

OK, so it grumbled but at least we're in business.

So to make this the default instance and run under SMF, kill the process we've just started and;
cd /usr/local
rm slimserver
ln -s squeezeboxserver-7.5.1-30836 slimserver
svcadm enable slimserver


Update: An alternative is to upgrade to 1.05, as suggested here - http://forums.slimdevices.com/showthread.php?p=556726#post556726

Wednesday, June 2, 2010

Fun with telnet - towel.blinkenlights.nl

I got reminded about these on Peteris Krumins' excellent blog but I think they bear repeating;

Star Wars;
$ telnet towel.blinkenlights.nl

BOFH excuses;
$ telnet towel.blinkenlights.nl 666

Wednesday, May 19, 2010

Solaris product registry

Things I have learnt about the product registry today.
  1. Some products (eg Sun Studio 12, Sun Cluster) register themselves into this mysterious registry even though they are just a bundle of standard Solaris packages.
  2. To inspect and administer packages in the registry, use the prodreg command.
  3. The product registry database is a flat file - /var/sadm/install/productregistry
  4. If the Sun Studio installer script takes forever to start up then killing any gconftool-2 processes on a headless box will greatly speed this up.

Wednesday, May 5, 2010

The old "Solaris du and df not matching" chestnut

Today we had a DBA come to us saying they had deleted a file but the filesystem space hadn't freed up. He'd killed the job that created the large (100GB+) file but the filesystem was still reporting as 100% full.

We tried pfiles against all the running processes and grepping for the filesystem in question. This took a while (over 800 processes) and didn't return any output.

Next stop was to look through a ps listing (grepping out anything obviously not very helpful) to see if anything jumped out. Needless to say this was not very productive.

fuser -cu against our filesystem gave only a handful of processes so we ran a pfiles against a likely looking one and it gave the following output;

root # pfiles 14648
14648: /usr/lib/ssh/sftp-server
Current rlimit: 256 file descriptors
0: S_IFSOCK mode:0666 dev:341,0 ino:3286 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
1: S_IFSOCK mode:0666 dev:341,0 ino:3286 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
2: S_IFSOCK mode:0666 dev:341,0 ino:8982 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
3: S_IFSOCK mode:0666 dev:341,0 ino:3286 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
4: S_IFSOCK mode:0666 dev:341,0 ino:3286 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
5: S_IFREG mode:0644 dev:154,1008 ino:10873 uid:1200 gid:208 size:136971687936
O_RDONLY|O_LARGEFILE


The file opened by fd5 looks pretty big so we might have found the culprit. Another clue was that there was no filename listed for fd5. Next step, confirm the size of fd 5.
root # du -sh /proc/14648/fd/5
128G 5

OK, this is a likely candidate and once the sftp connection was stopped the space was indeed returned.


However, if we look at the dev: entry in the pfiles output it gives a clue as to what we should have grepped for in our original pfiles trawl. dev:154,1008 gives us major number of 154 and minor number 1008.

root # grep 154 /etc/name_to_major
vxio 154

Not very helpful. What if we look at this the other way round, what is the major/minor of our target filesystem?

root # ls -l /dev/vx/dsk/tempdg/db_raw
lrwxrwxrwx 1 root other 60 Apr 9 15:13 /dev/vx/dsk/tempdg/db_raw -> ../../../../devices/pseudo/vxio@0:tempdg,db_raw,1008,blk

root # ls -l /devices/pseudo/vxio@0:tempdg,db_raw,1008,blk
brw------- 1 root root 154, 1008 Mar 7 04:44 /devices/pseudo/vxio@0:tempdg,db_raw,1008,blk

So that double confirms what we knew and next time running pfiles and looking for processes with an open file against dev:major,minor but with no filename listed should do the trick.