I’m sure I’ll get all kinds of interesting queries that land here from people who are a little too… enamored with AIX but never the less.  AIX is the only OS that I know of that can replace boot disks while the OS is running without an outage.  Of course, standard disclaimers apply, but this is extremely helpful for Storage Array migrations (bringing an LPAR under an SVC, for example) and just general maintenance.  If you’re running under VIO Servers, it even allows complete cleanup.  I recommend a reboot at the end to be safe, but I doubt it is necessary.

First, an assumption:
Namely, all the target disks are currently configured on the LPAR and are not in any volume groups.

Step 0:  MAKE SURE TO HAVE A CURRENT MKSYSB.

Step 1:  Replace an old root hdisk with the new one.  If this fails due to the destination disk being smaller, go to the alternate instructions below

$ replacepv OLDDISK1 NEWDISK1
0516-1011 replacepv: Logical volume hd5 is labeled as a boot logical volume.
0516-1232 replacepv:
       NOTE: If this program is terminated before the completion due to
       a system crash or ctrl-C, and you want to continue afterwards
       execute the following command
       replacepv -R /tmp/replacepv385038
0516-1011 replacepv: Logical volume hd5 is labeled as a boot logical
volume.

Step 2:  Verify that the old disk is not defined to any volumegroups:

$ lspv
OLDDISK1          00007690a14d9fee                    None
NEWDISK1          00007690a14cae39                    rootvg
active
OLDDISK2          0000769091324b51                    rootvg
active

Step 3:  Add the boot image to the new disk:

$ bosboot -ad NEWDISK1
bosboot: Boot image is 30441 512 byte blocks.

Step 4:  Repeat steps 1-3 for the second root disk (if replacing both
root disks)

Step 5:  Adjust the bootlist

$ bootlist -om normal NEWDISK1 NEWDISK2
$ bootlist -om service NEWDISK1 NEWDISK2

Step 6:  Remove the old hdisks.

$ rmdev -dl OLDHDISK

Step 7:  Remove the old disk mappings from the VIO Server

<VIO> $ rmdev -dev OLDMAPPING

Step 9:  Run savebase

$ savebase

Alternate Instructions

Step A1:  Place the replacement hdisks into the volumegroup:

extendvg rootvg NEWDISK

Step A2:  Migrate the disks (you must have PPs sufficient to migrate the
disk):

migratepv OLDDISK NEWDISK

Step A3:  Validate that there is no data on the old disk

lspv -l OLDDISK

Step A4:  Remove the OLDDISK from the Volumegroup

reducevg rootvg OLDDISK

Step A5:  Add the boot image to the new disk:

$ bosboot -ad NEWDISK1

Step A6:  Repeat steps A1-A5 for the second root disk.

Step A7:  Continue with step 5 above

Posted in aix, ibm, storage, tech at May 27th, 2009. No Comments.

In case you were living in a cave today, EMC released the upgrade to their Enterprise Storage Array, the Symmetrix V-Max.  Currently, the Storage Anarchist and Storagezilla seem to be competing on who can post more technical information on this release… check out both of those blogs if you want to get up-to-speed quickly.

Chuck Hollis has a pair of posts up today that give a decent management level overview of the release… while I don’t exactly agree with how loosely the term storage virtualization is thrown around in general, I have to admire his use of the term with regard to a platform where hyper size still matters.  I still think thin provisioning is the most accessible definition of “storage virtualization.”

After all, thin provisioning is a vendor solution to Microsoft’s lack of robust volume management.

Anyway, that isn’t what brings me to this post.  As I was paging through Google Reader waiting for my World of Warcraft realm to come online, I came across this post from blogger-for-hire Tony Asaro on his HDS blog.  As an “end user” of storage, I figured I’d attempt to answer his questions and see how accurate the EMC announcement was portrayed to a customer like myself.  I’m sure someone will come along and correct me as necessary.  Or, no one will see this and I’ll continue thinking I understand more that I do.  Either one should be worthwhile.

1.  V-Max has some new capabilities  – but what about all of the investment that customers have made in the DMX?  Do DMX customers get any of these features or do they have to buy the V-Max to get them?

For a lot of the functionality, the capabilities hinge on horizontal scaling.  Applying the capabilities to a vertical scaling technology like the Symmetrix doesn’t exactly make sense.  For purely software functionality, I’m sure EMC will extend them to current DMX and CX customers as soon as they are able to thoroughly test and implement them.  Barry Burke already said as much with regard to FAST:

Under development for several years, EMC will be delivering FAST technology across all of its storage platforms (Symmetrix, CLARiiON and Celerra) beginning in 2009 with Symmetrix.

Of course, I’m sure it will all be a separate license, but needlessly isolating technology to the V-Max architecture doesn’t appear to be in EMC’s game plan.

2. If they have to buy the V-Max to get the new features -  does EMC have a service for doing the data migrations and application switch over?  How much does it cost?  How long will it take?  What impact will there be to operations?

Yes, EMC has a migration service.  Like any statement of work, it depends on the complexity and size for any sort of cost and time estimate to be made.  Even virtualized platforms require a brief outage for implementation.  I’m sure every vendor offers Professional Services to help with the cutover to their new architecture.  With EMC, I’m sure Powerpath Migration Enabler can handle the migrations with little application/system impact.  Almost every customer has managed a decent sized data migration, I’d think, unless they deployed SVC out of the chute (it has automagic upgrades, right?).

3. What about all of the scripts that have been developed for older Symms?  Will they work with the new V-Max?  And if yes, will some of new functionality provided by the V-Max by using older Symm scripts be lost?

Yes, all of the SYMCLI scripts will be usable with the V-Max.  No, new V-Max functionality will not work with the old scripts (I mean, how the heck would that even be possible without re-writes?).  Amazingly enough, even the V-Max can’t take a look at a legacy script and make it leverage new technology!

4. Will the DMX have these new capabilities at some point or is it being obsoleted?

I’m not sure how this is different from the first question, but I guess there needed to be 10 in total.

5. If the DMX is being obsoleted – then isn’t the V-Max really a new product – a disruptive “evolution” – requiring customers to spend new dollars and implement new infrastructure to get the new value?  In these economic times – that can be pretty challenging for IT budgets.

Like any technology refresh, it costs money to upgrade.  The benefit of the V-Max is that you can leverage your previous investment in monitoring, education, and automation.  I don’t know of any new platform that is available for free.  Going from monolithic scale-up to hugely scale-out isn’t evolutionary… it is revolutionary.  Everything hinges on the execution though.

6.  It will be interesting to see how the reality compares to the rhetoric – there is a big difference between concept and execution.  What are customers saying about the new capabilities of the V-Max?  Not just in comparison to the DMX but to other vendor solutions?

Time will tell… it is way too soon to even venture a guess on this one.  But, the same can be said for any new hardware release.

7.  This is a big deal to EMC – the Symmetrix seemed stuck for a few years without any real innovation and this announcement breaks that trend.  But when will all of the capabilities be available?

Oh yeah, totally… years with out real innovation.  I mean, it isn’t like EMC released Enterprise Flash Drives, or extensive vertical scalability, or a multitude of tiering options within a single array.  <eye roll>

8. It is one thing to say that performance is X% faster but that is just a bunch of hyperbole in the real world.  Since EMC doesn’t have a published baseline of performance for the DMX – what is the comparison based on?

SPC benchmarks aren’t “real-world” and there isn’t a benchmark suited to today’s cache-heavy, mixed workload environments… I’d assume the “X% faster” is based off of similar Symmetrixes (similar caches, similar disk layouts) running a similar workload and then measuring IOPS, throughput, and response time.  Performance is such a hard thing to measure in general, I’m not sure this can be proven or disproven as hyperbole regardless.

9.  There are a number of performance issues to consider with this new 1.0 architecture.  What is the impact of performance when tiering?  Has anyone tested the impact of performance with wide stripping – it should be faster but has it been measured?  What is  the impact on primary I/O performance when remote mirroring is taking place?  The impact of primary I/O performance during a RAID rebuild?  Since this is a new architecture – what is the performance impact when a controller is unavailable?  How does performance scale as more storage and I/O is added?  We don’t know the answer to any of these questions.

See question 8.  Why don’t people run SPC benchmarks with a failed drive?  Or with drives at nearly full utilization?

10. Many of the new capabilities on the V-Max  are 1.0 features and more importantly the fundamental architecture is 1.0!   Think about that for a second – an entirely new architecture that has no track record of success.  It is a 1.0 solution that carries with it the encumbrances of an older solution with millions of lines of code.   What are the detailed best practices for customers wanting to switch from DMX to V-Max?

This seems like a “wait for Service Pack 1″ argument.  EMC has built on a long-standing tradition of Enginuity stability… the code is not entirely new, and it sounds like they’ve thought long and hard about how to make this has stable as the DMX-4.  The concerns raised in this question are the same questions of any non-incremental technology upgrade… revolutionary ideas such as thin-provisioning, enterprise flash, and FAST always carry the “it hasn’t been proven” risk with them.  It is up to the vendor to mitigate these risks and provide a stable product.

Posted in emc, ibm, storage, tech at April 15th, 2009. 4 Comments.

Wired posted an extremely interesting article on the World’s Biggest Diamond Heist… makes the Italian Job look like a cakewalk.

Posted in news at March 16th, 2009. No Comments.

Many large AIX environments use IBM’s Network Installation Manager (NIM) to deploy and maintain AIX LPARs.  If you ever need to change a "forgotten" root password in AIX and have the NIM environment available, the following procedure will allow you to logon and change the password.  This requires an outage, but is easier and quicker than trying to boot off of CD/DVD.  Of course, try this in a test environment first to make sure it works as expected, and I offer no warranties/etc if something goes horribly wrong.  I’m not going to claim it is elegant or the best way of doing it, but it works.  If anyone has a better way, please post it in the comments.

On the NIM server, run the following command:

nim -o maint_boot -a spot=SPOTNAME LPARNAME

On the AIX box that is having the “root password opportunity”, reboot and go into the SMS mode.  Make sure that the NIM server IP address is set as the boot server, and that the LPAR’s network information is set to what it should be. 

It should boot off of the SPOT.  You will have to go through the prompts to set up the Current Terminal and the Preferred Language.  After that, there should be an option to either install the BOS, or go into a limited maintenance mode.  Go into the limited maintenance mode.

After this, you will be booted into a mostly non-functional AIX environment.  Type lspv to see what physical volumes you have available.  Type the following to import the hdisk that had rootvg on it originally:

importvg hdisk#

Create a temporary mount point and mount the root filesystem:

mkdir test
mount /dev/hd4 /test

You do not have access to 90% of the command line tools (including vi) in this environment.  Run the following command to add a new account to the passwd file:

echo tempuser::0:0::/:/usr/bin/ksh >> /test/etc/passwd

MAKE SURE THAT YOU USE TWO “>” SYMBOLS.  Otherwise, you will overwrite the entire passwd file.  Run the following commands to sync the file system and prepare the LPAR for the reboot:

sync
cd /
umount /test

Reboot the LPAR.  When the LPAR comes up, it should boot to the normal hdisk.  At this point, you can log in locally as the user you created above without a password.  Then, run “passwd root” to change the root password.  Be sure to go in and remove the entry you made in /etc/passwd after verifying that the password has been changed (smit users will allow you to remove it that way as well).

Posted in aix, ibm, tech at December 23rd, 2008. No Comments.

Last week, NetApp announced a guarantee that customers would use 50% less storage than traditional arrays by going with a NetApp FAS solution.  Of course, the tightly-knit group of storage bloggers were all over this announcement pointing out the “flaws” with it.  Some choice quotes:

  • StorageZilla (EMC):  “One has to ask is it really putting your money where your mouth is if you’ve been sure to rig the game so you can’t lose? 50% reduction with de-dup for low change rate heavily redundant data? Hell, you’re not even supposed to pick up the phone unless you’re running over 80% capacity and if you hit that number with one of the specifically defined data sets they support it means you probably don’t have de-dup switched on or have a pile of pre de-duplication snapshots stinking the place up somewhere.”
  • The SAN Technologist (Independent blog, but employed by Dell/Equilogic):  This post is pretty fair about the entire issue, but compares it to Harley Davidson stating that motorcycles reduce tire needs by 50%.  “I just thought it was funny that the baseline for storage chosen wasn’t another RAID6 based configuration, but comparison to a RAID10 deployment.”
  • Chuck Hollis (EMC):  Too much to quote effectively, but the standard response of RAID-10 vs. RAID-DP, along with a lot of the other caveats that the guarantee includes.  This is probably the most thorough vendor response that I’ve read.
  • Robin Harris (Independent Analyst):  Robin didn’t discuss the guarantee, other than use it as a jumping-off point for primary storage de-dup.  “If the feature is free, de-duping some primary storage will be standard practice in most data centers within 5 years. As the de-dup technology improves and Moore’s Law drives performance, more and more unstructured data will be de-dup’d as a matter of course.”
  • Scott Lowe (Independent Storage Professional):  On the Storage Monkey’s blog (Full Disclosure – I’m a member of the forum site), Scott asks for opinions but doesn’t share his (other than the common RAID-DP vs RAID 10).  “I’ll keep my thoughts to myself, except to say I disagree with the requirement that the baseline system (against which the customer’s system will be measured to determine if the 50% reduction is being met; more requirements here under “How it works”) use RAID 10.”
  • Craig Simpson (HP):  “Of course some simple math shows that RAID 5 or 6 would save around 40% (43% for the case they chose) over RAID 1 on anybody’s array. So with all the other tools and restrictions their guarantee only saves another 7% over vanilla RAID 1.  Did they really think people wouldn’t do the math?”
  • Jon Toigo surprisingly stayed out of the entire discussion.

After reading through the sheer amount of commentary surrounding this announcement, including NetApp’s own blog post in it, I think that this entire marketing “stunt” was brilliant.  First off, similar to IBM’s stealth launch of several storage products in the past month, this received a lot more commentary than most storage announcements.  From a “getting the product and message out there” perspective, NetApp received a lot of publicity for a fairly low amount of work.  At the very least, for any storage RFP in a VMware environment, competitors will have to answer why they feel the guarantee isn’t valid, and why they don’t have a similar guarantee.  After all, if it means “nothing,” then why don’t other vendors offer it?

After reading through the requirements, I believe that this is primarily for getting into the datacenter of new customers.  First off, the requirement for Professional Services isn’t too significant, since any new technology purchase typically comes with a PS engagement.  Secondly, from the NetApp blog posting about this offer:  “Once customers see the other advantages we have, in terms of performance efficient snapshots, rapid and efficient cloning and provisioning techniques, rapid data backup and recovery, and un-compromised data protection, they will realize a whole new way to manage their storage in a VMware environment.  Come for the space savings, and stay for the simplicity, efficiency and ease of use. Wouldn’t you like to try it?”

All guarantees have fine print.  I’m not exactly sure how any storage vendor would make a guarantee without similar requirements (Professional Services engagement, follow the best practices).  That doesn’t mean that the results are not achievable without PS… just that the guarantee requires them for it to be valid.  Let’s go to Craig’s quote above:  “So with all the other tools and restrictions their guarantee only saves another 7% over vanilla RAID 1.”  First off, I assume he meant to put RAID 5 there, and not RAID 1.   Only 7%.  On one hand… it is a 7% guarantee that no one else has.  On the other hand, 50% is the MINIMUM that is guaranteed.  I would love to see statistics in 6 months as to how much over that most customers achieve.

John Martin (NetApp) comments over at the SAN Technologist blog about the math behind it, and shows that without taking parity into account, a 41% reduction is likely.  Of course, 41% isn’t as sexy of a marketing term as 50%.

In the end, though, what this really comes down to is how much does a given vendor’s solution cost to achieve a given result… all of the guarantees really don’t mean anything if 50% of NetApp storage is equivelant (or more) TCO-wise compared to <insert random storage vendor here>’s storage.  Which is why I have to laugh at this tempest in a teapot… not one person mentioned the cost for a given deployment and compared it to the NetApp deduped deployment.  Because, quite honestly, customers really shouldn’t care about what technologies are used to achieve something, just that it is achieved as cost effectively and maintainable as possible.

Which is also why I thought the squawking about XIV using only mirrored storage was misplaced… but that’s one for another day.

Posted in netapp, storage, tech at October 6th, 2008. 1 Comment.

Blocks and Files has 3 articles up discussing IBM’s recently acquired XIV… definitely worth a read if you aren’t up to date with this storage platform.

Posted in ibm, storage, tech at August 4th, 2008. No Comments.

I spent a little time a few weeks ago configuring Cygwin into a usable environment for both AIX administration and storage administration.  The stimulus for this was a lack of command line history for symcli commands.  Our primary Solutions Enabler system at work is a Windows 2003 Server install that happens to have SSHD installed.  I haven’t had luck getting the arrow keys working for previous commands via putty, so I figured I’d try getting shell mode for emacs working and use the emacs history functions.

That didn’t work either… for some reason, the enter key refused to work after a ssh session was established through emacs shell mode… though normal SSH connections worked fine.  I still haven’t resolved that (and I’m not sure I’ll even bother further), but the configuration I ended up with works much better than my previous setup (plain putty).  Since it was fairly tricky to get the system working end-to-end, I figured I’d type it up in case other people were having similar issues.  The final result works as good as plain Putty for general SSH functions (except for a wrapping issue), allows for the use of SMIT menus in AIX, and looks much better than the default Cygwin shell.  The main issue that I continue to have has to do with long command lines… quite simply, the command starts "wrapping" onto itself and becomes a pain to modify. 

First of all, download the Cygwin installer and install any desired components.  I made sure to install the current version of emacs, emacs-el, ssh, and ncurses.  If you want to be able to use ‘clear’ from bash, ncurses is required.  I installed cygwin to c:\utils\cygwin.

Secondly, I modified the /etc/profile to move my home directory away from the system default.  To do this, add:

HOME=”/home/techmute”
export HOME

to the line in /etc/profile right before the line “# Here is how HOME is set, in order of priority, when starting from Windows”.

By changing the default home directory, it allows me to more easily transfer configurations from work to home where the usernames and profile locations don’t match.  After making that change, close out of Cygwin and relaunch it to seed the home directory with the default configuration files.

At the end of the .bashrc file, add the following text:

case $TERM in
    xterm)
    PS1=
    PS2=”> “
    PS4=”+”
    PROMPT_COMMAND=’
pcpwd=${PWD/$HOME/\~}
  [[ "${#pcpwd}" -gt 25 ]] && pcpwd=”..${pcpwd: -25}”
  echo -ne “\033]0;${USER}@${HOSTNAME} ${PWD/$HOME/~}\007″
  green=”\033[32m”; orange=”\033[33m”; cyan=”\033[36m”; off=”\033[m”;
  PS1=”$green\u$orange@$green\h $cyan${pcpwd}$orange \$ $off”
        ‘
    ;;
    dumb)
    PS1=”\u@\h \w \$ “
    PS2=”> “
    PS4=”+”
    PROMPT_COMMAND=
    ;;
esac

This configures the command prompt to look differently based off of whether it is a native bash shell or a bash shell spawned from emacs.  It prevents “ascii garbage” from showing up when you spawn a shell from emacs.  PLEASE NOTE:  I’m using the hacked version of putty as my terminal for this, so if you’re using the “default” Cygwin shortcut, it won’t work quite right.  Also, I borrowed this from somewhere online… if anyone knows the proper attribution, let me know.

Ngai Kim Hoong has an excellent site on configuring emacs for use with Cygwin.  Follow this link and add the portions you’re interested in to your .emacs file.  I would definitely include the shell configuration to launch bash instead of sh from emacs.

After this, the majority of the Cygwin configuration is complete.  The final steps replace the default Cygwin shell with something that doesn’t suck.  First of all, go to Google Code and download puttycyg.  This will allow you to run a modified version of putty as a cygwin terminal.  Secondly, go to this site and download the igvita desert theme.  It is packaged as a registry file containing putty customizations.  You’ll likely want to customize it further to suit your needs (I had to modify the columns and rows).

The end result doesn’t looks a ton better than the default Cygwin, but the wrapping issues still bother me daily when using piped commands.

Posted in aix, emc, linux, tech, windows at July 25th, 2008. No Comments.

Have you needed to consolidate and migrate a filesystem that is spread over 2 physical disks onto 1 physical disk? You can easily do this in AIX without even unmounting the FS.

mjd@techmute mjd $ lspv
hdisk0          00007690a14d9fee                    rootvg
hdisk4          00007690a14cae39                    None
hdisk5          0000769091324b51                    rootvg

There is a FS on testlv that resides in rootvg but has active storage on both hdisk0 and hdisk5. To consolidate and move that filesystem to hdisk4, its as simple as 3 commands:

extendvg rootvg hdisk4
migratepv -l testlv hdisk5 hdisk4

At this point, half of the testlv is on hdisk4, and half is on hdisk0:

mjd@techmute mjd $ lspv -l hdisk4
hdisk4:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
testlv                100   100   00..67..33..00..00    /test

mjd@techmute mjd id13982 $ lspv -l hdisk0
hdisk0:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
[...]
testlv                100   100   39..61..00..00..00    /test
[...]

To finish the consolidation/move, migrate the last half.

mjd@techmute mjd $ migratepv -l testlv hdisk0 hdisk4

mjd@techmute mjd id13982 $ lspv -l hdisk4
hdisk4:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
testlv                200   200   66..67..67..00..00    /test

This is easier than trying to use cplv or restoring a backup onto the new disk.

Posted in aix, storage, tech at July 24th, 2008. No Comments.

Barry Whyte, "IBM Master Inventor", has an article posted discussing what IBM means by "non-disruptive upgrades" for the SVC virtualization appliances.  He states that the upgrades involve no interruption to service or access.  Which is a fairly standard answer for what a non-disruptive upgrade (NDU) is… I’m curious what he means by "in practice users will encounter different perceptions of different products" though.  I haven’t personally encountered vendors with a different view of "non-disruptive."  Barry acknowledges that upgrading drivers/firmware/etc to correspond with an SVC upgrade can make NDUs to the SVC require host downtime. 

Another question would be are there any other portions of IBM that have a different definition for NDUs as well.

"Thus, as a node goes write-through and is upgraded after it flushes any dirty data in its cache, your performance should not degrade and we don’t ask that your stop applications or reduce workload during this time." (Emphasis mine)

I had always assumed that each SVC cluster mirrored the write-cache among paired nodes, but distributed the read-cache evenly… if there isn’t any performance degradation, then I must be mistaken.  I have a few redbooks on SVC I need to go over, maybe I can find something there.

Towards the end, Barry insinuates that SVC clusters can be replaced with upgraded hardware non-disruptively over a weekend… it’d be interesting to see exactly how large of an environment could be done.  I assume that you’d want to pre-stage a ton of zoning and masking changes.

"With the latest 4.3.0 release of SVC software that supports Vdisk Mirroring, even if you have to take a controller completely offline for some disruptive maintenance, SVC allows you to use the Vdsisk Mirroring feature to prepare for this event, offline the controller, fix it, and then only sync back the data that has changed since it went offline. I know of no other product that provides such a dynamic, non-disruptive and time-saving set of solutions to solve problems you or your storage administrators faces on a daily basis."

Wow, one of my favorite features of AIX LVM brought into a storage appliance… very cool, and I’m sure it is a great value to clients that don’t have enterprise storage arrays.

At the beginning of the post, though, Barry takes a jab at EMC’s Burke for ignoring his "loaded questions" about performance yardsticks.  Honestly, Barry admitted his questions were loaded… what did he expect, especially since the quoted specs were internal to the array?  I’m actually surprised that he pointed out Burke’s article, since it highlights that an IBM storage platform doesn’t have the ability to take a non-disruptive upgrade.

Unless, of course, it is behind an SVC…

Hmmmmmm…

Posted in aix, emc, ibm, storage, tech at July 23rd, 2008. 2 Comments.