reddragdiva: (Default)
[personal profile] reddragdiva

I'm having severe problems net booting Sun machines.

We have two identical Netra T1-105 here. Call them box1 and box2. We want to upgrade them from Solaris 7 to Solaris 9. So we set up a boot server on box1 using the Solaris 9 12/02 CDs and installed box2 from it. This worked fine.

Now I've set up a boot server on box2 to upgrade or reinstall box1 (and various other boxes) from. It doesn't seem to work. When you do boot net - install on box1, it comes up with:

box1 # reboot "net - install"
Jan 12 15:20:50 box1 reboot: rebooted by root
/usr/opt/SUNWmd/sbin/mdlogd: going down on signal 15
/usr/opt/SUNWmd/sbin/mdlogd: going down on signal 15
Jan 12 15:20:50 box1 snmpdx: received signal 15
Jan 12 15:20:50 box1 rpcbind: rpcbind terminating on signal.
Jan 12 15:20:50 box1 syslogd: going down on signal 15
syncing file systems... done
rebooting...
Resetting ... 

Executing last command: boot net - install                            
Firmware Password: 
Boot device: /pci@1f,0/pci@1,1/network@1,1  File and args: - install
24000 boot: cannot open kernel/unix
Enter filename [kernel/unix]: 
boot: cannot open kernel/unix
Enter filename [kernel/unix]: 
boot: cannot open kernel/unix
Enter filename [kernel/unix]: 

Nothing will shift it from there. Enter anything you like, it'll just repeat the error message that it can't open it.

In the PROM on box1, boot-file is set to blank, as it should be. Haven't checked diag-switch.

I confirmed that rpc.bootparamd and in.tftpd are running on box2. I have also accessed box2 by tftp and NFS to check it's possible. Here are the contents of /tftpboot:

box2 # ls -l /tftpboot
total 296
lrwxrwxrwx   1 root     other         26 Jan 11 02:01 AC150007 -> inetboot.SUN4U.Solaris_9-1
lrwxrwxrwx   1 root     other         26 Jan 11 02:01 AC150007.SUN4U -> inetboot.SUN4U.Solaris_9-1
-rwxr-xr-x   1 root     other     131424 Jan 11 02:01 inetboot.SUN4U.Solaris_9-1
-rw-r--r--   1 root     other        314 Jan 11 02:01 rm.172.21.0.7
lrwxrwxrwx   1 root     other          1 Jan 11 02:01 tftpboot -> . 

We don't have other machines spare to test on at this time, nor can we easily reboot box2 for now.

As far as I can tell from checking the Usenet archive, the "boot: cannot open kernel/unix" message means the boot kernel sent over doesn't actually support the hardware in question. But that doesn't explain how I got box2 installed from box1 using Sol9 12/02 in the first place — the machines are identical.

Any ideas? I found one person on the Web with this precise problem, but he didn't get any responses. (I've emailed him to ask in any case.)

I must run a snoop on the subnet. (Traffic for and from box1 only, that subnet has the web servers on it!) I've also downloaded the Sol9 04/04 CDs in the meantime, though I'm not sure trying again with those will help and it's a bit much work for the sake of it.

Update: Now working after rebooting the boot/install server.

(no subject)

Date: 2005-01-12 04:21 pm (UTC)
From: [identity profile] pir.livejournal.com
snoop (or preferably ethereal) is your friend with obscure netboot problems. See what it's actually trying to do.

(no subject)

Date: 2005-01-12 04:41 pm (UTC)
From: [identity profile] sweh.livejournal.com
Could other machines on the same subnet be doing rarp or bootparam or tftp stuff and so you're not actually booting from the server you think you are? As Peter sayd, snoop is your friend so you can see where traffic is going.

(no subject)

Date: 2005-01-12 04:49 pm (UTC)
drplokta: (Default)
From: [personal profile] drplokta
Solaris 9 12/02 should work fine on Netra T1s, which are pretty old hardware.

Can you run tcpdump on box 2 while box 1 is trying to boot, and see what's happening with the network traffic? Is box 1 making a tftp connection? Is it getting sent the kernel image?

(no subject)

Date: 2005-01-12 05:08 pm (UTC)
drplokta: (Default)
From: [personal profile] drplokta
How about this advice, below? What are the PROM settings on box1?


boot: cannot open /kernel/unix(SPARC based systems only)

Cause: This error occurs when you override the location of the boot - file by explicitly setting it to /kernel/unix.

*

Solution: Reset the boot -file in the PROM to " " (blank).
*

Ensure that the diag-switch is set to off and to true.

A very vague memory....

Date: 2005-01-12 06:16 pm (UTC)
From: [identity profile] norikos-author.livejournal.com
Says I ran into this when I had a typo in the boot command (Missing space between - and install? Extra space? It's been long enough I can't remember which way it should be.)

(no subject)

Date: 2005-01-12 09:51 pm (UTC)
From: [identity profile] http://users.livejournal.com/_nicolai_/
Are you quite sure the NFS root is shared out to this host such that the host can read it all?

(no subject)

Date: 2005-01-12 10:03 pm (UTC)
From: [identity profile] http://users.livejournal.com/_nicolai_/
For netboot it must also be shared with anon=0 in the mount options.
You'll be wanting a readonly export too.

(no subject)

Date: 2005-01-12 10:04 pm (UTC)
From: [identity profile] http://users.livejournal.com/_nicolai_/
I mean for jumpstart net install it must be shared with anon=0

(no subject)

Date: 2005-01-12 10:49 pm (UTC)
From: [identity profile] http://users.livejournal.com/_nicolai_/
Does the architecture you specified with add_install_client (or wrapper) match that of the actual machine and are boot bits for that architecture present on the server? If another machine of same architecture can boot from the server, then the latter is probably true.

BTW

Date: 2005-02-21 08:00 pm (UTC)
From: [identity profile] drstein.livejournal.com
I wanted to say "Thanks" for not only leaving this public, but updating it to reflect the solution... AND not just linking to some other article. :)

Frustration levels increase when you find:
* People asking questions, and the ONLY responses are "Google for it."
* People asking questions and there are NO responses
* People asking questions, solving it, but linking to some $online_mag or $blog entry that ends up becoming a 404.

That's one cool thing about Livejournal - you can post problems *and* solutions in the same fun place for others to find them when Googling for strange stuff. :D

Oddly enough, I found this when looking for something about mdlogd, but hey, I wanted to thank you anyway for leaving useful information on the internet for others to find. That's what the net is for, right? :)