Netboot Mailing List (by thread)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Etherboot 4.5.5 released




Klaus Espenlaub wrote:
> 
> First a general note on the feature request by Hans-Peter Jansen.  I don't
> know what one would gain if etherboot would reboot after (say) 3 minutes without
> dhcp replies.  It would just restart from the beginning.  And it would piss
> me off, because if someone in the department (which usually is me, but
> might be someone else) decides to reboot our server, it would cause those
> suggested etherboot reboots to be triggered if someone decides to boot one of
> the diskless machines right at that time.  The server needs at least 5 minutes

SCSI sucks at boot time ;-)

> for a clean reboot, not to mention the estimated crash reboot time of 40
> minutes.  It will irritate the users needlessly and they will end up in my

I can imagine that...

> office with wild guesses ranging from power surges, broken power supplies or
> even thermal problems of the mainboard and/or the CPU.  No thanks.  What's
> wrong with the current exponential backoff behavior?  If we assume the server is
> unavailable for 10 minutes, it just means that on average the backoff counter
> is at 7, so it will take up to 17 minutes (on average only half of that) to the
> next try.  This is a bit long, but the users can force an instant retry if
> they press ESC, as Ken said.  That's what the bush drum generally takes care of -
> the users will find out very quickly that the server is back online.

A hint would be useful. 
 
> But while looking at it I spotted fishy code in etherboot: if you look at the
> code in main.c: load(), there is an unhandled case when bootp() fails and
> EMERGENCYDISKBOOT is not set - it just pretends that it found something...
> bootp() fails after 20 retries.  Not that this happens frequently - it takes
> on average 1165 hours (48.5 days) to get to that point, but it might happen.

That's exactly, what happned to me. In this case, it was provoked by an erroneous 
3c900 combo nic. (the card failed to work lately in other systems, too)
This is a rare case, indeed. Network trouble could lend to that, too.

> Not that this is in any way related to the feature request except that after
> a while it really takes ages to the next retry.  Maybe the RFC951 sleep should
> be limited to an exponent of 10 (which would give approx. 34 minutes retry
> interval, which shouldn't be a strain on the network even for a large network
> of machine desperately waiting for the dhcp/bootp server to answer their
> request).  Anyone volunteers?

What about a configurable max. backoff time? I really big environments, 15 min.
would be sufficient. In small segments, I think 5 minutes is practical value.
I will look into it the next days...

Meanwhile, I would like to thank You all for your great work. I very curious 
about nfs net booting. 
 
Greetings


===========================================================================
This Mail was sent to netboot mailing list by:
hpj.lisa@t-online.de (Hans-Peter Jansen)
To get help about this list, send a mail with 'help' as the only string in
it's body to majordomo@baghira.han.de. If you have problems with this list,
send a mail to netboot-owner@baghira.han.de.



For requests or suggestions regarding this mailing list archive please write to netboot@gkminix.han.de.