*************************************************************************
*
* RC5DES *NIX Terminal Spawner	
*
* [Checks Load, Console, and Remote Users]
*
* Version   : 1.0.6
* Date      : 10/22/1999
*
* Designed and "Coded" By:
*
* Brad Mertz: bphantom@xmission.com
*	- Automation Scripts, Crontab, and Documentation.
*
* Chris Grahn: grahn@eng.utah.edu
*	- User Load/Console/Remote Logins Checker Scripts.
*
* Copyright : (C) Brad Mertz/Chris Grahn 1999
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; version 2,
* June 1991.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
* GNU General Public License for more details:
*
* http://www.fsf.org/copyleft/gpl.html
*
*************************************************************************


We would like to say thanks to:
	
Jacek Radajewski - For the idea on how to do remote RSH spawning of
		   specified terminals and not to mention the EXCELLENT
		   disk-less template scripts!
		   http://www.sci.usq.edu.au/staff/jacek/beowulf

Team AnandTech   - For the will to assimilate like mad!  WoooMooo! 
		   http://www.anandtech.com/html/rc5.cfm

Tavish Robinson  - For a few pointers in Perl.

CocaCola Company - For that needed caffeine boost at four in the morning.

------

DISCLAIMER:

***! You MUST HAVE permission to operate the Distributed.Net RC5DES
***! client on any of the machines the scripts spawn.

The scripts included in this package are meant to remotely spawn RC5DES
clients (http://www.distributed.net) on *NIX machines via RSH.  RSH is NOT
a secure program and can be compromised.  For our purposes, a temp account
with minimal access was used.

Note: Even though these scripts are presented as-is, they are still being
fine-tuned.  Chris and I (Brad) would very much appreciate any feedback of
modifications, suggestions or what not on how to make this better.  

Table of Contents:

Section 1 - What Comes In This Package
Section 2 - General Setup
Section 3 - Explanation of the Script Procedures
Section 4 - Overview of the Scripts
Section 5 - Configuring of the Hostnames
Section 6 - Crontab Setup and Usage
Section 7 - Possible Network Congestion
Section 8 - Checkpoint Files
Section 9 - Additional Notes and Thoughts


**** Section 1 - What Comes In This Package:

- A bunch of scripts
- Pre-configured RC5DES .ini files wo/email address (run -config)
- Two README files for your enjoyment


**** Section 2 - General Setup:

Each OS (Sparc, UltraSparc, and X86) has their own home directory for
operation.  Since the buffer files could be incompatible, each OS gets 
their own set of buffers.  The clients then share the buffers that are
specific to their own OS.  I hope that makes sense.... :) (The scripts and
.ini files are already set to this)

The path /rc5/ needs be changed to reflect where you extracted the files 
(usually your home directory).  You MUST EDIT all scripts that came in 
this package, because they all default to this path.

I was originally going to include the binaries of the Sparc, UltraSparc,
and X86 RC5DES clients, but decided it would be in your (and mine) best
interest to download them yourself.  Here are the links to the files:

http://www.distributed.net/clients.html

These are the versions we used:

Sparc      - v2.7112.444 (Solaris 2.x, non-ultra, mt)
UltraSparc - v2.7103.427 (Solaris 2.x, UltraSparc, mt)
Linux      - v2.7111.442 (Linux glibc 2.1, x86, mt)
  
Untar (tar zxvf) the X86 Linux client to a temp directory and

mv RC5DES /rc5/x86/rc5_x86
chmod 700 /rc5/x86/rc5_x86

Delete the rest of the files that were included with the client tarball.
Do the same thing for the remaining Sparc (rc5_sparc) & UltraSparc
(rc5_ultra) clients.

Directory of /rc5/x86/ will show (and similar for the other dir's):

drwx------   1 users       4096 Sep  8 10:32 ./
drwx------   2 users       4096 Oct 22 13:41 ../
-rwx------   1 users     629232 Jul 23 03:30 rc5_x86*
-rw-------   1 users        126 Oct 22 14:21 rc5_x86.ini
-rw-------   1 users         43 Sep  8 10:39 x86_nodes

Any variations of the file names and either the scripts will not work or
you will need to modify the scripts.

Once that is all completed, enter each RC5 directory and execute 
./rc5_* -config (ex: rc5_x86), <1> and add your email address.

You will also want to adjust the block threshold and preferredblocksize
settings.  Save the configuration when you are finished and copy the
rc5_*.ini file to /rc5/backup/rc5_*.ini .  You are now ready to
proceed.


**** Section 3 - Explanation of the Script Procedures:

- Execute one of the OS specific spawners (auto*_up)
- Grab's a hostname from the host lookup file (*_nodes)
- RSH into hostname and execute the Terminal Load Checker script (*_script)
- Terminal Load Checker script checks User Load, Console Login, and Remote
  Logins
- If any specified parameter (load to high, user on Console, to many
  remote logins) is exceeded, script aborts and exits. 
- If specified parameters are meant, script executes RC5DES.

For the LAB environment Chris and I were using, guidelines had to be set
on how the clients were to be used.  The Sparc's were occupied/utilized on
a low/medium basis.  The UltraSparc's were occupied/utilized almost all
the time.  The X86 systems were brand new to the Lab, so public usage was
restricted.

Since the Sparc and UltraSparc's could be occupied/utilized at any time,
the Terminal Load Checker script must be run every half hour.  This meant
the RC5DES client must run for 29 minutes and shutdown.  A minute later, 
a cronjob would restart the Terminal Load Checker script which would
determine if it was safe to reload the client.  The X86 boxes could easily
run for a full hour at which point they shutdown and be rechecked.  

The whole half hour shutdown process was pretty simple with the use of 
the exitrc5.now control file.  Once the file is generated all operating
RC5 clients immediately shutdown.  Very slick!


**** Section 4 - Overview of the Scripts:

Manual testing of the scripts is crucial before setting up your crontab.
The /rc5/automated/ directory contains all the scripts you need to play
with:

These first three files start RSH sessions, which executes their
Terminal Load Checker (*_script) counterparts:

-rwx------ autosparc_up*     - Startup script for Sparc RC5 clients
-rwx------ autoultra_up*     - Startup script for USparc RC5 clients
-rwx------ autox86_up*       - Startup script for X86 RC5 clients

The flush_all starts RSH sessions for each OS type.  Each OS then runs
their corresponding flush_* (example: flush_x86) script:

-rwx------ flush_all*        - Client buffer update > spawns next three
-rwx------ flush_sparc*      - Sparc buffer update script
-rwx------ flush_ultra*      - USparc buffer update script
-rwx------ flush_x86*        - X86 buffer update script

Both scripts create exitrc5.now files in the clients directories.  They
may seem redundant unless you have clients which run longer than others
(which is what we did - 30 min for Sparc & USparc, 60 min for X86):

-rwx------ fullhr_shutdown*  - Client shutdown for **:59
-rwx------ halfhr_shutdown*  - Client shutdown for **:30

These are the main scripts that determine if the client is able to load or
not.  If they fail any of the set parameters, the script terminates and
exits its RSH session.

-rwx------ sparc_script*     - Load Checker Script for Sparc's
-rwx------ ultra_script*     - Load Checker Script for USparc's
-rwx------ x86_script*       - Load Checker Script for X86's


**** Section 5 - Configuring of the Hostnames:

In each OS home directory, there is a file called *_nodes 
(ex: /rc5/x86/x86_nodes).  This is the file auto*_up looks up for
hostnames it can start the client on.

Lets say I have eight X86 Linux boxes with hostnames: zip-01 through
zip-08, zip-03 and zip-04 can not have RC5 loaded.  So I edit
/rc5/x86/x86_nodes and add:

zip-01
zip-02
zip-05
zip-06
zip-07
zip-08

Be sure not to add machines that are incompatible with the installed 
client binary (i.e.: accidentally adding a Sparc hostname into the X86
node list is not good).


**** Section 6 - Crontab Setup and Usage:

Please read README.CRONTAB for more information on how to setup the 
crontab.

At **:00 or **:30, a script (auto*_up) is run which starts up the RC5DES
clients.

At **:29 or **:59 minutes, a script (halfhr_shutdown or fullhr_shutdown) 
is run which creates a exitrc5.now in the specified clients home 
directory.  The clients immediately (quite fast in fact!) shutdown
operations and exit.

For example (this is the default setup for the scripts): 
	
	I want all OS clients to start up at **:00, but I also want the
Sparc's and UltraSparc's to restart at **:30.  So I edit the crontab to
use 0,30 * as startup times for the autosparc_up and autoultra_up scripts.
I edit the halfhr_shutdown script to "touch /rc5/sparc/exitrc5.now"
and "touch /rc5/ultra/exitrc5.now".

	All OS clients must shutdown at **:59, so I add a touch 
exitrc5.now line for each client directory.

	Since there can be over 100 machines munching on blocks, I added
a flush_all script which runs every **:50.  You want to make sure the
buffers are updated (fetch/flush) before the clients are shutdown and 
restarted.  This is very important because I once had at least four
machines all trying to fetch/flush when they were restarted!  I wound up
with locked buffers and lost completed blocks.  ARGH!


**** Section 7 - Possible Network Congestion:

As you can imagine, spawning around 115 computers almost instantaneously
of each other, there is a LOT of network activity (not to mention
server activity) for the first 15-30 seconds!  Now depending who's network
this is, you may get your butt kicked. :)  If "scheduled" network
utilization is not a problem, then you may run into another problem.  With
over 100 RSH sessions opening all at the same time, various errors would
pop up.  From time to time it would be RSH login errors, unavailable
receiving ports (can't remember the exact error), and a few other errors.

To alleviate this problem, the sleep command was used to pause the
script (auto*_up) just long enough to allow the remote login to occur and
startup the RC5 client.  Usually a second or two was inserted and cured
most of the problems.  Unfortunately this brings up another problem: 80
Sparc's with a two second delay, means the final client will be almost
three minutes late in starting up and only have about 25 minutes of
operation (damn :))!

So you need to do some calculating of your available hardware:

	##    Type          KeyRate    ##    Combined   Utilization
	------------------------------------------------------------
	11 X86 boxes    :  1.1Mkey/s x 11 = 12.1Mkey/s  Low Usage
	25 UltraSparc's :  550kkey/s x 25 = 13.75Mkey/s High Usage
	80 Sparc's	:   60kkey/s x 80 = 4.8Mkey/s   Medium Usage

This means my X86 boxes are the most critical for getting started.
So I setup a cronjob with a start time of **:00 and place a two second
pause between each host (autox86_up).  

The UltraSparc's are usually in use, which means the Terminal Load Checker
is going to fail most of the time.  So they are started **:01 & **:30 and
with one second of delay (autoultra_up).

The Sparc's in quantity add up the keyrate and usually have a low/medium
usage, except it would take forever to spawn 80 Sparc's.  So lets start
them ALL instantaneously at **:01 & **:30 and take our chances with RSH
errors.


**** Section 8 - Checkpoint Files:

With the clients starting and shutting down up to 48 times a
day, checkpoint files are critical!  Checkpoint files are enabled by
default by the *_script files.  This makes sure that when the clients come
back online, they will start working on the same block they were 
originally doing before shutdown.  Works like a charm!

Note: If you decide to redo the .ini file, make sure the following line
is set (or deleted all together) in the rc5_*.ini files:

noexitfilecheck=0 

If this is set to =1, the exitrc5.now will fail and the shutdown scripts
will no longer function.


**** Section 9 - Additional Notes and Thoughts:

When the auto*_up scripts start up, they will spawn many RSH sessions
which will be reported in ps.  These will exist until the RC5 client
shutdowns and exits the RSH login.  If you log into one of the remote 
computers when RC5 is running, a ps may show a defunct pid and what
not.  This seems perfectly normal (...in the eye of the beholder) and
they will clear out when the RC5 client shutdowns.

Kill(ing) the pids of RSH on the host machine before the RC5DES client
finishes WILL NOT stop RC5 on the remote computer.  The quickest way to
bring down the node(s) is using the exitrc5.now method.  

Enjoy!
