Looking to monitor the nodes..

Peige · 22 June 2005 13:14

Just wondering what the best method is for monitoring the headless nodes.

Bonic Manager doesnt seem to work, not sure about VNC atm… won’t really do what i want anyway.

Once i start to build up i obvously want to know that all the boards are doing what they should and see some progress but i can’t see how i can do that atm

DoubleTop · 22 June 2005 13:30

from memory from how we set things up

from your server desktop, open a terminal up; then

tail /home/ws001/boinc/nohup.out -f

ctrl C to exit

DT.

Mojo · 22 June 2005 13:45

I use an old Windoze box (Pentium 450) hooked into the LTSP side of the server with BOINCView on it.

The tail command works too.

I believe the Major had scripts for monitoring his - how about it MAOJC? Anything the rest of us could use?

Peige · 22 June 2005 14:00

I’ve used boinc view before but last time i used it i did using the \machinename\boinc-directory\ method to look at the csv files… i know it has a better method of monitoring it but never got that too work.

I’ve downloaded it again so i’ll have a look but am i right in thinking that this will only work if the windows machine is plugged into the ltsp slde of the switch ?

Will also try your method DT, see what that looks like. I’ve left it running today and would like to see how things are going so to speak.

MAOJC · 22 June 2005 14:19

I setup a linux/apache web server and port forwarded it to the outside world. I then used a combination of ssh and rsh to just fetch the data. It would be easy to use a web server right from the LTSP server and just fetch the data from the local boxes home directories. I also wanted to know some other stuff though like uptime and CPU load. Those you need to get from the OS itself and need remote access. Mine is setup for D2OL but a boinc setup is about the same kind of stuff.

If I get some time in the next couple of days maybe I will do a Mojo style How-to if there is some interest.

You can see a sample @ http://the-linux-guy.com/

Peige · 22 June 2005 14:24

I’m very interested in that

Although you will need to bare in mind :amstupid:

You running D2ol on ltsp nodes ? any chance of a few pointers on that aswell

MAOJC · 22 June 2005 14:41

Well your first step is to get a apache web server runing on the box. Get over that hurdle and your half way there. All I do is build a scripted index page every 2 minutes off a script that is fired from cron.

So now work on a apche web server and get the default index page to show.
I suggest you start here http://httpd.apache.org/docs-2.0/

MAOJC · 22 June 2005 14:47

It is no different than any other DC project that runs from CLI. You need to segregate each nodes working directory on the server, and the start script for the cli should contain a path variable that refers to the hostname.
like in a ltsp environment with the default naming conventions a directory can be setup off the root users home dir. The operational settings are in a control file in the nodes directory.
/root/ws001/SengentD2OL…

in the start script it would look like this

cd /root/hostname/SengentD2OL…
D2OL controller:file

Peige · 22 June 2005 14:56

Thats me taken care of for a few days then :chuckle:

Egad · 22 June 2005 20:36

Also interested in the how-to

MAOJC · 25 June 2005 16:35

In the quest to keep track of farm activity I have developed a little How-to on setting up a web based farm monitoring system.

First some definitions.

Web server – the place to collect, collate and build the html page. This could be any linux/unix platform but will probably be your LTSP server.
Monitored Client – The platform to make the collect from
SSH – secure shell protocol to make the call from the server to the client.
RSH - shell protocol to make the call from the server to the client. (non-secure)
Apache – the web server software

First an operational web server needs to be installed. The base Fedora distros contain a version if Apache. The re are plenty of basic How-to floating about on getting a Apache up and running and so I won’t delve into that at all. Suffice it to say you will need to be able to see the default Apache index.html page before proceeding.

In order to collect the data a communications method must be employed between the web server and the client. I suggest using SSH. This requires the exchange of keys between the platforms.

http://kimmo.suominen.com/docs/ssh/ provides and excellent reference to get SSH operational for a specific user and does more justice to the subject than I could here. Pay attention to the sections on Key generation, exchange and remote command execution.
Make sure a simple shell command can be run from the server to the client.
Run from the server ssh <client_hostname> ls / This needs to return a listing of the remote host root directory without prompting for password.

The client scripts
A simple method of delivery and parse is employed to generate the data on the client machine Here is an example of the script that is run on the client, It returns a single line of data to the calling program on the server. It is saved in the user directory of the D2OL client box, in my case root. It may look complicated at first glance but is really a very simple program.


#!/bin/bash
D2OL="/root/SengentD2OL/D2OL"  #Home dir for the D2OL client
CAN=""	# Initialize some variables
TARG=""
LINE=`uptime` # Populate the LINE variable with the output of uptime command
# this sections determines the different variations in the way uptime reports time.
if [ `echo $LINE | grep day | wc -l` -eq "1" ]  
then
        if [ `echo $LINE | grep min | wc -l` -eq "1" ]
        then
                DAY=`echo $LINE | cut -f3 -d" "`
                HOUR="0"
                MIN=`echo $LINE | cut -f5 -d" " `
                LOAD=`echo $LINE | cut -f11 -d" " |cut -f1 -d","`
        else
                DAY=`echo $LINE | cut -f3 -d" "`
                HOUR=`echo $LINE | cut -f5 -d" " | cut -f1 -d ":"`
                MIN=`echo $LINE | cut -f5 -d" " | cut -f2 -d ":" | cut -f1 -d","`
                LOAD=`echo $LINE | cut -f10 -d" " |cut -f1 -d","`
        fi
elif [ `echo $LINE | grep min | wc -l` -eq "1" ]
then
        DAY="0"
        HOUR="0"
        MIN=`echo $LINE | cut -f3 -d" " `
        LOAD=`echo $LINE | cut -f9 -d" " |cut -f1 -d","`
else
        DAY="0"
        HOUR=`echo $LINE | cut -f3 -d" " | cut -f1 -d ":"`
        MIN=`echo $LINE | cut -f3 -d" " | cut -f2 -d ":" | cut -f1 -d","`
        LOAD=`echo $LINE | cut -f8 -d" " |cut -f1 -d","`
fi
# This section gets the pertinent data from the d20l.out.txt file
ID=`head -3 $D2OL/d2ol.out.txt | grep "Node ID:" | cut -f4 -d" "`
tail -12 $D2OL/d2ol.out.txt > /tmp/tail.out
LINE=`cat /tmp/tail.out | grep "Analyzing Candidate" `
CAN=`echo $LINE | cut -f3 -d" "`
TARG=`echo $LINE | cut -f2 -d":"`
LINE=`cat /tmp/tail.out | grep "Tasks Assigned:"`
ASS=`echo $LINE | cut -f3 -d" "`
LINE=`cat /tmp/tail.out | grep "Tasks Downloaded:" `
DL=`echo $LINE | cut -f3 -d" "`
LINE=`cat /tmp/tail.out | grep "Results to Upload:" `
RES=`echo $LINE | cut -f4 -d" "`
LINE=`cat /tmp/tail.out | grep "Completed Candidates:" `
COMP=`echo $LINE | cut -f3 -d" "`
# this printf statement returns the data in a form that can be parsed at the server. Note
# the quotes around $TARG, this is required cause it contains spaces that printf would
# otherwise interpret.
printf "%s %s %s %s %s %s %s %s %s %s ;%s
" $DAY $HOUR $MIN $LOAD $ID $ASS $DL $RES $COMP $CAN "$TARG"
exit 0

The output looks like this:


6 20 04 0.98 114747 242 243 7 12834 IBS_STOCK3S-52575 ; Smallpox Target I

The next step is to get a web page format that you like and design the elements in the page that you prefer. In my case I just used a word processor to design a page and marked a place to a set of table entries. You will cut this file at the table marking into 2 different files. It will all make sense soon.

Now create a directory to run the server scripts from. Then place the following script in that directory. I used the server cgi-bin directory and called the script status.sh


#!/bin/bash
printf "<CENTER><TABLE BORDER="7" CELLPADDING="10"\ CLASS=\"boldtable\"><TH><FONT SIZE=10>%s</TH></TABLE></CENTER>
"\ "`date`" >./tmp.html
printf "<CENTER><TABLE BORDER="7" CELLPADDING="10" CLASS=\"boldtable\">
">>./tmp.html
printf "<TH COLSPAN=\"12\"><FONT SIZE=8>Statistics</TH>
" >> ./tmp.html
printf\ "<TR><TH>Host</TH><TH>Up<br>Days</TH><TH>Up<br>Hours</TH><TH>Up<br>Min</TH><TH>Load<br>Average</TH>
" >> ./tmp.html
printf\ "<TH>Node<br>ID</TH><TH>Tasks<br>Assigned</TH><TH>Tasks<br>Download</TH><TH>Results<br>Upload</TH><TH>Completed</TH><TH>Candidate</TH>\
<TH>Target</TH></TR>
" >> ./tmp.html

# the file boxes.ssh contains the names of the clients to access
for i in `cat ./boxes.ssh`
do
ping -c 1 $i 2>&1 > /dev/null # ping if up get the data 
if [ $? -eq "0" ]
then
        LINE=`ssh $i ./get_d2ol.sh` # the call to the client box 
        # chop up the input into separate variables
        DAY=`echo $LINE | cut -f1 -d" "`
        HOUR=`echo $LINE | cut -f2 -d" "`
        MIN=`echo $LINE | cut -f3 -d" " `
        LOAD=`echo $LINE | cut -f4 -d" "`
        ID=`echo $LINE | cut -f5 -d" "`
        ASS=`echo $LINE | cut -f6 -d" "`
        DL=`echo $LINE | cut -f7 -d" "`
        RES=`echo $LINE | cut -f8 -d" "`
        COMP=`echo $LINE | cut -f9 -d" "`
        CAN=`echo $LINE | cut -f10 -d" "`
        TARG=`echo $LINE | cut -f2 -d";"`
        INT=`echo $LOAD | cut -f1 -d"."` # integer part of load avg. from uptime
        DEC=`echo $LOAD | cut -f2 -d"."` # decimal part of load avg. from uptime
        if [ $CAN == ";" ] # in if the output is empty the client is switching to a new WU
        then
                TARG="Switching"
                CAN="Switching"
        fi
        if [ $INT -eq "0" -a  $DEC -lt "50" ] # if the load avg. is to low print in red
        then
        printf "<TR BGCOLOR=\"#FF0000\"><TD><FONTCOLOR=\"#FF0000\">%s</FONT></TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD>
" $i $DAY $HOUR $MIN $LOAD $ID>>./tmp.html
        else # print in normal
        printf "<TR><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD>
"$i $DAY $HOUR $MIN $LOAD\ $ID>>./tmp.html
        fi
        printf "<TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD></TR>
" $ASS $DL $RES $COMP $CAN "$TARG" >>./tmp.html
else # if the ping failed then print the down message in RED
        LINE="DOWN"
        DAY="DOWN"
        HOUR="DOWN"
        MIN="DOWN"
        LOAD="DOWN"
        ID="DOWN"
        ASS="DOWN"
        DL="DOWN"
        RES="DOWN"
        COMP="DOWN"
        CAN="DOWN"
        TARG="DOWN"
        printf "<TR BGCOLOR=\"#FF0000\"><TD><FONT COLOR=\"#FF0000\">%s</FONT></TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>\
%s</TD><TD>%s</TD>
" $i $DAY $HOUR $MIN $LOAD $ID>>./tmp.html
        printf "<TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD><TD>%s</TD></TR>
" $ASS $DL $RES $COMP $CAN "$TARG" >>./tmp.html
fi
done
printf "</TABLE></CENTER>
" >> ./tmp.html
printf "</BODY> </HTML>
" >> ./tmp.html
# done making the Bottom half of the html page
cp ./top.txt ../html/index.html # copy the top half of the page
cat ./tmp.html >> ../html/index.html # cat the dynamic portion on the ass end.
exit 0

My example of top.txt that came from my design page.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
        <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=utf-8">
        <TITLE>The Major Strikes Again</TITLE>
        <META NAME="GENERATOR" CONTENT="OpenOffice.org 1.1.2  (Linux)">
        <META NAME="CREATED" CONTENT="20050101;8035300">
        <META NAME="CHANGED" CONTENT="20050101;11075700">
        <STYLE>
        <!--
                @page { size: 8.5in 11in; margin-left: 1.25in; margin-right: 1.25in; margin-top: 1in; margin-bottom: 1in }
                P { margin-bottom: 0.08in }
        -->
        </STYLE>
</HEAD>
<BODY BGCOLOR="grey" LANG="en-US" DIR="LTR">
<P ALIGN=CENTER STYLE="margin-top: 0.17in; margin-bottom: 0.2in; page-break-after: avoid">
<A HREF="http://www.wunderground.com/US/AZ/Phoenix.html"><IMG SRC="http://banners.wunderground.com/banner/bigwx_both_cond/language/www/US/AZ/Phoenix.gif" NAME="Graphic2" ALT="Click for Phoenix Arizona Forecast" ALIGN=BOTTOM WIDTH=468 HEIGHT=60 BORDER=0></A><BR><FONT SIZE=7><FONT FACE="Albany, sans-serif"><FONT COLOR="#0047ff">
<MARQUEE SCROLLAMOUNT=5 SCROLLDELAY=10>Farming for TPR on D2OL</MARQUEE></FONT></FONT></FONT></P>
<P ALIGN=CENTER><SPAN ID="Frame1" DIR="LTR" STYLE=" width: 8.73in; height: 6.55in; border: none; padding: 0in; background: #ffffff">
        <P STYLE="margin-top: 0.08in"><IMG SRC="farm.jpg" NAME="Graphic1" ALIGN=CENTER WIDTH=100% BORDER=0><BR ><BR><BR>
        </P>
</SPAN>

MAOJC · 25 June 2005 16:46

Now you need a cron tab entry
run crontab -e as the user you want to run the script and paste this entry.


0-58/2 * * * * cd /var/www/cgi-bin;./status.sh >> /tmp/status.log 2>&1

The script will run every 2 minutes from the 0 minute to the 58 minute and log any extraneous output to the file /tmp/status.log

MAOJC · 25 June 2005 17:07

Just a little more input.

be aware of where your D2OL data file is (d2ol.out.txt) and edit the client script accordingly, You may need to use the hostname in the directory name to get to the right dir.
My web server is installed in the default location /var/www Yours may be different.
SSH must run with no password, this is accomplished by transfering the key files between the server and the clients to the proper .ssh location on each platform. (use the SSH link :eek: ) and test it first, that is probably the biggest hurdle to overcome. ssh <client_name> ./get_d2ol.sh should return a line similar to the example given.
this can easily be used with RSH if that is your preference, just change the ssh call to a rsh call and have the .rhosts setup right with the rsh rpm installed. It is then not secured but if your farm is in a isolated network then that is no issue really.
alternatively you could setup cron on each client and run the get script. Then an ftp job could be substitied on the server to get the output(s). This is a rather major alteration of the collection method. but really could be more secure as the data on the client could be placed in a chroot directory to limit the remote access to the machine. You would need to get ftpd running on your client.

Remember this is an example, it could easily be altered for a BOINC setup. The key to doing that resides in the client script collection, the return data line protocol (what data is in what line position), and the server parsing of the data. The concepts of tabular output would remain the same.

Peige · 26 June 2005 07:45

Thanks for taking the time to do that Major

I’m going to make the effort to get D2ol running this week and have a go at this

DoubleTop · 29 June 2005 12:32

in answer to your PM MAOJC, yes SSH to the clients is there by default to the nodes, I have quite happily got a ls listing, although it prompted me for a password. I did have to use ip address rather than hostname though, but not an issue as the farm are all fixed anyway.

The link you provided on this shows how to retain the keys in various ways, which way would you suggest, security is not really an issue as its only a farm doing nothing special.

DT.

MAOJC · 29 June 2005 13:00

I sugest that you gen the public key and manual move it to the target placing it in the targets file, I will get the AMD64 back running later today or tomorrow and get the exact location for the target key. the Office is still in disarry! I still have 3 boxes to get moved to the new location. Pictures over the weekend jsut to prove it can be neat and clean!

PS me like my wireless bridge. this could be great for remote location farming.