LTSP for BOINC farm

Hello,

I’ve been following the various threads around on the forum regarding LTSP but cannot resolve an issue. I’m running Fedora Core 2 on an Athlon 2500+, ABit KA7 board with kernel 2.6.x. The onboard NIC is connected directly to my broadband router and picks the IP up by DHCP from there.

I have a second NIC (10/100) card in the machine with a static IP (192.168.2.101) which connects to a switch that I’m attaching diskless machines (ws00x) to (all ABit VA-10 with onboard gfx and LAN, each with 256mb DDR, but different CPU’s).

I’ve been able to configure LTSP completely so that I can boot from the thin clients and get either an X session or command line - they both work. I have set up new users on the server box and can run the X sessions and browse through to the internet etc. Everything is exceptionally speedy!

So the problem is BOINC. I can’t get it to run locally. I can have different copies running on the server (‘top’ shows copies running under different users, but with CPU shares around 33% - when three copies run).

From the commandline on the diskless client I can cd to a /boinc/ws00x where I have placed a copy of the BOINC executable, but when I run it it says another copy is running. I guess I’m still on the server…

I have tried ‘ssh’ from the server to the client but get ‘ssh: connect to host ws001 port 22: Connection refused’ although the fedora firewall has been disabled.

I think that these problems have been resolved by you guys on the forum, but maybe my linux skill is still too much in its infancy! I hope that you can help me out here as I’m anxious to great these diskless machines crunching!

I apologise over the length of the post, but have tried to include as much info as possible. Thanks in advance for any help that you can give me!

There was a thread somewhere that went over this.

Maojc / Mojo / Doubletop are the the folks in the know

See if this thread helps? http://forums.teamphoenixrising.net/showthread.php?t=24635&page=2

You’re running on your server.
The commands to change directory and run boinc, will be something like:

cd /home/hostname
nohup boinc_3.20 -return_results_immediately &

On my setup (Mandrake 10 and LTSP 4.0) these commands go in the server directory /opt/ltsp/i386/etc/rc.d in a file called runboinc - owner root, execute premissions for root.

in /opt/ltsp/i386/etc/lts.conf, section [default] the following line exists

RCFILE01 = runboinc

Just remember what is happening here.
The Linux kernel and routines in the /opt/ltsp/i386 tree are mounted by the client when it starts.
Once the client has started it runs everything it should from here, as controlled by lts.conf

When all the above is complete each client runs either startx or shell - ON THE SERVER. If you can see your client’s copy of BOINC running through ps on the server then it is running on the server.

What we are trying to achieve is make the local copy of Linux on the client, not the server, execute the code. So the instructions to do that have to exist in /opt/ltsp/i386 no?

Check:

/etc/exports - you need to have a dedicated work directory for each client, exported via NFS, mounted by the client.
home directory - each client needs its own directory as BOINC has a lockfile strategy to stop more than one copy running. The directory needs to contain the Linux BOINC executable. Directory owned by root, owner permissions rwx, everyone else read-only.
startup commands - specify SCREEN01 = shell in lts.conf and sit at your xterminal as it boots. It will drop to a shell so you can (a) see what’s running and (b) see what’s mounted on the client.

Good luck, please post back if you need any further help :slight_smile:

What he said. :nod:

This is soooo frustrating!!!

Thanks for the prompt response guys. I’ve been trying to make the changes as stated, but I can’t get it to work. The messages rolling on the workstation screen state that ‘nohup: failed to open ‘nohup.out’: Read-only file system.’

My /etc/exports file reads:

/opt/ltsp 192.168.2.0/255.255.255.0(rw,sync,no_root_squash)
/var/opt/ltsp/swapfiles 192.168.2.0/255.255.255.0(rw,async,no_root_squash)
/opt/ltsp/i386/boinc 192.168.2.0/255.255.255.0(rw,sync,no_root_squash)
/opt/ltsp/i386/boinc/ws001 ws001(rw,sync,no_root_squash)

I set up the /boinc/ws00x/ directories under /opt/ltsp/i386/ and placed the boinc executables in there. These directories are all owner by root - /boinc is 775 and ws00x are 774.

I copied nohup and nice (from an earlier thread) to /opt/ltsp/i386/bin.

I set up in /opt/ltsp/i386/etc/rc.d/ the runboinc file as suggested, but it is pointing to /boinc/ws001 as I have nothing in /home/hostname, and amended the lts.conf file to run the RCFILE01.

I believe I understand what needs to happen, but can’t seem to make it work. Can anyone see where I’m going wrong?

The nohup.out message is hitting you cause you are on a read-only file system, nohup.out is where stdout gets written.

Step 1 Get the platform so you can login to it, telnet or ssh
Step 2. go to the directory you want to run from.
Step3. try and edit a file there, if you can write to the directory you are OK
Step 4 put a copy of Boinc in there and have at it, with the registration process from the commandline
Step 5. fix the rc.d stuff so you have a cd to the proper directory in the rc.d (you may not have used the grave/back quote around your hostname command in the rc.d file

cd /boinc/hostname

ps change the last line in your exports to be /opt/ltsp/i386/boinc

then all the diskless can use a subdir off that mount point.

Ok. Update time.

The console I get on the workstation is bash-2.05b#. The listing of ‘mount’ from here is:

rootfs on / type rootfs (rw)
/dev/root on /oldroot type ext2 (rw)
none on /oldroot/dev type devfs (rw)
192.168.2.101:/opt/ltsp/i386 on / type nfs (ro,v3,rsize=32768,wsize=32768,hard,udp,nolock,addr=192.168.2.101)
/devfs on /dev type devfs (rw)
/proc on /proc type proc (rw)
/dev/ram1 on /tmp type ext2 (rw)

TBH, I’m not sure of your first step in your last reply. From this console I did:

mkdir /mnt/boinc [There is an /mnt directory]
mount server:/opt/ltsp/i386/boinc/ws001 /mnt/boinc [server is the name in /etc/hosts of 192.168.2.101]
cd /mnt/boinc
nohup boinc3.20

and the file ran, but as root (confirmed by ps afterwards).

I also went back and amended the startboinc file so that it now reads:

cd /boinc/hostname rather than cd /boinc/ws001.

That did not help, unfortunately.

I think the problem is to do with the NFS exports as, from the mount command detail above, the directory’s been mounted read-only. I get the following message lines in /var/log/messages:

Aug 29 17:47:25 server rpc.mountd: authenticated mount request from ws001:657 for /opt/ltsp/i386 (/opt/ltsp/i386)
Aug 29 17:47:26 server rpc.mountd: refused mount request from ws001 for /home (/): not exported

Any additional help you can give is greatly appreciated!

you need to access the platform locally to confirm you can get local apps to work, so from the server try

telnet ws001
or
ssh ws001

When you get login do a hostname to confirm the login is to the diskless box.
at that point you will know that the client is ok. then the other steps should be easy to diagnose.

post a copy of your exports file

from the client post a copy of the output from mount

and the rc.d or the rc.local will allways run the code as root as root is running those commands

if you must run as a user then you need to su to the proper user in the rc.d startup file.

Thanks for the reply.

I can’t ssh or telnet to ws001 or 192.168.2.102 (its IP) - it keeps returning ‘connect to host ws001 port 22: connection refused’ Doesn’t this mean that the receiving ‘equipment’ on the workstation is not running? I know that sshd is running on the server.

Thus I can’t run hostname. However, when I work from the workstation, hostname generates ‘ws001’, which is correct. Again, when on the client directly, ‘mount’ generated what I detailed previously.

Here’s a copy of /etc/exports:

LTSP-begin

The lines between ‘LTSP-begin’ and ‘LTSP-end’ were added

on: Sun Aug 29 16:09:21 2004, by the ltspcfg configuration tool.

For more information, visit the LTSP homepage

at http://www.LTSP.org

/opt/ltsp/i386/ 192.168.2.0/255.255.255.0(ro,no_root_squash,async)
/var/opt/ltsp/swapfiles 192.168.2.0/255.255.255.0(rw,no_root_squash,sync)

LTSP-end

My whole intention here is to crunch locally on ws00x using ws00x cpu and ram, but writing to the server HD. From your final comments, I wonder whether that is possible.

Thank you for your replies.

Its possible just needs getting the settings right

i’ll ask a silly question here as i forget where its located, but you have set it to
run apps locally in one of the files? so it attempts to run on the workstartion not
the server.

I suspected as much,

you need to export a rw file system from the server and mount it on the client. Go back and read Mojo post. I think he used the /home directory but really any will do. You could export /boinc from your server and have sub-directories for the individual clients. Then in rc.local you will need to get that volume mounted into the client files system.

No problem we are all using the client CPU & RAM while using the server disk.

Thanks for the replies. I need to check out the other threads again - particularly Mojo’s. Glad to hear that I’m not trying to do something impossible! I think I need to work through all the docs and the threads again over the next few days. I shall be back to report progress and hopefully success!

exports

LTSP-begin

The lines between ‘LTSP-begin’ and ‘LTSP-end’ were added

on: Thu Jul 29 14:59:35 2004, by the ltspcfg configuration tool.

For more information, visit the LTSP homepage

at http://www.LTSP.org

/opt/ltsp 192.168.0.0/255.255.255.0(ro,no_root_squash,sync)

/var/opt/ltsp/swapfiles 192.168.0.0/255.255.255.0(rw,no_root_squash,async)

/home 192.168.0.0/255.255.255.0(rw,no_root_squash,async)

LTSP-end

All my xterms run in their own workspace being /home/hostname Hence I have directories like xterm1, xterm2, xterm3… all are subdirectories of the /home tree and so once this is mounted RW I can see the subdirectories

LTSP 4.0 only mounts “/home” if “LOCAL_APPS = Y”. Useless, now I need NIS. Try editing rc.sysinit

SUCCESS!!! :yippee: Got the diskless machines all running climateprediction locally!

:hail: to Mojo, DT and MAOJC :hail:

It was a combination of all your previous posts and the advice here that finally found its way into my thick head and I was able to work it all out.

I’ll try to put some sort of document together setting out what I did, but the final piece was as Mojo said - set up a separate directory for BOINC, export it rw and then mess around with rc.sysinit to get it to mount (running LTSP 4.1 under FC2).

I’m sure I had to play around with something else as well (oh yeah it was the IP masquerading under IPTABLES), but I’m just chuffed now to have three machines booting diskless under Fedora and crunching locally. They’ve been given separate host ID’s under Climate, so just need to cross fingers that they will report ok tomorrow.

:clap: :clap: :clap:

YEAH BABY YEAH :woot: :clap:

Superb, absolutely superb :banana:

DT.

Way kewl, the LTSP vortex takes another one in to the land of madness! :nod:

You know what I really love, the knowledge is spread around, Now if divebod will tell 2 people and they tell 2 people … :scared:

You know what else I love, Sir Willy of Gates loses another round. :smackbum:

Sounds like TPR is becoming the LTSP portal :slight_smile:

Kudos to the LTSP gang - MMD or MDM or DMM, shame eithe Major or Mojo doesn’t change their nick to something like WAOJC or Wojo, then it could be the WMD crew :smiley: :stuck_out_tongue: