Setting up a K12 LTSP Diskless Farm

DoubleTop · 8 May 2005 21:17

The TPR K12 Ltsp Crunching Farm Setup Guide

Introduction

What is exactly meant by a diskless setup? Well you have one reasonably powerful and quick server that simply provides the disk for each computer connected to it to write to. This server does not need to be rocket quick either; the key is RAM and hard disk speed. For distributed computing the diskless setup is incredibly efficient. With no hard disk, no case and minimal components you can create a large server farm on a small budget.

So what do I need?

One server with a good amount of RAM, a raid array for speed or for redundancy, two network cards in the server, a good switch and a couple of hours spare.

For the guide writing, I have started a second LTSP server using a XP1800, 256Mb ram and an ATA66 10 GB Quantum drive and all runs fine. For the client machines, you need, board, cpu, ram and psu. Most boards have onboard networking that support PXE boot – but even if they don’t the price of the node only goes up by one floppy drive.

Let’s get started!!!

First things first. Make sure you have the time in one go is my recommendation – if you go away mid install and come back its is likely that you will spend an age finding where you got to – this I advise due to experience
I would advise using the K12 LTSP iso as simply it does so much of the install for you, it takes away a lot of the hardship than can be encountered if starting from for example a RH9 install and the ltsp download. K12 expects two network cards, does all the routing and firewall stuff needed for you – if this is your first effort into linux, you will appreciate that.
www.k12ltsp.org

For the install I recommend you will only need the first three cd’s to install, that is for what I would call a pure server install for crunching, that’s what this guide is about.

With your server and client machines assembled, time to get into things. K12 makes it as easy as putting in the cd and making sure the bios is set to boot from CD. I tend to skip the media check – as they take a long time to complete. Set your language and keyboard settings and when asked the type of install select the LTSP option. The next screen will then sort out your networking, in true wizard style just click next. Now you will be presented with partitioning options, once again my advice is to click next. At the point where you get asked if you wish to now modify/view the install packages – do so. The big one to remove is openoffice from office/productivity and from the very bottom the educational section. All 32 items should be installed in the ltsp section.

Sit back – browse the net and drink coffee – the install takes about twenty minutes.

First Stage Completed

DoubleTop · 8 May 2005 21:21

So you now have your familiar Fedora Core 2 desktop screen. If you wish to depending on your “client” or “node” you may be able to boot with a graphical screen straight away!!! But that’s not what we want, because that means you are using the main server to do the processing. One of the first things to do is identifying which one of your network cards in eth0 and which one is eth1. I would seriously advise writing which is which on the backplate of the lan cards, eth1 is your WAN connecting to the internet and eth0 is the farm dhcp interface. You literally have a 50/50 chance of getting the right one. On first boot plug ONE in and find your ip address, if you have 192.168.0.254 then its eth0, anything else and you’ve plugged in eth1.

Fire up your client machine. Personally I like to use the nforce2 chipset for my nodes as they support everything needed. VIA boards still do the job. This next part can get quite complex.

There are two types of network card – those with a “boot rom” and those without. Boot roms broadcast over the network that it exists and basically says “someone please send me something to boot”. This is BOOTP booting; the K12 server will recognise this and send it a reduced kernel that can start the extended kernel operating. I’m am not that good at explaining this so I advise you take a look at www.k12ltsp.org/clients.html as that explains it far better. Nforce2 boards PXE boot using the zpxe file specific for the network card. The zpxe files are found on the site www.etherboot.org I would recommend downloading these into a new folder in /tftboot/lts/pxe/bootstrap/*.zpxe

Telling the server which card to look for

Next step – you have the zpxe file for you network card and everything is all connected up, so how do you tell the server to give that file to that specific machine. Answer – you assign a static ip based on the MAC address of the network device. This done through the dhcp.conf file that is found in /etc/dhcpd.conf
This file is very important to the operation of your diskless farm and must be got right. You will find that the K12 install has placed a nice set of examples within the dhcpd.conf file that are commented out using “#”. I’ve always found that I’ve needed to put my isp’s DNS value in the line

option domain-name-servers         XXX.XXX.XXX.XXX

It does not matter if your existing network users the same subnet 192.168.0 because you are specifying the network on eth1 that is not taking a blind bit of notice to your existing network.

So to create an entry for your first ltsp client called ws001
After the line “option log-servers 192.168.0.254”; enter the following


host ws001 {
	hardware ethernet 		XX:XX:XX:XX:XX:XX; #MAC ADDRESS
	fixed address			192.168.0.1;
	if substring (option vendor-class-identifier, 0, 9) = “PXEClient” {
	  filename “lts/pxe/bootstrap/<your downloaded zpxe file>”;
	}else if substring (option vendor-class-identifier, 0, 9) = “Etherboot” {
	 filename “lts/vmlinuz.ltsp”;
	option vendor-encapsulated-options 3c:09:45:74:68:65:72:62:6f:6f:74:ff;
	}

Once done to check for syntax errors (normally a missing open or close {} ), restart dhcp by

 /etc/rc.d/init.d/dhcpd restart

So let’s break down the above into what it is doing. The dhcp server has got a call from a machine saying it wants a dhcp licence. So the server checks, do I know this machine (the MAC address) and if so what shall I do. Well first off – here you go have a static ip address, next one, you’ll be asking using “PXEClient” so therefore here have a boot pxe, then using that pxe which wil now broadcast “Ethernet” boot the tftpboot kernel and start the fire up for the real thing.

If you boot your client at this stage you will now get the fixed ip and still boot into a graphical interface. Have a play by running both at this point, have your server running either top from command line or system information, and see that whenever you request something to be done on your client, the cpu that is being used is the servers. Not what we want for crunching.

Next file to edit – the “exports” file in /etc. There is a nice line that says “the following needs to be uncommented to use local apps”. Yup – you guessed it, uncomment the line.

We are finished configuring the servers “real” files, the files in etc are owned and used by the server. In the folder /opt/ltsp/i386 are the files that are used by the nodes. Now not all of the files required for crunching are there, but we will cover that later.

Next file = rc.sysinit that is found in /opt/ltsp/i386/etc

Look for the section that is labelled “Mount Filesystems”. Now the little if routine in there is saying, if local apps is set to use, then mount the home folder. We know we want the home folder mounted, so simply copy the line “mount –t nfs –o nolock……” and paste it outside the loop, the line after “fi” will do. In K12 that’s the only edit required to rc.sysinit.

In the same directory we have the file “lts.conf” this file drives the building blocks of the operating system that is being created for you diskless client, so this is the next one to edit. On a default install will see a lot of examples that are all commented out, including a setup for using local apps for the machine [ws001]. Insert the code below into the file

[ws001]
	LOCAL_APPS	= Y
	LOCAL_WM		= Y
	NIS_DOMAIN 	= ltsp
	NIS_SERVER		= 192.168.0.254
	SCREEN_01		= shell
	RUNLEVEL		= 3
	RCFILE_01		= start_boinc

So we’ve told the server to send a kernel to the diskless node and now we are setting the parameters of that kernel. Runlevel 3 is console, we are running dedicated crunchers, so why risk losing a few clock cycles to graphics

Now we need to have the “mounting” directories sorted. We are mounting home, so for each workstation we need a directory to run boinc. Firstly in the /opt/ltsp/i386/home directory create a dir for each node, named the same as the name in the lts.conf
So we have
/opt/ltsp/i386/home/ws001
/opt/ltsp/i386/home/ws002
/opt/ltsp/i386/home/ws003

We do not need anything else in here, as we are using the real /home for the data, this is the mount point. (that may be technically incorrect – I’m not too sure tbh)

So now in /home create the same (nearly)
/home/ws001/boinc
/home/ws002/boinc
/home/ws003/boinc and so on……

In the boinc folder place the most recent boinc file, you do not need to put in the boinc manager into each directory. In the boinc folder also create a “remote_hosts.cfg” file that has a single line in it with “192.168.0.254”, this allows the boinc manager running on your server to monitor the boinc client.

Now to start boinc on machine bootup we take a quick look back at the lts.conf and the RCFILE_01 entry. Between this and the rc.sysinit we are saying on boot run the file “start_boinc” in the /opt/ltsp/i386/etc/rc.d folder. In that folder you will find a sample file, do a copy and then paste to create “sample(copy)” and then rename it to “start_boinc”. Edit this file to include


echo “Starting Boinc”
cd /home/${HOSTNAME}/boinc
rm nohup.out
rm lockfile
nohup ./boinc –run_cpu_benchmarks –allow_remote_gui_rpc &

nohup directs all output from the command boinc to the file “nohup.out”, the “&” on the end keeps it quiet from the client screen so you can monitor its progress and generally use the logon screen without the boinc messages appearing.

Now referring back slightly again, I previously mentioned that not all files from the “full” install were in the /opt/ltsp/i386 tree. You will need to copy three files into the correct locations.

/usr/bin/nohup copy to /opt/ltsp/i386/usr/bin
/usr/bin/top copy to /opt/ltsp/i386/usr/bin/top
/lib/libproc.so.3.2.0 copy to /opt/ltsp/i386/

Now, just we need to restart the services before booting the client machine to see all the changes take effect, from the system terminal

/etc/rc.d/init.d/portmap restart
/etc/rc.d/init.d/named restart
/etc/rc.d/init.d/xinetd restart
/etc/rc.d/init.d/dhcpd restart
/etc/rc.d/init.d/nfs restart

Now when you boot your client, you should be able to see the line “Starting Boinc” from the start_boinc file you created and because of the command line argument asking boinc to run the benchmarks, if you type “top” you will see the machine with no disk has a 100% cpu usage. To kill the boinc process from the diskless node, press “k”, then more than likely the process number of 144 (check it may be different).

Now press “q” to exit top, then cd /home/${HOSTNAME}/boinc and now you can attach projects using

./boinc –attach_project

Once you have set all your project going, you can either run quiet by “./boinc &” or type the full path to your start “batch” file

/etc/rc.d/start_boinc

Or in my opinion the simplest way is to reboot the machine !!

For new nodes, just add the entry to lts.conf and dhcpd.conf and do the same.

Comments and or glaringly obvious errors and all that - spent all day nailing the easiest way imo to get this running. If it helps get one farm running it was worth it

DT.

DoubleTop · 8 May 2005 21:22

[u]Running Nforce2 clients from a default K12 ltsp install[/u]

The core installed kernel that is used by PXE to boot the main kernel does not support the driver for nforce2 lan that is found on the etherboot site. I’ve found a neat trick to use an different kernel from my first raw install of ltsp that will get around the problem.
To enable nforce2 board clients in the K12 ltsp install, download the file ltsK12_nforce2.tar.gz and extract to the /tftpboot/lts location.
(to confirm you should now have a file “vmlinuz-2.4.24-ltsp-4” in tftboot/lts and a new directory 2.4.24-ltsp-4" in tftpboot/lts)

Now to use this kernel rather than the default a change in the dhcp.conf file is required for nforce2 clients.
[


#####nforce2 client
host ws001 {
	hardware ethernet 		XX:XX:XX:XX:XX:XX; #MAC ADDRESS
	fixed address			192.168.0.1;
	if substring (option vendor-class-identifier, 0, 9) = “PXEClient” {
	  [b]filename “lts/2.4.24-ltsp-4/eb-5.2.4-forcedeth.zpxe”;[/b]
	}else if substring (option vendor-class-identifier, 0, 9) = “Etherboot” {
	 [b]filename “lts/vmlinuz-2.4.24-ltsp-4”;[/b]
	option vendor-encapsulated-options 3c:09:45:74:68:65:72:62:6f:6f:74:ff;
	}

That should be all that is required to get your K12 ltsp server running using nforce2 boards.

DT.

Mojo · 9 May 2005 09:13

Nice one

Personally I prefer VIA Micro-ATX boards because they have everyting on-board and are real cheap (<£30) to buy brand new. You can buy them just about anywhere and support for their network cards etc. is already built into K12LTSP.

Still can’t get NForce2 boards working with K12 here - have you done this with one of yours?

Why the copy of /lib/libproc.so.3.2.0? I never did this and my BOINC nodes run fine, albeit version 4.19.

Unconditional mount of /home - easiest way is just delete the conditional statement round them i.e. the if statement.

I use the NFS swap option here as well, 64Meg per node and it works well, handy if you have a node with low memory.

Feel free to delete when read to keep the thread tidy

DoubleTop · 9 May 2005 09:38

I must admit - I’ve not taken the test ltsp setup into the loft yet to run one of the nforce boards - job for this evening. I used my laptop as the test client.

The lib/libproc was for something - I think it may have been for top or nohup, I know when running something during the install I got a cannot find shared lib so copied it for good measure. I ran through the install a couple of times yesterday and it was shockingly easy compared to using rh9 and ltsp like I did with mine - big to K12

Just got D2OL running on the server, but have to go out now - I’ll be adding folding and d2ol to the node guide later.

DT.

Peige · 9 May 2005 10:00

Thats a great guide DT, its the info i need to get off and running and have a go at this

Got a workstation here thats going to come free soon, that can be main unit… mojos board suggestion is the other piece of the puzzel thats been filled in for me… Last time i started this i ended up with 5 in the loft which all had to go when the loft conversion was done, but now the shed has power :wiggle:

Lots to learn though… most of that looks completely out of my league atm :eek:

/edit, any chance of a model number on those boards Mojo ?

Oh, feel free to delete or move this post to keep things tidy

Mojo · 9 May 2005 10:58

Anything Via KM400 or KM266 will do as long as it has onboard graphics and LAN

Abit have the VA-10 and VA-20, but most manufacturers make them

Try these for ideas:

http://www.ebuyer.com/customer/products/index.html?action=c2hvd19wcm9kdWN0X292ZXJ2aWV3&product_uid=83809
http://www.ebuyer.com/customer/products/index.html?action=c2hvd19wcm9kdWN0X292ZXJ2aWV3&product_uid=62519
http://www.ebuyer.com/customer/products/index.html?action=c2hvd19wcm9kdWN0X292ZXJ2aWV3&product_uid=65354
http://www.ebuyer.com/customer/products/index.html?action=c2hvd19wcm9kdWN0X292ZXJ2aWV3&product_uid=63681
http://www.ebuyer.com/customer/products/index.html?rb=7361118871&action=c2hvd19wcm9kdWN0X292ZXJ2aWV3&product_uid=63174

MAOJC · 9 May 2005 15:05

And the Abit NF7 boards run successful with no graphics cards, keybords or mice if set to ignore errors in the bios. Neat trick, just board, CPU memory and cooler and no overhead from on board graphics controller.

DoubleTop · 9 May 2005 18:47

I’ve just booted the kids nforce2 shuttle using the test ltsp server - took some doing mind you :nod:

You’ll need some files in order to it - a slight modification to dhcp.conf and a new directory in tftpboot/lts and I have a booted nforce2 board from k12

DT.

/edit - post edited with link to required files and quick guide

Mojo · 9 May 2005 21:30

used your download & all is now well - thanks Mucker!

bigsheff1 · 15 August 2006 13:04

any chance you can update this as your using core 2 and you’re using 5 according to some posts also an old version of boinc

DoubleTop · 15 August 2006 13:24

the version of boinc shouldn’t matter, and there is no massive change in file locations between 2 and 5 believe it or not. To use fc5 and then ltsp there is more work involved that just using the K12 distro, but the new LTSP “installer” does a fair amount.

My current server is 64bitFC4 and ltsp installed by doing ‘yum install ltsp’, then picking up bits from the guid about editing the lts.conf and sysinit files.

New job starts officially Monday - it might be some time before I can do this, I’ll see what I can do for you though

DT.

bigsheff1 · 16 August 2006 17:49

ta

i start my job tomorrow

cheers

Droid · 27 April 2013 11:21

Spotted someone surfing this page and wondered if anyone is still running a diskless array like the Crunchy Hog. Would it be possible to bring this into the present and use something like a RaspberryPi to run it as the server or possibly even a few as clients?

Might be an interesting project.

DoubleTop · 27 April 2013 16:12

I’ve been tempted to try with a bunch of atom based machines, ultra low power stuff that could create a pretty large array. Server side would be a lot better now if on SSD, no IO bottleneck which I managed to get to with a decent number of nodes.

No idea what the state of K12LTSP is now, I may have to investigate

DT.

PMM · 3 May 2013 19:14

I think somebody has done it with the RaspberryPi I remember them building it within a Lego structure each board rotated 180deg to make a really tight snugg setup… i will see if I can find it.

PMM · 3 May 2013 19:22

delve into this http://www.southampton.ac.uk/~sjc/raspberrypi/

http://www.southampton.ac.uk/~sjc/raspberrypi/pi_supercomputer_southampton.htm

Droid · 3 May 2013 21:06

That’s brilliant!! Cheers for that Paul. Be great to see an array like that crunching. As the vid says you could run it with just a few of them as opposed to the mass array that they have and it would be like Mojo’s shuttle farm that he used to have in the pub. Not great power individually but linked together it could produce significant numbers I would guess. Possible team project? Get members who can, to each buy one Raspberry Pi and then create a platform like the Crunchy Hog to assemble them and link them together. I’d be happy to buy a unit, but I don’t have the Linux skills to set up the software side of things. Always willing to learn though

I’d love to know what output we’d be able to achieve from this type of setup. It would almost be like a Borg distributed computing project. A TPR collective cruncher