It is currently Mon Nov 23, 2009 2:38 pm




Post new topic Reply to topic  [ 22 posts ]  Go to page Previous  1, 2
 beowulf type cluster for larc 
Author Message

Joined: Thu Nov 15, 2007 7:41 pm
Posts: 9
Post 
So anyway i checked it and it somewhat runs but has lots of warnings and probably needs ALOT more tweaking like for one t needs a share since all the nodes need the executables to run them.


Thu Nov 15, 2007 8:35 pm
Profile
Winner

Joined: Mon Oct 01, 2007 8:43 pm
Posts: 250
Location: Beavercreek/Cincinnati
Post 
distcc is now running on those boxes.
So far it's enabling 10.63.1.0/24 to compile, which should include most boxes in the LaRC office, shouldn't it?
Anyhow, to use it:
1) Ensure distcc is installed locally.
2) export DISTCC_HOSTS='alpha bravo charlie delta echo foxtrot golf hotel'
3) make -j30 CC=distcc
Or do -j10 or -j20 or something. The site recommends setting the number to twice the number of CPUs present but also says going beyond about 20 is usually pointless.
If it's something built with autotools and all that crap, try... CC=distcc ./configure

As a test, I built distcc, first with "make -j20": 0m16.133s
And then with just 'make': 1m1.925s
So that's about a 5X speedup.

I also built kernel 2.6.23.1 with 'make -j30': 12m29.980s

and I'm waiting for it to finish with 'make'


Thu Nov 15, 2007 9:34 pm
Profile
Winner

Joined: Mon Oct 01, 2007 8:43 pm
Posts: 250
Location: Beavercreek/Cincinnati
Post 
Ok, I screwed up the kernel builds because one was done with modules and one without, so my meaningless benchmark numbers are now 74% more meaningless. Meh.
I am going to try to set up Erlang soon, just for messing around if nothing else. I set up a user account, erlang/erlang, on each box, and I attempted to set up ssh-agent with RSA keys. I don't know if I am doing things right, but here's the general procedure:
1) Log in as user erlang.
2) ssh-agent bash (insert shell of choice)
3) ssh-add
4) From within this shell, you may SSH password-free to alpha, bravo, charlie, delta, echo, foxtrot, golf, & hotel, as user erlang.
5) Exit out of this shell once done.
This is not so much for convenience, as for allowing some programs to communicate in the first place over SSH.
pssh is once such program. Once you're at step 4 above, you can run something like:
pssh -h ~/hosts.txt "echo foobarbazquuz >~/moo.txt"
...and that command will run on all the hosts in hosts.txt (which is the 8 in the cluster right now).

I perhaps could disable outside access to user erlang (you can log in from another name anyhow and then su), enable sudo for apt-get/aptitude access, and manage software on all 8 hosts at once. Is there any way to avoid the massive bandwidth-wasting of grabbing a package 8 times?


Mon Nov 19, 2007 9:21 am
Profile
Winner

Joined: Mon Oct 01, 2007 8:43 pm
Posts: 250
Location: Beavercreek/Cincinnati
Post Re: beowulf type cluster for larc
Well, I tried for several hours on Wednesday with the help of a few other people to do automated installs onto the other 7 nodes of the cluster using FAI. One node is acting as the head node; one of its NICs goes to UC's network, and the other goes to the crappy old 10 megabit switch, which the other 7 nodes plug into. Yes, I know 10 megabit is very slow.

Getting the nodes to netboot was, surprisingly, the easy part. The hard (so far impossible) part was getting them to do something useful once netbooting was done. The problem was that FAI was having them netboot using eth1 into some configuration which then proceeded to try to get an address from eth0 and refused to proceed further as there was no cable on eth0, and none of our attempts to make it use the other card were successful. I tried switching the cable from eth1 to eth0 once netbooting had finished, and then it went one step further and halted yet again.
I suppose we could just remove the non-netbooting NIC from each node, but if we are going to remove each node individually anyway, a number of other methods for cloning an install become easy.

It was looking like I also could do a Debian netboot and use a feature called "preseed" in which I could pass some options to the kernel at startup and the install would then run totally non-interactively using all the settings from another install (basically like Kickstart from RH). I couldn't get too far with this, as apparently preseed isn't a standard feature so I'd need some branched/forked version.

But... I was reading today about OpenSSI. The "SSI" there stands for Single-System Image, which means that the view the software presents of the cluster is like that of a single computer, which differs from MPI and PVM and the explicit message-passing between nodes.
One node acts as the master node, and it netboots the other nodes (which don't even need hard drives). From there, as I understand it, it will automatically migrate processes (but not threads) between nodes as needed, and every node sees a very similar filesystem. So most software runs unmodified, but performance benefits will only be seen if there are enough processes to distribute. I wonder if this could speed up long builds...
Problem is, OpenSSI seems to run best on older distros and probably kernel 2.4. I think it involves a modified kernel - hopefully one that the Myrinet drivers will run on.
So perhaps I will try this. Maybe I can convince Adam that one of the workstations we'll soon acquire/get working can be a head node instead of one of the P2s.


Sat Mar 14, 2009 1:30 am
Profile
Winner

Joined: Mon Oct 01, 2007 11:33 am
Posts: 114
Location: Cincinnati, Ohio
Post Re: beowulf type cluster for larc
OpenSSI seems pretty cool, and we probably will be able spare on of the new machines to be the head node considering there is no planned usage for the machines.

_________________
Edward C. Kimball
LaRC Treasurer
Argh... May the source be with you... and you.... and you


Sat Mar 14, 2009 9:44 am
Profile WWW
Winner

Joined: Mon Oct 01, 2007 8:43 pm
Posts: 250
Location: Beavercreek/Cincinnati
Post Re: beowulf type cluster for larc
pirates0argh wrote:
OpenSSI seems pretty cool, and we probably will be able spare on of the new machines to be the head node considering there is no planned usage for the machines.

I will try to get it running next time I'm in the office, at least to some degree. Whatever the head node is, it will probably need some decently fast disk if 8 nodes are going to be hitting it at once.
What I am curious about is if this can be used in conjunction with something like MPI or PVM. The normal communications channels between nodes should be preserved, but the head node provides automatic load-leveling. Every single node basically shares a filesystem, so all the necessary libraries and binaries should be available on each.


Sat Mar 14, 2009 11:47 am
Profile
Winner

Joined: Mon Oct 01, 2007 8:43 pm
Posts: 250
Location: Beavercreek/Cincinnati
Post Re: beowulf type cluster for larc
OpenSSI is now running. I use the term "running" lightly because the current arrangement is pretty slow, on account of the head node not being that fast and the interconnect (i.e. a 10 megabit switch) really not being that fast either, and the entire thing has crashed at least twice. I should install something old on it like Debian Sarge or FC3, then OpenSSI is officially supported and the Myrinet boards have a much higher chance of working.
Right now, 10.63.1.73 is the head node, and it netboots the other 7 nodes (192.168.1.2 to 192.168.1.8). The top 4 nodes require a keyboard to boot and I can't find BIOS options anyplace to change this. I tried to do a BIOS update yesterday on one of them, but I guess I had the wrong board number or something, because it refused to update.
If you start something in 'bash-ll' any new processes are load-balanced between the node. Right now, this is able to efficiently distribute the task of completely failing to compile. As an interesting side effect of all nodes having a shared filesystem, I made one SSH key and now you can SSH from any node into any other node without a password.
You can also prefix a command with 'dsh -a' to run it on every node.


Thu Mar 19, 2009 9:27 am
Profile
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ]  Go to page Previous  1, 2


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB © phpBB Group.
Designed by Vjacheslav Trushkin for Free Forums/DivisionCore.