Friam

FW: Distribution / Parallelization of ABM's

Classic

List

Threaded

5 messages Options

Marcus G. Daniels-3

Oct 06, 2006; 9:35pm

FW: Distribution / Parallelization of ABM's

104 posts

Quoting Douglas Roberts <doug at parrot-farm.net>:

> If you go to any of the supercomputing centers such as NCSA, SDSC, or PSC,
> you do not see parallel java apps running on any of their machines (with the
> occasional exception of a parallel newbie trying, with great difficulty to
> make something work). The reasons:
>
> 1. there are few supported message passing toolkits that support
> parallel java apps,
> 2. java runs 3-4 times slower than C, C++, Fortran, and machine time
> is expensive, and finally
> 3. there are well-designed and maintained languages, tookits and APIs
> for implementing HPC applications, and || developers use them instead of
> java.

I expect in the next few years some supercomputing niches will start to use
hypervisors like Xen. By paying the 20% cost or so this will allow queueing
systems like LSF to bring jobs on and off line with uniform checkpointing. It
will remove the need for ad-hoc checkpointing code in applications by allowing
any executable to be stored (much a laptop does when it goes to sleep) and/or
migrated from one system to another.

I would be very happy I could submit a job and have it run indefinitely to
completion...instead of having it kicked out every 6-12 hours for manual restart
or a procedure where I have to write scripts to figure out where things stand
and adaptive resubmit. 20% is nothing compared to that inefficency!

While there are well-designed and maintained languages for HPC (MPI and OpenMP),
they are only the most basic of infrastructure. MPI is a pain to use and OpenMP
requires a big SMP system. Maybe when there are Hypertransport cables and
implementations of languages like Fortress or Chapel life will be better. (e.g.
Infiniband hasn't resulted in useful and used distributed shared memory systems.)

Russell Standish

Oct 05, 2006; 8:23am

FW: Distribution / Parallelization of ABM's

361 posts

On Fri, Oct 06, 2006 at 03:35:30PM -0600, mgd at santafe.edu wrote:

> Quoting Douglas Roberts <doug at parrot-farm.net>:
>
> > If you go to any of the supercomputing centers such as NCSA, SDSC, or PSC,
> > you do not see parallel java apps running on any of their machines (with the
> > occasional exception of a parallel newbie trying, with great difficulty to
> > make something work). The reasons:
> >
> > 1. there are few supported message passing toolkits that support
> > parallel java apps,
> > 2. java runs 3-4 times slower than C, C++, Fortran, and machine time
> > is expensive, and finally
> > 3. there are well-designed and maintained languages, tookits and APIs
> > for implementing HPC applications, and || developers use them instead of
> > java.
>
> I expect in the next few years some supercomputing niches will start to use
> hypervisors like Xen. By paying the 20% cost or so this will allow queueing
> systems like LSF to bring jobs on and off line with uniform checkpointing. It
> will remove the need for ad-hoc checkpointing code in applications by allowing
> any executable to be stored (much a laptop does when it goes to sleep) and/or
> migrated from one system to another.
>
> I would be very happy I could submit a job and have it run indefinitely to
> completion...instead of having it kicked out every 6-12 hours for manual restart
> or a procedure where I have to write scripts to figure out where things stand
> and adaptive resubmit. 20% is nothing compared to that inefficency!

... [show rest of quote]

Interesting comment, but checkpointing of single image tasks was never a
real showstopper. It was a practical option on many systems, eg Irix,
and could have been solved for Linux at any time.

Also EcoLab (for agent based modelling) provides trivial checkpointing
functionality for serial codes - but it does get more interesting when
using it in parallel.

However Xen will not solve the showstoppers that occur for codes that
use sockets - ie all distributed memory message passing jobs, and jobs
using floating licensed commercial software.

I would prefer that the use of Xen be simply a user specifiable option
for doing checkpointing.

>
> While there are well-designed and maintained languages for HPC (MPI and OpenMP),
> they are only the most basic of infrastructure. MPI is a pain to use and OpenMP
> requires a big SMP system. Maybe when there are Hypertransport cables and
> implementations of languages like Fortress or Chapel life will be better. (e.g.
> Infiniband hasn't resulted in useful and used distributed shared memory systems.)
>

ClassdescMP takes away most of the pain of MPI. There other options too,
for example the recently added Boost.MPI package. I'm planning on
taking a look at Boost.MPI to see how it compares with ClassdescMP...

(Note I'm being a one-eyed C++ person here, though...)

>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org

--
*PS: A number of people ask me about the attachment to my email, which
is of type "application/pgp-signature". Don't worry, it is not a
virus. It is an electronic signature, that may be used to verify this
email came from me if you have PGP or GPG installed. Otherwise, you
may safely ignore this attachment.

----------------------------------------------------------------------------
A/Prof Russell Standish Phone 0425 253119 (mobile)
Mathematics
UNSW SYDNEY 2052 R.Standish at unsw.edu.au
Australia http://parallel.hpc.unsw.edu.au/rks
International prefix +612, Interstate prefix 02
----------------------------------------------------------------------------

Douglas Roberts-2

Oct 07, 2006; 4:41pm

FW: Distribution / Parallelization of ABM's

1355 posts

In reply to this post by Marcus G. Daniels-3

I forgot to mention: you have to tinker the snot out of a NUMA application
to get optimal performance. NUMA means that you have to pay close attention
to what parts of your calculation are using which memory, location-wise.
Non-uniform means different latency/bandwith for different memory locations
relative to any cpu in the system. IMO it actually takes longer to develop
an effective NUMA app than it does to field a distributed memory app.

--Doug

On 10/7/06, Douglas Roberts <doug at parrot-farm.net> wrote:

>
> On 10/6/06, mgd at santafe.edu <mgd at santafe.edu> wrote:
> >
> > Quoting Douglas Roberts <doug at parrot-farm.net>:
> >
> > > I disagree about the InfiniBand bit. Myrinet, and now, the newer
> > > Infiniband technology are commonly used on distributed memory
> > systems.
> >
> > Systems or applications? What systems? I know Intel sells a version of
> > Treadmarks that has a DSM server, but what hardware vendor uses
> > Infiniband to
> > make a unified NUMA memory, e.g. like an Altix?
>
>
> Lots of systems use Infiniband interconnet technology, but for distributed
> memory machines, not NUMA.
>
> http://www.osc.edu/press/releases/2004/voltaire.shtml
> http://www.beowulf.org/archive/2001-October/005268.html
> http://www.linuxdevices.com/news/NS7459807643.html
> http://www.hpcwire.com/hpc/506904.html
>
> etc.. There's nothing magic about Infiniband, it's just a faster,
> lower-latency Myrinet. See below for a note regarding NUMA machines.
>
>
> > The +2GB/sec bandwidth of these interconnect fabrics is important for
> > message
> > passing applications.
> >
> > For a NUMA system, where application parallelism isn't limited to
> > message
> > passing, latency is more important than bandwidth. Message passing as a
> > way to
> > write programs is what I find constraining!
> >
>
> Distributed applications will probably always scale better than shared
> memory applications because there are not very many shared memory or NUMA
> machines out there, and the distributed memory machines are much bigger than
> any of the shared memory or NUMA machines currently in production. The
> Altix 3000 is one of the few NUMA machines currenly still running at a few
> places, and SGI no longer is in business, at least with respect to NUMA
> machines. NUMA machines, while fun to play on, are really better suited to
> the hobbyist, since you don't find them (with but a few exceptions) in the
> production world.
>
> Learning how to design effective message passing distributed applications
> is not easy, but is is worth it when you have an application that needs to
> scale.
>
> --Doug
> --
> Doug Roberts, RTI International
> droberts at rti.org
> doug at parrot-farm.net
> 505-455-7333 - Office
> 505-670-8195 - Cell
>

... [show rest of quote]

--
Doug Roberts, RTI International
droberts at rti.org
doug at parrot-farm.net
505-455-7333 - Office
505-670-8195 - Cell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: /pipermail/friam_redfish.com/attachments/20061007/2fa909fd/attachment.html

Marcus G. Daniels-3

Oct 07, 2006; 6:15pm

FW: Distribution / Parallelization of ABM's

104 posts

Douglas Roberts wrote:
> I forgot to mention: you have to tinker the snot out of a NUMA
> application to get optimal performance. NUMA means that you have to
> pay close attention to what parts of your calculation are using which
> memory, location-wise. Non-uniform means different latency/bandwith
> for different memory locations relative to any cpu in the system. IMO
> it actually takes longer to develop an effective NUMA app than it does
> to field a distributed memory app.
To make almost any interesting operation execute fast on a modern CPU
means paying attention to what memory is being called upon and in what
order. That's unavoidable whether or not you admit defeat by using
message passing for the sake of scaling. (I'm avoiding the term
"distributed memory app" to avoid confusion with "distributed shared
memory".)

Marcus G. Daniels-3

Oct 07, 2006; 6:35pm

FW: Distribution / Parallelization of ABM's

104 posts

In reply to this post by Douglas Roberts-2

Douglas Roberts wrote:
>
> etc.. There's nothing magic about Infiniband, it's just a faster,
> lower-latency Myrinet. See below for a note regarding NUMA machines.
>
Lower latency yet would be AC Hypertransport cabling (up to a meter at
full speed apparently, and then it's just a question of backing off for
longer runs). The lowest latency Infiniband systems already use HTX:

http://www.pathscale.com/infinipath.php
http://www.hypertransport.org/tech/tech_htthree.cfm?m=3
http://www.hpcwire.com/hpc/646006.html

Interconnects on a rack would be about a meter. Voil?, what was once
a cluster becomes a NUMA SMP by cross-box Hypertransport!