Re: Information request/Amazon EC2

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

Jack K. Horner
At 09:00 AM 8/19/2009, Doug Roberts wrote:

>From: Douglas Roberts <[hidden email]>
>Precedence: list
>MIME-Version: 1.0
>To: The Friday Morning Applied Complexity Coffee Group <[hidden email]>
>Date: Tue, 18 Aug 2009 10:38:23 -0600
>Reply-To: The Friday Morning Applied Complexity Coffee Group
>         <[hidden email]>
>Message-ID: <[hidden email]>
>Content-Type: multipart/alternative; boundary=0016364ed7a89e0c4404716d2582
>Subject: [FRIAM] Information request
>Message: 1
>
>Hi, all.
>
>I am interested in learning what kind of experiences users of
>Amazon's EC2 resources have had.  What resources have you used; what
>has been your experience with availability, ease of use, cost, data
>transfer, privacy, etc.?
>
>TIA,
>
>--Doug
>
>--
>Doug Roberts
><mailto:[hidden email]>[hidden email]
><mailto:[hidden email]>[hidden email]
>505-455-7333 - Office
>505-670-8195 - Cell
>
>_______________________________________________
>Friam mailing list
>[hidden email]
>http://redfish.com/mailman/listinfo/friam_redfish.com



Doug,

I don't have direct experience with EC2.  However, I attended a
computational biology conference about two years ago in which Amazon
gave a talk on the system.  Here's what I distilled:

         1.  If computation-to-communication ratio of your
application is >> 1 (e.g., the SETI power-spectrum analysis problem),
EC2's network performance is benign.  If, in order to realize a
time-to-solution in your lifetime, your application requires a
computation/communication ratio approaching 1 (e.g., an extreme-scale
adaptive Eulerian mesh radiation-hydrodynamics code), the EC2 network
is your enemy.

         2.  For comparable problem setups, EC2 was less expensive
than buying time on IBM's pay-per-use Blue Gene system.

         3.  For comparable problem setups and theoretical peaks,
over the lifecycle the EC2 is less expensive per CPU-hour than a
cluster of PCs linked by fast Ethernet.

         4.  There was general agreement among the half-dozen or so
users of pay-per-use commercial clusters who were present at the talk
that EC2 gave the best bang for the buck.


Jack K. Horner
P. O. Box 266
Los Alamos, NM  87544-0266
Voice:   505-455-0381
Fax:     505-455-0382
email:   [hidden email]




============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

Douglas Roberts-2
Thanks, Jack.  I suspect that for distributed message passing ABM simulations the Amazon EC is not a good solution.

--Doug


--
Doug Roberts
[hidden email]
[hidden email]
505-455-7333 - Office
505-670-8195 - Cell

On Wed, Aug 19, 2009 at 3:31 PM, Jack K. Horner <[hidden email]> wrote:
At 09:00 AM 8/19/2009, Doug Roberts wrote:

From: Douglas Roberts <[hidden email]>
Precedence: list
MIME-Version: 1.0
To: The Friday Morning Applied Complexity Coffee Group <[hidden email]>
Date: Tue, 18 Aug 2009 10:38:23 -0600
Reply-To: The Friday Morning Applied Complexity Coffee Group
       <[hidden email]>
Message-ID: <[hidden email]>
Content-Type: multipart/alternative; boundary=0016364ed7a89e0c4404716d2582
Subject: [FRIAM] Information request
Message: 1

Hi, all.

I am interested in learning what kind of experiences users of Amazon's EC2 resources have had.  What resources have you used; what has been your experience with availability, ease of use, cost, data transfer, privacy, etc.?

TIA,

--Doug

--
Doug Roberts
<mailto:[hidden email]>[hidden email]
<mailto:[hidden email]>[hidden email]
505-455-7333 - Office
505-670-8195 - Cell

_______________________________________________
Friam mailing list
[hidden email]
http://redfish.com/mailman/listinfo/friam_redfish.com



Doug,

I don't have direct experience with EC2.  However, I attended a computational biology conference about two years ago in which Amazon gave a talk on the system.  Here's what I distilled:

       1.  If computation-to-communication ratio of your application is >> 1 (e.g., the SETI power-spectrum analysis problem), EC2's network performance is benign.  If, in order to realize a time-to-solution in your lifetime, your application requires a computation/communication ratio approaching 1 (e.g., an extreme-scale adaptive Eulerian mesh radiation-hydrodynamics code), the EC2 network is your enemy.

       2.  For comparable problem setups, EC2 was less expensive than buying time on IBM's pay-per-use Blue Gene system.

       3.  For comparable problem setups and theoretical peaks, over the lifecycle the EC2 is less expensive per CPU-hour than a cluster of PCs linked by fast Ethernet.

       4.  There was general agreement among the half-dozen or so users of pay-per-use commercial clusters who were present at the talk that EC2 gave the best bang for the buck.


Jack K. Horner
P. O. Box 266
Los Alamos, NM  87544-0266
Voice:   505-455-0381
Fax:     505-455-0382
email:   [hidden email]




============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org



============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

Douglas Roberts-2
In reply to this post by Jack K. Horner
Interesting article about cloud computing on Slashdot today:

http://tech.slashdot.org/story/09/08/20/0327205/Amazon-MS-Google-Clouds-Flop-In-Stress-Tests?art_pos=7

--Doug


On Wed, Aug 19, 2009 at 3:31 PM, Jack K. Horner <[hidden email]> wrote:
At 09:00 AM 8/19/2009, Doug Roberts wrote:

From: Douglas Roberts <[hidden email]>
Precedence: list
MIME-Version: 1.0
To: The Friday Morning Applied Complexity Coffee Group <[hidden email]>
Date: Tue, 18 Aug 2009 10:38:23 -0600
Reply-To: The Friday Morning Applied Complexity Coffee Group
       <[hidden email]>
Message-ID: <[hidden email]>
Content-Type: multipart/alternative; boundary=0016364ed7a89e0c4404716d2582
Subject: [FRIAM] Information request
Message: 1

Hi, all.

I am interested in learning what kind of experiences users of Amazon's EC2 resources have had.  What resources have you used; what has been your experience with availability, ease of use, cost, data transfer, privacy, etc.?

TIA,

--Doug

--
Doug Roberts
<mailto:[hidden email]>[hidden email]
<mailto:[hidden email]>[hidden email]
505-455-7333 - Office
505-670-8195 - Cell

_______________________________________________
Friam mailing list
[hidden email]
http://redfish.com/mailman/listinfo/friam_redfish.com



Doug,

I don't have direct experience with EC2.  However, I attended a computational biology conference about two years ago in which Amazon gave a talk on the system.  Here's what I distilled:

       1.  If computation-to-communication ratio of your application is >> 1 (e.g., the SETI power-spectrum analysis problem), EC2's network performance is benign.  If, in order to realize a time-to-solution in your lifetime, your application requires a computation/communication ratio approaching 1 (e.g., an extreme-scale adaptive Eulerian mesh radiation-hydrodynamics code), the EC2 network is your enemy.

       2.  For comparable problem setups, EC2 was less expensive than buying time on IBM's pay-per-use Blue Gene system.

       3.  For comparable problem setups and theoretical peaks, over the lifecycle the EC2 is less expensive per CPU-hour than a cluster of PCs linked by fast Ethernet.

       4.  There was general agreement among the half-dozen or so users of pay-per-use commercial clusters who were present at the talk that EC2 gave the best bang for the buck.


Jack K. Horner
P. O. Box 266
Los Alamos, NM  87544-0266
Voice:   505-455-0381
Fax:     505-455-0382
email:   [hidden email]




============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org




============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

Marcus G. Daniels
Douglas Roberts wrote:
> Interesting article about cloud computing on Slashdot today:
>
> http://tech.slashdot.org/story/09/08/20/0327205/Amazon-MS-Google-Clouds-Flop-In-Stress-Tests?art_pos=7
>
One nice thing about what Amazon does in contrast to most supercomputing
centers is to let you boot whatever kernel image you want.   That can be
important for diagnosing and fixing some kinds of problems.

I kind of doubt the commercial players pony up for the kind of
interconnects used on supercomputers though.  Maybe the NMCAC could
distinguish itself from the commercial players by providing users total
control over hardware, while also providing an interconnect that can
absorb stress?

Marcus


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

Douglas Roberts-2
Penguin Computing is trying to distinguish themselves in this way with their POD (Penguin On Demand, cute) service.

http://www.penguincomputing.com/POD/HPC_as_a_service

They seem expensive compared to Amazon, though.

On Thu, Aug 20, 2009 at 11:26 AM, Marcus G. Daniels <[hidden email]> wrote:
Douglas Roberts wrote:
Interesting article about cloud computing on Slashdot today:

http://tech.slashdot.org/story/09/08/20/0327205/Amazon-MS-Google-Clouds-Flop-In-Stress-Tests?art_pos=7

One nice thing about what Amazon does in contrast to most supercomputing centers is to let you boot whatever kernel image you want.   That can be important for diagnosing and fixing some kinds of problems.

I kind of doubt the commercial players pony up for the kind of interconnects used on supercomputers though.  Maybe the NMCAC could distinguish itself from the commercial players by providing users total control over hardware, while also providing an interconnect that can absorb stress?

Marcus



============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org



--
Doug Roberts
[hidden email]
[hidden email]
505-455-7333 - Office
505-670-8195 - Cell

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

glen e. p. ropella-2
In reply to this post by Marcus G. Daniels
Thus spake Marcus G. Daniels circa 09-08-20 10:26 AM:
> One nice thing about what Amazon does in contrast to most supercomputing
> centers is to let you boot whatever kernel image you want.   That can be
> important for diagnosing and fixing some kinds of problems.

Not only for problems; but without the ability to replicate the entire
toolchain all the way down through the OS, our transition from cluster
to cloud would have to wait for a scientific and development punctuation
point... which almost never happens. ;-)

--
glen e. p. ropella, 971-222-9095, http://agent-based-modeling.com


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

Jack K. Horner
In reply to this post by Jack K. Horner
At 09:00 AM 8/20/2009, Doug Roberts wrote:

>From: "Jack K. Horner" <[hidden email]>
>Precedence: list
>MIME-Version: 1.0
>To: [hidden email]
>References: <[hidden email]>
>In-Reply-To: <[hidden email]>
>Date: Wed, 19 Aug 2009 14:31:28 -0700
>Reply-To: The Friday Morning Applied Complexity Coffee Group
>         <[hidden email]>
>Message-ID: <[hidden email]>
>Content-Type: text/plain; charset="us-ascii"; format=flowed
>Subject: Re: [FRIAM] Information request/Amazon EC2
>Message: 3
>
>At 09:00 AM 8/19/2009, Doug Roberts wrote:
>
>>From: Douglas Roberts <[hidden email]>
>>Precedence: list
>>MIME-Version: 1.0
>>To: The Friday Morning Applied Complexity Coffee Group <[hidden email]>
>>Date: Tue, 18 Aug 2009 10:38:23 -0600
>>Reply-To: The Friday Morning Applied Complexity Coffee Group
>>         <[hidden email]>
>>Message-ID: <[hidden email]>
>>Content-Type: multipart/alternative; boundary=0016364ed7a89e0c4404716d2582
>>Subject: [FRIAM] Information request
>>Message: 1
>>
>>Hi, all.
>>
>>I am interested in learning what kind of experiences users of
>>Amazon's EC2 resources have had.  What resources have you used;
>>what has been your experience with availability, ease of use, cost,
>>data transfer, privacy, etc.?
>>
>>TIA,
>>
>>--Doug
>>
>>--
>>Doug Roberts
>><mailto:[hidden email]>[hidden email]
>><mailto:[hidden email]>[hidden email]
>>505-455-7333 - Office
>>505-670-8195 - Cell
>>
>>_______________________________________________
>>Friam mailing list
>>[hidden email]
>>http://redfish.com/mailman/listinfo/friam_redfish.com
>
>
>
>Doug,
>
>I don't have direct experience with EC2.  However, I attended a
>computational biology conference about two years ago in which Amazon
>gave a talk on the system.  Here's what I distilled:
>
>         1.  If computation-to-communication ratio of your
> application is >> 1 (e.g., the SETI power-spectrum analysis
> problem), EC2's network performance is benign.  If, in order to
> realize a time-to-solution in your lifetime, your application
> requires a computation/communication ratio approaching 1 (e.g., an
> extreme-scale adaptive Eulerian mesh radiation-hydrodynamics code),
> the EC2 network is your enemy.
>
>         2.  For comparable problem setups, EC2 was less expensive
> than buying time on IBM's pay-per-use Blue Gene system.
>
>         3.  For comparable problem setups and theoretical peaks,
> over the lifecycle the EC2 is less expensive per CPU-hour than a
> cluster of PCs linked by fast Ethernet.
>
>         4.  There was general agreement among the half-dozen or so
> users of pay-per-use commercial clusters who were present at the
> talk that EC2 gave the best bang for the buck.
>
>
>Jack K. Horner
>P. O. Box 266
>Los Alamos, NM  87544-0266
>Voice:   505-455-0381
>Fax:     505-455-0382
>email:   [hidden email]
>
>
>
>
>
>
>
>From: Douglas Roberts <[hidden email]>
>Precedence: list
>MIME-Version: 1.0
>To: The Friday Morning Applied Complexity Coffee Group <[hidden email]>
>References: <[hidden email]>
>         <[hidden email]>
>In-Reply-To: <[hidden email]>
>Date: Wed, 19 Aug 2009 14:42:41 -0600
>Reply-To: The Friday Morning Applied Complexity Coffee Group
>         <[hidden email]>
>Message-ID: <[hidden email]>
>Content-Type: multipart/alternative; boundary=0003255548b2234457047184adb6
>Subject: Re: [FRIAM] Information request/Amazon EC2
>Message: 4
>
>Thanks, Jack.  I suspect that for distributed message passing ABM
>simulations the Amazon EC is not a good solution.
>
>--Doug
>
>
>--
>Doug Roberts
><mailto:[hidden email]>[hidden email]
><mailto:[hidden email]>[hidden email]
>505-455-7333 - Office
>505-670-8195 - Cell



Doug,

Whether a given parallel computing system performs well enough
running a message-passing-oriented Agent Based Modeling (ABM)
application depends on, among other things,

         1.  How the agents are distributed across the processing
            elements (pes, nominally one microprocessor per pe) of the
            system.  Computational-mesh-oriented (CMO) applications that use
            message-passing services are sufficiently analogous to
            ABM-oriented applications that we can use mesh performance
            data to help bound what ABM performance is likely to be,
            given an allocation of agents per pe.

            In particular, it is not uncommon for CMO
            applications using ~50 state variable per cell to allocate
            ~100,000 cells per pe; state updates in such a system are
            accomplished by message-passing (using OMP or MPI) among cells.

            100,000 cells per pe is an empirically derived "rule of thumb",
            but it is roughly invariant across modern production-class
            compute nodes and a wide spectrum of mesh-oriented applications.

            For optimal performance, the cells allocated to a pe should
            be the set of cells that communicate most frequently with
            each other.  Sometimes a user can characterize that set
            through a propagation-rate function defined in the
            problem space (e.g., the speed of sound in a
            medium, the speed at which a virus travels from one agent
            to another, the speed of chemical reactions in a
            biological network).  Sometimes we don't know anything about
            the communication/propagation dynamics, in which case
            "reading" a pile of steaming chicken entrails predicts
            performance about as well as anything else.

            By analogy, if there were no more than ~50 state variables
            per agent in an ABM application, an allocation of up to
            100,000 tightly-communicating agents per pe would provide
            usable performance on many production-class clusters today
            (a cluster of PlayStations is an exception to
            this rule  of thumb, BTW).

            Allocating one agent per pe would be a vast waste of
            compute power for all except trivial problem setups.

            All of the above is useful only if the user can control
            the allocation of agents to pes.  Most production-class
            clusters, including the EC2, provide such controls.

            Note that this problem has to be addressed by the
            *user* in *any* cluster.



         2.  If the computation/communication ratio has to be near 1
            to obtain tolerable time-to-solution, the
            performance of the message-passing services matters
            hugely.  MPI and OMP have been optimized on only a few
            commercially available systems.  (A home-brew
            multi-thousand-node Linux cluster, in contrast, is nowhere
            near optimal in this sense.  Optimizing the latter, as
            a few incorrigibly optimistic souls have discovered,
            amounts to redesigning much of Linux process-management.
           If bleeding-edge performance matters, there is no free lunch.)


Jack



Jack K. Horner
P. O. Box 266
Los Alamos, NM  87544-0266
Voice:   505-455-0381
Fax:     505-455-0382
email:   [hidden email]



============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

Douglas Roberts-2
Jack,

It would be a fun project to move some already running largish distributed ABM from a standard Linux cluster over to EC2.

If only my company would pay me to play just on the fun projects...

--Doug

On Thu, Aug 20, 2009 at 12:47 PM, Jack K. Horner <[hidden email]> wrote:
At 09:00 AM 8/20/2009, Doug Roberts wrote:
Doug,

Whether a given parallel computing system performs well enough running a message-passing-oriented Agent Based Modeling (ABM) application depends on, among other things,

       1.  How the agents are distributed across the processing
          elements (pes, nominally one microprocessor per pe) of the
          system.  Computational-mesh-oriented (CMO) applications that use
          message-passing services are sufficiently analogous to
          ABM-oriented applications that we can use mesh performance
          data to help bound what ABM performance is likely to be,
          given an allocation of agents per pe.

          In particular, it is not uncommon for CMO
          applications using ~50 state variable per cell to allocate
          ~100,000 cells per pe; state updates in such a system are
          accomplished by message-passing (using OMP or MPI) among cells.

          100,000 cells per pe is an empirically derived "rule of thumb",
          but it is roughly invariant across modern production-class
          compute nodes and a wide spectrum of mesh-oriented applications.

          For optimal performance, the cells allocated to a pe should
          be the set of cells that communicate most frequently with
          each other.  Sometimes a user can characterize that set
          through a propagation-rate function defined in the
          problem space (e.g., the speed of sound in a
          medium, the speed at which a virus travels from one agent
          to another, the speed of chemical reactions in a
          biological network).  Sometimes we don't know anything about
          the communication/propagation dynamics, in which case
          "reading" a pile of steaming chicken entrails predicts
          performance about as well as anything else.

          By analogy, if there were no more than ~50 state variables
          per agent in an ABM application, an allocation of up to
          100,000 tightly-communicating agents per pe would provide
          usable performance on many production-class clusters today
          (a cluster of PlayStations is an exception to
          this rule  of thumb, BTW).

          Allocating one agent per pe would be a vast waste of
          compute power for all except trivial problem setups.

          All of the above is useful only if the user can control
          the allocation of agents to pes.  Most production-class
          clusters, including the EC2, provide such controls.

          Note that this problem has to be addressed by the
          *user* in *any* cluster.



       2.  If the computation/communication ratio has to be near 1
          to obtain tolerable time-to-solution, the
          performance of the message-passing services matters
          hugely.  MPI and OMP have been optimized on only a few
          commercially available systems.  (A home-brew
          multi-thousand-node Linux cluster, in contrast, is nowhere
          near optimal in this sense.  Optimizing the latter, as
          a few incorrigibly optimistic souls have discovered,
          amounts to redesigning much of Linux process-management.
         If bleeding-edge performance matters, there is no free lunch.)


Jack




Jack K. Horner
P. O. Box 266
Los Alamos, NM  87544-0266
Voice:   505-455-0381
Fax:     505-455-0382
email:   [hidden email]



============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org



--
Doug Roberts
[hidden email]
[hidden email]
505-455-7333 - Office
505-670-8195 - Cell

============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org
Reply | Threaded
Open this post in threaded view
|

Re: Information request/Amazon EC2

Roger Critchlow-2
In reply to this post by Marcus G. Daniels
There was an interesting article from SIGCOMM posted yesterday about in-center-inter-connect:


There's going to be a separation between dense clouds, where you get excellent connectivity within the cloud and fast vm migration when nodes or arcs fail, and sparse clouds where everything sort of works like the internet, somewhere on a scale of excellent to not at all.

-- rec --

On Thu, Aug 20, 2009 at 11:26 AM, Marcus G. Daniels <[hidden email]> wrote:
Douglas Roberts wrote:
Interesting article about cloud computing on Slashdot today:

http://tech.slashdot.org/story/09/08/20/0327205/Amazon-MS-Google-Clouds-Flop-In-Stress-Tests?art_pos=7

One nice thing about what Amazon does in contrast to most supercomputing centers is to let you boot whatever kernel image you want.   That can be important for diagnosing and fixing some kinds of problems.

I kind of doubt the commercial players pony up for the kind of interconnects used on supercomputers though.  Maybe the NMCAC could distinguish itself from the commercial players by providing users total control over hardware, while also providing an interconnect that can absorb stress?

Marcus



============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org