FYI. Following on a brief discussion Tuesday at the data mining session....
Google & IBM giving students a distributed systems lab using Hadoop<http://feeds.feedburner.com/%7Er/oreilly/radar/atom/%7E3/167584952/google_ibm_give.html> Posted: 09 Oct 2007 04:07 PM CDT By Jesse Robbins [image: hadoop-logo.jpg] <http://lucene.apache.org/hadoop/> Google <http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html> & IBM have partnered <http://www-03.ibm.com/press/us/en/pressrelease/22414.wss> to give university students hands-on experience developing software for large-scale distributed systems. This initiative focuses on parallel processing for large data sets using Hadoop<http://lucene.apache.org/hadoop/>, an open source implementation of Google's MapReduce<http://labs.google.com/papers/mapreduce.html>. (See Tim's earlier post about Yahoo & Hadoop<http://radar.oreilly.com/archives/2007/08/yahoos_bet_on_h.html>) "The goal of this initiative is to improve computer science students' knowledge of highly parallel computing practices to better address the emerging paradigm of large-scale distributed computing. IBM and Google are teaming up to provide hardware, software and services to augment university curricula and expand research horizons. With their combined resources, the companies hope to lower the financial and logistical barriers for the academic community to explore this emerging model of computing." The project currently includes the University of Washington, Carnegie-Mellon University, MIT, Stanford, UC Berkeley and the University of Maryland. Students in participating classes will have access to a dedicated cluster of "several hundred computers" running Linux under XEN virtualization<http://www.xensource.com/Pages/default.aspx>. The project is expected to expand to thousands of processors and eventually be open to researchers and students at other institutions. As part of this effort, Google and the University of Washington have released a Creative Commons licensed curriculum to help teach distributed systems concepts and techniques<http://code.google.com/edu/content/parallel.html>. IBM is also providing Hadoop plug-ins for Eclipse<http://www.alphaworks.ibm.com/tech/mapreducetools>. *Note: *You can also build similar systems using Hadoop with Amazon EC2<http://wiki.apache.org/lucene-hadoop/AmazonEC2>. Tom White recently posted an excellent guide<http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112>and Powerset has been using this in production<http://www.royans.net/arch/2007/09/13/scaling-powerset-using-amazons-ec2-and-s3/>for quite some time. --tj -- ========================================== J. T. Johnson Institute for Analytic Journalism -- Santa Fe, NM USA www.analyticjournalism.com 505.577.6482(c) 505.473.9646(h) http://www.jtjohnson.com tom at jtjohnson.us "You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete." -- Buckminster Fuller ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: http://redfish.com/pipermail/friam_redfish.com/attachments/20071010/37957911/attachment.html |
But doesn't most evidence point to the likelihood that not having enough
computing power isn't our problem with natural systems? Phil Henshaw ????.?? ? `?.???? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 680 Ft. Washington Ave NY NY 10040 tel: 212-795-4844 e-mail: pfh at synapse9.com explorations: www.synapse9.com <http://www.synapse9.com/> -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Tom Johnson Sent: Wednesday, October 10, 2007 11:49 AM To: Friam at redfish. com Cc: David Collins Subject: [FRIAM] Google & IBM giving students a distributed systems labusing Hadoop FYI. Following on a brief discussion Tuesday at the data mining session.... <http://feeds.feedburner.com/%7Er/oreilly/radar/atom/%7E3/167584952/goog le_ibm_give.html> & IBM giving students a distributed systems lab using Hadoop Posted: 09 Oct 2007 04:07 PM CDT By Jesse Robbins <http://lucene.apache.org/hadoop/> hadoop-logo.jpg Google <http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html> & IBM have partnered <http://www-03.ibm.com/press/us/en/pressrelease/22414.wss> to give university students hands-on experience developing software for large-scale distributed systems. This initiative focuses on parallel processing for large data sets using Hadoop <http://lucene.apache.org/hadoop/> , an open source implementation of Google's <http://labs.google.com/papers/mapreduce.html> MapReduce. (See Tim's earlier post about Yahoo <http://radar.oreilly.com/archives/2007/08/yahoos_bet_on_h.html> & Hadoop ) "The goal of this initiative is to improve computer science students' knowledge of highly parallel computing practices to better address the emerging paradigm of large-scale distributed computing. IBM and Google are teaming up to provide hardware, software and services to augment university curricula and expand research horizons. With their combined resources, the companies hope to lower the financial and logistical barriers for the academic community to explore this emerging model of computing." The project currently includes the University of Washington, Carnegie-Mellon University, MIT, Stanford, UC Berkeley and the University of Maryland. Students in participating classes will have access to a dedicated cluster of "several hundred computers" running Linux under XEN <http://www.xensource.com/Pages/default.aspx> virtualization. The project is expected to expand to thousands of processors and eventually be open to researchers and students at other institutions. As part of this effort, Google and the University of Washington have released a Creative Commons licensed curriculum to help teach distributed systems concepts and <http://code.google.com/edu/content/parallel.html> techniques. IBM is also providing Hadoop <http://www.alphaworks.ibm.com/tech/mapreducetools> plug-ins for Eclipse. Note: You can also build similar systems using Hadoop <http://wiki.apache.org/lucene-hadoop/AmazonEC2> with Amazon EC2 . Tom White recently posted an excellent guide <http://developer.amazonwebservices.com/connect/entry.jspa?externalID=87 3&categoryID=112> and Powerset has been using this in production <http://www.royans.net/arch/2007/09/13/scaling-powerset-using-amazons-ec 2-and-s3/> for quite some time. --tj -- ========================================== J. T. Johnson Institute for Analytic Journalism -- Santa Fe, NM USA www.analyticjournalism.com 505.577.6482(c) 505.473.9646(h) http://www.jtjohnson.com tom at jtjohnson.us "You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete." -- Buckminster Fuller ========================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: http://redfish.com/pipermail/friam_redfish.com/attachments/20071010/c93da283/attachment.html |
Administrator
|
In reply to this post by Tom Johnson
"Super computing" is facing an interesting challenge with the advent
of multi-core, multi-memory, blade/cluster/grid systems. The issue is the architecture one uses for powerful architectures. It's very difficult to have a generalized system that works well over a number of application architectures. And the choices are becoming larger by the minute. The newer "blade" systems offer both multi- processor and shared memory systems. They can be configured as clusters or as a sorta many processor system looking like a single memory system .. far easier to program. Grid systems are popular, and figuring out how to adapt to the latest hardware advances. My guess is any realistic solution will be hybrid, combining the features of all these large scale architectures. Here's the gotcha: how does it impact the programming language used? One wants an "agile" multi-processor, multi-memory architecture that can be reconfigured for advances in hardware and software. Thus far, there's no silver bullet. -- Owen owen at backspaces.net Beer is proof that God loves us, and wants us to be happy. On Oct 10, 2007, at 9:48 AM, Tom Johnson wrote: > FYI. Following on a brief discussion Tuesday at the data mining > session.... > > Google & IBM giving students a distributed systems lab using Hadoop > > Posted: 09 Oct 2007 04:07 PM CDT > > By Jesse Robbins > > Google & IBM have partnered to give university students hands-on > experience developing software for large-scale distributed systems. > This initiative focuses on parallel processing for large data sets > using Hadoop, an open source implementation of Google's MapReduce. > (See Tim's earlier post about Yahoo & Hadoop ) > > "The goal of this initiative is to improve computer science > students' knowledge of highly parallel computing practices to > better address the emerging paradigm of large-scale distributed > computing. IBM and Google are teaming up to provide hardware, > software and services to augment university curricula and expand > research horizons. With their combined resources, the companies > hope to lower the financial and logistical barriers for the > academic community to explore this emerging model of computing." > The project currently includes the University of Washington, > Carnegie-Mellon University, MIT, Stanford, UC Berkeley and the > University of Maryland. Students in participating classes will have > access to a dedicated cluster of "several hundred computers" > running Linux under XEN virtualization. The project is expected to > expand to thousands of processors and eventually be open to > researchers and students at other institutions. > > As part of this effort, Google and the University of Washington > have released a Creative Commons licensed curriculum to help teach > distributed systems concepts and techniques. IBM is also providing > Hadoop plug-ins for Eclipse. > > Note: You can also build similar systems using Hadoop with Amazon > EC2 . Tom White recently posted an excellent guide and Powerset has > been using this in production for quite some time. > > > --tj > -- > ========================================== > J. T. Johnson > Institute for Analytic Journalism -- Santa Fe, NM USA > www.analyticjournalism.com > 505.577.6482(c) 505.473.9646(h) > http://www.jtjohnson.com tom at jtjohnson.us > > "You never change things by fighting the existing reality. > To change something, build a new model that makes the > existing model obsolete." > -- Buckminster > Fuller > ========================================== > ============================================================ > FRIAM Applied Complexity Group listserv > Meets Fridays 9a-11:30 at cafe at St. John's College > lectures, archives, unsubscribe, maps at http://www.friam.org |
I am currently in an Agent Based Simulation class and I am going to
do a report comparing and contrasting ABS in parallel (distributed, etc.) environments vs. running a simulation in a purely sequential environment. It seems obvious to me that you could get very different results from one computational architecture vs. another. Does anyone have any experience with truly parallel systems in this regard they would like to share? Thanks! On Oct 10, 2007, at 7:43 PM, Owen Densmore wrote: > "Super computing" is facing an interesting challenge with the advent > of multi-core, multi-memory, blade/cluster/grid systems. > > The issue is the architecture one uses for powerful architectures. > It's very difficult to have a generalized system that works well over > a number of application architectures. And the choices are becoming > larger by the minute. The newer "blade" systems offer both multi- > processor and shared memory systems. They can be configured as > clusters or as a sorta many processor system looking like a single > memory system .. far easier to program. Grid systems are popular, > and figuring out how to adapt to the latest hardware advances. > > My guess is any realistic solution will be hybrid, combining the > features of all these large scale architectures. > > Here's the gotcha: how does it impact the programming language used? > One wants an "agile" multi-processor, multi-memory architecture that > can be reconfigured for advances in hardware and software. Thus far, > there's no silver bullet. > > -- Owen owen at backspaces.net > Beer is proof that God loves us, and wants us to be happy. > > On Oct 10, 2007, at 9:48 AM, Tom Johnson wrote: > >> FYI. Following on a brief discussion Tuesday at the data mining >> session.... >> >> Google & IBM giving students a distributed systems lab using Hadoop >> >> Posted: 09 Oct 2007 04:07 PM CDT >> >> By Jesse Robbins >> >> Google & IBM have partnered to give university students hands-on >> experience developing software for large-scale distributed systems. >> This initiative focuses on parallel processing for large data sets >> using Hadoop, an open source implementation of Google's MapReduce. >> (See Tim's earlier post about Yahoo & Hadoop ) >> >> "The goal of this initiative is to improve computer science >> students' knowledge of highly parallel computing practices to >> better address the emerging paradigm of large-scale distributed >> computing. IBM and Google are teaming up to provide hardware, >> software and services to augment university curricula and expand >> research horizons. With their combined resources, the companies >> hope to lower the financial and logistical barriers for the >> academic community to explore this emerging model of computing." >> The project currently includes the University of Washington, >> Carnegie-Mellon University, MIT, Stanford, UC Berkeley and the >> University of Maryland. Students in participating classes will have >> access to a dedicated cluster of "several hundred computers" >> running Linux under XEN virtualization. The project is expected to >> expand to thousands of processors and eventually be open to >> researchers and students at other institutions. >> >> As part of this effort, Google and the University of Washington >> have released a Creative Commons licensed curriculum to help teach >> distributed systems concepts and techniques. IBM is also providing >> Hadoop plug-ins for Eclipse. >> >> Note: You can also build similar systems using Hadoop with Amazon >> EC2 . Tom White recently posted an excellent guide and Powerset has >> been using this in production for quite some time. >> >> >> --tj >> -- >> ========================================== >> J. T. Johnson >> Institute for Analytic Journalism -- Santa Fe, NM USA >> www.analyticjournalism.com >> 505.577.6482(c) 505.473.9646(h) >> http://www.jtjohnson.com tom at jtjohnson.us >> >> "You never change things by fighting the existing reality. >> To change something, build a new model that makes the >> existing model obsolete." >> -- Buckminster >> Fuller >> ========================================== >> ============================================================ >> FRIAM Applied Complexity Group listserv >> Meets Fridays 9a-11:30 at cafe at St. John's College >> lectures, archives, unsubscribe, maps at http://www.friam.org > > > ============================================================ > FRIAM Applied Complexity Group listserv > Meets Fridays 9a-11:30 at cafe at St. John's College > lectures, archives, unsubscribe, maps at http://www.friam.org |
David Mirly wrote:
> It seems obvious to me that you could get very different results from > one computational architecture vs. another. > Swarm, for example, has a logical model of concurrency and options for controlling it. Suppose two agents schedule two events in the future that happen to be at the same time to the time resolution of the model. When these events are run, they can either be iterated in serial or in randomized order. Randomized order simulates the non-determinism one would expect from a truly asynchronous (parallel) realization of the model. You can indeed get artifacts / apparent causation in models depending on the details of event ordering... Marcus |
In reply to this post by David Mirly
Owen wrote:
>> My guess is any realistic solution will be hybrid, combining the >> features of all these large scale architectures. >> By designing circuits to do special purpose compute tasks, the lengths of wires can be reduced (and thus their diameter) and this is ultimately the limiting factor on serial performance and circuit density. In practice, designing these circuits is hard to automate and optimize (even with FPGAs) and even relatively general hybird computing approaches like the Cell broadband engine still require quite a bit of programming finesse (e.g. prefetching and keen awareness of memory access patterns, etc.) Of course, at some point the wire diameters can get no smaller, and we have to look to programming approaches to find parallelism. One software technology that looks promising to me is software transactional memory: http://en.wikipedia.org/wiki/Software_transactional_memory Apparently Sun is working on hardware support for it.. http://www.theregister.co.uk/2007/08/21/sun_transactional_memory_rock/ |
In reply to this post by David Mirly
EpiSims (http://ndssl.vbi.vt.edu/episims.html) is a distributed discrete
event ABM that runs on clusters (and soon on clusters of clusters on the TeraGrid: http://www.isdsjournal.org/article/view/1947). It is entirely possible to get slightly different results from to subsequent EpiSims runs using the same input data sets. As MGD points out in a previous message, parallelization can randomize the order of execution of events that are scheduled to run at the same future point in time. We have studied the "noise" produced by randomized execution order of same-time events in EpiSims, and found that they produce variations of results that are on the order of < 1% for most cases. --Doug -- Doug Roberts, RTI International droberts at rti.org doug at parrot-farm.net 505-455-7333 - Office 505-670-8195 - Cell On 10/11/07, David Mirly <mirly at comcast.net> wrote: > > I am currently in an Agent Based Simulation class and I am going to > do a report comparing and contrasting ABS > in parallel (distributed, etc.) environments vs. running a simulation > in a purely sequential environment. > > It seems obvious to me that you could get very different results from > one computational architecture vs. another. > > Does anyone have any experience with truly parallel systems in this > regard they would like to share? > > Thanks! > > > > On Oct 10, 2007, at 7:43 PM, Owen Densmore wrote: > > > "Super computing" is facing an interesting challenge with the advent > > of multi-core, multi-memory, blade/cluster/grid systems. > > > > The issue is the architecture one uses for powerful architectures. > > It's very difficult to have a generalized system that works well over > > a number of application architectures. And the choices are becoming > > larger by the minute. The newer "blade" systems offer both multi- > > processor and shared memory systems. They can be configured as > > clusters or as a sorta many processor system looking like a single > > memory system .. far easier to program. Grid systems are popular, > > and figuring out how to adapt to the latest hardware advances. > > > > My guess is any realistic solution will be hybrid, combining the > > features of all these large scale architectures. > > > > Here's the gotcha: how does it impact the programming language used? > > One wants an "agile" multi-processor, multi-memory architecture that > > can be reconfigured for advances in hardware and software. Thus far, > > there's no silver bullet. > > > > -- Owen owen at backspaces.net > > Beer is proof that God loves us, and wants us to be happy. > > > > On Oct 10, 2007, at 9:48 AM, Tom Johnson wrote: > > > >> FYI. Following on a brief discussion Tuesday at the data mining > >> session.... > >> > >> Google & IBM giving students a distributed systems lab using Hadoop > >> > >> Posted: 09 Oct 2007 04:07 PM CDT > >> > >> By Jesse Robbins > >> > >> Google & IBM have partnered to give university students hands-on > >> experience developing software for large-scale distributed systems. > >> This initiative focuses on parallel processing for large data sets > >> using Hadoop, an open source implementation of Google's MapReduce. > >> (See Tim's earlier post about Yahoo & Hadoop ) > >> > >> "The goal of this initiative is to improve computer science > >> students' knowledge of highly parallel computing practices to > >> better address the emerging paradigm of large-scale distributed > >> computing. IBM and Google are teaming up to provide hardware, > >> software and services to augment university curricula and expand > >> research horizons. With their combined resources, the companies > >> hope to lower the financial and logistical barriers for the > >> academic community to explore this emerging model of computing." > >> The project currently includes the University of Washington, > >> Carnegie-Mellon University, MIT, Stanford, UC Berkeley and the > >> University of Maryland. Students in participating classes will have > >> access to a dedicated cluster of "several hundred computers" > >> running Linux under XEN virtualization. The project is expected to > >> expand to thousands of processors and eventually be open to > >> researchers and students at other institutions. > >> > >> As part of this effort, Google and the University of Washington > >> have released a Creative Commons licensed curriculum to help teach > >> distributed systems concepts and techniques. IBM is also providing > >> Hadoop plug-ins for Eclipse. > >> > >> Note: You can also build similar systems using Hadoop with Amazon > >> EC2 . Tom White recently posted an excellent guide and Powerset has > >> been using this in production for quite some time. > >> > >> > >> --tj > >> -- > >> ========================================== > >> J. T. Johnson > >> Institute for Analytic Journalism -- Santa Fe, NM USA > >> www.analyticjournalism.com > >> 505.577.6482(c) 505.473.9646(h) > >> http://www.jtjohnson.com tom at jtjohnson.us > >> > >> "You never change things by fighting the existing reality. > >> To change something, build a new model that makes the > >> existing model obsolete." > >> -- Buckminster > >> Fuller > >> ========================================== > >> ============================================================ > >> FRIAM Applied Complexity Group listserv > >> Meets Fridays 9a-11:30 at cafe at St. John's College > >> lectures, archives, unsubscribe, maps at http://www.friam.org > > > > > > ============================================================ > > FRIAM Applied Complexity Group listserv > > Meets Fridays 9a-11:30 at cafe at St. John's College > > lectures, archives, unsubscribe, maps at http://www.friam.org > > > ============================================================ > FRIAM Applied Complexity Group listserv > Meets Fridays 9a-11:30 at cafe at St. John's College > lectures, archives, unsubscribe, maps at http://www.friam.org > An HTML attachment was scrubbed... URL: http://redfish.com/pipermail/friam_redfish.com/attachments/20071012/15cacd01/attachment.html |
Free forum by Nabble | Edit this page |