NCHPC: March 2010

Tuesday, March 30, 2010

Product Review: PGI Workstation

PGI Workstation™ is PGI's single-user scientific and engineering compilers and tools product. PGI Workstation is available in three language versions;

PGI Fortran Workstation—Fortran only
PGI C/C++ Workstation—C and C++ only
PGI Fortran/C/C++ Workstation—combined Fortran and C/C++

PGI Fortran Workstation includes The Portland Group's native parallelizing/optimizing FORTRAN 77, Fortran 90/95/03 and HPF compilers for 64-bit x64 and 32-bit x86 processor-based Linux, Apple Mac OS X and Microsoft Windows workstations. PGI Fortran Workstation provides the features, quality, and reliability necessary for developing and maintaining advanced scientific and technical applications.

PGI parallel compilers and tools harness the full power of x64+GPU systems for science and engineering applications. PGI’s industry-leading performance, reliability, native multi-core and OpenMP support, GPGPU programming, and parallel-capable graphical debugging and profiling tools provide a complete state-of-the art programming environment for scientists and engineers. PGI’s support for legacy language and programming features ensures that existing applications will port easily and quickly to the latest-generation multi-core x64+GPU processor-based systems.

PGI C/C++ Workstation includes The Portland Group's native parallelizing/optimizing OpenMP C++ and ANSI C compilers. The C++ compiler closely tracks the proposed ANSI standard and is compatible with cfront versions 2 and 3. All C++ functions are compatible with Fortran and C functions, so you can compose programs from components written in all three languages.

PGI Workstation includes the OpenMP and MPI enabled PGDBG parallel debugger and PGPROF performance profiler that can debug and profile up to eight local MPI processes. PGI Workstation also includes several versions of precompiled MPICH message passing libraries.

PGI Workstation includes a single user node-locked license for Linux, Mac OS X or Microsoft Windows. Volume packs of five or more single user node-locked licenses are also available.

Volume packs are multi-platform; licenses may be mixed by operating system up to the maximum count. PGI Server offers the same features as PGI Workstation but includes a multi-user network floating license.

PGI Workstation for both Mac OS X and Windows consists of command-level versions of the PGI compilers and both command-level and graphical versions of the PGDBG debugger and PGPROF performance profiler. An integrated development environment (IDE) is neither provided nor supported. As a separate product, PGI Visual Fortran fully integrates PGI Fortran compilers and tools into Microsoft Windows using Microsoft Visual Studio.

This product targets 64-bit x64 and 32-bit x86 workstations with one or more single core or multi-core microprocessors.

(Detailed product info can be obtained from manufacturers web pages)

Thursday, March 25, 2010

How Supercomputing is Revolutionizing Nuclear Power

Out of all the carbon-free power options, nuclear power faces some of the highest hurdles to commercial-scale deployment. The upfront costs for reactors are in the billions, the projects take years to site and build, and nuclear materials and designs have to undergo testing for decades to make sure they can be used in the field. That’s one reason why nuclear research costs a whole lot of money and the pace of innovation seems incredibly slow. But that’s also the reason why supercomputing has started to truly revolutionize the world of nuclear power innovation.

Supercomputing, or “extreme computing” as the Department of Energy described it during a workshop on computing and nuclear power last year, involves computers at the petaflop scale. It will eventually reach even exaflop scale. A computer running at a petaflop can do 1 million billion calculations in a second, and an exaflop of performance can deliver a billion billion calculations per second.

That massive amount of number crunching can help developers of nuclear power technology simulate next-generation designs of nuclear reactors, show how advanced fuels in a reactor could be consumed over time, and model more efficient waste disposal and refueling efforts. It’s all about being able to go through very complex and lengthy research and development processes much more quickly and with far less cost compared to both physical testing and using less powerful computers.

(This article sourced from the earth2tech.com and original version can be reached their web pages.)

Wednesday, March 24, 2010

XtreemOS 2.1 Release Announced

The XtreemOS consortium is pleased to announce the release of XtreemOS 2.1.
This update release will include:

Improved installer with a new xosautoconfig tool to greatly simply and automate installation of XtreemOS instances.
A number of high impact bug fixes, along with work on stability and correctness.XtreemFS 1.2, which has a number of new features along with enhanced performance and stability.
XtreemOS MD (Mobile Device) -- This new version integrates XtreemOS on Internet Tablets, beginning with the Nokia N8xx models.

This makes it possible to launch jobs and interact with XtreemOS resources via a special client with a simple single signon.

Virtual Nodes -- a framework to provide fault tolerance for grid applications by replicating them over multiple nodes.
XOSSAGA -- a set of technologies to allow you to run SAGA compliant applications on top of XtreemOS unmodified.

Downloading
An updated list of Mandriva Mirrors can be found at http://api.mandriva.com/mirrors/list.php and http://twiki.mdklinuxfaq.org/en/Mandriva_mirrors.
The ISO files for XtreemOS releases are in the folder /devel/xtreemos/iso/2.1
Changes
This release has concentrated strictly on bug-fixes and polishing. You can find the change log at http://sourceforge.net/apps/mantisbt/xtreemos/roadmap_page.php.
All users are encouraged to test the new ISO and report any issues to our bug tracker at http://sourceforge.net/apps/mantisbt/xtreemos/main_page.php.

About XtreemOS
XtreemOS 2.1 is the result of an ongoing project with 18 academic and industrial partners into the design and implementation of an open source grid operating system including native support for virtual organizations (VO) ease of use. XtreemOS is running on a wide range of hardware ranging from smartphones, PCs and Linux clusters.

A set of system services, extending those found in traditional Linux, provides users with all the grid capabilities associated with current grid middleware, but fully integrated into the OS. Based on Linux, XtreemOS provides distributed support for VOs spanning across many machines and sites along with appropriate interfaces for grid OS services.

When installed on a participating machine, the XtreemOS system provides for the grid what an operating system offers for a single computer: abstraction from the hardware and secure resource sharing between different users. XtreemOS provides for users, the vision of a large powerful single workstation environment, but removing the complex resource management issues of a grid environment.

Tuesday, March 23, 2010

SGI Octane III: Supercomputing Gets Personal

SGI Octane III takes high performance computing out of the data center and puts it at the deskside. It combines the immense power and performance capabilities of a high-performance cluster with the portability and usability of a workstation to enable a new era of personal innovation in strategic science, research, development and visualization.

In contrast with standard 2P workstations with only eight cores and moderate memory capacity, Octane III's superior design permits up to 80 high-performance cores and nearly 1TB of memory. Octane III significantly accelerates time-to-results for over 50 HPC applications and supports the latest Intel® processors to capitalize on greater levels of performance, flexibility and scalability. Pre-configured with system software, cluster set up is a breeze.

Plug It In and It Works
Octane III is office ready with a pedestal, one foot by two foot form factor, whisper quiet operation, ease of use features, low maintenance requirements and support for standard office power outlets. A single, conveniently placed button turns it on and off. Octane III enjoys the same cost saving power efficiencies inherent in all SGI Eco-Logical™ compute designs.

Easily Configurable for Deskside HPC
Octane III is optimized for your specific, high performance computing requirements and ships as a factory-tested, pre-integrated platform with broad HPC application support and arrives ready for immediate integration for a smooth out-of-the-box experience.

Octane III allows a wide variety of single and dual-socket node choices and a wide selection of performance, storage, integrated networking, and graphics and compute GPU options. The system is available as an up to ten node deskside cluster configuration or dual-node graphics workstation configurations.

Supported operating systems include:
Microsoft® Windows® HPC Server 2008, SUSE Linux® Enterprise Server, or Red Hat® Enterprise Linux. All Linux-based configurations are available with pre-loaded SGI® ProPack™ system software, SGI® Isle™ Cluster Manager and Altair PBS Professional ™ scheduler to get you up and running quickly.

A High-Performance Graphics Lineage Capitalize on advanced graphics capabilities, brought to you by the same company that invented advanced graphics capabilities. Conceived, designed and built by SGI on the shoulders of its ground-breaking workstations from the past, Octane III is available as a single-node dual-socket graphics workstation with support for the fastest NVIDIA® graphics and compute GPU cards.Octane III is optimized for use with multiple display solutions for expanded advanced visual computing scenarios.

(To access detailed product info please visit manufacturers product pages)

Monday, March 22, 2010

Moscow State University Supercomputer Has Petaflop Aspirations

The Moscow State University (MSU) supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of nearly tripling its current processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.

With a current Linpack mark of 350 teraflops (peak: 420 teraflops), Lomonosov needs to generate an additional 650 teraflops of performance to achieve its goal. No small task. So far, there are only two computers that have broken the Linpack petaflop barrier, Jaguar at Oak Ridge National Lab, which holds the number one position on the TOP500 list, and Roadrunner at Los Alamos National Lab, with the number two spot. Lomonosov ranks 12th on the most recent edition of the TOP500 list and is the largest HPC system in the CIS and Eastern Europe.

Officials at Moscow State University held a meeting this week in order to establish a budget for the petaflops revamping of the Lomonosov system. According to Russian State Duma Speaker Boris Gryzlov, MSU has prepared a feasibility study on the effectiveness of creating a petaflops supercomputer, and the matter will be brought up to the President and Chairman of the government for approval.

Total university funding in 2010 from the Russian federal government will amount to 1.5 billion rubles ($51 million). The anticipated cost of increasing the computer's performance to reach petaflop-level is about 900 million rubles, or almost $31 million, according to Moscow State University President Victor Sadovnichy. MSU has already invested 350 million rubles ($12 million) in the Lomonosov system, and the total project cost so far is 1.9 billion rubles ($65 million). MSU is ready to provide up to a quarter of the cost of hardware, said Sadovnichy.

Apparently, the amounts specified to upgrade the system refer only to the procurement and installation of equipment, and do not include system maintenance and electricity costs. Current power requirements are around 5 MW, which, according to Sadovnichy, is comparable to powering a small city.
"Lomonosov" and its predecessor "Chebyshev" are responsible for many research breakthroughs, including an inhibitor of thrombin (a substance retarding the effect of the main component of blood clotting), as well as the development of urokinase, a possible cancer treatment. In addition to these undertakings, Lomonosov has been kept busy modeling climate processes, factoring large integers to solve cryptographic problems, and calculating the noise in turbulent environments.

The renovation work for transforming Lomonosov into a petaflop system is being put to a competitive bid, but is seems likely that T-Platforms will get the contract since it is the only Russian manufacturer with the know-how to implement such a project. And there's a partiality toward assigning work to national interests. State Duma Speaker Boris Gryzlov, who backs the creation of a domestic petaflop supercomputer, prefers to support domestic producers of supercomputers, and urged caution against the procurement of foreign goods.
Mikhail Kozhevnikov, commercial director for T-Platforms, has already prepared a bid and decided upon an upgrade path for the petaflop system. The details of the proposed architecture have not been publicly declared, however a good guess would be that they're going to add new nodes based on the Westmere EP Xeon processors Intel just announced.

Specifically, since the current MSU super is based on the T-Blade2, Xeon X5570 2.93 GHz, it's not unreasonable to think they're bidding T-Blade3 blades using Xeon X5670 2.93 GHz parts (note, the T-Blade3 don't actually exist yet). Since the new Xeons only deliver about 40 percent more computational performance per blade than the existing ones, they'll still need a bunch more servers. Alternatively, they could be thinking about upgrading with the upcoming NVIDIA Fermi GPU server boards, due out in May. That would get them to a petaflop with a lot less hardware. (A dual-socket X5670 server would yield about 250 DP gigaflops; a 4-GPU Fermi server would probably deliver over 2 DP teraflops.)

Russian Prime Minister Vladimir Putin has allocated 1.1 billion rubles ($37 million) to develop supercomputer technologies in Russia, according to a recent APA report, further demonstrating Russia's desire to possess a world-class computer system, one that may be capable of a place among the top 5 of the revered TOP500 list. Barring any unforeseen circumstances, it looks like the Lomonosov upgrade will go forward, and Russia will take its place on the exclusive short-list of petaflop systems. But, in HPC, the final goal is always a moving target, as other groups also race for the coveted petaflops level and beyond.

(This article sourced from HPCwire.)

Sunday, March 21, 2010

Hazelcast: The Art of Data Distribution

Hazelcast is an open source clustering and highly scalable data distribution platform for Java, which is:

Lightening-fast; thousands of operations per sec.
Fail-safe; no losing data after crashes.
Dynamically scales as new servers added.
Super-easy to deploy and use (include a single jar).

Hazelcast is pure Java. JVMs that are running Hazelcast will dynamically cluster. Although by default Hazelcast will use multicast for discovery, it can also be configured to only use TCP/IP for environments where multicast is not available or preferred. The program is released under Apache License and the project is hosted at Google Code. It can be freely used in commercial or non-commercial applications.

When Hazelcast?

Hazelcast will help you when you need to:

Share data/state among many servers (e.g. web session sharing)
Cache your data (distributed cache)
Cluster your application
Provide secure communication among servers
Partition your in-memory data
Distribute workload onto many servers
Take advantage of parallel processing
Provide fail-safe data management

(For more info please visit the their documentation page.)

Friday, March 19, 2010

PRACE Grants 4.3 Million Core Hours to Prototype Systems

Six projects, two from France and one from Norway, Denmark, UK, and the Netherlands, have been granted access to the PRACE (Partnership for Advanced Computing in Europe) prototype systems. These projects will spend a total of 4,311,272 core hours on the PRACE prototypes. So far, PRACE has granted a total of over 8.7 million core hours on the PRACE prototypes.

The purpose of this granting access is to enable future Tier-0 users to assess the prototypes and to prepare their applications for the petaflop infrastructure. The evaluation process has focused on technical feasibility and the expected benefits of the tests both for PRACE and the prototype users.

Headlines of projects;

Cryptanalytic Performance Evaluation
Solar Atmospheric Modelling
Ab Initio Calculation of Complex Doping of a Photovoltaic Material
Porting MESO-NH to PRACE Prototype
Incompact3d: High Performance Computing for Turbulence
Porting of GADGET2 to GPUs

More information about PRACE prototypes can be found at their web pages.

Tuesday, March 16, 2010

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.

For the HPC crowd, the performance improvements are the big story. Thanks in large part to the 32nm transistor size, Intel was able to incorporate six cores and 12 MB of L3 cache on a single die -- a 50 percent increase compared to the Xeon 5500 parts. According to Intel, that translated into a 20 to 60 percent boost in application performance and 40 percent better performance per watt.

Using the high performance Linpack benchmark, Intel is reporting a 61 percent improvement for a 6-core 5600 compared its 4-core Xeon 5500 predecessor (146 gigaflops versus 91 gigaflops). You might be wondering how this was accomplished, given that the 5600 comes with only 50 percent more cores and cache. It turns out that Intel's comparison was based on its two top-of-the line Xeon chips from each processor family. The 146 gigaflops result was delivered by a X5680 processor, which runs a 3.33 GHz and has a TDP of 130 watts, while the 91 gigaflops mark was turned in by the X5570 processor, which runs at 2.93 GHz and has a TDP of 95 watts. Correcting for clock speed, the 5600 Linpack would be something closer to 128 gigaflops, representing a still-respectable 41 percent boost.

Intel also reported performance improvements across a range of technical HPC workloads. These include a 20 percent boost on memory bandwidth (using Stream-MP), a 21 percent average improvement with a number of CAE codes, a 44 percent average improvement for life science codes, and a 63 percent improvement using a Black Scholes financial benchmark. These results also reflect the same 3.33/2.93 GHz clock speed bias discussed in the Linpack test, so your mileage may vary.

Looking at the performance per watt metric, the new 5600 chips also have a clear edge. An apples-to-apples comparison of the X5570 (2.93 GHz, 95 watt) and x5670 (2.93 GHz, 95 watts), has the latter chip delivering 40 percent more performance per watt. That's to be expected since two extra cores are available on the X5670 to do extra work.

Intel is also offering low-power 40 and 60 watt versions of the 5600 alongside the mainstream 80, 95, and 130 watt offerings. These low-power versions would be especially useful where energy consumption, rather than performance, is the driving factor. For example, a 60 watt L5640 matches the raw performance of a 95 watt X5570, potentially saving 30 percent in power consumption. Intel is even offering a 30 watt L3406, aimed at the single-processor microserver segment. Other power-saving goodies that come with the 5600 include a more efficient Turbo Boost and memory power management facility, automated low power states for six cores, and support for lower power DDR3 memory.

The Xeon 5600 parts are socket-compatible with the 5500 processors and can use the same chipsets, making a smooth upgrade path for system OEMs. Like their 5500 predecessors, the 5600s support DDR3 memory to the tune of three memory channels per socket. Practically speaking, that means two cores share a memory channel when all six cores are running full blast.

The enterprise market will be pleased by the new on-chip security features in the 5600 architecture. First, there is the new AES instructions for accelerating database encryption, whole disk encryption and secure internet transactions. The 5600 also offers what Intel is calling Trusted Execution Technology (TXT). TXT can be used to prevent the insertion of malicious VM software at bootup in a virtualized cloud computing environment.

Although the 5600 family will bring Intel into the mainstream six-core server market, the company is offering new four-core parts as well. In fact, the fastest clock is owned by the X5677, a quad-core processor that tops out at 3.46 GHz. These top-of-the-line four-core versions might find a happy home with many HPC users, in particular where single-threaded application performance is paramount. This would be especially true for workloads that tend to be memory-bound, since in this case more cores might actually drag down performance by incurring processing overhead while waiting for a memory channel to open up.

Intel's marketing strategy for the Xeon 5600 is not that different from its 5500 sales pitch: improved processor efficiencies generate quick payback on the investment. For the 5600, the claim is that you can replace 15 racks of single-core Xeons with a single rack of the new chips, that is, as long as you don't need any more performance. Intel is touting a five-month payback for this performance-neutral upgrade.

On the other hand, if you need 15 times the performance, you can do a 1:1 replacement of your single-core servers and still realize about eight percent in energy savings. But since software support and server warranty costs dominate maintenance expenses, any energy savings might get swallowed up by these other costs.
Intel says it is keeping the prices on the 5600 processors in line with the Xeon 5500s, although the new processor series spans a wider range of offerings. At the low end, you have the L3406, a 30 watt 2.26 GHz part with four cores just 4 MB of L3. It goes for just $189. At the top end are the six-core X5680 and the four-core X5677, both of which are offered at $1,663. Prices quoted are in quantities of 1,000.

In conjunction with Intel's launch, a number of HPC OEMs are also announcing new systems based on the Xeon 5600 series. For example, Cray announced its CX1 line of deskside machines will now come with the new chips. SGI is also incorporating the new Xeons into its portfolio, including the Altix ICE clusters, the InfiniteStorage servers, and the Octane III personal super. SGI will also use the new chips in its just-announced Origin 400 workgroup blade solution. IBM, HP and Dell are likewise rolling out new x86 servers based on the 5600 processors.

AMD is looking to trump Intel's latest Xeon offerings with its new Magny-Cours 6100 series Opteron processors, which are set to launch at the end of the month. The new Opterons come in 8- and 12-core flavors and are debuting alongside AMD's new G34 chipset. Although the Opterons lack the HyperThreading technology of the Xeons, the additional physical cores and fourth memory channel should make up for this. Also unlike the 5600 architecture, the Opteron 6100 support both 2-socket and 4-socket systems, giving server makers additional design flexibility. In any case, the x86 rivalry is quickly heating up as the two chipmakers battle for market share in 2010.

(This article sourced from the HPC Wire and original text can be found their web pages)

Monday, March 15, 2010

DEISA PRACE Symposium 2010

DEISA, the Distributed European Infrastructure for Supercomputing Applications, and PRACE, the Partnership for Advanced Computing in Europe, are inviting again to their joint annual science symposium as an important European HPC event: The DEISA PRACE Symposium 2010 which will take place from May 10 to May 12 in Barcelona, Spain.

(Registration and more info can be found DEISA web pages.)

Wednesday, March 10, 2010

HPC Training Course: 5 - 6 May 2010, University College, London, UK

Introduction to High Performance Computing.

Introduction to the DEISA Infrastructure.

DEISA is running two training courses at University College, London, in early May 2010. Both will be based around a number of practical programming exercises. No prior knowledge is assumed for either of the courses.

The first course on Wednesday 5th May is an "Introduction to High Performance Computing". It will cover the fundamentals of modern HPC architectures and the two major parallel programming models: shared variables and message passing. Practical sessions will involve running existing parallel programs to investigate issues such as performance and scalability.

The second course on Thursday 6th May is an "Introduction to the DEISA Infrastructure". This will cover the basic aspects of the DEISA distributed supercomputer environment and the software tools that are used to access it, including the Application Hosting Environment (AHE). Practical sessions will involve installing software on the desktop and using it to access the DEISA systems.

Courses are available free for academic attendees. If the courses become over-subscribed, preference will be given to members of the Virtual Physiological Human Network of Excellence.

Those attending are encouraged to use their own laptops for both courses.

(To register, please fill in the form at their web pages)

Friday, March 5, 2010

NCSA to provide Ember as shared-memory resource for nation's researchers

The National Center for Supercomputing Applications (NCSA) will soon deploy a new highly parallel shared memory supercomputer, called Ember. With a peak performance of 16 teraflops, Ember doubles the performance of its predecessor, the five-year-old Cobalt system.

Ember will be available to researchers through the National Science Foundation's TeraGrid until that program concludes in March 2011 and then will be allocated through its successor, the eXtreme Digital program.

(The full story can be reached at NCSA's site)

Thursday, March 4, 2010

SC10 is now accepting submissions for its technical program.

SC10, the premier international conference on high-performance computing, networking, storage and analysis, is now accepting submissions for its technical program. The 23rd annual conference in the series, SC10 will take place in New Orleans, Louisiana from November 13-19, 2010. Over 11,000 attendees from industry, academia and government are anticipated.

Drawing on expertise from the international HPC community, SC10 will build on over two decades of success offering a broad spectrum of technical presentations and discussions including rigorously peer-reviewed papers, panels, tutorials, workshops and posters showcasing the latest findings from laboratories and research institutions around the world.

This year, the technical program encourages participants to focus on one of three thrust areas to be featured prominently at the conference: climate simulation, heterogeneous computing and data-intensive computing.

Climate simulation spotlights the tremendous importance of research in global climate change, including HPC-based climate simulation techniques which help scientists understand global warming, climate change and other environmental processes.

SC10’s other thrusts highlight important emerging HPC technologies. Heterogeneous computing covers the technological and research advances in software that are required for accelerator-based computing, which is now occurring on large-scale machines and could propel supercomputing to the exascale level, where machines are capable of running a million trillion calculations per second.

As scientists depend more and more on supercomputing in their research, they are generating massive amounts of data that must be shared, stored and analyzed by teams of remotely located collaborators. This global trend underlines the importance of data-intensive computing, SC10s third main thrust, highlighting research into innovative solutions for managing data across distributed high-performance computing systems, especially hardware and software requirements for effective data transfer.

Submissions for most areas of the SC10 technical program will be accepted beginning March 1. Technical paper abstracts are due April 2 and final papers as well as submissions for Tutorials and the ACM Gordon Bell Prize are due April 5.

Other immediate submissions deadlines include: Workshops, which are due April 15, 2010; the Student Cluster Competition, which is due by April 16, 2010; as well as Panel submissions, which are due April 26, 2010.

All submissions can be made online via: https://submissions.supercomputing.org/

For the entire list of technical program deadlines, visit:
http://sc10.supercomputing.org/?pg=dates.html

For any questions about the Technical program, email: program (at) info.supercomputing (dot) org

About SC10
SC10, sponsored by IEEE Computer Society and ACM (Association for Computing Machinery) offers a complete technical education program and exhibition to showcase the many ways high-performance computing, networking, storage and analysis lead to advances in scientific discovery, research, education and commerce. This premier international conference includes a globally attended technical program, workshops, tutorials, a world class exhibit area, demonstrations and opportunities for hands-on learning. For more information on SC10, please visit http://sc10.supercomputing.org

Wednesday, March 3, 2010

Cray's Custom Engineering Group to Work with Microsoft Research on Cloud Computing

Cray Inc. announced its custom engineering group will work with Microsoft Research to explore and prototype a system that could provide a glimpse into the future of cloud computing infrastructure. The initiative represents the custom engineering group's first breakthrough into the commercial market.

The objective of the technology development initiative is to design a supercomputing architecture that dramatically lowers the total cost of ownership for cloud computing datacenters. Cray's custom engineering group will design a system infrastructure that combines super efficient power delivery, high-density packaging and innovative cooling technologies. This solution is intended to significantly reduce facility, power and hardware costs.

Cray's custom engineering group delivers "technology-led" professional services. The group designs and delivers customized computing, data management and consulting solutions developed specifically to fit the individual needs of the customer. Cray is an innovation-driven company, and the custom engineering group provides customers with the ability to leverage Cray's research and development expertise and more than 25 years of broad supercomputing experience to develop unique solutions when currently available technology will not achieve a customer's requirements.

(This news summarized from the HPCwire and full text pages can be reached their site)

Tuesday, March 2, 2010

Fixstars Launches Linux for CUDA

Multicore software specialist Fixstars Corporation has released Yellow Dog Enterprise Linux (YDEL) for CUDA, the first commercial Linux distribution for GPU computing. The OS is aimed at HPC customers using NVIDIA GPU hardware to accelerate their vanilla Linux clusters, and is designed to lower the overall cost of system deployment, the idea being to bring these still-exotic systems into the mainstream.

The problem is that the majority of future HPC accelerated deployments is destined to be GPU-based, rather than Cell-based. While Cell had a brief fling with HPC stardom as the processor that powered the first petaflop system -- the Roadrunner supercomputer at Los Alamos National Lab -- IBM has signaled it will not continue development of the Cell architecture for HPC applications. With NVIDIA's steady evolution of its HPC portfolio, propelled by the popularity of its CUDA development environment, general-purpose GPU computing is now positioned to be the most widely used accelerator technology for high performance computing. The upcoming "Fermi" GPU-based boards (Q3 2010) substantially increase the GPU's double precision capability, add error corrected memory, and include hardware support for C++ features.

Which brings us back to Fixstars. The company's new YDEL for CUDA offering is aimed squarely at filling what it sees as a growing market for turnkey GPU-accelerated HPC on x86 clusters. Up until now, customers either built their own Linux-CUDA environments or relied upon system OEMs to provide the OS integration as part of the system. That might be fine for experimenters and big national labs who love to tweak Linux and don't mind shuffling hardware drivers and OS kernels, but commercial acceptance will necessitate a more traditional model.

One of the challenges is that Red Hat and other commercial Linux distributions are generally tuned for mass market enterprise applications: large database and Web servers, in particular. In this type of setup, HPC workloads won't run as efficiently as they could. With YDEL, Fixstars modified the Red Hat kernel to support a more supercomputing-like workload. The result, according to Owen Stampflee, Fixstars' Linux Product Manager (and Terra Soft alum), is a 5 to10 percent performance improvement on HPC apps compared to other commercial Linux distributions.

Fixstars is selling YDEL for CUDA as a typical enterprise distribution, which in this case means the CUDA SDK, hardware drivers, and Linux kernel pieces are bundled together and preconfigured for HPC. A product license includes Fixstars support for both Linux and CUDA. The product contains multiple versions of CUDA, which can be selected at runtime via a setting in a configuration file or an environment variable. In addition, the YDEL comes with an Eclipse-based graphical IDE for CUDA programming. To complete the picture, Fixstars also offers end-user training and seminars on CUDA application development.

(This news summarized from the HPCwire and full text pages can be reached their site)

NCHPC