Thursday, May 6, 2010

SGI Announces Next-Generation Altix ICE Scale-Out Supercomputer

SGI, a global leader in HPC and datacenter solutions, today announced the immediate availability of Altix ICE 8400, the next generation of the award-winning Altix ICE scale-out high performance computing (HPC) blade platform. SGI also announced today it has submitted world record breaking SPEC benchmark performance results with Altix ICE 8400, demonstrating its readiness to implement multi-petascale supercomputers.

Altix ICE 8400 contains significant enhancements over prior Altix ICE generations, including through and through Quad Data Rate InfiniBand interconnect. SGI's elegantly-designed integrated backplane sports up to three times the link-to-node bandwidth relative to competitive QDR InfiniBand clusters to maximize performance where most job traffic occurs. Altix ICE 8400 supports up to 128 processors (1,536 cores) per cabinet with support for 130W CPU sockets. Three on-board Mellanox ConnectX-2 InfiniBand HCA compute blade configurations are also supported, and include single-port, dual-port or two single-port chipsets.

Altix ICE 8400, with its innovative blade design, easily and affordably scales to up to 65,536 compute nodes with integrated single or dual plane InfiniBand backplane interconnect. Open x86 architecture makes it equally simple to deploy commercial, open source or custom applications on completely unmodified Novell SUSE or Red Hat Linux operating systems.

Altix ICE 8400 easily meets the needs of the world's largest supercomputing deployments. Recognized for its design win at the NASA Advanced Supercomputing (NAS) facility at Ames Research Center, SGI helps Pleiades Supercomputer, the world's largest InfiniBand cluster, scale with an additional 32 cabinets of Altix ICE 8400 to nearly a petaflop. The system fully leverages SGI's hypercube topology to enable seamless cabinet-level upgrades without any production downtime, saving millions of core hours in the process.

(Full version of this article can be obtained from HPCwire's web pages)

Friday, April 23, 2010

Ceph: The Distributed File System Creature from the Object Lagoon

The last two years have seen a large number of file systems added to the kernel with many of them maturing to the point where they are useful, reliable, and in production in some cases. In the run up to the 2.6.34 kernel, Linus recently added the Ceph client. What is unique about Ceph is that it is a distributed parallel file system promising scalability and performance, something that NFS lacks.

High-level view of Ceph

One might ask about the origin of Ceph since it is somewhat unusual. Ceph is really short for Cephalopod which is the class of moulluscs to which the octopus belongs. So it’s really short for octopus, sort of. If you want more detail, talk a look at the Wikipedia article about Ceph. Now that name has been partially explained, let’s look at the file system.

Ceph was started by Sage Weil for his PhD dissertation at the University of California, Santa Cruz in the Storage Systems Research Center in the Jack Baskin School of Engineering. The lab is funded by the DOE/NNSA involving LLNL (Lawrence Livermore National Labs), LANL (Los Alamos National Labs), and Sandia National Laboratories. He graduated in the fall of 2007 and has kept developing Ceph. As mentioned previously, his efforts have been rewarded with the integration of the Ceph client into the upcoming 2.6.34 kernel.

The design goals of Ceph are to create a POSIX file system (or close to POSIX) that is scalable, reliable, and has very good performance. To reach these goals Ceph has the following major features:
  • It is object-based
  • It decouples metadata and data (many parallel file systems do this as well)
  • It uses a dynamic distributed metadata approach
These three features and how they are implemented are at the core of Ceph (more on that in the next section).

However, probably the most fundamental core assumption in the design of Ceph is that large-scale storage systems are dynamic and there are guaranteed to be failures. The first part of the assumption, assuming storage systems are dynamic, means that storage hardware is added and removed and the workloads on the system are changing. Included in this assumption is that it is presumed there will be hardware failures and the file system needs to adaptable and resilient.

(Full version of this article can be obtained from Linux Magazine's web pages)

Friday, April 16, 2010

New Cray OS Brings ISVs in for a Soft Landing

Cray has never made a big deal about the custom Linux operating system it packages with its XT supercomputing line. In general, companies don't like to tout proprietary OS environments since they tend to lock custom codes in and third-party ISV applications out. But the third generation Cray Linux Environment (CLE3) that the company announced  is designed to make elite supercomputing an ISV-friendly experience.

Besides adding compatibility to off-the-shelf ISV codes, which we'll get to in a moment, the newly-minted Cray OS contains a number of other enhancements. In the performance realm, CLE3 increases overall scalability to greater than 500,000 cores (up from 200,000 in CLE2), adds Lustre 1.8 support, and includes some advanced scheduler features. Cray also added a feature called "core specialization," which allows the user to pin a single core on the node to the OS and devote the remainder to application code. According to Cray, on some types of codes, this can bump performance 10 to 20 percent. CLE3 also brings with it some additional reliability features, including NodeKARE, a diagnostic capability that makes sure jobs are running on healthy nodes.

But the biggest new feature added to CLE3 is compatibility with standard HPC codes from independent software vendors (ISVs). This new capability has the potential to open up a much broader market for Cray's flagship XT product line, and further blur the line between proprietary supercomputers and traditional HPC clusters.

Cray has had an on-again off-again relationship with HPC software vendors. Many of the established ISVs in this space grew up alongside Cray Research, and software from companies like CEI, LSTC, SIMULIA, and CD-adapco actually ran on the original Cray Research machines. Over time, these vendors migrated to standard x86 Linux and Windows systems, which became their prime platforms, and dropped products that required customized solutions for supercomputers. Cray left most of the commercial ISVs behind as it focused on high-end HPC and custom applications.

Programming Environment of CLE
The CLE programming environment includes tools designed to complement and enhance each other, resulting in a rich, easy-to-use programming environment that facilitates the development of scalable applications.
  • Parallel programming models: MPI, SHMEM, UPC, OpenMP, and Co-Array Fortran within the node
  • MPI 2.0 standard, optimized to take advantage of the scalable interconnect in the Cray XT system
  • Various MPI libraries supported under Cluster Compatibility Mode
  • Optimized C, C++, UPC, Fortran90, and Fortran 2003 compilers
  • High-performance optimized math libraries of BLAS, FFTs, LAPACK, ScaLAPACK, SuperLU, and Cray Scientiific Libraries
  • Cray Apprentice2 performance analysis tools 

(Full version of this article can be obtained from HPCwire's web pages)

Monday, April 12, 2010

Product Review: Cray CX1000™ High(brid) Performance Computers

The Cray CX1000 series is a dense, power efficient and supremely powerful rack-mounted supercomputer featuring best-of-class technologies that can be mixed-and-matched in a single rack creating a customized hybrid computing platform to meet a variety of scientific workloads.

Cray is announced the Cray CX1000 system; a dense, power efficient and supremely powerful rack-mounted supercomputer that allows you to leverage the latest Intel® Xeon® processors for:
  • Scale-out cluster computing using dual-socket Intel Xeon 5600s (Cray CX1000-C)
  • Scale-through (GPU) computing leveraging NVIDIA Tesla® (Cray CX1000-G)
  • Scale-up computing with SMP nodes built on Intel’s QuickPath Interconnect (QPI) technology offering "fat memory" nodes (Cray CX1000-S)
High(brid) Performance Computing – The Cray CX1000 redefines HPC by delivering hybrid capabilities through a choice of chassis, each delivering one of the most important architectures of the next decade.

Cray CX1000-C Chassis
The compute-based Cray CX1000-C chassis includes 18 dual-socket Intel Xeon 5600 blades with an integrated 36-port QDR InfiniBand switch and a 24-port Gigabit Ethernet switch – all in 7U. With support for Windows® HPC Server 2008 or Red Hat Linux via the Cray Cluster Manager, the Cray CX1000-C system provides outstanding support for ISV applications as well as dual-boot capability for ultimate application flexibility. The Cray CX1000-C system maintains Cray's "Ease of Everything" approach by incorporating blades, switches and cabling all within a single chassis. The result is an easy-to-install system with compelling capabilities for scale-out high performance computing.
  • Two high-frequency Intel® Xeon® 5600 series processors (up to 2.93 GHz)
  • Large memory capacity (up to 48GB memory per blade with 4GB DDR3 DIMMs)
  • One SATA HDD or one SSD drive or diskless
Cray CX1000-G Chassis

The GPU-based Cray CX1000-G chassis delivers nine double-width, dual-socket Intel Xeon 5600 blades, each incorporating two NVIDIA Tesla GPUs. Cray CX1000-G systems allow users to maximize GPU performance with its unique architecture by eliminating I/O bottlenecks – an industry first. These 7U systems include an integrated 36-port QDR InfiniBand switch and a 24-port Gigabit Ethernet switch. The Cray CX1000-G system is the best solution to your density limitations by offering 18 NVIDIA Tesla GPUs in a 7U form factor. Combining Intel Xeon 5600 performance with NVIDIA Tesla-based acceleration offers true hybrid computing options.
  • Double-width blade
  • Two Intel® Xeon® 5600 series processors
  • Two NVIDIA® Tesla® M1060 GPUs
  • Up to 48GB or memory per blade with 8GB DDR3 DIMMs
  • Two ConnectX adapters providing single QDR IB channel
Cray CX1000-S Chassis
The SMP-based Cray CX1000-S server is offered in two configurations, offering up to 128 Intel® Xeon® 7500 series processors and 1 TB of memory in a 6U system. The Cray CX1000-SC compute node is made up of uniquely designed 1.5U "Building Blocks", each housing 32 cores interconnected using Intel QPI. The Cray CX1000-SM management node is a 3U server with four Intel Xeon 7500 series processors (32 cores) and up to 256 GB of memory.
  • Coherency switch – a proprietary feature based on Intel QPI technology allowing scalability from a single "building block" of 32 cores up to a maximum of 4 "building blocks" with 128 cores in 6U
  • Up to 1TB of memory (with 8GB DIMMS)
  • Support for applications requiring extensive I/O capacity

(The more information about this product can be obtained from Cray's product pages)

IO Profiling of Applications: strace_analyzer

Strace is a very useful tool for examining the IO profile of applications, as it comes standard on every Linux distro. However, as we’ll see in this article, strace can produce hundreds of thousands of lines of output. Trying to develop statistics and trends from a files of this size is virtually impossible to do by hand.

In this article, we will take a look at a tool to do a statistical analysis on strace output: strace_analyzer. This tool can take an individual strace file that has been created with the “-T -ttt” options and produce a statistical analysis of the IO portion of the strace. It also produces data files and .csv (comma delimited files for spreadsheets) files that can be used for plotting.

(Full version of this article can be obtained from Linux Magazine's web pages)

Book Review: The OpenCL Programming Book

Fixstars Corporation announces a book which starts with the basics of parallelization, covers the main concepts, grammar, and setting up a development environment for OpenCL, concluding with source-code walkthroughs of the FFT and Mersenne Twister algorithms written in OpenCL. It is highly recommended for those wishing to get started on programming in OpenCL.

(The pricing and more information can be obtained Fixstars web pages.)

Wednesday, April 7, 2010

Host-Based Processing Eliminates Scaling Issues for InfiniBand Fabrics

Scientific, engineering, and research facilities rely on InfiniBand fabrics because they offer the highest available bandwidth and the lowest available latency. But depending on the design of the InfiniBand HCAs, this advantage can be squandered as the number of compute nodes scales up into the hundreds or thousands. One of the main challenges in efficient scaling is how and where InfiniBand protocol is processed.

Adapter-based vs. host-based processing
There are two basic ways to handle protocol processing, and the choice can make a huge difference in overall fabric performance, particularly as a cluster scales. Some vendors rely heavily on adapter-based ('on-load) processing techniques, in which each InfiniBand host channel adapter (HCA) includes an embedded microprocessor that processes the communications protocols. Other vendors primarily use host-based processing, in which the server processes the communications protocols. In the early days of InfiniBand clusters, a typical server may have had just one or two single- or dual-core processors. With the ability to issue one instruction per second at a relatively low clock rate, these servers benefitted from having communications processing offloaded to the host channel adapter.

(Full version of this article can be obtained from HPCwire's web pages)

Friday, April 2, 2010

Red Hat Focuses New RHEL 5.5 on Multicore

Open-source enterprise software company Red Hat has updated its flagship operating system, Red Hat Enterprise Linux (RHEL), to take full advantage of the latest spoils from the heated microprocessor battle between Advanced Micro Devices and Intel.

RHEL version 5.5, released Wednesday, has been reconfigured for Intel's just-released eight-core Nehalem-EX and AMD's almost-as-recently released 12-core "Magny-Cours" Opteron 6100 Series processors. The software also supports the IBM eight-core Power7 processors, released in February.
RHEL 5.5 also now supports Single Root I/O Virtualization (SR-IOV), a specification that allows multiple virtual guests to better share PCI hardware resources and I/O devices. While some I/O-intensive applications, such as database servers, can experience as much as a 30 percent reduction in performance when virtualized, these new technologies could reduce that latency to as little as 5 percent.

Beyond support for the new round of multicore releases, RHEL 5.5 has a number of other new features as well. It has been updated to extend Active Directory integration, through the use of the latest version of Samba file- and print-sharing software. Also, for the first time, RHEL's version of SystemTap can trace the run-time performance of C++ applications (much like Oracle's DTrace does for Solaris' applications).
RHEL 5.5 also aggregates all the bug fixes and maintenance patches since the release of RHEL 5.4, released last September.

RHEL 5.5 is available for download for subscribers.

(This news sourced from the and full version can be reached their web pages)

Thursday, April 1, 2010

AMD Launches Intel Counter-Assault with New Opteron Chips

AMD has officially launched its Opteron 6100 series processors, code-named "Magny-Cours." Available in 8-core and 12-core flavors, the new 6100 parts are targeted for 2P and 4P server duty and are being pitched against Intel's latest high-end Xeon silicon: the 6-core Westmere EP processor for 2P servers and the upcoming 8-core Nehalem EX processor for 4P-and-above servers.

With the 6100 launch, AMD's battle with Intel for the high-end x86 server market enters a new era. In the two-socket server space, Intel's Westmere EP retains the speed title, clock-frequency-wise. At the same time, Nehalem EX, due to be announced tomorrow, will give Intel exclusive ownership of the 8P-and-above x86 server market. Meanwhile, AMD will use Magny-Cours to try to outmaneuver Intel with better price-performance and performance-per-watt on two-socket and four-socket machines.

While Intel can still deliver faster cores on its Westmere EP, thanks in part to its 32nm process technology, AMD, with its 45nm technology, has opted to go for more cores that run proportionally slower. The fastest Westmere EP CPUs top out at 3.33 GHz for the 6-core version and 3.46 GHz for the 4-core version. In contrast, the speediest 12-core and 8-core Magny-Cours come in at 2.3 GHz and 2.4 GHz respectively.

The $266 to $1,386 price spread for Magny-Cours will look especially attractive for large-scale 4P setups compared to the more expensive Nehalem EX. (As of Monday, prices on the EX series have not been announced, but are expected to range between $800 to $3,600.) For HPC deployments in particular, where hundreds or thousands of nodes are involved, the up-front cost savings are likely to be significant.

(This article summarized from the HPCwire and full version can be reached their web pages)

Tuesday, March 30, 2010

Product Review: PGI Workstation

PGI Workstation™ is PGI's single-user scientific and engineering compilers and tools product. PGI Workstation is available in three language versions;
  • PGI Fortran Workstation—Fortran only 
  • PGI C/C++ Workstation—C and C++ only 
  • PGI Fortran/C/C++ Workstation—combined Fortran and C/C++ 
PGI Fortran Workstation includes The Portland Group's native parallelizing/optimizing FORTRAN 77, Fortran 90/95/03 and HPF compilers for 64-bit x64 and 32-bit x86 processor-based Linux, Apple Mac OS X and Microsoft Windows workstations. PGI Fortran Workstation provides the features, quality, and reliability necessary for developing and maintaining advanced scientific and technical applications.

PGI parallel compilers and tools harness the full power of x64+GPU systems for science and engineering applications. PGI’s industry-leading performance, reliability, native multi-core and OpenMP support, GPGPU programming, and parallel-capable graphical debugging and profiling tools provide a complete state-of-the art programming environment for scientists and engineers. PGI’s support for legacy language and programming features ensures that existing applications will port easily and quickly to the latest-generation multi-core x64+GPU processor-based systems.

PGI C/C++ Workstation includes The Portland Group's native parallelizing/optimizing OpenMP C++ and ANSI C compilers. The C++ compiler closely tracks the proposed ANSI standard and is compatible with cfront versions 2 and 3. All C++ functions are compatible with Fortran and C functions, so you can compose programs from components written in all three languages.

PGI Workstation includes the OpenMP and MPI enabled PGDBG parallel debugger and PGPROF performance profiler that can debug and profile up to eight local MPI processes. PGI Workstation also includes several versions of precompiled MPICH message passing libraries.
PGI Workstation includes a single user node-locked license for Linux, Mac OS X or Microsoft Windows. Volume packs of five or more single user node-locked licenses are also available.

Volume packs are multi-platform; licenses may be mixed by operating system up to the maximum count. PGI Server offers the same features as PGI Workstation but includes a multi-user network floating license.

PGI Workstation for both Mac OS X and Windows consists of command-level versions of the PGI compilers and both command-level and graphical versions of the PGDBG debugger and PGPROF performance profiler. An integrated development environment (IDE) is neither provided nor supported. As a separate product, PGI Visual Fortran fully integrates PGI Fortran compilers and tools into Microsoft Windows using Microsoft Visual Studio.

This product targets 64-bit x64 and 32-bit x86 workstations with one or more single core or multi-core microprocessors.

(Detailed product info can be obtained from manufacturers web pages)

Thursday, March 25, 2010

How Supercomputing is Revolutionizing Nuclear Power

Out of all the carbon-free power options, nuclear power faces some of the highest hurdles to commercial-scale deployment. The upfront costs for reactors are in the billions, the projects take years to site and build, and nuclear materials and designs have to undergo testing for decades to make sure they can be used in the field. That’s one reason why nuclear research costs a whole lot of money and the pace of innovation seems incredibly slow. But that’s also the reason why supercomputing has started to truly revolutionize the world of nuclear power innovation.

Supercomputing, or “extreme computing” as the Department of Energy described it during a workshop on computing and nuclear power last year, involves computers at the petaflop scale. It will eventually reach even exaflop scale. A computer running at a petaflop can do 1 million billion calculations in a second, and an exaflop of performance can deliver a billion billion calculations per second.

That massive amount of number crunching can help developers of nuclear power technology simulate next-generation designs of nuclear reactors, show how advanced fuels in a reactor could be consumed over time, and model more efficient waste disposal and refueling efforts. It’s all about being able to go through very complex and lengthy research and development processes much more quickly and with far less cost compared to both physical testing and using less powerful computers.

(This article sourced from the and original version can be reached their web pages.)

Wednesday, March 24, 2010

XtreemOS 2.1 Release Announced

The XtreemOS consortium is pleased to announce the release of XtreemOS 2.1.
This update release will include:
  • Improved installer with a new xosautoconfig tool to greatly simply and automate installation of XtreemOS instances.
  • A number of high impact bug fixes, along with work on stability and correctness.XtreemFS 1.2, which has a number of new features along with enhanced performance and stability.
  • XtreemOS MD (Mobile Device) -- This new version integrates XtreemOS on Internet Tablets, beginning with the Nokia N8xx models.
This makes it possible to launch jobs and interact with XtreemOS resources via a special client with a simple single signon.
  • Virtual Nodes -- a framework to provide fault tolerance for grid applications by replicating them over multiple nodes.
  • XOSSAGA -- a set of technologies to allow you to run SAGA compliant applications on top of XtreemOS unmodified.
An updated list of Mandriva Mirrors can be found at and
The ISO files for XtreemOS releases are in the folder /devel/xtreemos/iso/2.1
This release has concentrated strictly on bug-fixes and polishing. You can find the change log at
All users are encouraged to test the new ISO and report any issues to our bug tracker at

About XtreemOS
 XtreemOS 2.1 is the result of an ongoing project with 18 academic and industrial partners into the design and implementation of an open source grid operating system including native support for virtual organizations (VO) ease of use. XtreemOS is running on a wide range of hardware ranging from smartphones, PCs and Linux clusters.

A set of system services, extending those found in traditional Linux, provides users with all the grid capabilities associated with current grid middleware, but fully integrated into the OS. Based on Linux, XtreemOS provides distributed support for VOs spanning across many machines and sites along with appropriate interfaces for grid OS services.

When installed on a participating machine, the XtreemOS system provides for the grid what an operating system offers for a single computer: abstraction from the hardware and secure resource sharing between different users. XtreemOS provides for users, the vision of a large powerful single workstation environment, but removing the complex resource management issues of a grid environment.

Tuesday, March 23, 2010

SGI Octane III: Supercomputing Gets Personal

SGI Octane III takes high performance computing out of the data center and puts it at the deskside. It combines the immense power and performance capabilities of a high-performance cluster with the portability and usability of a workstation to enable a new era of personal innovation in strategic science, research, development and visualization.

In contrast with standard 2P workstations with only eight cores and moderate memory capacity, Octane III's superior design permits up to 80 high-performance cores and nearly 1TB of memory. Octane III significantly accelerates time-to-results for over 50 HPC applications and supports the latest Intel® processors to capitalize on greater levels of performance, flexibility and scalability. Pre-configured with system software, cluster set up is a breeze.

Plug It In and It Works
Octane III is office ready with a pedestal, one foot by two foot form factor, whisper quiet operation, ease of use features, low maintenance requirements and support for standard office power outlets. A single, conveniently placed button turns it on and off. Octane III enjoys the same cost saving power efficiencies inherent in all SGI Eco-Logical™ compute designs.

Easily Configurable for Deskside HPC
Octane III is optimized for your specific, high performance computing requirements and ships as a factory-tested, pre-integrated platform with broad HPC application support and arrives ready for immediate integration for a smooth out-of-the-box experience.

Octane III allows a wide variety of single and dual-socket node choices and a wide selection of performance, storage, integrated networking, and graphics and compute GPU options. The system is available as an up to ten node deskside cluster configuration or dual-node graphics workstation configurations.

Supported operating systems include:
Microsoft® Windows® HPC Server 2008, SUSE Linux® Enterprise Server, or Red Hat® Enterprise Linux. All Linux-based configurations are available with pre-loaded SGI® ProPack™ system software, SGI® Isle™ Cluster Manager and Altair PBS Professional ™ scheduler to get you up and running quickly.
A High-Performance Graphics Lineage Capitalize on advanced graphics capabilities, brought to you by the same company that invented advanced graphics capabilities. Conceived, designed and built by SGI on the shoulders of its ground-breaking workstations from the past, Octane III is available as a single-node dual-socket graphics workstation with support for the fastest NVIDIA® graphics and compute GPU cards.Octane III is optimized for use with multiple display solutions for expanded advanced visual computing scenarios.

(To access detailed product info please visit manufacturers product pages)

Monday, March 22, 2010

Moscow State University Supercomputer Has Petaflop Aspirations

The Moscow State University (MSU) supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of nearly tripling its current processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.

With a current Linpack mark of 350 teraflops (peak: 420 teraflops), Lomonosov needs to generate an additional 650 teraflops of performance to achieve its goal. No small task. So far, there are only two computers that have broken the Linpack petaflop barrier, Jaguar at Oak Ridge National Lab, which holds the number one position on the TOP500 list, and Roadrunner at Los Alamos National Lab, with the number two spot. Lomonosov ranks 12th on the most recent edition of the TOP500 list and is the largest HPC system in the CIS and Eastern Europe.

Officials at Moscow State University held a meeting this week in order to establish a budget for the petaflops revamping of the Lomonosov system. According to Russian State Duma Speaker Boris Gryzlov, MSU has prepared a feasibility study on the effectiveness of creating a petaflops supercomputer, and the matter will be brought up to the President and Chairman of the government for approval.

Total university funding in 2010 from the Russian federal government will amount to 1.5 billion rubles ($51 million). The anticipated cost of increasing the computer's performance to reach petaflop-level is about 900 million rubles, or almost $31 million, according to Moscow State University President Victor Sadovnichy. MSU has already invested 350 million rubles ($12 million) in the Lomonosov system, and the total project cost so far is 1.9 billion rubles ($65 million). MSU is ready to provide up to a quarter of the cost of hardware, said Sadovnichy.

Apparently, the amounts specified to upgrade the system refer only to the procurement and installation of equipment, and do not include system maintenance and electricity costs. Current power requirements are around 5 MW, which, according to Sadovnichy, is comparable to powering a small city.
"Lomonosov" and its predecessor "Chebyshev"  are responsible for many research breakthroughs, including an inhibitor of thrombin (a substance retarding the effect of the main component of blood clotting), as well as the development of urokinase, a possible cancer treatment. In addition to these undertakings, Lomonosov has been kept busy modeling climate processes, factoring large integers to solve cryptographic problems, and calculating the noise in turbulent environments.

The renovation work for transforming Lomonosov into a petaflop system is being put to a competitive bid, but is seems likely that T-Platforms will get the contract since it is the only Russian manufacturer with the know-how to implement such a project. And there's a partiality toward assigning work to national interests. State Duma Speaker Boris Gryzlov, who backs the creation of a domestic petaflop supercomputer, prefers to support domestic producers of supercomputers, and urged caution against the procurement of foreign goods.
Mikhail Kozhevnikov, commercial director for T-Platforms, has already prepared a bid and decided upon an upgrade path for the petaflop system. The details of the proposed architecture have not been publicly declared, however a good guess would be that they're going to add new nodes based on the Westmere EP Xeon processors Intel just announced.

Specifically, since the current MSU super is based on the T-Blade2, Xeon X5570 2.93 GHz, it's not unreasonable to think they're bidding T-Blade3 blades using Xeon X5670 2.93 GHz parts (note, the T-Blade3 don't actually exist yet). Since the new Xeons only deliver about 40 percent more computational performance per blade than the existing ones, they'll still need a bunch more servers. Alternatively, they could be thinking about upgrading with the upcoming NVIDIA Fermi GPU server boards, due out in May. That would get them to a petaflop with a lot less hardware. (A dual-socket X5670 server would yield about 250 DP gigaflops; a 4-GPU Fermi server would probably deliver over 2 DP teraflops.)

Russian Prime Minister Vladimir Putin has allocated 1.1 billion rubles ($37 million) to develop supercomputer technologies in Russia, according to a recent APA report, further demonstrating Russia's desire to possess a world-class computer system, one that may be capable of a place among the top 5 of the revered TOP500 list. Barring any unforeseen circumstances, it looks like the Lomonosov upgrade will go forward, and Russia will take its place on the exclusive short-list of petaflop systems. But, in HPC, the final goal is always a moving target, as other groups also race for the coveted petaflops level and beyond.

(This article sourced from HPCwire.)

Sunday, March 21, 2010

Hazelcast: The Art of Data Distribution

Hazelcast is an open source clustering and highly scalable data distribution platform for Java, which is:
  • Lightening-fast; thousands of operations per sec.
  • Fail-safe; no losing data after crashes.
  • Dynamically scales as new servers added.
  • Super-easy to deploy and use (include a single jar).

Hazelcast is pure Java. JVMs that are running Hazelcast will dynamically cluster. Although by default Hazelcast will use multicast for discovery, it can also be configured to only use TCP/IP for environments where multicast is not available or preferred. The program is released under Apache License and the project is hosted at Google Code. It can be freely used in commercial or non-commercial applications.

When Hazelcast?

Hazelcast will help you when you need to:
  • Share data/state among many servers (e.g. web session sharing)
  • Cache your data (distributed cache)
  • Cluster your application
  • Provide secure communication among servers
  • Partition your in-memory data
  • Distribute workload onto many servers
  • Take advantage of parallel processing
  • Provide fail-safe data management
(For more info please visit the their documentation page.)

Friday, March 19, 2010

PRACE Grants 4.3 Million Core Hours to Prototype Systems

Six projects, two from France and one from Norway, Denmark, UK, and the Netherlands, have been granted access to the PRACE (Partnership for Advanced Computing in Europe) prototype systems. These projects will spend a total of 4,311,272 core hours on the PRACE prototypes. So far, PRACE has granted a total of over 8.7 million core hours on the PRACE prototypes.

The purpose of this granting access is to enable future Tier-0 users to assess the prototypes and to prepare their applications for the petaflop infrastructure. The evaluation process has focused on technical feasibility and the expected benefits of the tests both for PRACE and the prototype users.

Headlines of projects;
  • Cryptanalytic Performance Evaluation
  • Solar Atmospheric Modelling
  • Ab Initio Calculation of Complex Doping of a Photovoltaic Material
  • Porting MESO-NH to PRACE Prototype
  • Incompact3d: High Performance Computing for Turbulence
  • Porting of GADGET2 to GPUs

More information about PRACE prototypes can be found at their web pages.

Tuesday, March 16, 2010

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.

For the HPC crowd, the performance improvements are the big story. Thanks in large part to the 32nm transistor size, Intel was able to incorporate six cores and 12 MB of L3 cache on a single die -- a 50 percent increase compared to the Xeon 5500 parts. According to Intel, that translated into a 20 to 60 percent boost in application performance and 40 percent better performance per watt.

Using the high performance Linpack benchmark, Intel is reporting a 61 percent improvement for a 6-core 5600 compared its 4-core Xeon 5500 predecessor (146 gigaflops versus 91 gigaflops). You might be wondering how this was accomplished, given that the 5600 comes with only 50 percent more cores and cache. It turns out that Intel's comparison was based on its two top-of-the line Xeon chips from each processor family. The 146 gigaflops result was delivered by a X5680 processor, which runs a 3.33 GHz and has a TDP of 130 watts, while the 91 gigaflops mark was turned in by the X5570 processor, which runs at 2.93 GHz and has a TDP of 95 watts. Correcting for clock speed, the 5600 Linpack would be something closer to 128 gigaflops, representing a still-respectable 41 percent boost.

Intel also reported performance improvements across a range of technical HPC workloads. These include a 20 percent boost on memory bandwidth (using Stream-MP), a 21 percent average improvement with a number of CAE codes, a 44 percent average improvement for life science codes, and a 63 percent improvement using a Black Scholes financial benchmark. These results also reflect the same 3.33/2.93 GHz clock speed bias discussed in the Linpack test, so your mileage may vary.

Looking at the performance per watt metric, the new 5600 chips also have a clear edge. An apples-to-apples comparison of the X5570 (2.93 GHz, 95 watt) and x5670 (2.93 GHz, 95 watts), has the latter chip delivering 40 percent more performance per watt. That's to be expected since two extra cores are available on the X5670 to do extra work.

Intel is also offering low-power 40 and 60 watt versions of the 5600 alongside the mainstream 80, 95, and 130 watt offerings. These low-power versions would be especially useful where energy consumption, rather than performance, is the driving factor. For example, a 60 watt L5640 matches the raw performance of a 95 watt X5570, potentially saving 30 percent in power consumption. Intel is even offering a 30 watt L3406, aimed at the single-processor microserver segment. Other power-saving goodies that come with the 5600 include a more efficient Turbo Boost and memory power management facility, automated low power states for six cores, and support for lower power DDR3 memory.

The Xeon 5600 parts are socket-compatible with the 5500 processors and can use the same chipsets, making a smooth upgrade path for system OEMs. Like their 5500 predecessors, the 5600s support DDR3 memory to the tune of three memory channels per socket. Practically speaking, that means two cores share a memory channel when all six cores are running full blast.

The enterprise market will be pleased by the new on-chip security features in the 5600 architecture. First, there is the new AES instructions for accelerating database encryption, whole disk encryption and secure internet transactions. The 5600 also offers what Intel is calling Trusted Execution Technology (TXT). TXT can be used to prevent the insertion of malicious VM software at bootup in a virtualized cloud computing environment.

Although the 5600 family will bring Intel into the mainstream six-core server market, the company is offering new four-core parts as well. In fact, the fastest clock is owned by the X5677, a quad-core processor that tops out at 3.46 GHz. These top-of-the-line four-core versions might find a happy home with many HPC users, in particular where single-threaded application performance is paramount. This would be especially true for workloads that tend to be memory-bound, since in this case more cores might actually drag down performance by incurring processing overhead while waiting for a memory channel to open up.

Intel's marketing strategy for the Xeon 5600 is not that different from its 5500 sales pitch: improved processor efficiencies generate quick payback on the investment. For the 5600, the claim is that you can replace 15 racks of single-core Xeons with a single rack of the new chips, that is, as long as you don't need any more performance. Intel is touting a five-month payback for this performance-neutral upgrade.

On the other hand, if you need 15 times the performance, you can do a 1:1 replacement of your single-core servers and still realize about eight percent in energy savings. But since software support and server warranty costs dominate maintenance expenses, any energy savings might get swallowed up by these other costs.
Intel says it is keeping the prices on the 5600 processors in line with the Xeon 5500s, although the new processor series spans a wider range of offerings. At the low end, you have the L3406, a 30 watt 2.26 GHz part with four cores just 4 MB of L3. It goes for just $189. At the top end are the six-core X5680 and the four-core X5677, both of which are offered at $1,663. Prices quoted are in quantities of 1,000.

In conjunction with Intel's launch, a number of HPC OEMs are also announcing new systems based on the Xeon 5600 series. For example, Cray announced its CX1 line of deskside machines will now come with the new chips. SGI is also incorporating the new Xeons into its portfolio, including the Altix ICE clusters, the InfiniteStorage servers, and the Octane III personal super. SGI will also use the new chips in its just-announced Origin 400 workgroup blade solution. IBM, HP and Dell are likewise rolling out new x86 servers based on the 5600 processors.

AMD is looking to trump Intel's latest Xeon offerings with its new Magny-Cours 6100 series Opteron processors, which are set to launch at the end of the month. The new Opterons come in 8- and 12-core flavors and are debuting alongside AMD's new G34 chipset. Although the Opterons lack the HyperThreading technology of the Xeons, the additional physical cores and fourth memory channel should make up for this. Also unlike the 5600 architecture, the Opteron 6100 support both 2-socket and 4-socket systems, giving server makers additional design flexibility. In any case, the x86 rivalry is quickly heating up as the two chipmakers battle for market share in 2010.

(This article sourced from the HPC Wire and original text can be found their web pages)

Monday, March 15, 2010

DEISA PRACE Symposium 2010

DEISA, the Distributed European Infrastructure for Supercomputing Applications, and PRACE, the Partnership for Advanced Computing in Europe, are inviting again to their joint annual science symposium as an important European HPC event: The DEISA PRACE Symposium 2010 which will take place from May 10 to May 12 in Barcelona, Spain.

(Registration and more info can be found DEISA web pages.)

Wednesday, March 10, 2010

HPC Training Course: 5 - 6 May 2010, University College, London, UK

  • Introduction to High Performance Computing.
  • Introduction to the DEISA Infrastructure.

DEISA is running two training courses at University College, London, in early May 2010. Both will be based around a number of practical programming exercises. No prior knowledge is assumed for either of the courses.

The first course on Wednesday 5th May is an "Introduction to High Performance Computing". It will cover the fundamentals of modern HPC architectures and the two major parallel programming models: shared variables and message passing. Practical sessions will involve running existing parallel programs to investigate issues such as performance and scalability.

The second course on Thursday 6th May is an "Introduction to the DEISA Infrastructure". This will cover the basic aspects of the DEISA distributed supercomputer environment and the software tools that are used to access it, including the Application Hosting Environment (AHE). Practical sessions will involve installing software on the desktop and using it to access the DEISA systems.

Courses are available free for academic attendees. If the courses become over-subscribed, preference will be given to members of the Virtual Physiological Human Network of Excellence.

Those attending are encouraged to use their own laptops for both courses.

(To register, please fill in the form at their web pages)

Friday, March 5, 2010

NCSA to provide Ember as shared-memory resource for nation's researchers

The National Center for Supercomputing Applications (NCSA) will soon deploy a new highly parallel shared memory supercomputer, called Ember. With a peak performance of 16 teraflops, Ember doubles the performance of its predecessor, the five-year-old Cobalt system.

Ember will be available to researchers through the National Science Foundation's TeraGrid until that program concludes in March 2011 and then will be allocated through its successor, the eXtreme Digital program.

(The full story can be reached at NCSA's site)

Thursday, March 4, 2010

SC10 is now accepting submissions for its technical program.

SC10, the premier international conference on high-performance computing, networking, storage and analysis, is now accepting submissions for its technical program. The 23rd annual conference in the series, SC10 will take place in New Orleans, Louisiana from November 13-19, 2010. Over 11,000 attendees from industry, academia and government are anticipated.

Drawing on expertise from the international HPC community, SC10 will build on over two decades of success offering a broad spectrum of technical presentations and discussions including rigorously peer-reviewed papers, panels, tutorials, workshops and posters showcasing the latest findings from laboratories and research institutions around the world.

This year, the technical program encourages participants to focus on one of three thrust areas to be featured prominently at the conference: climate simulation, heterogeneous computing and data-intensive computing.

Climate simulation spotlights the tremendous importance of research in global climate change, including HPC-based climate simulation techniques which help scientists understand global warming, climate change and other environmental processes.

SC10’s other thrusts highlight important emerging HPC technologies. Heterogeneous computing covers the technological and research advances in software that are required for accelerator-based computing, which is now occurring on large-scale machines and could propel supercomputing to the exascale level, where machines are capable of running a million trillion calculations per second.

As scientists depend more and more on supercomputing in their research, they are generating massive amounts of data that must be shared, stored and analyzed by teams of remotely located collaborators. This global trend underlines the importance of data-intensive computing, SC10s third main thrust, highlighting research into innovative solutions for managing data across distributed high-performance computing systems, especially hardware and software requirements for effective data transfer.

Submissions for most areas of the SC10 technical program will be accepted beginning March 1. Technical paper abstracts are due April 2 and final papers as well as submissions for Tutorials and the ACM Gordon Bell Prize are due April 5.

Other immediate submissions deadlines include: Workshops, which are due April 15, 2010; the Student Cluster Competition, which is due by April 16, 2010; as well as Panel submissions, which are due April 26, 2010.

All submissions can be made online via:

For the entire list of technical program deadlines, visit:

For any questions about the Technical program, email: program (at) info.supercomputing (dot) org

About SC10
SC10, sponsored by IEEE Computer Society and ACM (Association for Computing Machinery) offers a complete technical education program and exhibition to showcase the many ways high-performance computing, networking, storage and analysis lead to advances in scientific discovery, research, education and commerce. This premier international conference includes a globally attended technical program, workshops, tutorials, a world class exhibit area, demonstrations and opportunities for hands-on learning. For more information on SC10, please visit

Wednesday, March 3, 2010

Cray's Custom Engineering Group to Work with Microsoft Research on Cloud Computing

Cray Inc. announced its custom engineering group will work with Microsoft Research to explore and prototype a system that could provide a glimpse into the future of cloud computing infrastructure. The initiative represents the custom engineering group's first breakthrough into the commercial market.

The objective of the technology development initiative is to design a supercomputing architecture that dramatically lowers the total cost of ownership for cloud computing datacenters. Cray's custom engineering group will design a system infrastructure that combines super efficient power delivery, high-density packaging and innovative cooling technologies. This solution is intended to significantly reduce facility, power and hardware costs.

Cray's custom engineering group delivers "technology-led" professional services. The group designs and delivers customized computing, data management and consulting solutions developed specifically to fit the individual needs of the customer. Cray is an innovation-driven company, and the custom engineering group provides customers with the ability to leverage Cray's research and development expertise and more than 25 years of broad supercomputing experience to develop unique solutions when currently available technology will not achieve a customer's requirements.

(This news summarized from the HPCwire and full text pages can be reached their site)

Tuesday, March 2, 2010

Fixstars Launches Linux for CUDA

Multicore software specialist Fixstars Corporation has released Yellow Dog Enterprise Linux (YDEL) for CUDA, the first commercial Linux distribution for GPU computing. The OS is aimed at HPC customers using NVIDIA GPU hardware to accelerate their vanilla Linux clusters, and is designed to lower the overall cost of system deployment, the idea being to bring these still-exotic systems into the mainstream.

The problem is that the majority of future HPC accelerated deployments is destined to be GPU-based, rather than Cell-based. While Cell had a brief fling with HPC stardom as the processor that powered the first petaflop system -- the Roadrunner supercomputer at Los Alamos National Lab -- IBM has signaled it will not continue development of the Cell architecture for HPC applications. With NVIDIA's steady evolution of its HPC portfolio, propelled by the popularity of its CUDA development environment, general-purpose GPU computing is now positioned to be the most widely used accelerator technology for high performance computing. The upcoming "Fermi" GPU-based boards (Q3 2010) substantially increase the GPU's double precision capability, add error corrected memory, and include hardware support for C++ features.

Which brings us back to Fixstars. The company's new YDEL for CUDA offering is aimed squarely at filling what it sees as a growing market for turnkey GPU-accelerated HPC on x86 clusters. Up until now, customers either built their own Linux-CUDA environments or relied upon system OEMs to provide the OS integration as part of the system. That might be fine for experimenters and big national labs who love to tweak Linux and don't mind shuffling hardware drivers and OS kernels, but commercial acceptance will necessitate a more traditional model.

One of the challenges is that Red Hat and other commercial Linux distributions are generally tuned for mass market enterprise applications: large database and Web servers, in particular. In this type of setup, HPC workloads won't run as efficiently as they could. With YDEL, Fixstars modified the Red Hat kernel to support a more supercomputing-like workload. The result, according to Owen Stampflee, Fixstars' Linux Product Manager (and Terra Soft alum), is a 5 to10 percent performance improvement on HPC apps compared to other commercial Linux distributions.

Fixstars is selling YDEL for CUDA as a typical enterprise distribution, which in this case means the CUDA SDK, hardware drivers, and Linux kernel pieces are bundled together and preconfigured for HPC. A product license includes Fixstars support for both Linux and CUDA. The product contains multiple versions of CUDA, which can be selected at runtime via a setting in a configuration file or an environment variable. In addition, the YDEL comes with an Eclipse-based graphical IDE for CUDA programming. To complete the picture, Fixstars also offers end-user training and seminars on CUDA application development.

(This news summarized from the HPCwire and full text pages can be reached their site)

Thursday, February 25, 2010

Helsinki to recycle excess heat from data center

Helsinki public energy company Helsingin Energia will recycle heat from a new data center to help generate energy and deliver hot water for the Finnish capital city.

The recycled heat from the data center, being built by IT and telecom services company Academica, could add about 1 percent to the total energy generated by Helsingin Energia's system in the summer.

The data center is located in an old bomb shelter and is connected to Helsingin Energia's district heating system, which works by pumping boiling water through a system of pipes to households in Helsinki.

The plan calls for the data center to first get cold water from Helsingin Energia's system. The water then goes through the data center to cool down the equipment. Next, the now warmer water flows to a pump that heats the water and sends it into the district heating system. The pump also cools the water and sends it back to the data center.

The ability of the heat pump to both heat and cool water is what makes it special. The pump is also very efficient.

The data center will go live at the end of January, and will at first measure 500 square meters.

Academica had always planned to use water to cool the data center and lower electricity bills for customers. The idea to recycle excess energy came later. However, recycling could end up playing an important role.

(This story summarized from ITworld and full story can be reached their web pages)

Saturday, February 20, 2010

JRT Offering the Tesla Workstation

JRT's new Tesla Workstation delivers the accelerated multi-core processing power. Designed to deliver groundbreaking performance, and power efficiency for compute and graphics intensive environments, the new JRT Tesla Workstation lets you create, design, render, and analyze, without compromise.

The new JRT Tesla Workstation offers outstanding performance and incredible graphics and memory up to 64 GB for technical and graphic intensive computing. The Tesla Workstation supports up to two 64-bit Dual/Quad-Core Intel Xeon 5200/5400 series processors and supports the full NVIDIA Quadro graphics and Tesla accelerator product lines. Designed with all new performance architecture for the research - critical, compute-intensive and graphically demanding workstation environments.

The JRT Tesla Workstation offers the latest high-end graphics cards that gives high level graphics performance for the most demanding visual applications in industries such as oil and gas, CAD, animation and 3D modeling.

Key Features

  • Dual / Quad-Core Intel Xeon Processors
  • Up to 64 GB of Memory
  • Dual PCI Express x16 Slot
  • High Performance NVIDIA Quadro Graphics Card
  • Up to 8 TB of Hot-Swap Storage
  • Whisper Quiet Workstation (28 dB)
  • NVIDIA Tesla C1060 Computing Processor

(For more information visit the product pages)

Thursday, February 18, 2010

A Strategic Application Collaboration for Molecular Dynamics

Over the last two decades, an increasing number of chemists have turned to the computer to predict the results of experiments beforehand or to help interpret the results of experiments. Skepticism on the part of laboratory chemists has gradually evaporated as the computational results have made contact with, and even anticipated, experimental findings. When the 1998 Nobel Prize in Chemistry was awarded recently to two scientists, Walter Kohn and John Pople, who originated some of the first successful methods in computational chemistry, the award was seen as an affirmation of the value of computational chemistry to the field of chemistry.

"We've come a long way," said Peter Kollman of the Department of Pharmaceutical Chemistry at UC San Francisco (UCSF). "But while we've come a long way, we can see that we've still got a long way to go."

Now, as part of an NPACI Strategic Application Collaboration, AMBER's performance is being improved by 50 percent to 65 percent.

AMBER stands for Assisted Model Building with Energy Refinement. The code's successes include its use to study protein folding, to study the relative free energies of binding of two ligands to a given host (or two hosts to a given ligand), to investigate the sequence-dependent stability of proteins and nucleic acids, and to find the relative solvation free energies of different molecules in various liquids. Hundreds of contributions to the scientific literature reflect the use of AMBER.

(This news summarized from the San Diego Super Computing Center and original full text can be reached their web site)

Tuesday, February 16, 2010

Appro HyperPower™ Cluster - Featuring Intel Xeon CPU and NVIDIA® Tesla™ GPU computing technologies

The amount of raw data needed to process research analysis in drug discoveries, oil and gas exploration, and computational finance create a huge demand for computing power. In addition, the 3D visualization analysis data has grown a lot in recent years moving visualization centers from the desktop to GPU clusters. With the need of performance and memory capacities, Appro clusters and supercomputers are ideal architectures combined with the latest CPUs and GPU's based on NVIDIA® Tesla™ computing technologies. It delivers best performance at lower cost and fewer systems than standard CPU-only clusters. With 240-processor computing core per GPU, C-language development environment for the GPU, a suite of developer tools as well as the world’s largest GPU computing ISV development community, the Appro HyperPower GPU clusters allow scientific and technical professionals the opportunity to test and experiment their ability to develop applications faster and to deploy them across multiple generations of processors.

The Appro HyperPower cluster features high density 1U servers based on Intel® Xeon® processors and NVIDIA® Tesla™ GPU cards onboard. It also includes interconnect switches for node-to-node communication, master node, and clustering software all integrated in a 42U standard rack configuration. It supports up to 304 CPU cores and 18,240 GPU cores with up to 78TF single/6.56 TF double precision GPU performance. By using fewer systems than standard CPU-only clusters, the HyperPower delivers more computing power in an ultra dense architecture at a lower cost.

In addition, the Appro HyperPower cluster gives customers a choice of configurations with open-source commercially supported cluster management solutions that can easily be tested and pre-integrated as a part of a complete package to include HPC professional services and support.

Ideal Environment:
Ideal solution for small and medium size HPC Deployments. The target markets are Government, Research Labs, Universities and vertical industries such as Oil and Gas, Financial and Bioinformatics where the most computationally-intensive applications are needed.

Installed Software
The Appro HyperPower is preconfigured with the following software:
- Redhat Enterprise Linux 5.x, 64-bit
- CUDA 2.2 Toolkit and SDK
- Clustering software (Rocks Roll)

CUDA Applications
The CUDA-based Tesla GPUs give speed-ups of up to 250x on applications ranging from MATLAB to computational fluid dynamics, molecular dynamics, quantum chemistry, imaging, signal processing, bioinformatics, and so on. Click here to learn more about these speedups with links to application downloads.,

(This news sourced from Appro Ltd. and can be reached their web site)

Intel stretches HPC dev tools across chubby clusters

SC11 Supercomputing hardware and software vendors are getting impatient for the SC11 supercomputing conference in Seattle, which kick...