Showing posts with label Intel. Show all posts
Showing posts with label Intel. Show all posts

Wednesday, November 9, 2011

Intel stretches HPC dev tools across chubby clusters



SC11 Supercomputing hardware and software vendors are getting impatient for the SC11 supercomputing conference in Seattle, which kicks off next week. More than a few have jumped the gun with product announcements this week, including chipmaker Intel.

No, Intel is not going to launch its "Sandy Bridge-EP" Xeon E5 processors, which are expected early next year. But the new Cluster Studio XE toolset for HPC customers will help those lucky few HPC and cloud shops that have been able to get systems this year to squeeze more performance out of their Xeon E5 clusters.

The Cluster Studio XE stack includes a slew of Intel tools for creating, tuning, and monitoring parallel applications running on x86-based parallel clusters. Intel had already been selling a set of application tools called Cluster Studio, which bundled up the chip giant's C, C++, and Fortran compilers, its rendition of the message passing interface (MPI) messaging protocol that allows server nodes to share work, and various math and multithreading libraries to goose the performance of applications.

With the XE (Extended Edition) of the HPC cluster tools, Intel is goosing the performance of the MPI library, and claims its MPI 4.0.3 stack is anywhere from 3.3 to 6.5 times as fast as the OpenMPI 1.5.4 and MVAPICH2 1.6 MPI stacks from the open source community. Benchmark tests were done on a 64-node system running 768 processes and linked by InfiniBand switches.

Intel tested the Platform Computing MPI 8.1.1 stack against the three MPI stacks listed above, only this time on an eight-mode system; in this case the performance differences between Intel and Platform (which is now owned by IBM) were not huge. With the Microsoft MPI 3.2 stack on the same iron, the Intel MPI stack running on Windows servers was anywhere from 2.17 to 2.74 times faster than the Microsoft MPI.

Read full story at theregister.co.uk

Monday, April 12, 2010

Product Review: Cray CX1000™ High(brid) Performance Computers


The Cray CX1000 series is a dense, power efficient and supremely powerful rack-mounted supercomputer featuring best-of-class technologies that can be mixed-and-matched in a single rack creating a customized hybrid computing platform to meet a variety of scientific workloads.

Cray is announced the Cray CX1000 system; a dense, power efficient and supremely powerful rack-mounted supercomputer that allows you to leverage the latest Intel® Xeon® processors for:
  • Scale-out cluster computing using dual-socket Intel Xeon 5600s (Cray CX1000-C)
  • Scale-through (GPU) computing leveraging NVIDIA Tesla® (Cray CX1000-G)
  • Scale-up computing with SMP nodes built on Intel’s QuickPath Interconnect (QPI) technology offering "fat memory" nodes (Cray CX1000-S)
High(brid) Performance Computing – The Cray CX1000 redefines HPC by delivering hybrid capabilities through a choice of chassis, each delivering one of the most important architectures of the next decade.


Cray CX1000-C Chassis
The compute-based Cray CX1000-C chassis includes 18 dual-socket Intel Xeon 5600 blades with an integrated 36-port QDR InfiniBand switch and a 24-port Gigabit Ethernet switch – all in 7U. With support for Windows® HPC Server 2008 or Red Hat Linux via the Cray Cluster Manager, the Cray CX1000-C system provides outstanding support for ISV applications as well as dual-boot capability for ultimate application flexibility. The Cray CX1000-C system maintains Cray's "Ease of Everything" approach by incorporating blades, switches and cabling all within a single chassis. The result is an easy-to-install system with compelling capabilities for scale-out high performance computing.
  • Two high-frequency Intel® Xeon® 5600 series processors (up to 2.93 GHz)
  • Large memory capacity (up to 48GB memory per blade with 4GB DDR3 DIMMs)
  • One SATA HDD or one SSD drive or diskless
Cray CX1000-G Chassis

The GPU-based Cray CX1000-G chassis delivers nine double-width, dual-socket Intel Xeon 5600 blades, each incorporating two NVIDIA Tesla GPUs. Cray CX1000-G systems allow users to maximize GPU performance with its unique architecture by eliminating I/O bottlenecks – an industry first. These 7U systems include an integrated 36-port QDR InfiniBand switch and a 24-port Gigabit Ethernet switch. The Cray CX1000-G system is the best solution to your density limitations by offering 18 NVIDIA Tesla GPUs in a 7U form factor. Combining Intel Xeon 5600 performance with NVIDIA Tesla-based acceleration offers true hybrid computing options.
  • Double-width blade
  • Two Intel® Xeon® 5600 series processors
  • Two NVIDIA® Tesla® M1060 GPUs
  • Up to 48GB or memory per blade with 8GB DDR3 DIMMs
  • Two ConnectX adapters providing single QDR IB channel
Cray CX1000-S Chassis
The SMP-based Cray CX1000-S server is offered in two configurations, offering up to 128 Intel® Xeon® 7500 series processors and 1 TB of memory in a 6U system. The Cray CX1000-SC compute node is made up of uniquely designed 1.5U "Building Blocks", each housing 32 cores interconnected using Intel QPI. The Cray CX1000-SM management node is a 3U server with four Intel Xeon 7500 series processors (32 cores) and up to 256 GB of memory.
  • Coherency switch – a proprietary feature based on Intel QPI technology allowing scalability from a single "building block" of 32 cores up to a maximum of 4 "building blocks" with 128 cores in 6U
  • Up to 1TB of memory (with 8GB DIMMS)
  • Support for applications requiring extensive I/O capacity

(The more information about this product can be obtained from Cray's product pages)

Friday, April 2, 2010

Red Hat Focuses New RHEL 5.5 on Multicore

Open-source enterprise software company Red Hat has updated its flagship operating system, Red Hat Enterprise Linux (RHEL), to take full advantage of the latest spoils from the heated microprocessor battle between Advanced Micro Devices and Intel.

RHEL version 5.5, released Wednesday, has been reconfigured for Intel's just-released eight-core Nehalem-EX and AMD's almost-as-recently released 12-core "Magny-Cours" Opteron 6100 Series processors. The software also supports the IBM eight-core Power7 processors, released in February.
RHEL 5.5 also now supports Single Root I/O Virtualization (SR-IOV), a specification that allows multiple virtual guests to better share PCI hardware resources and I/O devices. While some I/O-intensive applications, such as database servers, can experience as much as a 30 percent reduction in performance when virtualized, these new technologies could reduce that latency to as little as 5 percent.


Beyond support for the new round of multicore releases, RHEL 5.5 has a number of other new features as well. It has been updated to extend Active Directory integration, through the use of the latest version of Samba file- and print-sharing software. Also, for the first time, RHEL's version of SystemTap can trace the run-time performance of C++ applications (much like Oracle's DTrace does for Solaris' applications).
RHEL 5.5 also aggregates all the bug fixes and maintenance patches since the release of RHEL 5.4, released last September.

RHEL 5.5 is available for download for subscribers.

(This news sourced from the pcworld.com and full version can be reached their web pages)

Thursday, April 1, 2010

AMD Launches Intel Counter-Assault with New Opteron Chips

AMD has officially launched its Opteron 6100 series processors, code-named "Magny-Cours." Available in 8-core and 12-core flavors, the new 6100 parts are targeted for 2P and 4P server duty and are being pitched against Intel's latest high-end Xeon silicon: the 6-core Westmere EP processor for 2P servers and the upcoming 8-core Nehalem EX processor for 4P-and-above servers.

With the 6100 launch, AMD's battle with Intel for the high-end x86 server market enters a new era. In the two-socket server space, Intel's Westmere EP retains the speed title, clock-frequency-wise. At the same time, Nehalem EX, due to be announced tomorrow, will give Intel exclusive ownership of the 8P-and-above x86 server market. Meanwhile, AMD will use Magny-Cours to try to outmaneuver Intel with better price-performance and performance-per-watt on two-socket and four-socket machines.

While Intel can still deliver faster cores on its Westmere EP, thanks in part to its 32nm process technology, AMD, with its 45nm technology, has opted to go for more cores that run proportionally slower. The fastest Westmere EP CPUs top out at 3.33 GHz for the 6-core version and 3.46 GHz for the 4-core version. In contrast, the speediest 12-core and 8-core Magny-Cours come in at 2.3 GHz and 2.4 GHz respectively.

The $266 to $1,386 price spread for Magny-Cours will look especially attractive for large-scale 4P setups compared to the more expensive Nehalem EX. (As of Monday, prices on the EX series have not been announced, but are expected to range between $800 to $3,600.) For HPC deployments in particular, where hundreds or thousands of nodes are involved, the up-front cost savings are likely to be significant.


(This article summarized from the HPCwire and full version can be reached their web pages)

Tuesday, March 16, 2010

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.

For the HPC crowd, the performance improvements are the big story. Thanks in large part to the 32nm transistor size, Intel was able to incorporate six cores and 12 MB of L3 cache on a single die -- a 50 percent increase compared to the Xeon 5500 parts. According to Intel, that translated into a 20 to 60 percent boost in application performance and 40 percent better performance per watt.

Using the high performance Linpack benchmark, Intel is reporting a 61 percent improvement for a 6-core 5600 compared its 4-core Xeon 5500 predecessor (146 gigaflops versus 91 gigaflops). You might be wondering how this was accomplished, given that the 5600 comes with only 50 percent more cores and cache. It turns out that Intel's comparison was based on its two top-of-the line Xeon chips from each processor family. The 146 gigaflops result was delivered by a X5680 processor, which runs a 3.33 GHz and has a TDP of 130 watts, while the 91 gigaflops mark was turned in by the X5570 processor, which runs at 2.93 GHz and has a TDP of 95 watts. Correcting for clock speed, the 5600 Linpack would be something closer to 128 gigaflops, representing a still-respectable 41 percent boost.

Intel also reported performance improvements across a range of technical HPC workloads. These include a 20 percent boost on memory bandwidth (using Stream-MP), a 21 percent average improvement with a number of CAE codes, a 44 percent average improvement for life science codes, and a 63 percent improvement using a Black Scholes financial benchmark. These results also reflect the same 3.33/2.93 GHz clock speed bias discussed in the Linpack test, so your mileage may vary.

Looking at the performance per watt metric, the new 5600 chips also have a clear edge. An apples-to-apples comparison of the X5570 (2.93 GHz, 95 watt) and x5670 (2.93 GHz, 95 watts), has the latter chip delivering 40 percent more performance per watt. That's to be expected since two extra cores are available on the X5670 to do extra work.

Intel is also offering low-power 40 and 60 watt versions of the 5600 alongside the mainstream 80, 95, and 130 watt offerings. These low-power versions would be especially useful where energy consumption, rather than performance, is the driving factor. For example, a 60 watt L5640 matches the raw performance of a 95 watt X5570, potentially saving 30 percent in power consumption. Intel is even offering a 30 watt L3406, aimed at the single-processor microserver segment. Other power-saving goodies that come with the 5600 include a more efficient Turbo Boost and memory power management facility, automated low power states for six cores, and support for lower power DDR3 memory.

The Xeon 5600 parts are socket-compatible with the 5500 processors and can use the same chipsets, making a smooth upgrade path for system OEMs. Like their 5500 predecessors, the 5600s support DDR3 memory to the tune of three memory channels per socket. Practically speaking, that means two cores share a memory channel when all six cores are running full blast.

The enterprise market will be pleased by the new on-chip security features in the 5600 architecture. First, there is the new AES instructions for accelerating database encryption, whole disk encryption and secure internet transactions. The 5600 also offers what Intel is calling Trusted Execution Technology (TXT). TXT can be used to prevent the insertion of malicious VM software at bootup in a virtualized cloud computing environment.

Although the 5600 family will bring Intel into the mainstream six-core server market, the company is offering new four-core parts as well. In fact, the fastest clock is owned by the X5677, a quad-core processor that tops out at 3.46 GHz. These top-of-the-line four-core versions might find a happy home with many HPC users, in particular where single-threaded application performance is paramount. This would be especially true for workloads that tend to be memory-bound, since in this case more cores might actually drag down performance by incurring processing overhead while waiting for a memory channel to open up.

Intel's marketing strategy for the Xeon 5600 is not that different from its 5500 sales pitch: improved processor efficiencies generate quick payback on the investment. For the 5600, the claim is that you can replace 15 racks of single-core Xeons with a single rack of the new chips, that is, as long as you don't need any more performance. Intel is touting a five-month payback for this performance-neutral upgrade.


On the other hand, if you need 15 times the performance, you can do a 1:1 replacement of your single-core servers and still realize about eight percent in energy savings. But since software support and server warranty costs dominate maintenance expenses, any energy savings might get swallowed up by these other costs.
Intel says it is keeping the prices on the 5600 processors in line with the Xeon 5500s, although the new processor series spans a wider range of offerings. At the low end, you have the L3406, a 30 watt 2.26 GHz part with four cores just 4 MB of L3. It goes for just $189. At the top end are the six-core X5680 and the four-core X5677, both of which are offered at $1,663. Prices quoted are in quantities of 1,000.

In conjunction with Intel's launch, a number of HPC OEMs are also announcing new systems based on the Xeon 5600 series. For example, Cray announced its CX1 line of deskside machines will now come with the new chips. SGI is also incorporating the new Xeons into its portfolio, including the Altix ICE clusters, the InfiniteStorage servers, and the Octane III personal super. SGI will also use the new chips in its just-announced Origin 400 workgroup blade solution. IBM, HP and Dell are likewise rolling out new x86 servers based on the 5600 processors.

AMD is looking to trump Intel's latest Xeon offerings with its new Magny-Cours 6100 series Opteron processors, which are set to launch at the end of the month. The new Opterons come in 8- and 12-core flavors and are debuting alongside AMD's new G34 chipset. Although the Opterons lack the HyperThreading technology of the Xeons, the additional physical cores and fourth memory channel should make up for this. Also unlike the 5600 architecture, the Opteron 6100 support both 2-socket and 4-socket systems, giving server makers additional design flexibility. In any case, the x86 rivalry is quickly heating up as the two chipmakers battle for market share in 2010.

(This article sourced from the HPC Wire and original text can be found their web pages)

Intel stretches HPC dev tools across chubby clusters

SC11 Supercomputing hardware and software vendors are getting impatient for the SC11 supercomputing conference in Seattle, which kick...