NCHPC: Super Computing

Showing posts with label Super Computing. Show all posts

Wednesday, April 7, 2010

Host-Based Processing Eliminates Scaling Issues for InfiniBand Fabrics

Scientific, engineering, and research facilities rely on InfiniBand fabrics because they offer the highest available bandwidth and the lowest available latency. But depending on the design of the InfiniBand HCAs, this advantage can be squandered as the number of compute nodes scales up into the hundreds or thousands. One of the main challenges in efficient scaling is how and where InfiniBand protocol is processed.

Adapter-based vs. host-based processing
There are two basic ways to handle protocol processing, and the choice can make a huge difference in overall fabric performance, particularly as a cluster scales. Some vendors rely heavily on adapter-based ('on-load) processing techniques, in which each InfiniBand host channel adapter (HCA) includes an embedded microprocessor that processes the communications protocols. Other vendors primarily use host-based processing, in which the server processes the communications protocols. In the early days of InfiniBand clusters, a typical server may have had just one or two single- or dual-core processors. With the ability to issue one instruction per second at a relatively low clock rate, these servers benefitted from having communications processing offloaded to the host channel adapter.

(Full version of this article can be obtained from HPCwire's web pages)

Thursday, March 25, 2010

How Supercomputing is Revolutionizing Nuclear Power

Out of all the carbon-free power options, nuclear power faces some of the highest hurdles to commercial-scale deployment. The upfront costs for reactors are in the billions, the projects take years to site and build, and nuclear materials and designs have to undergo testing for decades to make sure they can be used in the field. That’s one reason why nuclear research costs a whole lot of money and the pace of innovation seems incredibly slow. But that’s also the reason why supercomputing has started to truly revolutionize the world of nuclear power innovation.

Supercomputing, or “extreme computing” as the Department of Energy described it during a workshop on computing and nuclear power last year, involves computers at the petaflop scale. It will eventually reach even exaflop scale. A computer running at a petaflop can do 1 million billion calculations in a second, and an exaflop of performance can deliver a billion billion calculations per second.

That massive amount of number crunching can help developers of nuclear power technology simulate next-generation designs of nuclear reactors, show how advanced fuels in a reactor could be consumed over time, and model more efficient waste disposal and refueling efforts. It’s all about being able to go through very complex and lengthy research and development processes much more quickly and with far less cost compared to both physical testing and using less powerful computers.

(This article sourced from the earth2tech.com and original version can be reached their web pages.)

Friday, March 5, 2010

NCSA to provide Ember as shared-memory resource for nation's researchers

The National Center for Supercomputing Applications (NCSA) will soon deploy a new highly parallel shared memory supercomputer, called Ember. With a peak performance of 16 teraflops, Ember doubles the performance of its predecessor, the five-year-old Cobalt system.

Ember will be available to researchers through the National Science Foundation's TeraGrid until that program concludes in March 2011 and then will be allocated through its successor, the eXtreme Digital program.

(The full story can be reached at NCSA's site)

Thursday, March 4, 2010

SC10 is now accepting submissions for its technical program.

SC10, the premier international conference on high-performance computing, networking, storage and analysis, is now accepting submissions for its technical program. The 23rd annual conference in the series, SC10 will take place in New Orleans, Louisiana from November 13-19, 2010. Over 11,000 attendees from industry, academia and government are anticipated.

Drawing on expertise from the international HPC community, SC10 will build on over two decades of success offering a broad spectrum of technical presentations and discussions including rigorously peer-reviewed papers, panels, tutorials, workshops and posters showcasing the latest findings from laboratories and research institutions around the world.

This year, the technical program encourages participants to focus on one of three thrust areas to be featured prominently at the conference: climate simulation, heterogeneous computing and data-intensive computing.

Climate simulation spotlights the tremendous importance of research in global climate change, including HPC-based climate simulation techniques which help scientists understand global warming, climate change and other environmental processes.

SC10’s other thrusts highlight important emerging HPC technologies. Heterogeneous computing covers the technological and research advances in software that are required for accelerator-based computing, which is now occurring on large-scale machines and could propel supercomputing to the exascale level, where machines are capable of running a million trillion calculations per second.

As scientists depend more and more on supercomputing in their research, they are generating massive amounts of data that must be shared, stored and analyzed by teams of remotely located collaborators. This global trend underlines the importance of data-intensive computing, SC10s third main thrust, highlighting research into innovative solutions for managing data across distributed high-performance computing systems, especially hardware and software requirements for effective data transfer.

Submissions for most areas of the SC10 technical program will be accepted beginning March 1. Technical paper abstracts are due April 2 and final papers as well as submissions for Tutorials and the ACM Gordon Bell Prize are due April 5.

Other immediate submissions deadlines include: Workshops, which are due April 15, 2010; the Student Cluster Competition, which is due by April 16, 2010; as well as Panel submissions, which are due April 26, 2010.

All submissions can be made online via: https://submissions.supercomputing.org/

For the entire list of technical program deadlines, visit:
http://sc10.supercomputing.org/?pg=dates.html

For any questions about the Technical program, email: program (at) info.supercomputing (dot) org

About SC10
SC10, sponsored by IEEE Computer Society and ACM (Association for Computing Machinery) offers a complete technical education program and exhibition to showcase the many ways high-performance computing, networking, storage and analysis lead to advances in scientific discovery, research, education and commerce. This premier international conference includes a globally attended technical program, workshops, tutorials, a world class exhibit area, demonstrations and opportunities for hands-on learning. For more information on SC10, please visit http://sc10.supercomputing.org

Wednesday, March 3, 2010

Cray's Custom Engineering Group to Work with Microsoft Research on Cloud Computing

Cray Inc. announced its custom engineering group will work with Microsoft Research to explore and prototype a system that could provide a glimpse into the future of cloud computing infrastructure. The initiative represents the custom engineering group's first breakthrough into the commercial market.

The objective of the technology development initiative is to design a supercomputing architecture that dramatically lowers the total cost of ownership for cloud computing datacenters. Cray's custom engineering group will design a system infrastructure that combines super efficient power delivery, high-density packaging and innovative cooling technologies. This solution is intended to significantly reduce facility, power and hardware costs.

Cray's custom engineering group delivers "technology-led" professional services. The group designs and delivers customized computing, data management and consulting solutions developed specifically to fit the individual needs of the customer. Cray is an innovation-driven company, and the custom engineering group provides customers with the ability to leverage Cray's research and development expertise and more than 25 years of broad supercomputing experience to develop unique solutions when currently available technology will not achieve a customer's requirements.

(This news summarized from the HPCwire and full text pages can be reached their site)

Thursday, February 18, 2010

A Strategic Application Collaboration for Molecular Dynamics

Over the last two decades, an increasing number of chemists have turned to the computer to predict the results of experiments beforehand or to help interpret the results of experiments. Skepticism on the part of laboratory chemists has gradually evaporated as the computational results have made contact with, and even anticipated, experimental findings. When the 1998 Nobel Prize in Chemistry was awarded recently to two scientists, Walter Kohn and John Pople, who originated some of the first successful methods in computational chemistry, the award was seen as an affirmation of the value of computational chemistry to the field of chemistry.

"We've come a long way," said Peter Kollman of the Department of Pharmaceutical Chemistry at UC San Francisco (UCSF). "But while we've come a long way, we can see that we've still got a long way to go."

Now, as part of an NPACI Strategic Application Collaboration, AMBER's performance is being improved by 50 percent to 65 percent.

AMBER stands for Assisted Model Building with Energy Refinement. The code's successes include its use to study protein folding, to study the relative free energies of binding of two ligands to a given host (or two hosts to a given ligand), to investigate the sequence-dependent stability of proteins and nucleic acids, and to find the relative solvation free energies of different molecules in various liquids. Hundreds of contributions to the scientific literature reflect the use of AMBER.

(This news summarized from the San Diego Super Computing Center and original full text can be reached their web site)

Tuesday, February 16, 2010

Appro HyperPower™ Cluster - Featuring Intel Xeon CPU and NVIDIA® Tesla™ GPU computing technologies

The amount of raw data needed to process research analysis in drug discoveries, oil and gas exploration, and computational finance create a huge demand for computing power. In addition, the 3D visualization analysis data has grown a lot in recent years moving visualization centers from the desktop to GPU clusters. With the need of performance and memory capacities, Appro clusters and supercomputers are ideal architectures combined with the latest CPUs and GPU's based on NVIDIA® Tesla™ computing technologies. It delivers best performance at lower cost and fewer systems than standard CPU-only clusters. With 240-processor computing core per GPU, C-language development environment for the GPU, a suite of developer tools as well as the world’s largest GPU computing ISV development community, the Appro HyperPower GPU clusters allow scientific and technical professionals the opportunity to test and experiment their ability to develop applications faster and to deploy them across multiple generations of processors.

The Appro HyperPower cluster features high density 1U servers based on Intel® Xeon® processors and NVIDIA® Tesla™ GPU cards onboard. It also includes interconnect switches for node-to-node communication, master node, and clustering software all integrated in a 42U standard rack configuration. It supports up to 304 CPU cores and 18,240 GPU cores with up to 78TF single/6.56 TF double precision GPU performance. By using fewer systems than standard CPU-only clusters, the HyperPower delivers more computing power in an ultra dense architecture at a lower cost.

In addition, the Appro HyperPower cluster gives customers a choice of configurations with open-source commercially supported cluster management solutions that can easily be tested and pre-integrated as a part of a complete package to include HPC professional services and support.

Ideal Environment:
Ideal solution for small and medium size HPC Deployments. The target markets are Government, Research Labs, Universities and vertical industries such as Oil and Gas, Financial and Bioinformatics where the most computationally-intensive applications are needed.

Installed Software
The Appro HyperPower is preconfigured with the following software:
- Redhat Enterprise Linux 5.x, 64-bit
- CUDA 2.2 Toolkit and SDK
- Clustering software (Rocks Roll)

CUDA Applications
The CUDA-based Tesla GPUs give speed-ups of up to 250x on applications ranging from MATLAB to computational fluid dynamics, molecular dynamics, quantum chemistry, imaging, signal processing, bioinformatics, and so on. Click here to learn more about these speedups with links to application downloads.,

(This news sourced from Appro Ltd. and can be reached their web site)

Monday, February 15, 2010

SC10 Conference

The SC Conference is the premier international conference for high performance computing (HPC), networking, storage and analysis. Conference will be held this year in New Orleans, LA,USA at November 15th - 18th, 2010.

For more info visit sc10.supercomputing.org

Tuesday, February 9, 2010

Power of Desktop: Cray CX1

Affordably priced, the award-winning Cray CX1 is the right size in performance, functionality and cost for a wide range of users, from the single user using a personal supercomputer to a department of users accessing shared clustered resources.

The brilliant brochure is here.

Wednesday, December 30, 2009

NumaConnect SMP Adapter Card

Numascale's SMP Adapter is an HTX card made to be used with commodity servers with AMD processors that feature an HTX connector to its HyperTransport interconnect.

Highlights
Scalable, Directory Based Cache Coherence Protocol
Write-back cache for Remote Data: 2-4-8-(16)GB options, standard SDIMMs
ECC protected with background scrubbing of soft errors
16 coherent + 16 non-coherent outstanding memory transactions
Support for single-image or multi-image OS partitions
3-way on-chip distributed switching for 1D, 2D or 3D Torus topologies
30GB/s switching capacity per node
HTX connected - 6.4GB/s
<20W power dissipation

For detailed review click their webpages.
The PDF manual is here.

Thursday, December 10, 2009

PRACE is Ready for the Next Phase

PRACE is eligible to apply for a grant under the European Union’s 7th Framework Programme to start the implementation phase.

In October 2009 PRACE demonstrated to a panel of external experts and the European Commission that the project made “satisfactory progress in all areas” and “that PRACE has the potential to have real impact on the future of European HPC, and the quality and outcome of European research that depends on HPC services”. Two months before the end of the project it met the eligibility to apply for a grant of 20 million Euros for the implementation phase of the permanent PRACE Research Infrastructure.

The future PRACE Research Infrastructure (RI) will consist of several world-class top-tier centers, managed as a single European entity. The infrastructure to be created by PRACE will form the top level of the European HPC ecosystem. It will offer competent support and a spectrum of system architectures to meet the requirements of different scientific domains and applications. It is expected that the PRACE RI will provide European scientists and technologists with world-class leadership supercomputers with capabilities equal to or better than those available in the USA, Japan, China, India and elsewhere in the world, in order to stay at the forefront of research.

About PRACE: The Partnership for Advanced Computing in Europe (PRACE) prepares the creation of a persistent pan-European HPC service, consisting of several tier-0 centres providing European researchers with access to capability computers and forming the top level of the European HPC ecosystem. PRACE is a project funded in part by the EU’s 7th Framework Programme (FP7/2007-2013) under grant agreement n° RI-211528.

Tuesday, November 10, 2009

What to do with an old nuclear silo?

Question: What to do with a 36 feet wide by 65 feet high nuclear grade silo with 2 feet thick concrete walls ?
Answer: An HPC Center!

A supercomputing center in Quebec has transformed a huge concrete silo into the CLUMEQ Colossus, a data center filled with HPC clusters.

The silo, which is 65 feet high with two-foot thick concrete walls, previously housed a Van de Graaf accelerator dating to the 1960s. It was redesigned to house three floors of server cabinets, arranged so cold air can flow from the outside of the facility through the racks and return via an interior 'hot core'. The construction and operation of the unique facility are detailed in a presentation from CLUMEQ.

Link: http://www.datacenterknowledge.com/archives/2009/12/10/wild-new-design-data-center-in-a-silo/

(This news sourced from the slashdot.com)

Monday, March 9, 2009

San Diego Supercomputer Center has built a high-performance computer with solid-state drives

The San Diego Supercomputer Center has built a high-performance computer with solid-state drives, which the center says could help solve science problems faster than systems with traditional hard drives.

The flash drive will provide faster data throughput, which should help the supercomputer analyze data an "order-of-magnitude faster" than hard drive-based supercomputers, said Allan Snavely, associate director at SDSC, in a statement. SDSC is a part of the University of California, San Diego.

"This means it can solve data-mining problems that are looking for the proverbial 'needle in the haystack' more than 10 times faster than could be done on even much larger supercomputers that still rely on older 'spinning disk' technology," Snavely said.

Solid-state drives, or SSDs, store data on flash memory chips. Unlike hard drives, which store data on magnetic platters, SSDs have no moving parts, making them rugged and less vulnerable to failure. SSDs are also considered to be less power-hungry.

Flash memory provides faster data transfer times and better latency than hard drives, said Michael Norman, interim director of SDSC in the statement. New hardware like sensor networks and simulators are feeding lots of data to the supercomputer, and flash memory more quickly stores and analyzes that data.

The system uses Intel's SATA solid-state drives, with four special I/O nodes serving up 1TB of flash memory to any other node. The university did not immediately respond to a query about the total available storage in the supercomputer.

SSDs could be better storage technology than hard drives as scientific research is time-sensitive, said Jim Handy, director at Objective Analysis, a semiconductor research firm. The quicker read and write times of SSDs compared to hard drives contribute to providing faster results, he said.

SSDs are also slowly making their way into larger server installations that do online transaction processing, like stock market trades and credit-card transactions, he said.

Many data centers also a employ a mix of SSDs and hard drives to store data, Handy said. Data that is frequently accessed is stored on SSDs for faster processing, while hard drives are used to store data that is less frequently needed.

"Hard drives are still the most cost-effective way of hanging on to data," Handy said. But for scientific research and financial services, the results are driven by speed, which makes SSDs makes worth the investment.

(This news sourced from http://www.goodgearguide.com.au)

Tuesday, September 23, 2008

Making portable GridStack 4.1 (Voltaire OFED) drivers.

Remove previously installed IB rpms if there. To do this;

rpm -e kernel-ib-1.0-1 \
dapl-1.2.0-1.x86_64 \
libmthca-1.0.2-1.x86_64 \
libsdp-0.9.0-1.x86_64 \
libibverbs-1.0.3-1.x86_64 \
librdmacm-0.9.0-1.x86_64

lsmod

And remove by hand all of "ib_" modules with "rmmod modulename" command

*** If you installed previously OFED IB with same package you can run ./uninstall.sh
script which is included GridStack-4.1.5_9.tgz package instead above steps.
This script does same and plus things automaticaly so you can prefer.

1. First optain Gridstack source code from Voltaire.
And then;

mkdir /home/setup
cp GridStack-4.1.5_9.tgz /home/setup
cd /home/setup
tar -zxvf GridStack-4.1.5_9.tgz

all of files will be in "/home/setup/GridStack-4.1.5_9"

cd GridStack-4.1.5_9

2. Install the GridStack drivers

./install.sh --make-bin-package

This process takes about 30 minutes.
time to coffee or tea but not cigarette...

....
.......
..........
INFO: wrote ib0 configuration to /etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0 ONBOOT=yes BOOTPROTO=static IPADDR=192.168.129.9 NETWORK=192.168.0.0 NETMASK=255.255.0.0 BROADCAST=192.168.255.255 MTU=2044

Installation finished
Please logout from the shell and login again in order to update your PATH environment variable

3. Finishing the driver settings
Firts edit ip settings for IB
Just edit "/etc/sysconfig/network-scripts/ifcfg-ib0" like below;

DEVICE=ib0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.129.50.9
NETMASK=255.255.0.0
MTU=2044

save and reboot the system.

4. GridStack installation puts a init.d service on the system startup.
After the bootup process you must see ib0 device on ifconfig command and
LEDs of HCA cards must be on or blinking state. Check this...

After the reboot check the state of connection by ifconfig

eth0      Link encap:Ethernet  HWaddr 00:19:BB:XX:XX:XX  
          inet addr:10.128.129.9  Bcast:10.128.255.255  Mask:255.255.0.0
          inet6 addr: fe80::219:bbff:fe21:b3a8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:177 errors:0 dropped:0 overruns:0 frame:0
          TX packets:148 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:16829 (16.4 KiB)  TX bytes:21049 (20.5 KiB)
          Interrupt:169 Memory:f8000000-f8011100 

ib0       Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.129.50.9  Bcast:10.129.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:11 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:892 (892.0 b)  TX bytes:384 (384.0 b)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:336 (336.0 b)  TX bytes:336 (336.0 b)

If you see similar of above message you won. Ping the neighbors IP addres if avaible there;

ping 10.129.50.1
PING 10.129.50.1 (10.129.50.1) 56(84) bytes of data.
64 bytes from 10.129.50.1: icmp_seq=0 ttl=64 time=0.094 ms
64 bytes from 10.129.50.1: icmp_seq=1 ttl=64 time=0.057 ms
64 bytes from 10.129.50.1: icmp_seq=2 ttl=64 time=0.064 ms
64 bytes from 10.129.50.1: icmp_seq=3 ttl=64 time=0.056 ms

If you does not see ib0 or cannot ping gridstack service may not be started.
Start by manualy: /etc/init.d/gridstack start

If everything ok you can make an image of this system for
central deploying mechanism like tftp.

6. Installing new compiled GridStack driver to identical machines.
It is so easy. After the GridStack compilation process a new bz2 file and
their md5 checksum are created automaticaly. You can find these two files under the
upper level of source folder. On our example two files wait for your attn in there;

ls -al /home/setup
-rw-r--r--   1 root root       88 Nov 23 19:11 GridStack-4.1.5_9-rhas-k2.6.9-42.ELsmp-x86_64.md5sum
-rw-r--r--   1 root root 43570798 Nov 23 19:11 GridStack-4.1.5_9-rhas-k2.6.9-42.ELsmp-x86_64.tar.bz2

Copy this two files to all of the IB hosts which you want to plan GridStack installation.
Opposite to previous steps this installation not takes too many minutes.
Just copy files to new machine by scp;

cd /home/setup
scp GridStack-4.1.5_9-rhas-k2.6.9-42.ELsmp-x86_64 root@10.128.129.10:/home

Change to target machine console and type those commands;

cd /home
first check-out the binary equality of bz2 file
md5sum -c GridStack-4.1.5_9-rhas-k2.6.9-42.ELsmp-x86_64.md5sum
GridStack-4.1.5_9-rhas-k2.6.9-42.ELsmp-x86_64.tar.bz2: OK

if you see OK sign type this;

tar -jxvf GridStack-4.1.5_9-rhas-k2.6.9-42.ELsmp-x86_64.tar.bz2

A folder which is called "GridStack-4.1.5_9-rhas-k2.6.9-42.ELsmp-x86_64" will be created.

cd GridStack-4.1.5_9-rhas-k2.6.9-42.ELsmp-x86_64/
./install.sh

GridStack binary rpms will be install automaticaly.
Make ifcfg-ib0 setting like above, reboot and check for IP connectivity.

7. As a bonus advice;
After the GridStack installation there is lots of ib diagnostics tools avaible under the
/usr/local/ofed/bin directory. So for example issuing the ./ibv_devinfo give an brief
and usefull informations about HCA connectivity, board model, FW level and ... etc

Here ise sample output for my machine;

hca_id: mthca0
        fw_ver:                         4.7.400
        node_guid:                      0017:08ff:ffd0:XXXX
        sys_image_guid:                 0017:08ff:ffd0:XXXX
        vendor_id:                      0x1708
        vendor_part_id:                 25208
        hw_ver:                         0xA0
        board_id:                       HP_0060000001
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 29
                        port_lid:               75
                        port_lmc:               0x00

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 29
                        port_lid:               261
                        port_lmc:               0x00

---=== HCA DDR EXP-D FW upgrade after GridStack 4.1 install =--------

ib-burn -y -i VLT-EXPD -a /usr/voltaire/fw/HCA400Ex-D-25208-4_7_6.img 

INFO: Using alternative image file /usr/voltaire/fw/HCA400Ex-D-25208-4_7_6.img
Burning : using fw image file: /usr/voltaire/fw/HCA400Ex-D-25208-4_7_6.img VSD extention : -vsd1 VLT-EXPD -vsd2 VLT0040010001
    Current FW version on flash:  N/A
    New FW version:               N/A

    Burn image with the following GUIDs:
        Node:      0019bbffff00XXXX
        Port1:     0019bbffff00XXXX
        Port2:     0019bbffff00XXXX
        Sys.Image: 0019bbffff00XXXX

    You are about to replace current PSID in the image file - "VLT0040010001" with a different PSID - "VLT0040010001".
    Note: It is highly recommended not to change the image PSID.

 Do you want to continue ? (y/n) [n] : y

Read and verify Invariant Sector               - OK
Read and verify PPS/SPS on flash               - OK
Burning second    FW image without signatures  - OK  
Restoring second    signature                  - OK  

Where /usr/local/bin/ib-burn is a realy BASH script
this is another deep way to burn HCA card FW

lspci -n | grep -i "15b3:6278" | awk '{print $1}'
if you see "13:00.0" as output type this;

mstflint -d 13:00.0 -i /usr/voltaire/fw/HCA400Ex-D-25208-4_7_6.img -vsd1 "" -psid HP_0060000001 -y burn > /root/hca-fw-ugr.log
This command does not prompt for Yes.

For checking FW on the flash type this;
mstflint -d 13:00.0 q

NCHPC