Osaka University: OCTOPUS Supercomputer

Heterogeneous architecture on new cluster delivers computing capacity with lower cost.

Executive Summary
Osaka University (Osaka U) is a leading research university in Japan. Its Cyberme­dia Center (CMC) hosts the university’s supercomputing resources. Historically, supercomputers at Osaka U were built to support both research and general edu­cation needs. To continue to attract leading researchers, CMC built a world-class, heterogeneous cluster targeted at scientific computing for a variety of workloads programmed for different architectures. The OCTOPUS cluster now attracts new users running a wide variety of workloads, from simulation to AI and machine learning.

Challenge
Innovation in research often begins with brilliant minds supported by latest-gen­eration High-Performance Computing (HPC) resources. Osaka U’s CMC supports a large variety of scientific fields that rely on supercomputing resources for break­throughs, including high-energy physics, molecular dynamics, material, life, dental, social sciences, and others. Recently, a researcher used CMC systems to under­stand vortex breakdowns in supersonic flows. His breakthroughs are expected to help contribute to a supersonic combustion ramjet engine for air and space planes. Other activities are described in the university’s research profile.

“There is a growing demand for supercomputing in every field of science,” stated Susumu Date, Associate Professor at Osaka U’s CMC, “because researchers today rely heavily on scientific computing prior to the experimental stage and afterwards to analyze and correlate the results of observations.”

With earlier computing resources in CMC, the system was designed to support both HPC and non-research needs. Some of the challenges users experienced were related to the conflicts of trying to partition for both general users and paral­lel computing users, resulting in an unreliable resource for scientific computing. Seeking to continue to support important research areas, and guided by feed­back from its users, Osaka U needed to expand its parallel computing capabilities beyond the existing systems in its data center.

“Our users’ biggest challenge, in most cases, is to achieve inter-node and intra-node parallelism,” added Professor Date. “Many are working with MPI and OpenMP coding to achieve greater parallelism. We needed to deliver more resources that supported their work.”

CMC’s research and user feedback resulted in the building of a new petascale heterogeneous supercomputer that supports a variety of scientific comput­ing domains—simulation, visualization, AI/machine learning, and HPDA—on a single system.

Built on Intel® Xeon® Scalable Processors, Intel® Xeon Phi™ processors, and GPUs, OCTOPUS supports a wide range of scientific research.

Solution
The Osaka University Cybermedia Center’s Over-Petascale Universal Supercomputer (OCTOPUS) supports researchers using a wide variety of coding and application environments, from open sourced and commercial codes written for x86 Intel® Architecture (IA) to CUDA-based GPUs, targeting tradi­tional simulation, AI frameworks, genomics, and other fields of research.

“We had to explore the architecture of a new HPC system in terms of both hardware and software,” explained Professor Date, “so more people could take advantage of supercomput­ing resources. In particular, we had to look at an integrated architecture approach for HPC and HPDA, using x86 and other architectures.”

One of the key challenges in designing the system was to increase compute capacity within the data center’s power and cooling budget. Leveraging the performance and power efficiency of latest generation CPUs and GPUs and integrat­ing Asetek’s RackCDU Direct-to-Chip liquid cooling on all compute nodes (including GPUs), CMC could maintain reliable and stable performance across the cluster without increasing operational and power budgets.

The new system delivers 1.463 petaFLOPS1 of throughput using multiple types of processor architectures and a Lustre* filesystem interconnected with InfiniBand* Architecture at 100 Gbps. OCTOPUS was built by NEC using Intel® Xeon® Scalable Processors, Intel® Xeon Phi™ 7210 processors based on Many Integrated Core (MIC) architecture, Tesla* P100 GPUs (CUDA architecture), and a DirectData Networks (DDN) EXAScaler* storage system. It went into production in December of 2017.

Osaka University OCTOPUS supercomputer at a glance:

  • Heterogeneous supercomputer to meet widely diverse research needs in simulation, visualization, AI/machine learning, and high-performance data analytics (HPDA)
  • Intel® Xeon® Gold 6126 processors (236 nodes), Intel® Xeon® Platinum 8153 processors (2 nodes), Intel Xeon Phi 7210 processors (44 nodes)
  • Intel Xeon Gold 6126 processors with four (per node) NVIDIA Tesla P100 using NVIDIA NVLINK* (37 nodes)
  • 5X larger compute capacity compared to previous system for less cost1

Osaka University Cybermedia Center.

Results
The new supercomputer boosts Osaka U’s scientific comput­ing capacity by five times, which has given researchers a new level of resources to work with.

“The new system is leading to an increase of users, which is a good impact,” concluded Professor Date.

Because OCTOPUS is heterogeneous, users can choose the resources they need based on their particular codes and research—IA or MIC Intel CPUs or CUDA GPUs. CMC has completed user surveys, in which users have reported higher performance than their previous system.

“Today, OCTOPUS is running machine learning and other AI-related jobs, which we have not seen before,” said Profes­sor Date. “Plus, we are seeing other new types of work from users. We designed the new system for these new work­loads.”

Solution Summary
Osaka U’s CMC needed to enhance its computing capabili­ties to keep and attract researchers from around the world. Based on research and user feedback, it specified a one-plus petaFLOPS supercomputer with a heterogeneous architec­ture. Built on Intel Xeon Scalable Processors, Intel Xeon Phi processors, and the latest GPUs, the new OCTOPUS cluster delivers 1.463 petaFLOPS, supporting a wide variety of work­loads across many scientific fields and drawing new users to the university.

Osaka’s OCTOPUS cluster supports a wide variety of workloads, from simulation to AI and machine learning.

Solution Ingredients

  • NEC LX* Servers 406 Rh-2 with Intel Xeon Scalable Processors
  • NEC LX* Server 102Rh-1G with Intel Xeon Scalable Processors and NVIDIA P100 GPUs
  • NEC Express5800/HR110c-M* Servers with Intel Xeon Phi processors
  • NEC LX* 116Rg servers with Intel Xeon Scalable Processors
  • DDN EXAScaler (3.1 PB) Lustre storage cluster

Explore Related Products and Solutions

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Notices and Disclaimers

Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at https://www.intel.it. // Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit https://www.intel.it/benchmarks. // Performance results are based on testing as of the date set forth in the configurations and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. // Cost reduction scenarios described are intended as examples of how a given Intel®-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. // Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. // In some test cases, results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

Informazioni su prodotti e prestazioni

1Octopus system configuration per www.hpc.cmc.osaka-u.ac.jp/en/oc­topus/: General purpose CPU nodes: 236 nodes (471.24 TFLOPS); CPU: Intel Xeon Gold 6126 (Skylake/2.6 GHz 12 cores) 2 CPUs Memory: 192 GB; GPU nodes: 37 nodes (858.28 TFLOPS); CPU: Intel Xeon Gold 6126 (Skylake/2.6 GHz 12 cores) 2 CPUs; GPU: NVIDIA Tesla P100 (NV-Link) 4 units Memory: 192 GB; Xeon Phi nodes: 44 nodes (117.14 CPU: Intel Xeon Phi 7210 (Knights Landing/1.3 GHz 64 cores) 1 CPU Memory: 192GB; Large-scale shared-memory nodes: 2 nodes (16.38 TFLOPS); CPU: Intel Xeon Platinum 8153 (Skylake/2.0 GHz 16 cores) 8 CPUs Memory: 6 TB. For previous HCC system configuration see http://www.hpc.cmc.osaka-u.ac.jp/en/hcc-sys/ 1.463 petaFLOPS is not application performance but peak performance.