With Cloud HPC Toolkit, Google Pursues HPC, Intel Pushes OneAPI

People who use Google Cloud can see the HPC tides change and know that, as we discussed just a few months ago, there’s a good chance more HPC workloads will be shifted to cloud builders in the future. over time, as their scale increasingly dictates future chip and system designs and the economics of processing.

Google also knows it needs to do more to steal more market share from its biggest rivals – Amazon Web Services and silver medalist Microsoft Azure – so the company has introduced a new open-source toolkit that helps HPC workshops to build clusters for simulation and modeling that are reproducible and flexible.

Called Cloud HPC Toolkit — a name that likely saved Google Cloud’s marketing department money — the system software has a modular design that allows users to create everything from simple clusters to advanced clusters that can benefit from the cloud’s ability to easily slice and dice disaggregated resources based on ever-changing needs – this is called composability and is starting to gain traction in the HPC industry.


Here’s what the Cloud HPC Toolkit components look like:

google cloud hpc toolkit block diagram

Google Cloud thinks most users will want to start with the toolkit’s several predefined blueprints for infrastructure and software setups that are convenient for HPC environments. But for those with their own configuration preferences, these plans can be changed by changing a few lines if the text in the configuration files.

These blueprints support a variety of building blocks needed to create an HPC environment, from compute and storage to networking and schedulers. On the compute side, this includes all of Google Cloud’s virtual machines as well as its GPU-based instances and its HPC virtual machine image, based on the CentOS variant of Red Hat Enterprise Linux. For storage, the toolkit supports Intel’s DAOS system and DDN’s Luster-based EXAScaler system as well as Filestore, local SSDs, and persistent storage on Google Cloud. Additionally, Blueprints can be configured to run on a 100 Gbps network using Google Cloud placement rules to reduce latency between virtual machines.

There is, however, only one scheduler choice available on the toolkit for now: Slurm. Given that Google Cloud currently supports Altair’s PBS Pro and Grid Engine schedulers as well as IBM’s Spectrum LSF and Slurm, it seems reasonable for Cloud HPC Toolkit to add them as well.

Both Intel and AMD have backed the Cloud HPC Toolkit, but it’s the former — currently trying to catch up with the latter to make faster, higher-quality processors — that’s particularly eager to use the latest HPC offering. of Google Cloud as a showcase for the semiconductor giant’s growing investments in software, particularly on the HPC side.

Among the plans for Google Cloud’s new toolkit is a predefined configuration of hardware and software for simulating and modeling workloads from Intel itself, which is promoted under the Intel Select Solutions brand. Whatever happened between Google Cloud and Intel behind the scenes, the cloud maker made sure to promote Intel’s simulation and modeling plan as the only detailed example in its blog post announcing the box. tools.

A key part of Intel’s simulation and modeling plan is the company’s oneAPI toolkit, the cross-platform parallel programming model that aims to simplify development on a wide range of compute engines, including those from rivals. ‘Intel.

In a statement, Intel said access to oneAPI and its HPC-focused branch can help optimize performance for simulation and modeling workloads by improving compile times, accelerating results, and enabling users to take advantage of chips from Intel and its competitors using SYCL, which is the royalty-free cross-architecture programming abstraction layer that underpins the oneAPI Data Parallel C++ language.

Intel and its rivals know that the real gold in the semiconductor industry is found in cloud builders and hyperscalers, so we wouldn’t be surprised if we saw more and more HPC software announcements of this ilk. in the cloud world, with Intel peddling oneAPI, AMD pushing its open ROCm platform, and Nvidia finding new ways to extend the Hydra software that is CUDA.

Subscribe to our newsletter

Bringing the week’s highlights, analysis and stories straight from us to your inbox with nothing in between.
Subscribe now

Leave a Comment