Neu.ro Deploys Nvidia Supercomputers to the Cloud

Neuromation is pleased to announce that we have massively expanded the capabilities of our Neu.ro cloud especially with regard to our Deep Learning as a Service (DLaaS) offering  with the deployment of multiple supercomputers powered by Nvidia’s revolutionary A100 GPU and supplied by Gigabyte in their latest G492-ZD0 server.

Given the global chip shortage that has impacted supply chains across multiple industries in recent times the reader will not be surprised to hear that these Nvidia A100 GPUs are almost impossible to buy through usual channels so it is a major coup for Neuromation to be able to deploy them.

With this article I intend to focus on how Neuromation has harnessed this computing power to create a best in class cloud offering for our global platform client base.

But first let’s remind ourselves of what we mean by GPU when compared to a CPU and why a GPU is perfect for deep learning compute jobs. GPU is of course the acronym for Graphics Processing Unit while CPU refers to a Central Processing Unit.

What relation does Graphics Processing have to Deep Learning?

At first glance graphics processing as used in 3D video games would seem to have little in common with AI and Deep Learning. Taking the term “Graphics Processing Unit” the word “graphics” relates to presenting an image in a defined place in a 2d or 3d space and how this is perceived by the viewer. For an experience that mimics reality directly it is thus necessary for an object on the screen to vary its size and definition as it moves away from the viewer or moves closer to the viewer and for this to apply to all objects in view on the screen simultaneously exactly as our eyes portray the real world to the human brain.

In order to achieve this it is necessary for the GPU to compute processes in parallel at very high speed as opposed to a CPU (Central Processing Unit)  which by design operates in a  sequential manner. A CPU is typically divided into multiple cores and each core addresses one task at a time but the processes still execute in a serial manner. A GPU in comparison has hundreds or even thousands of cores  each of which are focused on a single task thus giving rise to the term parallel computing as each core executes simultaneously, but independently of the others storing the data that is in constant use into its own memory cache under a concept known as locality reference.

Nvidia GPUs are supplied with CUDA cores in which CUDA is an acronym for Compute Unified Device Architecture and Tensor cores that are dedicated to the extreme demands of Deep Learning. Typically the Nvidia A100 board has 6,912 FP32 CUDA cores, 3,456 FP64 CUDA cores and 640 Tensor cores with FP standing for Floating Point. Underlying all of this the A100 boasts 54 billion transistors which Nvidia have managed to squeeze into a die size of 826 square mm.

To put some numbers on the capabilities of the A100 board we need to look at the teraflops that apply to the cores mentioned in the previous paragraph, a teraflop, of course, being one trillion floating point calculations per second. The acronym TOPS mentioned below pertaining to the INT8 Tensor Core  stands for Trillions of Operations Per Second.

DesignationPerformance
FP 64  9.7 TFLOPS
FP64 Tensor Core19.5 TFLOPS
FP3219.5 TFLOPS
Tensor Float 32 156 TFLOPS (upto 312 TFLOPS with sparsity)
BFloat 16 Tensor Core 312 TFLOPS (upto 624 TFLOPS with sparsity)
FP16 Tensor Core 312 TFLOPS (upto 624 TFLOPS with sparsity)
INT8 Tensor Core 624 TOPS (up to 1,248 TOPS with sparsity)

In order to maximize the Neuromation platform experience, we have fine tuned and adapted the A100 GPU powered supercomputers with 4th generation Peripheral Component Interconnect (PCI).

A Neuromation, Nvidia powered 8 X A100 GPU supercomputer as seen from above

In particular we are able to simultaneously render synthetic data and  do distributed training of deep learning models in a closed feedback loop.

It is also worth noting that on the same machine we can simultaneously run 64 sockets containers effectively dividing each A100 board by 8 on a fractional use basis which on the 8 board computer gives rise to the equivalent of 64 extremely powerful GPU servers.

The Neuromation supercomputer showing the 8 X A100 GPUs at Advanced HPC testing facility

The question then arises, why have we done this?

Readers of my previous Neuromation articles may remember that I have a predilection for drawing analogies with the automotive industry in order to get my point across, as most people understand different types of vehicles and what they are used for. It is no surprise then that with this article I will continue that theme.

As our new supercomputers are at the cutting edge of High Performance Computing (HPC) it is only right and proper that I should use a high performance vehicle as a comparison point. On that understanding I would invite the reader to imagine one of my favourite cars, the Porsche 911. Ultimately this is a high performance car, with a very comfortable interior equally at home cruising along the motorway at high speed as it is heading down to the supermarket on a Saturday afternoon for the weekly shop.

So far so good, but, what if you don’t need your 911 for daily use, what if you want to re-purpose and re-engineer it for a single purpose, to  go racing. Here you would  need to strip the car down to what lies at the very heart of the vehicle, namely the chassis, engine and transmission and rebuild it from the ground upwards. Your new 911 racing car will need improved suspension, a roll cage, a quick release clutch for quick get-aways when the race starts, bigger carburettors, an exhaust system that creates back pressure, wider wheels, bigger brake discs and callipers, a full racing harness, automatic fire extinguishers and a lighter body shell to save weight plus a great deal of engine tuning.

You now have a vehicle that would on one hand be useless as a daily driver but which stands a very good chance of winning races which is the single purpose you have designed it for.

In this analogy the supercomputer equivalent of the chassis, engine and transmission would be the 8 X A100 Nvidia GPU boards, around these Neuromation has constructed a tailor-made High Performance Computer that has been specifically configured and constructed to offer maximum performance in a DLaaS / MLops / AI cloud environment.

To give the reader a visual reference of the car analogy Neuromation has achieved the HPC equivalent of turning one of these…

Into one of these…

According to Advanced HPC, a San Diego based company who assembled and tested the machine for Neuromation, this is the most powerful compute system they have ever built. 

Neu.ro’s AI Cloud launch is an extremely timely entry to the AI Cloud market as the availability of Nvidia DGX, HGX and A100 systems is at historically low levels and competition for these resources is intense. 

Our offering also dramatically reduces operations overhead for AI developers by providing native integration of Neu.ro’s MLOps and ML interoperability platform to dramatically simplify all stages of AI data management, training and monitoring. 

Storage is another important component of ML pipelines that require provisioning, configuration and management. The Nvidia systems in use by the Neu.ro AI Cloud utilize Mellanox switches and InfiniBand networking to ensure that data is moved and operated on with maximum efficiency and speed. Our cloud GPU systems are fully integrated with Nvidia-spec networking and storage for the fastest AI compute solutions on the market. 

How will our new Nvidia supercomputers impact revenue generation?

Readers who have taken notice of my previous Neuromation articles will also be aware of multiple forecasts predicting that the Deep Learning sector of AI is rapidly on the ascent with huge growth predicted over the coming years, indeed a consensus of 40% CAGR and a market size of up to $93bn by 2026 is forecast by specialist AI / Deep Learning researchers.

It is quite clear then that there is a massive demand for compute power to be able process all the data that today’s business environment produces. As the great majority of this data is unlabelled and unstructured it is only to Deep Learning that the owners of the data can turn.

The problem for the data owners is that acquiring all the Deep Learning infrastructure needed to process the data is expensive and not very balance sheet friendly, especially if it is only needed sporadically during the year, for example, a single Nvidia DGX supercomputer boasting 8 X A100 GPUs retails for US$199,000.

In its simplest form there are effectively only two types of end user when it comes to Deep Learning. Companies that have been established for decades, or longer, and who as a result have all their data sitting in legacy data centres which they own, these could be established pharmaceutical, aerospace or automobile manufacturers for example. On the other hand, we have companies who, as a result of being established more recently, have all their data in the cloud.

For the long established companies with their data held at in-house data centers it makes absolute sense to acquire their own Deep Learning infrastructure and these companies are not target clients for the Neu.ro cloud

For the newer companies who rely on the cloud for storage and application it is a completely different story, these organisations are continually adding to their data sets, especially since the onset of IoT, and almost all this data will be unlabelled and unstructured. It is to these data owners that Neuromation is now able to offer best of class Deep Learning as a Service on demand when they need it and for as long or as short a time as they need it.

As described in previous articles Neuromation has created a comprehensive MLops and Deep Learning suite of software tools that allow the computational scientist to maximise their return on processing all their data sets whether they be completely unstructured, a mixture of labelled and unlabelled or completely labelled.  These allow the scientist to both analyze and adjust their algorithmic inputs with a view to achieving a successful outcome to any one job. After all, he who achieves the best insights into their unstructured, unlabelled data and is able to draw the best possible conclusions from it will have much improved knowledge on what their customers are doing and will be able to design better products and services more quickly as a result, leading to a clear market advantage for them.

It is these companies and data owners that Neuromation will be looking to add as platform clients.

To give the reader an idea of the revenue generating potential of Nvidia A100 powered supercomputers it is worth having a look at Amazon Web Services (AWS) as they have a number of Nvidia DGX supercomputers (Nvidia’s off-the shelf model featuring 8 X A100 GPUs) in their own data centers and are already offering DLaaS options to their clients.

Typically AWS offers DLaaS on Nvidia DGX on the following terms:

Hourly use rate $32 per hour

Software tools up to a 40% premium or $12.80 per hour

Total up to $44.80 per hour

It is worth bearing in mind that Deep Learning involves a lot of computer training as well as processing of data sets, so the compute jobs undertaken in a Deep Learning scenario can last quite a long time. It is not unheard of for processing times to take days for very large jobs or even longer.

Under the AWS example above, the client would be charged US$1,075.20 per day for the full package or $7,526.40 per week.

For Neuromation’s Nvidia GPU supported cloud offering we will be undercutting the AWS price by a substantial margin that budget conscious scientific teams will find extremely attractive, in addition we are proposing not to charge an extra premium for the use of our comprehensive, best in class, suite of software tools. The reason here is that we want the AI community to embrace our software toolset as a key part of their AI activities, once they do this they will be more likely to return to the Neuromation platform for their next cloud job and Neuromation will benefit from repeat business which will reduce the cost of our customer acquisition over time.

It is also theoretically possible to utilize the fractional GPU capabilities of the A100 boards by servicing multiple jobs at once on each machine without a noticeable drop-off in client facing performance which opens the possibility for Neuromation, in some circumstances, to boost revenue generation, although it is fair to say that the reality of this possibility will only become known once we start to fully load the system with client jobs.

For now, then, we will be charging our platform clients US24 per hour inclusive of the suite of software tools per each 8 X A100 machine. This represents a 45% discount for example to Amazon Web Services (AWS) offering on similar Nvidia DGX machines. Our two 4 X A100 machines can be run together to create one 8 X A100 machine or can be run separately, but we are still looking at US$ 24 per hour as the base price per 8 GPU machine with the strong probability the this base cost will be exceeded over time as client use characteristics become apparent

Taking 8 GPUs as the minimum base case revenue revenue generation point we get the following minimum revenue prediction once our new supercomputers are fully loaded.

Cost per hour US$24 X 24 = $576 per day or US$ 4,032 per week

Of course as we effectively have 16 A100 GPUs deployed across our new machines this total can be doubled giving a base case revenue forecast of $1,152 per day or $8,064 per week.

These fees will ultimately need to be paid in  NTK  as this is the utility token of the Neuromation ecosystem. Platform clients will either need to buy their NTK on the market and it would certainly make sense for repeat clients to do so over time, or clients can pay Neuromation using a credit or debit card in fiat currency, in this case Neuromation will use the funds so received to buy NTK on the market to close the process as the NTK utility token is the only currency accepted by the Neuromation platform.

Neu.ro solves the environmental problem for platform clients

There has recently been a lot of press globally describing how data centers and the power hungry “e” world in general, with its voracious demand for electricity, is seriously bad news for planet Earth as the majority of the electricity that powers them comes from coal fired power stations. In one well known example, Tesla, the US electric car manufacturer first announced it would accept Bitcoin as payment for its vehicles only to suspend that offer a few months later when it became apparent how environmentally unfriendly bitcoin mining is given its use of electricity that in many cases comes from pollution causing generating sources.

Neuromation’s Neu.ro cloud has solved this issue by using only 100% renewable carbon-free energy. Unlike major cloud providers who tout their sustainability by utilizing carbon offsets to achieve carbon neutral status, our offering utilizes 100% renewable geothermal and hydro energy that releases no CO2 into the atmosphere. Furthermore, our location in a dedicated level 3 ISO27001-certified data center facility in Iceland offers the advantages of free-air cooling due to its arctic circle location. In the average data center, cooling represents around 40% of electricity consumption and can be as high as 80% in warmer climates. The Neu.ro AI Cloud running on 100% clean geothermal and hydro energy utilizes only 20% of its electricity consumption on cooling. 

Conclusion

There is a massive and growing demand for Deep Learning compute services provided on a DLaaS model. With the deployment of our first three Nvidia supercomputers for, which lie at the heart of the Neu.ro cloud, our data center operations in Iceland have taken a quantum leap in our ability to service client demands in the AI space in general across all categories and have moved Neuromation into a new league in our ability to generate substantial revenue which will ultimately be paid in our utility token, NTK. 

Moving forward it is our intention to acquire further Nvidia A100 GPU boards and deploy more of these massively powerful supercomputers into the Neu.ro cloud as and when opportunities present themselves.

We now have a complete offering including custom AI system development, MLOps as a service and AI optimized cloud infrastructure . 

Our services will include: 

  • MLOps as a service (CI/CD, infrastructure management, pipeline creation, project management and tool interoperability) via our Neu.ro platform 
  • Custom AI system research and development (including use case consulting, model selection and experimentation, data management, iterative testing) through our experienced AI development team of AI research PhDs and developers 
  • And now our own high speed 100% renewable energy Nvidia GPU-based cloud platform 

As always, Neu.ro remains committed to infrastructure portability, all solutions and our MLOps infrastructure platform can be deployed to any cloud, or on-premises. Given Nvidia’s leadership position in GPUs for AI and HPC, however, we have undertaken specific optimization for these assets, which we are able to take advantage of using our own Nvidia GPU-based cloud infrastructure. 

Furthermore, having full visibility and control over every aspect of the physical infrastructure, the MLOps management platform and the specific AI projects running on them will provide users with a high level of confidence in our ability to take AI Transformation consulting projects from concept through to development and deployment.

Finally, I would like to say that with our new Nvidia powered supercomputers we are now able to look the likes of AWS straight in the eye, as an equal, based on the comprehensive quality of our cloud offering, even though we are not as big as they are … yet!

Martin Birch

Kyiv, September 2021About the author: Martin serves as the Non-Executive Chairman of The Neuromation Group and is the Managing Partner of Eastern Europe focused investment bank, Empire State Capital