Product Expertise Development Manager, Department of development of advanced technologies and solutions

Focus on the future

01.06.2021

Special bare metal for neural networks. A luxury or a necessity?

The retail trade is an excellent example of an industry in which the technologies of artificial intelligence and data analysis are applied and in great demand. Forecast models and neural networks for application tasks all help retailers in actual practice, if not in theory, improve their range, plan warehouse logistics and enhance the customer experience. Today we are talking about the bare metal which makes these intellectual solutions work.

How can AI be calculated?

Mathematical modeling, the work of neural systems, predictive systems learning, etc., all require serious computing capability and specific architectures of high-performance hardware and software devices.

A situation where calculations can take hours and even days is very typical of this kind of remit. But for business this can be too long and even dangerous, the reaction often has to be immediate and frequently also requires proactive measures to save on money, health, resources and so on.

The first step towards speeding up the computing process was the use of graphics processing units - production cards created initially for graphics work but which proved to be highly effective for other calculations, too, not connected with graphics.

Such cards can be installed into standard slots (for instance, PCIe). More than one can be installed at the same time, and all modern apps with an intensive computing capability can use them. SXM4 technologies can "unpick" the GPU on the card, and this enables data transfer delays through the bus to be kept to a minimum.

The technological input of NVIDIA

The main developer and manufacturer of graphics accelerators in the world is NVIDIA. Their flagship product is the A100 accelerator, with very impressive features: 8192 cores, including tensor cores with up to 80 GB memory and performance of up to 9.7 TFLOPS.

The weak point of systems with graphic accelerators is data exchange. In order to combine many computing processors in the network, NVIDIA proposed its own NVLink databus. In its most recent versions it reaches a bandwidth of 600 gigabytes per second for one GPU (ten times faster than PCIe v4).

The idea was further developed with the appearance of NVSwitch, which is capable of combining up to 12 NVLink buses and opening the way to using graphics accelerators to build powerful and easily scalable computing clusters.

NVIDIA would not be the market leader if it had not proposed its own implementation for such a cluster: the DGX A100.

This is a server in which A100 cards are installed not in slots, but are 'unpicked' on the circuit board and connected to the NVSwitch. High-speed Mellanox network switches (the company was bought in 2020 by NVIDIA) enable a cluster to be built from several DGX A100 and a remarkable volume of computing resources to be concentrated in one place.

How are DGX good for AI?

A closed-loop predictable ecosystem. DGX is not only a server but a fully-fledged hardware and software system. NVIDIA supports an immense software landscape for learning neural networks and their in-house operation and data analysis. The software component of NVIDIA DGX is based on a powerful hardware platform which is technologically the most advanced in the world.

NVIDIA's competence in data science. NVIDIA employs several hundred people, developers and analysts, around the world to teach neural systems and build predictive models. In other words, NVIDIA possesses excellent market competences for supporting clients who are involved with data science, and ensures them one entry point to solve all issues.

NGC Catalog. It contains many ready-made solutions optimized for graphics processors, as developed both by NVIDIA and its partners. These are solutions for artificial intelligence, machine learning and high-performance computing tasks. Thanks to previously learned artificial intelligence models and the SDK sectors, clients can solve their tasks quicker than ever before.

Accelerated computing for Sportsmaster

Sportsmaster is a large retailer with classic challenges for its business. Artificial intelligence models in Sportsmaster are used to forecast demand and optimize commodity operational management, in order to avoid shortages of goods in shops and the accumulation of unsaleable goods in warehouses. For instance, each day the retail chain receives an order from thousands of shops and tens of thousands of sales points.

The next challenge is to increase sales conversions and improve the efficiency of loyalty programs. Marketing campaigns, especially targeted campaigns, require constant evaluation and optimization. Another business challenge is to recruit the necessary number of staff to work in shops during peak demand.
The volume of data and the sheer number of factors that impact on projection data are huge and ever-increasing, while the time for making decisions gets shorter and shorter. A person is unable to solve such formidable optimization challenges alone, so to his aid come artificial intelligence systems.

Sportmaster's data analysis team have a wealth of experience in modeling the company's complex business processes, but learning the models on existing capabilities (incidentally, a 10-node Big Data cluster) took up far too much time. Once it had decided to adopt a specialized platform with accelerated computing for GPU, the company turned its attention to the NVIDIA DGX-2 system.

As a pilot project NVIDIA installed DGX-2's 'younger brother' – the NVIDIA DGX Station server - in the client's data center. To get the maximum performance from DGX's resources, scripts for preparing data and the learning process had to be rewritten and selected service machine learning models tested.

The results turned out to be extremely optimistic. The key factor was the GPU-compatibility of specific algorithms in certain artificial intelligence frameworks. Algorithms with GPU support (especially the full implementation of the multi-threaded GPU mode) demonstrated a dramatic improvement in performance. For instance, on a gradient boosting in Catboost a 30-fold increase was recorded, and on XGBOOST in H2O this was to a factor of 20.

Following the tests, and allowing for the active development of the open-source RAPIDS project, supported by NVIDIA (where more and more algorithms are moved to GPU), the client decided to adopt NVIDIA DGX-2. After formulating the exact technical requirements, two DGX-2 stations were installed in the data center.

The migration of artificial intelligence service models to the production-platform DGX-2 confirmed the successful findings of the pilot test and demonstrated an additional improvement in performance through the more powerful bare metal DGX-2. A window of opportunity was opened to ramp up functionality. For instance, moving over from weekly to daily technology of full model learning, a significant increase in the number of factors and the possibility of model 'staggering', the use of new resource-intense algorithms to improve the quality of forecasting and classification results.

A few figures: if on the previous 'bare metal' model learning took 12 hours, on the DGX-2 it is only 2.5 hours. When the learning was over, inference, which on the previous bare metal took 36 hours, was completed on DGX-2 after 4 hours.

Softline's expertise

The software and hardware system we installed has no equivalent in Russia, and only a few of NVIDIA's partners are involved in its promotion on the domestic market, with Softline among them. At the present time Softline has the highest market vendor status: ELITE.

Softline's product and services portfolio is constantly being enlarged with its own solutions and those of vendors in the sphere of artificial intelligence. We can suggest to our clients the SuperPod solution, a cluster with the DGX computing component, storage devices and a high-speed network infrastructure.

A sizeable and important category of our clients is the university sector. ASIP (Almetevsk State Institute of Petroleum) on NVIDIA's bare metal solves challenges brought about by oil and gas exploration, in the Tatarstan Institute of Applied Semiotics a Russian-Tatar translation guide is being developed, in the Saint Petersburg State Electrical Engineering University 'LSEEU' young specialists are trained in accordance with a Presidential Decree.