AI on the Fly®

OSS unique experience and capabilities brings the power of the datacenter to the most demanding edge applications. AI on the Fly defines the intersection of high-performance GPU accelerated computing and AI at the Edge. As the demand for artificial intelligence applications across a wide set of industries increases, the need for high-speed data acquisition, training AI models and AI inference making split second decisions in the field becomes more useful then reliance on slow network connections to sterile, fixed data centers. These Industries include deep learning, autonomous vehicles, military & aerospace and media & entertainment.

AI on the Fly edge applications have unique requirements over traditional embedded computing

Delivering the high performance required in edge applications necessitates native PCIe interconnectivity providing the fast, low latency data highway between high speed processors, NVMe storage and compute accelerators using GPUs or application specific FPGAs.

AI on the Fly Applications

The next decade will see a fundamental change in the way we get from point A to point B in our automobiles. The quest to remove humans from behind the wheel with truly autonomous vehicles will drive billions of dollars in investment by car manufacturers and transportation service providers to develop and acquire the required technology. According to the SAE international classifications for autonomous capabilities, we are only at Level 2, meaning only basic levels of driver assistance automation are being deployed in commercial vehicles today. However, many of the key players in the industry are projecting that Level 5 vehicles will be on the road by 2028, providing full automation of all dynamic driving tasks under all roadway and environmental conditions. Additionally, it is projected that by 2040 virtually all vehicles on the road will be fully automated, saving thousands of lives a year from automobile accidents and bringing the brief 150 year history of human driving to an end.

To reach this milestone, major car manufacturers and rideshare companies are starting to deploy fleets of development and prototype cars. These fleets are being used to gather the data required to develop and test the artificial intelligence algorithms, which will eventually be deployed in millions of commercial vehicles. The cars in these fleets need to be outfitted with specialized high performance edge computing equipment including high bandwidth data ingest systems tied to the myriad of video, radar and LIDAR sensors in the car, high capacity and low latency storage subsystems and high performance compute engines that can perform the AI machine learning and inference tasks needed to enable the vehicle to see, hear, think and make decisions just like human drivers.

In addition to performance requirements, there is also the need for specialization of this computer equipment in terms of form factor, cooling and ruggedization to meet the unique harsh environment of cars driving hundreds of thousand miles in all road and weather conditions. This combination of requirements is ideally addressed with AI on the Fly technologies where specialized high-performance accelerated computing resources for deep learning training are deployed in field near the data source; in this case, inside the vehicles themselves. In typical AI solutions, deep learning training has been a centralized datacenter process, and only inferencing occurs in the field. In contrast, AI on the Fly moves this capability to the edge and allows rapid response to new data with continual reinforcement and transfer learning. This is critical to effectively performing fundamental autonomous vehicle tasks such as obstacle detection and collision avoidance.

AI on the Fly is made of three modular sub-systems; data ingest, data storage and compute engines. These sub-systems support high speed components including data capture hardware, NVMe SSD storage and GPU and FPGA compute accelerators all with PCI Express interfaces for flexible scaling while maintaining high bandwidth and low latency. The data ingest system must be capable of absorbing the vast amounts of data continually flowing in from the sensors and process the data for efficient delivery to both the persistent storage as well as the compute engines. Features in PCIe allow for simultaneous multi-casting the data to the multiple sub-systems using RDMA transfers to avoid system memory bottleneck without additional network protocol latency. The compute functions include machine learning tasks using traditional data science tools, data analysis, deep learning training tasks using neural network frameworks and inference engines for prediction using trained models against newly sourced data. Each of these elements may require specialized GPU resources. AI on the Fly provides all of these elements in flexible building block components that are easily customized to the specific requirement of the autonomous vehicle developer. The figure below illustrates an example of AI on the Fly configurations for autonomous vehicles.

Figure 1. AI on the Fly Hardware Configurations

One Stop Systems is working with some industry leaders to provide technology for their autonomous vehicle development programs. These companies look to OSS as their trusted development partner because of its technical expertise in specialized high performance edge computing. They rely on OSS’s experience in developing scalable PCI Express based systems which tie together high bandwidth sensor data ingest sub-systems with low latency NVMe storage and ultra-high performance multi-GPUs all packaged in specialized rugged form factors. OSS recently announced a collaborative engineering design win for a major international network transportation company for deployment of AI on the Fly components in its 150 vehicle autonomous driving development fleet.

AI on the Fly is playing a key role in development of fully autonomous driving vehicles and will help to usher in fundamental changes to human transport over the next decade.

AI on the Fly puts computing and storage resources for the entire AI workflow, not in the datacenter, but on the edge near the sources of data. Applications are emerging for this new AI paradigm in diverse areas including autonomous vehicles, predictive personalized medicine, battlefield command and control, and industrial automation. The common elements of these solutions are high data rate acquisition, high speed low latency storage and efficient high performance AI training and inference computing. All of these building block elements are connected seamlessly with memory mapped PCI Express system, interconnected and customized as appropriate, to meet the specific environmental requirements of ‘in the field’ installations.

At the front end of these systems is high speed data acquisition technology. Depending on the application, the data can be generated from a wide array of sensors. In the case of autonomous vehicles, data is generated through arrays of video and LIDAR sensors. In battlefield applications radar, sonar, FLIR (infrared), and RF sensors are deployed. Medical applications use MRI or CT sensors. In security applications, networks of security cameras produce high volumes of video data. Industrial automation includes telemetry data from IoT sensors and video feeds at a wide spectrum of frame rates.

Figure 1. AI on the Fly Hardware Configurations

Although data rates vary for all of these applications based on the sensors or array of sensors deployed, the fundamental requirements for the ingest subsystem is that it supports high speed data rates, does not allow loss of data, and does not impose flow control on the sensor data stream. For many AI on the Fly applications, local data rates can be extremely high, as much as 100 Gbps per stream or more, requiring specialized PCIe data capture hardware. As part of the capture process the data is often processed in real time to be formatted in a useable form prior to movement to the storage devices. (See Figure 1: Data Clean & Prep). Capture hardware can be in the form of PCIe FPGAs, video capture (encoded and raw), frame grabbers, or smart NICs performing a range of functions including tagging, encoding, sorting, analog to digital conversion, filtering, time stamping, and channel synchronization. The data rates of the acquisition front-end drive high performance requirements for the storage subsystem, necessitating the use of direct PCIe attached NVMe SSD storage devices. In these systems, storage needs to be able to scale to potentially 10’s or 100’s of TBs. Additional storage subsystem requirements include high availability/redundancy, security and optional support for removable storage media.

Few companies have the range of expertise to develop and deploy these edge-focused data acquisition systems in support of AI workflows. One Stop Systems (OSS) with expertise in large scale high performance specialized PCI Express, NVMe storage and AI-system architecture is one such company. OSS recently announced a flexible AI data acquisition platform based on end-to-end PCIe Gen 4 using AMD’s latest generation EPYC™ 7002 processor servers and Gen 4 NVMe SSDs to achieve 56GB/s data ingress capability. The entire solution includes a 1U AMD 2nd generation EPYC 7002 processor based server and an OSS 4UV PCIe Gen 4 scale-out expansion system with up to 16 32TB NVMe cards. The expansion system directly attaches to the server with two OSS PCIe x16 Gen 4 links providing 512 gbps bandwidth and can provide up to 512TB storage capacity. Variations of the system configuration can support up to 8 x16 PCIe Gen 4 data acquisition cards.

A representative example of AI on the Fly data acquisition is autonomous vehicle development fleets. OSS recently announced a collaborative engineering design win with a major international network transportation company for deployment of AI on the Fly components in its 150 vehicle autonomous driving development fleet. This fleet is being used to gather the data required to develop and test the artificial intelligence algorithms which will eventually be deployed in thousands of commercial vehicles. In this case, the AI on the Fly data ingest system is tied to a myriad of video, radar and LIDAR sensors in the car aggregated through redundant 50 gbps Ethernet connections to the storage subsystem, which is then interconnected directly to multi-GPU machine learning training and inference compute engines. The entire system is deployed in the trunk of the automobile.

In many AI applications, transporting large amounts of data back to a remote datacenter is impractical and undesirable. With AI on the Fly the entire AI workflow resides at the edge at the data source. High performance scalable data acquisition is a fundamental and enabling component of this emerging new paradigm.

Ingest, Data Clean & Prep, and Storage

U.2 Datacenter

The 2U NVMe storage array offers flexible capacity while maintaining the high-bandwidth and low-latency pedigree of Ion Accelerator™ arrays deployed in hundreds of global installations.

Learn More

 

 

HHHL Rugged

The high-performance, field-ready FSAn-4R (Ruggedized) NVMe All-Flash Array provides a new level of performance for applications such as real-time data storage, high-speed data recording, data analytics and big data.

Learn More

Data Explore (ETL) and Model Prototype Development

DS-Pro

Data scientists require the best tools to transform the large data sets from the field into workable AI models. The Data Science Pro workstation comes packed with all of the AI frameworks and utilities required in a truly "AI Ready" workstation for fast time to market. The DS-Pro also comes with full service NVIDIA support for AI model creation to keep the AI project moving forward.

Learn More

Machine Learning/Deep Learning

16x SXM3 Datacenter (HGX-2)

The 10U GPU accelerated server (OSS-VOLTA16)is an HGX-2 platform with unprecedented compute power, bandwidth, and memory topology to train massive models, analyze datasets, and solve simulations faster and more efficiently than previously possible in a single server.

Learn More

16x PCIe Datacenter

Our compute accelerators support up to sixteen PCIe GPUs connected by a 4GB/s PCIe switched fabric to provide high performance at a value price compared to proprietary SXM architectures. Compute accelerators expand any Intel, AMD, Power or ARM server with the latest GPU, FPGA and NVMe accelerators and storage products.

Learn More

16x SXM3 Rugged (HGX-2)

Accelerated computing for AI training and re-training can occur in harsh environments where standard servers dare not go. The OSS AI on the Fly® HGX products can operate up to 16 GPUs with a 360GB/s NVLink hybrid mesh in the harshest environments. Whether the datacenter power is needed in the trunk of a car or a data center in the sky, the OSS HGX products can handle the highest performance tasks without sacrificing performance with "low-voltage" or "low-power" components.

 

 

Rackscale Datacenter Composable Infrastructure

The GPUltima-CI is a power-optimized HPC rack-level solution that combines AMD, Epyc or Intel Xeon Scalable Architecture compute nodes, with hundreds of PCIe network adapters, NVIDIA® Volta® GPUs with NVLink or RTX Quadro, and NVMe drives based on a composable switched PCIe fabric. This highly configurable fabric can compose bare metal servers in virtually any configuration in seconds using job scheduling software to increase server resource utilization by up to 100%. Put the right amount of server and GPU resources to work in mixed-workload datacenters on AI training, scientific modeling, architectural rendering and traditional HPC applications without purchasing costly, underutilized, hyper-converged servers.

Learn More

Inferencing with Realtime Dataset

32x T4 Scale-out Datacenter

Large scale-out inferencing platforms can make tens of thousands of decision in short periods of time due to the ability to connect up to 32 Turing architecture GPUs to a single server. The OSS 3US uses a 512 Gb/s connection to Intel, AMD, Power or ARM processor servers to provide AI inference at scale and in harsh environments. The removable canisters assist in upgrades and serviceability along with the low size-weight and power expected in edge applications.

 

 

5x RTX6000 Rugged

Let OSS design the perfect-fit AI on the Fly system for today's HPC at the edge applications using the full performance, commercially available building blocks with no "embedded" compromises. OSS engineers design solutions to power and cool up to five RTX6000 GPUs in the trunk of an autonomous vehicle, design rack solutions for mobile data centers and MIL-STD rugged compute and storage systems for data centers in the sky. Call or email our highly skilled sales engineers with any design challenge by clicking here.

Other HPC Applications

Our products can be customized to fit a wide range of other high performance computing applications. See how our products can be used for other applications, such as Machine Learning, Defense, Finance, Medical, Research, Oil and Gas, Media & Entertainment, and VDI.

Learn More