NVIDIA has announced that its launching today its brand new HGX A100 systems that incorporate the updated A100 PCIe GPU accelerators featuring twice the memory & faster bandwidth for HPC users.
NVIDIA Upgrades HGX A100 Systems With Flagship Ampere Based A100 HPC GPU Accelerators - 80 GB HBM2e Memory & 2 TB/s Bandwidth
The existing NVIDIA A100 HPC accelerator was introduced last year in June and it looks like the green team is planning to give it a major spec upgrade. The chip is based on NVIDIA's largest Ampere GPU, the A100, which measures 826mm2 and houses an insane 54 billion transistors. NVIDIA gives its HPC accelerators a spec boost during mid-cycle which means that we will be hearing about the next-generation accelerators at GTC 2022.
NVIDIA A100 Tensor Core GPUs deliver unprecedented HPC acceleration to solve complex AI, data analytics, model training and simulation challenges relevant to industrial HPC. A100 80GB PCIe GPUs increase GPU memory bandwidth 25 percent compared with the A100 40GB, to 2TB/s, and provide 80GB of HBM2e high-bandwidth memory.
The A100 80GB PCIe’s enormous memory capacity and high-memory bandwidth allow more data and larger neural networks to be held in memory, minimizing internode communication and energy consumption. Combined with faster memory bandwidth, it enables researchers to achieve higher throughput and faster results, maximizing the value of their IT investments.
A100 80GB PCIe is powered by the NVIDIA Ampere architecture, which features Multi-Instance GPU (MIG) technology to deliver acceleration for smaller workloads such as AI inference. MIG allows HPC systems to scale compute and memory down with guaranteed quality of service. In addition to PCIe, there are four- and eight-way NVIDIA HGX A100 configurations.
NVIDIA partner support for the A100 80GB PCIe includes Atos, Cisco, Dell Technologies, Fujitsu, H3C, HPE, Inspur, Lenovo, Penguin Computing, QCT and Supermicro. The HGX platform featuring A100-based GPUs interconnected via NVLink is also available via cloud services from Amazon Web Services, Microsoft Azure and Oracle Cloud Infrastructure.
In terms of specifications, the A100 PCIe GPU accelerator doesn't change much in terms of core configuration. The GA100 GPU retains the specifications we got to see on the 250W variant with 6912 CUDA cores arranged in 108 SM units, 432 Tensor Cores, and 80 GB of HBM2e memory that delivers higher bandwidth of 2.0 TB/s compared to 1.55 TB/s on the 40 GB variant.
The A100 SMX variant already comes with 80 GB memory but it doesn't feature the faster HBM2e dies like this upcoming A100 PCIe variant. This is also the most amount of memory ever featured on a PCIe-based graphics card but don't expect consumer graphics cards to feature such high capacities any time soon. What's interesting is that the power rating remains unchanged which means that we are looking at higher density chips binned for high-performance use cases.
The FP64 performance is still rated at 9.7/19.5 TFLOPs, FP32 performance is rated at 19.5 /156/312 TFLOPs (Sparsity), FP16 performance is rated at 312/624 TFLOPs (Sparsity) and the INT8 is rated at 624/1248 TOPs (Sparsity). NVIDIA is planning to release its latest HPC accelerator next week and we can also expect the pricing of over $20,000 US considering the 40 GB A100 variant sells for around $15,000 US.
In addition to these announcements, NVIDIA has also announced its new InfiniBand solution that provides configurations of up to 2048 points of NDR 400 Gb/s (or 4096 ports of NDR 200) with a total bi-directional throughput of 1.64 Pb/s. That alone is a 5x increase over the previous-gen and offers 32x higher AI accelerator.
NVIDIA Ampere GA100 GPU Based Tesla A100 Specs:
NVIDIA Tesla Graphics Card | Tesla K40 (PCI-Express) |
Tesla M40 (PCI-Express) |
Tesla P100 (PCI-Express) |
Tesla P100 (SXM2) | Tesla V100 (SXM2) | Tesla V100S (PCIe) | NVIDIA A100 (SXM4) | NVIDIA A100 (PCIe4) |
---|---|---|---|---|---|---|---|---|
GPU | GK110 (Kepler) | GM200 (Maxwell) | GP100 (Pascal) | GP100 (Pascal) | GV100 (Volta) | GV100 (Volta) | GA100 (Ampere) | GA100 (Ampere) |
Process Node | 28nm | 28nm | 16nm | 16nm | 12nm | 12nm | 7nm | 7nm |
Transistors | 7.1 Billion | 8 Billion | 15.3 Billion | 15.3 Billion | 21.1 Billion | 21.1 Billion | 54.2 Billion | 54.2 Billion |
GPU Die Size | 551 mm2 | 601 mm2 | 610 mm2 | 610 mm2 | 815mm2 | 815mm2 | 826mm2 | 826mm2 |
SMs | 15 | 24 | 56 | 56 | 80 | 80 | 108 | 108 |
TPCs | 15 | 24 | 28 | 28 | 40 | 40 | 54 | 54 |
FP32 CUDA Cores Per SM | 192 | 128 | 64 | 64 | 64 | 64 | 64 | 64 |
FP64 CUDA Cores / SM | 64 | 4 | 32 | 32 | 32 | 32 | 32 | 32 |
FP32 CUDA Cores | 2880 | 3072 | 3584 | 3584 | 5120 | 5120 | 6912 | 6912 |
FP64 CUDA Cores | 960 | 96 | 1792 | 1792 | 2560 | 2560 | 3456 | 3456 |
Tensor Cores | N/A | N/A | N/A | N/A | 640 | 640 | 432 | 432 |
Texture Units | 240 | 192 | 224 | 224 | 320 | 320 | 432 | 432 |
Boost Clock | 875 MHz | 1114 MHz | 1329MHz | 1480 MHz | 1530 MHz | 1601 MHz | 1410 MHz | 1410 MHz |
TOPs (DNN/AI) | N/A | N/A | N/A | N/A | 125 TOPs | 130 TOPs | 1248 TOPs 2496 TOPs with Sparsity |
1248 TOPs 2496 TOPs with Sparsity |
FP16 Compute | N/A | N/A | 18.7 TFLOPs | 21.2 TFLOPs | 30.4 TFLOPs | 32.8 TFLOPs | 312 TFLOPs 624 TFLOPs with Sparsity |
312 TFLOPs 624 TFLOPs with Sparsity |
FP32 Compute | 5.04 TFLOPs | 6.8 TFLOPs | 10.0 TFLOPs | 10.6 TFLOPs | 15.7 TFLOPs | 16.4 TFLOPs | 156 TFLOPs (19.5 TFLOPs standard) |
156 TFLOPs (19.5 TFLOPs standard) |
FP64 Compute | 1.68 TFLOPs | 0.2 TFLOPs | 4.7 TFLOPs | 5.30 TFLOPs | 7.80 TFLOPs | 8.2 TFLOPs | 19.5 TFLOPs (9.7 TFLOPs standard) |
19.5 TFLOPs (9.7 TFLOPs standard) |
Memory Interface | 384-bit GDDR5 | 384-bit GDDR5 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 6144-bit HBM2e | 6144-bit HBM2e |
Memory Size | 12 GB GDDR5 @ 288 GB/s | 24 GB GDDR5 @ 288 GB/s | 16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s |
16 GB HBM2 @ 732 GB/s | 16 GB HBM2 @ 900 GB/s | 16 GB HBM2 @ 1134 GB/s | Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s |
Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 2.0 TB/s |
L2 Cache Size | 1536 KB | 3072 KB | 4096 KB | 4096 KB | 6144 KB | 6144 KB | 40960 KB | 40960 KB |
TDP | 235W | 250W | 250W | 300W | 300W | 250W | 400W | 250W |
The post NVIDIA A100 80GB PCIe Accelerator Launched – Flagship Ampere Gets 2 TB/s Bandwidth For New HGX A100 Systems by Hassan Mujtaba appeared first on Wccftech.
source https://wccftech.com/nvidia-a100-80gb-pcie-accelerator-launched-flagship-ampere-hgx-a100-systems/