The Intel Nervana processor goes beyond the terabit bandwidth with a refined architecture -


Post Top Ad

Post Top Ad

Thursday, 7 December 2017

The Intel Nervana processor goes beyond the terabit bandwidth with a refined architecture

What's new in artificial intelligence: Inter Nervana processor

Carey Kloss, vice president of hardware for Intel's artificial intelligence product group, gave an update on the refinements made to the Nervana architecture. The first thing is to understand the requirements of what a neural network processor (NNP) should do. Training a machine using a neural network requires massive amounts of arithmetic and memory operations to generate useful results. Escalation capabilities, energy consumption and maximum utilization are the main considerations for the spatial architecture of Nervana. To achieve maximum energy savings, data should not move within the system unless absolutely necessary. The vector data can be divided between memory modules so that the necessary data is always close to the place where they are needed.

With the implementation of a memory of a powerful bandwidth (HBM), it is possible that the bandwidth exceeds 1TB / s between the external memory banks and in the matrix. Although this is an impressive figure, the memory bandwidth is still a limiting factor for deep learning workloads. As Intel can not wait for new memory technologies to develop, other creative options have been developed.

Software control of memory usage allows the memory in the matrix to load the information from the external memory once and then change the data between local memory modules. Each module is approximately 2MB with a total of about 30MB per Nervana chip. The reduction of the readings to the external memory helps to avoid saturation of the bandwidth of the memory and allows to preset the following set of data necessary for the operation.

An update to the Flexpoint data type allows performance similar to 32-bit floating-point operations using only 16 bits of storage. Using half the number of bits, the available memory bandwidth is effectively doubled. Flexpoint is also modular so that future Nervana generations can further reduce the number of bits that Flexpoint requires.

The communication between the chip and the external components has been significantly improved to offer bidirectional terabit-class performance. A Nervana chip cluster can work on a single task as if the cluster were a single mass processor due to high-speed communication between the chips.

No comments:

Post a Comment

Post Top Ad