- This event has passed.
MATRIX Spring Seminar Series – Dr. Roland Green
March 4, 2022 • 11:00 am - 12:00 pm
Training of Giant Neural Networks with Weight Streaming
Roland Green
Cerebras Systems
https://utsa.zoom.us/j/92387759081
Friday, March 4, 2022
11 AM – 12 PM CST
State-of-the-art language models have grown in parameter count by three orders of magnitude over the last two years. This growth has presented challenges for training both in terms of storage and compute requirements. In this talk, we survey existing approaches used to scale training to clusters of compute units and explore the limitations of each in the face of giant models. One thing that all these approaches have in common is the storage of model parameters on the compute units, which we find to be a primary driver of complexity and communication overhead. We present a new paradigm for giant model training, called weight streaming, based on the disaggregation of model storage and compute, and describe an implementation of this paradigm using Cerebras wafer-scale systems. Our weight streaming architecture enables the training of models two orders of magnitude larger than the current state-of-the-art, with a simple scaling model. Combined with built-in support for weight sparsity, our solution can make training giant networks tractable for the first time.