Om Communication optimization in Machine Learning
Recent studies showed that for large models, such as GPT-3 which requires 355 years to complete the training using one fastest GPU, it is necessary to use thousands of GPUs to finish the training. Therefore the design of scalable distributed training system imposes a significant implication for the future development of machine learning. One major bottleneck for the scalability of the training system is the communication cost, which could totally overweight the computation cost on commodity systems with that offer limited network bandwidth or high network latency.
Visa mer