Microsoft Strung Together Tens of Thousands of Chips in a Pricey Supercomputer for OpenAI
When Microsoft invested $1 billion in OpenAI in 2019, it agreed to build a massive, cutting-edge supercomputer for the artificial intelligence research startup. The only problem: Microsoft didn’t have anything like what OpenAI needed and wasn’t totally sure it could build something that big in its Azure cloud service without it breaking. From a report: OpenAI was trying to train an increasingly large set of artificial intelligence programs called models, which were ingesting greater volumes of data and learning more and more parameters, the variables the AI system has sussed out through training and retraining. That meant OpenAI needed access to powerful cloud computing services for long periods of time. To meet that challenge, Microsoft had to find ways to string together tens of thousands of Nvidia’s A100 graphics chips — the workhorse for training AI models — and change how it positions servers on racks to prevent power outages. Scott Guthrie, the Microsoft executive vice president who oversees cloud and AI, wouldn’t give a specific cost for the project, but said “it’s probably larger” than several hundred million dollars. […] Now Microsoft uses that same set of resources it built for OpenAI to train and run its own large artificial intelligence models, including the new Bing search bot introduced last month. It also sells the system to other customers. The software giant is already at work on the next generation of the AI supercomputer, part of an expanded deal with OpenAI in which Microsoft added $10 billion to its investment.
Read more of this story at Slashdot.