Musk hits OpenAI, the world’s largest Big Mac model Grok-1 is open source! 314 billion parameters, 8 moes, and GitHub is crazy about 6k stars
Original Zhiyuan Xin Zhiyuan Xin
Zhiyuan Xin reports.
Editor: editorial department
[Introduction to Zhiyuan Xin] Grok of xAI opened its source as scheduled after seven days! Open the code behind Grok and let Musk slap his face at OpenAI. There are 8 MOEs with 314 billion parameters, and the weight structure is fully open. As soon as the project is launched, it has already snapped up 6k stars on GitHub.
When it comes to doing it, Grok of Musk xAI is really open source as scheduled!
Just now, xAI officially released the weight and architecture of the 314 billion parameter hybrid expert model Grok-1.
With a parameter of 314 billion, Grok-1 has become the largest open source LLM so far, four times that of Llama 2.
At present, xAI has not disclosed more information about Grok-1.
The information released by official website is as follows-
-The basic model is trained on a large amount of text data and has not been fine-tuned for any specific task.
The MoE of the-314B parameter has a weight of 25% that is active on a given token.
-In October, 2023, xAI started training from scratch using a custom training stack based on JAX and Rust.
Once online GitHub, Grok snapped up 6k stars and 586 Fork.
Project address: https://github.com/xai-org/grok-1.
Musk did not forget to ridicule OpenAI, "Tell us more about the" open "part of OpenAI …"
The new york Times commented that the original code behind the open source Gork is an upgrade of the future battle that the richest man in the world controls AI.
Will open source make technology safer or will it make it more abusive?
Musk, an "open source supporter", set an example and got involved in this heated debate in the AI community, and gave the answer with actions.
Xiao Zha also commented on Grok just now. "It didn’t really impress people. There are too many parameters of 314 billion. You need a bunch of H100, but I have already bought it."
A magnetic chain, the world’s first largest model open source.
This time, xAI open source Grok-1 is licensed by Apache-2.0, so users can freely use, modify and distribute the software.
The repository contains JAX sample code for loading and running the Grok-1 open source weight model.
Users need to download checkpoint, put the ckpt-0 directory in checkpoint, and then run the following code to test:
pip install -r requirements.txtpython run.py
This script will load the checkpoint and the samples in the model on the test input.
Because the model is large and the parameters reach 314B, a computer with enough GPU memory is needed to test the model with sample code.
Moreover, because the implementation efficiency of MoE layer in this repository is not high, this implementation is chosen to avoid the need to customize the kernel to verify the correctness of the model.
Through the Torrent client and the following magnetic link, you can download the weights.
magnet:? xt=urn:btih:5f96d43576e3d386c9ba65b883210a393b68210e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php& tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
More details
Andrew Kean Gao, a Stanford researcher, browsed model.py and introduced more information about the architecture of Grok. There are no additional clauses for the 314 billion parameters.
8 mixed experts (2 active experts), 860B active parameters. It uses rotation embedding instead of fixed position embedding.
-tokenizer vocabulary: 131,072 (similar to GPT-4) is equivalent to 2 17.
-Embedded size: 6144(48*128)
-Transformer layer: 64 (each layer has a decoding layer: multi-head attention block and density block)
-Key size: 128
Multi-head attention module: there are 48 query headers and 8 key values.
Dense block (dense feedforward block):
-Width factor: 8
-The hidden layer size is 32768.
Each token selects 2 out of 8 experts.
The embedding size of the rotation position is 6144, which is meaningful, and it is the same as the input embedding size of the model.
-Context length: 8192 token
-Accuracy: bf16
Finally, attach a summary map.
Netizen: Open source battle is coming.
The AI community is already boiling!
The technical community points out that the highlight of Grok is the use of GeGLU and normalization method in the forward feedback layer, and the use of interesting sandwich norm technique.
Even OpenAI employees have expressed their strong interest in Grok.
Jim Fan, a senior scientist in NVIDIA, said, "The largest open source model in history was trained by a world-class team and released through the magnetic chain Apache 2.0.
314B parameter, mixed expert model (8 experts and 2 active). Even the active parameter scale (86B) exceeds the largest Llama model. I can’t wait to see the benchmark results and what kind of applications people can build with it. "
Sebastian Raschka, an AI researcher, said that Grok is more open-source than other open-source modular models that usually have restrictions on use. However, it is less open source than Pythia, Bloom and Olmo, which provide training code and reproducible data sets.
Boris Dayma, the founder of Craiyon, analyzed the code of Grok-1 in detail.
Netizen indigo said that in order to "understand the universe", it seems that the xAI team deliberately set the parameter as "pi 314B", which is the largest open source model at present. It is expected that Llama 3 will join Grok’s open source battle in June this year.
After Grok opened the source, a big wave of fine-tuning is coming.
The first generation Grok has surpassed Llama-2-70B.
In November, 2023, xAI launched its first generation of big language model, Grok, and formally joined the big model war.
At that time, Grok used Twitter as part of the "Premium+" subscription service, with a monthly fee of $16.
XAI said that Grok’s design was inspired by The Hitchhiker’s Guide to the Galaxy, which can answer almost all questions and help people pursue understanding and knowledge regardless of background or political stance.
Grok’s original version, Grok-0, had 33 billion parameters, and then xAI introduced Grok-1, which was improved several times to support the Grok chat robot on X.
According to the data released by xAI, in a series of benchmark tests such as GSM8K, HumanEval and MMLU, Grok-1 outperformed LLMA-2-70b and GPT-3.5, although it was far from GPT-4.
At that time, Grok was not only able to handle the user content generated in real time on the X platform, but also had a little sense of humor, which injected a little vitality into the simple AI.
While providing information about the latest hot events (whether political or sports), it can also be clever and even occasionally sarcastic.
Why did Musk choose open source?
After mocking OpenAI as "CloseAI" several times, Musk really chose to open his own big model.
Of course, there must be commercial considerations behind this.
As a market leader, OpenAI has no reason to open source the model code behind ChatGPT.
Now, by releasing Grok’s code, Musk firmly rooted himself in the latter camp. This decision may make his xAI surpass Meta and Mistral AI.
Llama’s open source has brought many benefits to Meta, and almost made Xiaozha climb out of the quagmire of the meta-universe.
Mistral AI, which is just a small start-up, is also famous for its open source strategy and is recognized by the industry as "OpenAI in Europe".
The open source version may encourage developers and potential customers to adopt their own models more quickly, which actually plays a marketing role.
The feedback and improvement from the developer community on the open source version of Grok may also help xAI to speed up the development of new versions, which can choose to open source or keep exclusive rights.
For example, like Mistral, it promises to continue to release open source versions while retaining the most advanced models for paying customers.
Musk has always been a supporter of open source technology. Even Tesla has opened the source code of some parts of the car, and his social media platform X has published some algorithms for content ranking.
"There is still work to be done, but this platform is by far the most transparent, truth-oriented and not a high threshold platform," Musk said today in response to comments on the open source X recommendation algorithm.
Although OpenAI is still far ahead in the AI field, the war between open source and closed source is far from over.
Should the AI model be open source? Some people think that this powerful technology must be protected from intruders, while others insist that the advantages of open source absolutely outweigh the disadvantages.
As a market leader, OpenAI has no reason to open source the model code behind ChatGPT.
Now, by releasing Grok’s code, Musk firmly rooted himself in the latter camp.
This decision may enable his xAI to finally surpass Meta and Mistral AI.
References:
https://x.ai/blog/grok-os
here's your DEEP DIVE into @grok's architecture!
I just went through the https://t.co/8Y5cjeImg6, for this 314B open source behemoth with *no strings attached*.— andrew gao (@itsandrewgao) March 17, 2024
https://www.wired.com/story/elon-musk-no-choice-open-chatbot-grok/
Original title: "Musk faces OpenAI, the world’s largest Big Mac model Grok-1 is open source! 314 billion parameters, 8 MoE, GitHub madly took 6k stars "
Read the original text