Skip to content

Chinese LLMs Tracker

Image Source: Scott Rodgerson/Unsplash

As the best-known large language models, such as OpenAI’s ChatGPT, Google’s Gemini and Meta’s LLaMa, are officially not available in China and only accessible by VPNs, Chinese conglomerates and recently founded start-ups are racing to provide domestic alternatives. Unlike their US counterparts, Chinese LLMs must be first approved by Chinese authorities before becoming publicly available. Since August 2023, 46 models developed by 44 different companies and research institutions have been approved. As of November 2023, media reports indicated that there are about 200 LLMs in China. However, compiling such a database is currently beyond the scope of our project. Instead, it focuses on the approved models which Chinese users are most likely to encounter. The database is based on this article, which provided a first comprehensive evidence of all the LLMs approved by the authorities.

This database aims to offer a basis for researchers and journalists exploring the Chinese AI sectors, highlighting dominant companies and trends in LLM development. It includes brief descriptions of the companies behind the LLMs, details on the models including release data and specifications, and the hardware, specifically GPUs, used for training. This information is particularly relevant in light of the ongoing debate on the impact of US export controls on Chinese companies’ access to advanced chips, potentially slowing AI sector growth, including LLM development.

This database is undoubtedly a work in progress, as information remains limited. While some developers have released detailed technical reports on their models’ training processes, limitations, and performance benchmarks, information on other LLMs is scarce, often limited to brief media mentions.

Notes: Click on the company or LLM mentioned in the table to see details about the specific model and use the filters to see the various batches of approvals and the GPUs used for training. The release date refers to the most up-to-date version of the model, if available, and the parameters refer to the number of parameters that were used to train the model so that it may understand the human language. Usually, the more parameters a model has, the more complex it can be. However, newer models tend to have better algorithms that allow to achieve better learning abilities even with lower parameters.

Should you identify any inaccuracies or have updated information, please reach out to Veronika Blablová (veronika.blablova@amo.cz).

Written by

Veronika Blablová

Veronika Blablová is an analyst at CEIAS.