Chinese artificial intelligence pioneer, SenseTime, announced the launch of its latest invention on Tuesday – the Intern 2.5, which it claims to be the “largest multimodal open-source large-language model.” This development is the latest push by SenseTime and China as a whole, to upgrade AI technology, making it more powerful and applicable across different sectors.
Intern 2.5 was jointly developed by SenseTime, Shanghai Artificial Intelligence Laboratory, Tsinghua University, the Chinese University of Hong Kong, and Shanghai Jiao Tong University. The model boasts 3 billion parameters, making it the largest and most accurate on ImageNet among the world’s open-source models. Additionally, it is the only model in the object detection benchmark dataset COCO that exceeds 65.0 mAP, which is a remarkable achievement.
The ImageNet project, a large visual database designed for use in visual object recognition software research, served as the basis for the development of the Intern 2.5. SenseTime’s cross-modal open-task processing ability allows it to provide efficient and accurate perception and understanding support for general scenarios such as autonomous driving and robots, according to the company.
Intern 2.5 is a higher-level visual system that possesses universal scene perception and complex problem-solving capabilities. It can define tasks through text, making it possible to flexibly define the task requirements of different scenarios. With this, the model can give instructions or answers based on given visual images and prompts for tasks, thereby possessing advanced perception and complex problem-solving abilities in general scenarios such as image description, visual question-answering, visual reasoning, and text recognition.
The significance of Intern 2.5 cannot be overemphasized as it marks another giant leap towards the realization of a world that’s powered by artificial intelligence. SenseTime and its partners have done a remarkable job in developing a model that will revolutionize how AI is used across different sectors.