In a landmark achievement for India’s AI ecosystem, the BharatGen initiative — a government-backed effort to build India-centric multimodal large language models — has unveiled Patram-7B-Instruct, the country’s first vision-language foundational model designed specifically for complex document understanding.
Developed by a team from the International Institute of Information Technology, Hyderabad (IIIT-H) and the Indian Institute of Technology, Bombay (IIT-B), Patram is part of the BharatGen suite of multimodal AI models funded by the Department of Science and Technology (DST). With 7 billion parameters, the model has been built from the ground up and trained on a massive collection of diverse Indian documents — enabling it to interpret scanned and photographed documents and respond to natural-language instructions with precision.
Despite its compact size, Patram outperforms several global models, including DeepSeek-VL-2, on benchmark tasks like DocVQA and VisualMRC, and shows robust results on Patram-Bench, a custom benchmark based on real-world Indian document scenarios.
Officially launched on June 2, 2025, at the BharatGen National Summit in New Delhi, Patram was unveiled by Jitendra Singh, Hon’ble Minister of State for Science and Technology, in the presence of key dignitaries including Abhay Karandikar (dst secretary), Kris Gopalakrishnan (chair, MGB-NMICPS), and Abhishek Singh (additional secretary, MeitY). The launch was also attended by P. J. Narayanan, director of IIIT-Hyderabad.
“Patram marks a significant step as India designs state-of-the-art foundational models,” said Prof. Narayanan. “With this launch, we integrate language in all forms — text, speech, and images — unlocking powerful new capabilities for multimodal applications.”
Developed in just five months by a dynamic team of engineers, alumni, and student interns at IIIT-Hyderabad — with support from TiH-IoT at IIT Bombay — the project was co-led by Dr. Ravi Kiran Sarvadevabhatla (IIIT-H) and Dr. Ganesh Ramakrishnan (IIT-B).
“With Patram, we’ve built a model that truly understands the structure and diversity of Indian documents,” said Dr. Sarvadevabhatla. “This is only the beginning of what India can achieve in vision-language AI.“
Alongside Patram, the team also launched DocBodh, a generative AI suite tailored for Indic document intelligence. Built for use in governance, education, law, and business, DocBodh promises to accelerate digital transformation across key sectors.
Patram-7B-Instruct is now available as a fully open-source model on Hugging Face and the IndiaAI’s AIKosh platform by MeitY — reinforcing India’s vision of building inclusive, sovereign AI infrastructure aligned with Digital India and Atmanirbhar Bharat.
Also Read: IIIT Hyderabad’s 25th Anniversary: Online Master’s in Computer Science launched
Posted in National, News