Who We Are
Indic-Scripts Research Forum is a newly formed research group based in Pune, India,
focused on building machine-readable systems for ancient and historic Indian scripts.
Our work currently centers on Modi-scripted Marathi, its transliteration and translation
into modern Marathi, with a roadmap that includes other Indic scripts such as Pali and Brahmi.
Why We Exist
Existing efforts to transliterate Modi script into Devanagari have been limited in scope and accuracy.
Our own tests on publicly announced models show that sentence-level approaches with small datasets
often produce illegible and historically unusable output.
We are addressing this gap by investing the time and effort required to construct large,
high-quality character and word-level datasets and robust AI pipelines grounded in
dictionaries and expert knowledge.
What We Do
- Focus on past experimentations at the Modi character level research
- Test characters to form words
- Reprogram to improve word construction algorithms
- Test algorithms to transliterate Modi scripted words to Marathi
- Build word-level paired datasets:
- Modi script → transliterated Marathi (Devanagari)
- Transliteration → modern Marathi equivalents
- Digitize, code, and build datasets using:
- Historic dictionaries (e.g., Aitihasik Shabdakosh and other Marathi/old Marathi dictionaries)
- Marathi Vishwakosh and other reference works
- Thousands of Maratha history books and documents
- Use vector databases and retrieval-augmented generation (RAG) to:
- Link words, meanings, places, people, events
- Provide context-aware answers and translations
- Design and fine-tune specialized models and a future Maratha history LLM
Vision
Our long-term vision is to create reliable AI tools that can read, understand,
and explain historic Indian documents — not just at the surface level of text,
but with awareness of period-specific language, idioms, and historical context.
We want historians, archivists, students, and citizens to be able to access
centuries of material that is today locked away in scripts very few people can read.