Decoding India’s
Historic Scripts with AI

From Modi Script to Modern Marathi:
Building New datasets for IndicScript AI models.

50,000+Word-Level Pairs
(In Progress)

300+ Yearsof Maratha
History

AuthenticManuscripts &
Historical Sources

AI forLanguage,
History & Heritage

Preserving History. Empowering AI. Enriching the Future.

Indic-Scripts Research Forum is a Pune-based AI research company building ground‑up, word‑level datasets and models to transliterate and translate historic Indian scripts — starting with Modi-scripted Marathi — into modern languages like Marathi, Hindi, and English.

We believe that accurate AI for historic scripts is only possible when it is built on carefully curated, high‑quality datasets. Our team works with verified Modi manuscripts, transliterated documents, and historic Marathi dictionaries to build large, paired word datasets. These datasets power robust transliteration and translation AI models to eventually accomplish our goal to achieve 80% accuracy in machine translations. As we are focusing on last three hundred years of Modi-scripted manuscripts it will automatically lay the foundation for a future Maratha history large language model.

Data First

We are building a large Modi–Marathi and Modern Marathi dataset from authentic sources such as historic Peshwe-period letters, Fermans, pothi, biographies, dictionaries, and other Modi documents.

Expert-Validated

We collaborate with Modi script experts and historians to verify transliterations and word pairings, ensuring that outputs are legible, historically accurate, and context-aware.

Purpose-Built Models

Our goal is to build AI systems that go beyond basic TSOCR — models that understand context, relationships, and historical terminology to support serious research and heritage preservation.

Our Key Projects

Modi-Marathi
Word-Level Dataset

Building a dataset of 50,000+ word-level pairs from authentic Modi manuscripts, transliterated Marathi texts, and modern equivalents.

Historic Dictionary
Digitization & Vector DB

Converting historic dictionaries into a machine-readable format and building a vector database for semantic search and RAG applications.

Evaluation of Existing
Modi AI Models

Independent evaluation of public datasets and models to identify gaps and the need for larger, curated, word-level datasets.

Maratha History LLM
(Planned)

Fine-tune a domain-specific language model specializing in Maratha history using curated datasets, dictionaries, and historical corpora.

Our High-Level Technical Approach

1 Digitization

Scan Modi manuscripts and printed transliterated Marathi works.

→

2 Segmentation

Use custom JS tools to split Modi and Devanagari text into word-level units.

→

3 Pairing & Annotation

Pair Modi script words with modern Marathi words and add meanings & historical tags.

→

4 Vectorization

Convert dictionary and word entries into vectors and store in a vector database.

→

5 Model Training & RAG

Use the datasets for transliteration, translation models and retrieval-augmented generation.

Our Vision

Our long-term vision is to create reliable AI tools that can read, understand, and explain historic Indian documents — not just at the surface level of text, but with awareness of period-specific language, idioms, and historical context. We want historians, archivists, students, and citizens to be able to access centuries of material that is today locked away in scripts very few people can read.

Join us in building the foundation for AI that understands India’s past.

Together, we can preserve heritage and empower the future.

Collaborate with Us →

Decoding India’sHistoric Scripts with AI