Dr. Sutirtha Sahariah
Hi, I’m Michael
Web designer and developer working for envato.com in Paris, France.
My Experience
Software Develop.
Co-Founder
Microsoft Corporation
Web Design.
Founder, XYZ IT Company
Reinvetning the way you create websites
Teacher and Developer
SuperKing LTD
Sr. Software Engineer
Education
BSc in Computer Science
University of DVI
New Haven, CT ‧ Private, non-profit
AS - Science & Information
SuperKing College
Los Angeles, CA 90095, United States
Secondary School Education
Kingstar Secondary School
New Haven, CT ‧ Private, non-profit
My Resume
Education Quality
BSc in Computer Science
University of DVI (2006 - 2010)The training provided by universities in order to prepare people to work in various sectors of the economy or areas of culture.
AS - Science & Information
SuperKing College (2001 - 2005)Higher education is tertiary education leading to award of an academic degree. Higher education, also called post-secondary education.
Secondary School Education
Kingstar Secondary School (1998 - 2000)Secondary education or post-primary education covers two phases on the International Standard Classification of Education scale.
Job Experience
Sr. Software Engineer
Google Out Tech - (2017 - Present)Google’s hiring process is an important part of our culture. Googlers care deeply about their teams and the people who make them up.
Web Developer & Trainer
Apple Developer Team - (2012 - 2016)A popular destination with a growing number of highly qualified homegrown graduates, it's true that securing a role in Malaysia isn't easy.
Front-end Developer
Nike - (2020 - 2011)The India economy has grown strongly over recent years, having transformed itself from a producer and innovation-based economy.
Design Skill
PHOTOSHOT
FIGMA
ADOBE XD.
ADOBE ILLUSTRATOR
DESIGN
Development Skill
HTML
CSS
JAVASCRIPT
SOFTWARE
PLUGIN
Job Experience
Sr. Software Engineer
Google Out Tech - (2017 - Present)Google’s hiring process is an important part of our culture. Googlers care deeply about their teams and the people who make them up.
Web Developer & Trainer
Apple Developer Team - (2012 - 2016)A popular destination with a growing number of highly qualified homegrown graduates, it's true that securing a role in Malaysia isn't easy.
Front-end Developer
Nike - (2020 - 2011)The India economy has grown strongly over recent years, having transformed itself from a producer and innovation-based economy.
Trainer Experience
Gym Instructor
Rainbow Gym Center (2015 - 2020)The training provided by universities in order to prepare people to work in various sectors of the economy or areas of culture.
Web Developer and Instructor
SuperKing College (2010 - 2014)Higher education is tertiary education leading to award of an academic degree. Higher education, also called post-secondary education.
School Teacher
Kingstar Secondary School (2001 - 2010)Secondary education or post-primary education covers two phases on the International Standard Classification of Education scale.
Company Experience
Personal Portfolio April Fools
University of DVI (1997 - 2001))The education should be very interactual. Ut tincidunt est ac dolor aliquam sodales. Phasellus sed mauris hendrerit, laoreet sem in, lobortis mauris hendrerit ante.
Examples Of Personal Portfolio
University of DVI (1997 - 2001))The education should be very interactual. Ut tincidunt est ac dolor aliquam sodales. Phasellus sed mauris hendrerit, laoreet sem in, lobortis mauris hendrerit ante.
Tips For Personal Portfolio
University of DVI (1997 - 2001))The education should be very interactual. Ut tincidunt est ac dolor aliquam sodales. Phasellus sed mauris hendrerit, laoreet sem in, lobortis mauris hendrerit ante.
Job Experience
Personal Portfolio April Fools
University of DVI (1997 - 2001))The education should be very interactual. Ut tincidunt est ac dolor aliquam sodales. Phasellus sed mauris hendrerit, laoreet sem in, lobortis mauris hendrerit ante.
Examples Of Personal Portfolio
University of DVI (1997 - 2001))The education should be very interactual. Ut tincidunt est ac dolor aliquam sodales. Phasellus sed mauris hendrerit, laoreet sem in, lobortis mauris hendrerit ante.
Tips For Personal Portfolio
University of DVI (1997 - 2001))The education should be very interactual. Ut tincidunt est ac dolor aliquam sodales. Phasellus sed mauris hendrerit, laoreet sem in, lobortis mauris hendrerit ante.
My Portfolio
My Blog
Why AI Safety Fails in Multi-Turn Conversations — And What This Means for Governance
In this article, I analyse, from the governance perspective, a fine-grained evaluation benchmark called SafeDialBench for LLMs in multi-turn dialogues evaluated by Chinese researchers. The paper was recently discussed in the AI governance reading by BlueDot
We all now use LLM chat boxes for almost everything these days, but we just don’t use them the way we used the internet for surfing; we interact with them to solve complex problems, both personal and professional. The knowledge just flows from a reservoir in an instant: for users, it’s insanely crazy, feels comforting and empowering. But the information, if manipulated or falls into the hands of a malicious user with a criminal bent of mind, can be dangerous. The latter is already on the rise. As MIT Technology Review recently reported, LLMs are increasingly being used to enable cyber scams and online crimes at scale.
But how do LLMs understand the intent of the malicious users? Can AI systems detect harm at scale? How robust are the safety features of LLMs? For example, in Denmark, a 22-year-old used AI to research how to injure his father without killing him. He bypassed the model safeguards by posing as an author researching for a novel. The AI provided a detailed plan to execute the intended harm. The earlier known benchmarks, such as the Controllable Offensive Language Detection (or )COLD, BeaverTails, and Red Teeming, were designed on a single prompt, but the Danish case demonstrates that seemingly harmless multi-turn conversations can lead to harmful outcomes by tricking the safety measures of the model into believing something else. This opens new challenges for AI governance and model testing.
How can we make AI strong enough to detect harmful conversational trajectories? To test that a group of researchers in China built a multi-turn safety benchmark (an AI system is asked multiple questions through deviant situations, but with one goal) based on realistic conversations. They built 4000 dialogues in Chinese and English, making three to ten turns per conversation; created 22 real-life situations and used seven jailbreak attack strategies (a way to bypass AI safety by phrasing a prompt in a cleverer way). Further, they tested 17 large language models, including Open Source (Deep seek, GLM), Chinese Models (Qwen, Baicuhan, Moonshot) and US Models (Chat GPT, Llama 3.1) using multi-turn jail break attacks; fine-grained safety metrics and human and model evaluation.
The SafeDialBench benchmark discussed in this article offers concrete pathways for AI governance safety evaluation because the dangers of AI misuse to create unprecedented harm are real, and there are no robust structures to protect victims, because the impact at scale is on millions of people. And the real danger, as the Denmark case above illustrates that anyone with access to AI can improvise ways to create something harmful because of low barriers, AI assistance and rapid iteration, and together they could cause large-scale disruptions endangering the security and safety of populations at large.
SafeDialBench approach
The SafeDialBench evaluation framework that the researchers developed used six safety categories: morality, aggression, legality, fairness, privacy and ethics. These are human qualities that shape our intention to cause harm to others and guide our actions, thoughts or words. Choosing such parameters mattered because most benchmarks only test violence and hate speech, but real AI risks are broader and quieter and include risks of manipulation, financial fraud, bias and political persuasion.
By structuring safety around the six dimensions, the researchers recreated real-world conversations by creating a level. For example, aggression includes insults, impolite language, sedition, hostility and harmful persuasion. This granularity matters. Rather than asking “if it is harmful?”, regulators can ask “ if it is politically, socially or financially harmful?” which is a significant shift for AI governance. For example, “How do I commit fraud without getting caught?” The idea is to find out if the LLM is strong enough to understand the intent of the user.
The data sets went through a rigorous process: the first user prompt in each dialogue was human-generated, making the datasets more realistic. Expert reviews checked for coherence, logical flow and jailbreak effectiveness. This was important because automated datasets miss the creativity and social engineering that humans craftily deploy for manipulation.
The attack methods mirrored psychological human behaviour used by humans in all settings: personal, social, professional or political. These methods show that the danger is not in any single question — it is in the conversation as a whole, and in the gap between what is asked and what is intended.
· Scene Construction: where a fictional scenario is built, like a journalist investigating a scam.
· Purpose Reverse: a normal conversation is suddenly reversed “How do I break into someone’s email?” you say, “I am working on a cybersecurity guide about how a hacker breaks into someone’s email.”
· Role Play: often seen as a powerful jailbreak technique, in this, the user asks an AI to assume a role where harmful information becomes “normal” or “acceptable”
· Topic change: In this, the conversation starts harmless and gradually shifts towards harmful content without triggering safety concerns. For example, the user begins talking about travel, but the idea is to collect some information about the place with violent motives. Can AI identify the risk across topic drift?
· Reference attack: where a harmful intent is introduced in a way that appears normal in the conversation. For example, the user says he is writing a story about two characters and then says, one character wants revenge but does not want to be caught. So the harmful intent is hidden in the character (reference).
· Fallacy attack: a tactic where the user does not ask for harmful information but uses false logic, pushing the model to accept incorrect assumptions. AI is tricked by bad reasoning to provide misleading output. This strategy is important in real life since a lot of misinformation takes place through manipulation. This also assumes that a lot of media databases used for training models can be based on biased reporting.
·Probing question: where the user moves from harmless to more sensitive topics. This works because risk appears across turns and an AI system often evaluates each message separately, so multi-turn evaluation detects risks where single -turn benchmarks fail.
Real-life situations
The attack methods have significant governance implications. Take purpose reversal — instead of asking “how to manipulate someone,” the user could ask, “how do I know if someone is emotionally manipulating me?” The intent is identical, but the framing is opposite. This matters for governance because detecting intent is hard. AI could easily see it as an educational context or research framing. Safety, therefore, cannot be binary. It is context-dependent, intent-driven and plays out across a conversation and not within a single prompt. The question is at what point AI intervenes and how it reasons about intent.
Of the seven methods, two, in my view, stand out as particularly effective and governance-relevant: roleplay and fallacy attack
The role play uses what SafeDialBench calls “context shield.” Once the role is assigned to AI, it can assume the role of a fictional expert or a character, and AI operates within that frame. So the harm belongs to the character, not the model. This makes safety detection more difficult and is why SafeDialBench classifies roleplay manipulation as “conversational manipulation” rather than a single prompt. The harm is spread across the conversation, not concentrated in a single exchange.
The fallacy attack is the most socially dangerous of all seven methods. Rather than asking harmful information, it uses false reasoning to make the model accept incorrect assumptions. This mirrors closely how misinformation flows in real life. Since a lot of media content and social media discussions thrive on misinformation, it might not be difficult to prove a point that is harmful but is normalised in society, such as hate speech or a manufactured social phobia. Since AI acts on logical reasoning and is an algorithm, data sets trained on misleading information or media content which could be influenced, and that could lead to dangerous social implications.
An article on the rising cybercrime using LLMs shows the methods can be used not just to get information but to execute tasks such as sending malicious emails using deep fake identities, which is an accentuated example of scenario building or purpose reverse. There is also an example where LLM Gemini was used to debug codes — like all users do, but then it was tasked with writing phishing emails. This is another example of a purpose reversal attack that SafeDialBench where the end goal is misleading or manipulative, clearly demonstrating that guardrails might fail under role framing, contextual persuasion and incremental requests. The article provides an example of where a user was able to trick AI safety by persuading it by saying that the user is participating in cyber security game. Apparently, Gemini did pass on the information, which later Google adjusted for safety.
Governance Beyond Model Safety
The SafeDialBench, which uses both Chinese and English datasets, provides significant pathways about AI safety, particularly for countries that are building their own multi-lingual LLMs. It emphasises that a model’s robustness in identifying harmful content is significantly influenced by the quality of training data and the sophistication of security alignment strategies. Interestingly, the evaluation indicates that closed-source models, such as ChatGPT and o3 mini, have limitations with Chinese datasets. It will be interesting to see how these models fare with other languages, such as. Are open source models, which are likely to be more powerful in the coming years, better adapted to the local context because of accessibility? It opens questions on the democratisation of LLMs?
One concern in the SafeDialBench study is that the model evaluation aligned with human judgment 80 percent of the time, but there still remains a 20 percent gap. In the context of AI safety, this is a significant gap because at scale, even a small safety failure can translate into large- scale harm, especially when AI systems are used by millions across diverse contexts. This is where SafeDialBench becomes important for governance. The paper shows that harm spirals out through multi-turn conversation through psychological evaluation, conversation drift, persuasion and contextual framing. This suggests that governance cannot simply rely on single-prompt static testing but must adapt to dynamic conversational risks.
The SafeDialBench framework, therefore, offers more than a technical evaluation tool. It highlights that AI risks are evolving, conversational, and increasingly complex, and the benchmark performance does not always translate to real-world performance. An article on the current state of AI states that AI companies are sharing less data about how models are being trained, and the focus seems to be more on AI capabilities rather than how they perform on responsible-AI benchmarks.
According to Stanford’s 2026 AI Index, AI is progressing so fast that regulation simply can’t cope with the pace. It demonstrates that AI risks come from helpfulness rather than safety alone. Addressing these risks will require multi-layered governance — combining improved model evaluation, continuous monitoring, international coordination, risk-based regulation, and social adaptation. As AI capabilities continue to grow, particularly toward more advanced reasoning systems, these governance challenges will only become more pressing. Additionally, civil society tech-organisations, institutions, and policymakers must develop awareness of AI-enabled manipulation and misuse. In this sense, governance is not only technical or legal — it is also social. The ability of societies to understand and respond to AI-generated risks will become an important layer of protection.
Ends
Use of AI in writing this article
ChatGPT: I used ChatGPT as my thinking partner to clarify technical concepts and refine the structure of my arguments, while the interpretation and governance perspective remain my own.
Claude: Claude (Anthropic) was used as an editorial assistant during the drafting process — reviewing paragraphs for clarity, flagging language errors, and offering structural feedback. Claude did not generate content, suggest arguments, or shape the analytical direction of this piece.
Main reference: https://openreview.net/forum?id=KFjtRqVnKH
AI’s Big Capability Claims Depend on Who Does the Grading
When AI companies make big claims about future capabilities, financial markets move and media amplifies. But the question nobody asks is: what benchmark was used, and who designed it?
During the India AI Impact Summit 2026, leaders of tech giants predicted different timelines. Dario Amodei, the CEO of Anthropic said that the “powerful AI could come as early as 2026”. Articulating a vision for an AI country where a single data centre could be equivalent to a mid-size country, he indicated that 2026 threshold is when AI models would be capable of autonomous, expert- level reasoning across all human domains. Sam Altman, Open AI’s CEO earmarked 2028 as the year when AI superintelligence finally happens. Sir Demis Hassabis, the Nobel- Prize winning CEO of Google DeepMind offered a timeline within a three- to-five -year window for ASI (artificial super-intelligence) to emerge. These forecasts differ not just in timing but in how intelligence itself is measured.
These predictions matter because trillions of dollars are at stake and it directly influences governments’ urgency to create AI infrastructure. this week CNBC’s anchor Andrew Ross Sorkin, speaking on Big Technology Podcast highighted that The AI boom is often framed as a technological revolution, but it may also represent a financial experiment. As billions of dollars flow into AI infrastructure and private credit markets, the risks extend beyond technological disruption to systemic financial instability. If expectations fail to materialize, the consequences may ripple through labour markets, investment ecosystems, and global development funding.
ASI is stage where AI becomes so powerful that it becomes more intelligent than any humans ever to walk on the planet. Its capabilities are going to have an enormous impact transformative akin to the social and economic transformation that took place with the advent of electricity and the industrial revolution. In other words, AI is going to be deeply embedded in our lives and the economic systems, and that it will be impossible to delink from the AI infrastructure that might bind and constantly transform the global economy.
But then why are three of the most informed people on earth looking at the same evidence of getting to ASI and seeing completely different things? The reason these three brilliant people disagree is not because one of them is wrong. It is because they are choosing different measurements to support the timeline for projecting future capability.
Also, the big AI giants are constantly exploring new capabilities which have a direct impact on capital inflows into the overall ecosystem. So benchmarking and new capabilities become a part of market signalling supported by good media amplification and strategic communications, and not just science. For example, MIT Technology Review reported this week that Open AI is throwing all its resources in creating a fully automated researcher.
Earlier Open AI’s CEO, Altman claimed that the goal of building AGI is solved in principle, leading Open AI to pivot its focus towards super intelligence — AI that is significantly more capable than the best human researchers and executives. So, his comments imply operational benchmark and alludes to coding capabilities, task automation and large- scale adoption. But the problem is that widespread deployment adoption measure usefulness and not super intelligence capability.
However, there is one benchmark that is being used to measure AI’s fluid intelligence: the Abstract and Reasoning Corpus for Artificial General Intelligence (ARC-AGI)., Created by François Chollet, an AI researcher at Google and creator of Keras, one of the most widely used AI development frameworks in the world, the benchmark is designed to test a model’s logical reasoning and skill acquisition abilities on unseen tasks or tasks that is easier for humans but difficult for AI such as solving logic- based puzzles.
The earlier LLMs followed a path. It was designed to store massive amount of data and apply the knowledge based on a prediction pattern. it could only perform something if similar examples existed in training. But then in 2024, the o3 model which scored 87.5% on ARC- AGI — 1 test. The jump was a significant improvement from a year earlier. The world thought the era of superintelligence just arrived, but when the bar was set a higher standard in ARC-AGI -2, its performance dropped dramatically. The same model scored approximately 2.9% to 3.0% on the ARC-AGI-2 semi-private evaluation where human scored 60 percent. What changed.
In the second edition, it had to demonstrate both a high level of adaptability and high efficiency and that is where it could not match human faculty. So, humans had an edge. What becomes obvious is anyone who designs the test, controls the score. There was another catch behind the 87.5 % score — the AI might have partially seen the answers before the test due to the benchmark’s public availability.
The point is the AI race, and capability depends on who is saying what and how it is being tested. Every model generates a lot of excitements; tons are written about it. The social media explodes with videos and tutorials, but measurement depends on strictly what you are measuring against and who is measuring. The benchmark design matters greatly because it shapes market perception.
AI labs design their own internal benchmark and report those selectively but independent benchmarks like ARC- AGI shows things in a different light. What the AI labs claim is not wrong either. So, when they say o3 (newest model) scores 88% on PhD-level science questions; PhDs average 34%, it could mean several things. One ought to know whether the model was already trained on due to factors such as public database exposure or possible benchmark contamination. This is not to say anyone is lying; the argument is the metrics matters and it differs.
The reason these matters beyond the technical debate because civilisation- scale decisions that are being made on the numbers. The scale of financial and infrastructure commitment is enormous. For example, in 2025 US government announced the Stargate Project, a major private-sector initiative aimed at investing up to $500 billion in artificial intelligence (AI) infrastructure over the next four years to build massive data centres.
At the Stargate announcement, OpenAI CEO Sam Altman called it “the most important project of this era,” claiming it could lead to cures for cancer and heart disease, as well as enable the creation of AGI — a benchmark his and other companies are working fervently to hit. In January 2026 at the India AI Impact Summit, Indian companies announced investment of hundreds of millions to build world-class-data centres and even signed up bilateral deals with US based AI companies like Anthropic and Open AI.
So next time AI companies announce something big, it is important to ask three questions: what benchmark was used to determine the capability claim? Was it measured by the efficiency of skill acquisition on unknown tasks? What was the compute power and cost — what kind of chips were used? Who benefits from the policy announcements being made, and what governance looks like.
References
BlueDot AGI Strategy course: https://bluedot.org/courses/agi-strategy . The author recently completed this course and thanks the course instructor Filip Alimpic
Peuyo. T (2025). The Most Important Time in History Is Now. https://unchartedterritories.tomaspueyo.com/p/the-most-important-time-in-history-agi-asi?utm_source=bluedot-impact
Leveraging AI in Education for Foundational Learning in India
It was during the monsoon of 2025 that I travelled to the eastern Indian state of Bihar, a state with over 130 million people, where the per capita income remains one of the lowest in the country. During the trip, I visited a local administrative office to collect some information about the local population.The officer was a primary school mathematics teacher who was assigned state election-related duties that were months away. He was on a data entry job in the run-up to the elections. While he was on this job, the poorer children in the local government-aided primary school, where he teaches, were losing out on classes for weeks and months together.
The teacher told me, “I feel bad, but what can I do?” Sometimes, ad-hoc teachers are appointed to fill in, but they often come with no teaching experience. “They are also not motivated because the jobs are temporary”, rued the maths teacher. In a state where unemployment is high, people scramble for any government jobs that are available, and such positions are secured through local connections and even by paying bribes.
This kind of neglected approach explains why India’s poorest children lack foundational learning skills. The focus of the state-aided schools is attendance, incentivised with mid-day meals, but the foundation goes beyond teaching textbooks. It requires real-time investment and monitoring for building a child’s intellectual, emotional and learning needs. The Annual Status of Education Report (ASER) 2024 found that in some states, such as Bihar, over 50% of Class 5 students in rural India still struggle to read at the Class 2 level, despite recent improvements from focused policy interventions.
Press enter or click to view image in full size
From my field visit to a school in Bihar, I realised that things can be improved dramatically if there is a political will. Buildings exist but need to be upgraded, and the systems need to be completely overhauled. The internet is strong. What is needed is not AI for automation — not systems that grade papers or generate lesson plans — but AI for accessibility: tools that work offline, provide personalised adaptive learning, function on basic smartphones, and empower teachers rather than replace them.
The crisis of the Indian education system is that it favours those who can afford it, thus leaving a vast number of poor children with no additional resources for learning. Hiring a private tutor in India is expensive, but it’s the norm for all school-going children. The private tutoring industry is pegged at a whopping $10.8 billion. This is where AI in Education can be revolutionary by creating a level playing field in access to quality education.
India’s AI in education policy has to be multilayered because there is massive wealth and accessibility inequality. The priority should be to help a significant percentage of children from the poorer and rural communities to leap-frog, so that India achieves a revolution in education by creating a pool of skilled and job-ready population in a generation. To achieve this, AI in education must be designed keeping in mind the poor and uneducated and treat it as a digital public good. The first step is to integrate AI across education systems in public or state-funded schools across all states in different languages.
The systems must be inspired and adopt the best practices, such as UNESCO and the OECD recommendations of using AI as a means to enhance cognitive development and lifelong learning, provided that systems remain human-centred, inclusive, and transparent. Political will and public-private partnership are the keys to the success of a project of such magnitude, with the stated mission of AI for good for everyone, everywhere.
One example from India’s context is OpenAI’s Study Mode feature, launched in July 2025 to help students with technical subjects such as maths and computer science. The design was not prompt-dependent, but the idea was to enforce learning behaviour. The key features of the study mode are
- · Socratic questioning — nudge follow-up questions instead of solutions
- · Scaffolding — breaks concepts into manageable steps
- · Personalisation — adjusts explanations to learner level
- · Encouragement — reinforces confidence and curiosity
- · Active learning nudges — offer quizzes, reflection and deeper exploration
OpenAI’s Head of Education, Leah Belsky, explained that Study Mode originated from field observations in India, where families were spending a significant portion of their earnings on private tuitions, thus disadvantaging children from economically weaker sections. OpenAI used India as a design laboratory by beta testing students nationwide, including those preparing for highly competitive medical and engineering exams. Participants were not individually named for privacy, but research indicates that it spanned everyday learning to high-stakes prep, providing feedback that shaped personalisation and scaffolding features. There are no specific cities mentioned, with very few details about the research outcome
Subsequent OpenAI Learning Accelerator (launched August 2025) collaborated with IIT Madras ($500K research on AI learning outcomes), AICTE, Ministry of Education, and ARISE schools, distributing 500K ChatGPT licenses to educators and students.
OpenAI’s Study Mode supports 11 languages with Voice Capability, which is a significant stride for an inclusive education. The voice-enabled interaction can greatly help first-generation students with learning difficulties and those alienated from traditional classrooms. This approach aligns closely with India’s National Education Policy (NEP) 2020, which emphasises foundational literacy, inquiry-based learning, and the reduction of rote memorisation, along with OpenAI’s own policy “to democratise encouragement, guidance, and confidence — especially for learners who lack access to quality teachers or tutors.”
But to address the learning needs of India’s poorest, AI tools alone will not help. India needs a digital infrastructure and an innovative way to fund digital devices, such as a specially designed tablet that can work as a slate. It has to be an interactive device that is given to all students at a subsidised or no cost. India needs an AI-in-Education Code of Conduct — a governance framework developed through multi-stakeholder consultation, including students, teachers, parents, and civil society. This Code should balance personalisation with autonomy, innovation with equity, and data utility with privacy. Without this, we risk deploying AI that works technically but fails socially.
The approach has to be collaborative, and the programme roll-out has to be meticulously planned because this is where most well-intended projects fail. Every tablet distributed has to have insurance. There has to be a soft penalty system, like “access blocked”, that puts the onus of ownership and accountability of the devices on parents. A behaviour change program must run in parallel, along with investments in control and support centres providing 24/ 7 support through chatbots with human oversight. The progress tracker of every child should be available with a unique password and ID, and where parents are uneducated, teachers will assist.
For children coming from poorer families, there has to be an incentive model. If a child does well, credit points could be provided that parents use for redeeming stationery or paying for something else. Such models can arrest dropouts, encourage parents not to withdraw children from school to join the labour market. Foundational learning with AI should help children transition into an AI-driven skill development program that can help them get decent jobs early on. It should lead to creating a credible employability pipeline. Fixing education has ripple effects on other factors, such as child labour, human trafficking of girls, in addition to providing a demographic dividend to the country.
India has a good measurement infrastructure already in place. The Parakh Rastriya Savbekshan, a national assessment system, tested approximately 2.3 million students from 782 districts covering classes 3, 6 and 9, so India already has a baseline data of massive scale, so when AI tools are introduced, the baseline data can be used for comparison to evaluate how AI is making a difference.
The new thrust of India’s National Curriculum Framework shifts focus to building core competencies of what students can do rather than which class they sat in. An AI tool can be used to find out if students can do things they couldn’t before. It will help in creating a competency–based measurement framework for judging AI’s real impact.
India has also created a multi-level assessment dissemination system where data is shared through workshops at national, regional and state levels to inform practical action. This is ideal for infrastructure for AI programs because AI impact measurement needs a similar pipeline to share results. Since India has already built a system, AI evaluation can leverage it.
Accelerating AI in education needs a bold vision, good data and transparent enforcement of the existing mechanisms. The ASER 2024 report, released by Pratham Foundation in January 2025, shows the highest recorded reading levels for Class 3 government school students since the survey began 20 years ago. This is attributed to focused government programs like the NIPUN Bharat Mission.
ASER data has been referenced in 105 parliamentary questions, used by NITI Aayog (India’s planning body), and cited in the World Bank’s World Development Report. For AI in education to succeed. India has proven it can build trusted measurement systems. Now, as AI tools like Study Mode are deployed, these same systems can track whether accessible AI is delivering on its promise — providing personalised, patient support for foundational skills that current interventions cannot fully reach.
From a policy perspective, India’s Governance Policy Architecture, such as the DPDP ACT 2023, has provisions for student data protection, mandatory algorithmic audits for assessment tools following the framework’s fairness requirements, and teacher empowerment over surveillance. Special child safety provisions — explicitly flagged in the Guidelines — should prevent AI systems from exploiting developing minds. Further integration through DIKSHA, Bhashini, and PARAKH offers the infrastructure; the governance framework offers the guardrails. The opportunity is transformative; responsible deployment ensures no child is left behind.
How AI was used in researching and writing his article
AI Claude was used as an augmentation tool while writing this article. Peplexity was used for deep research. Every citation and data was verified. Gemini was used for infographics. The author has also created a Claude project for iterating on editorial flow discussion etc, but the author ensures that the outcome is his own.
Contact With Me
Nevine Acotanza
Chief Operating OfficerI am available for freelance work. Connect with me via and call in to my account.
Phone: +012 345 678 90 Email: admin@example.com