Simran Arora: Responsible AI and Governance in the Product Lifecycle

Cathy Campo
Apr 26
5 min read

By: Kevin Shi, Staff Writer & KTech Co-President

“AI product success doesn't come from having the best model. It comes from defining the right problem, building the right system, and continuously holding it accountable in the real world.”

AI has slowly taken hold of everything that used to be time consuming. You can draft emails, summarize me

etings, plan trips, and even get advice about your personal health. All with enough confidence to probably ruin all of those things too. We have witnessed incredible speed, scale, and efficiency, but what AI hasn’t promised is judgment or safety.

This tension was the center of a recent talk hosted by KTech with Simran Arora who broke down AI not as a silver bullet that can be shoehorned everywhere, but as a component of the product lifecycle shaped by human judgment at every stage. Arora, a Brown University graduate who has worked in Product Management at Microsoft and now at Meta’s Instagram, brought both enterprise and consumer-facing experience to the conversation. At Microsoft, she worked on Azure AI Studio in the B2B machine learning enterprise space. At Instagram, she moved into a consumer AI role tied to recommendation systems where the stakes are less about selling custom chatbots and more about shaping what billions of people see every day.

Before diving into her talk for an eager crowd of more than 70 students, Arora opened with a practical question: favorite AI tools. Her answer, like many in the room, was Anthropic’s Claude, particularly because she can deploy different agents for different jobs. One agent helps with deep research, another with writing, and another exists mainly to challenge her thinking and copy—essentially a built-in critic to pressure test her output. The larger point was not just tool preference, but the distinction between efficiency and effectiveness. AI absolutely creates things faster; that does not mean those things are better.

From there, Arora laid out the AI product lifecycle in five parts: problem definition, data strategy, model development and integration, evaluation, and deployment and monitoring. It’s a framework that sounds obvious but her vision was refreshingly clear. If teams want responsible AI, they can’t bolt it on at the end and expect success; it must be embedded from the start.

The first step—problem definition—was most basic, yet somehow, one of the easiest for teams to skip. “Some problems don’t need AI and that is okay,” Arora said. She pushed the audience to be specific and frame what exact problem we will be solving, why now, who the primary user is, and what kind of failure would be unacceptable. In other words: before announcing that AI will revolutionize your industry and ways of working, maybe first decide which emails should actually be answered by AI, which should not, and whether anyone wants it in the first place.

*Simran speaks to a crowd of 70 engaged Kellogg students*

Then came data strategy, where Arora made one of the talk’s strongest points: “Dataset is the product specification.” Many product managers, she noted, think the model is the product. In reality, data often matters just as much, if not more. If a dataset underrepresents certain users, the product will underperform for them. This is not an ML problem; it is a product decision with social consequences. When opening up the floor to discuss what would be the best data to train a model, the familiar consultant tagline of “it depends” reared its head. There truly is no one size fits all. More precise data in one language may be useful in one setting; noisier, more regionally diverse data may be better in another. Pretending there is a universal answer is shortsighted and once again, a product decision that requires judgment.

In model development and product integration, Arora turned to the question of tradeoffs. “PMs own tradeoffs,” she said, even if they are not the ones coding the models. That means deciding whether recall matters more than precision, whether latency matters more than accuracy, and, as she put it, “how imperfect is acceptable.” These are often framed as technical questions, but Arora argued they are product sense and judgment questions disguised as ML ones. A spam system, for instance, might tolerate some junk getting through if the alternative is burying an important email. A medical tool would make very different choices. “Even if a model is 95% accurate it still fails 5% of the time,” she noted. Users, of course, do not always get hurt when they interact with “95% accurate.” But is that remaining 5% okay?

That’s why product integration matters. “Users don’t talk to the model; they talk to the interface,” Arora explained. Trust is often built or destroyed at the UX layer. Can users override a suggestion and prompt inject your tool? Are low-confidence outputs flagged? Do failures feel graceful, or do they feel like polished nonsense delivered with unsettling confidence? In high-stakes settings, that design work becomes the difference between useful automation and polished chaos.

Arora’s section on evaluation pushed this even further. “AI systems don’t fail loudly; they fail quietly, and we often don’t know that they are failing,” she said. The example she gave was an AI loan approval system that showed strong overall accuracy but denied Black and Hispanic applicants at higher rates even when creditworthiness was similar. The issue was rooted in historical lending data. “92% accuracy doesn’t mean for everyone; it just means 92% overall.” Evaluation, in her framing, has to go beyond accuracy and include fairness, safety, robustness, transparency, privacy, and security. Ask yourself, if I made the same judgment would others think that is okay?

By the time Arora reached deployment and monitoring, the central message had become hard to miss: launch is no longer the finish line as it may have been for other software products. “Shipping is just the beginning,” she said. AI systems drift, user behavior changes, models will behave differently in production than they did in testing. That is why gradual rollouts, canary releases, A/B tests, feature flags, and constant monitoring matter. Traditional software may become stable after release. AI keeps moving.

She ended with a case study on an AI travel planning assistant, and it captured the talk’s broader logic. If a system over-indexes toward expensive hotels, trains mostly on positive outcomes, and encourages users to trust itineraries as “final truth,” then hallucinations become financial risks to users. The obvious product temptation would be to automate more. Arora’s recommendation went the other way: why do we need an autonomous planner? Would a recommendation-only version with strict guardrails be something people actually use and want more? In other words, just because AI acts like it knows what it is doing does not mean it should be handed your credit card to book your fourth trip to Japan.

The most grounded line of the talk was also the simplest: “AI product success doesn’t come from having the best model.” It comes from defining the right problem, building the right system, and continuously holding it accountable in the real world. At a moment when the culture of AI still rewards speed and a new shiny tool comes along every hour, it was a useful reminder. The best AI products are not the ones that sound the smartest; they are the ones designed to deserve your trust.

Simran Arora: Responsible AI and Governance in the Product Lifecycle

Recent Posts

Comments