Your AI Startup is a Demo Without Rigorous EVALS | Ameya B, CTO Braintrust
What if the real bottleneck in AI is not the model, but how rigorously you test it?After nearly two decades of building AI systems at Microsoft, Facebook, and Dropbox, Ameya Bhatawdekar is now Field CTO at Braintrust, the AI observability platform used by Airtable, Notion, Stripe, Dropbox, Vercel, Cloudflare, Lovable, and Replit.
We discuss a shift that most teams underestimate. The winners in AI are not just shipping faster. They are building systems that behave predictably, improve continuously, and earn user trust over time. As traditional monitoring breaks down in a probabilistic world, observability now requires learning how an AI system reasons, not just how it performs. This leads to a new paradigm where agents are no longer just executing tasks, but also analyzing and debugging other agents.
The episode also traces the evolution of machine learning itself. From feature engineering to deep learning to transformers , each leap increased capability and reduced control. Evaluation is now where control sits.
Ameya is clear on one point. Moving fast with weak evaluations feels like velocity, but it compounds into technical debt, unpredictable failures, and ultimately a loss of user trust. The teams that win are the ones that invest early in rigor, especially in understanding context, which is quickly becoming the hardest and most critical layer in AI systems.
If you are a founder or engineer moving beyond the demo phase and trying to build durable, high-quality AI systems, this episode will change how you think about shipping.
0:00 — Trailer
00:55 — What’s Braintrust?
05:01 — What agents are shipping today
07:54 — What evals look like in practice for Notion & Zapier
09:44 — Evals vs Classic monitoring
11:33 — Who is the Field CTO?
16:35 — What goes wrong when agents fail
18:26 — Agents analyzing other agents
24:17 — Evals are existential in vibecoding
25:52 — Ship fast with weak evals or slow with strong evals?
25:41 — What makes enterprises trust an LLM?
29:25 — Do AI startups know how good their product is?
30:23 — 3 ML systems: Microsoft, Dropbox, Meta
36:30 — How the 2017 transformer paper changed everything
38:20 — All algorithms are predicting the next word
43:40 — What LLMs will do in 1 year
-------------
India’s talent has built the world’s tech—now it’s time to lead it.
This mission goes beyond startups. It’s about shifting the center of gravity in global tech to include the brilliance rising from India.
What is Neon Fund?
We invest in seed and early-stage founders from India and the diaspora building world-class Enterprise AI companies. We bring capital, conviction, and a community that’s done it before.
Subscribe for real founder stories, investor perspectives, economist breakdowns, and a behind-the-scenes look at how we’re doing it all at Neon.
-------------
Check us out on:
Website: https://neon.fund/
Instagram: https://www.instagram.com/theneonshoww/
LinkedIn: https://www.linkedin.com/company/beneon/
Twitter: https://x.com/TheNeonShoww
Connect with Siddhartha on:
LinkedIn: https://www.linkedin.com/in/siddharthaahluwalia/
Twitter: https://x.com/siddharthaa7
-------------
This video is for informational purposes only. The views expressed are those of the individuals quoted and do not constitute professional advice. Receive SMS online on sms24.me
TubeReader video aggregator is a website that collects and organizes online videos from the YouTube source. Video aggregation is done for different purposes, and TubeReader take different approaches to achieve their purpose.
Our try to collect videos of high quality or interest for visitors to view; the collection may be made by editors or may be based on community votes.
Another method is to base the collection on those videos most viewed, either at the aggregator site or at various popular video hosting sites.
TubeReader site exists to allow users to collect their own sets of videos, for personal use as well as for browsing and viewing by others; TubeReader can develop online communities around video sharing.
Our site allow users to create a personalized video playlist, for personal use as well as for browsing and viewing by others.
@YouTubeReaderBot allows you to subscribe to Youtube channels.
By using @YouTubeReaderBot Bot you agree with YouTube Terms of Service.
Use the @YouTubeReaderBot telegram bot to be the first to be notified when new videos are released on your favorite channels.
Look for new videos or channels and share them with your friends.
You can start using our bot from this video, subscribe now to Your AI Startup is a Demo Without Rigorous EVALS | Ameya B, CTO Braintrust
What is YouTube?
YouTube is a free video sharing website that makes it easy to watch online videos. You can even create and upload your own videos to share with others. Originally created in 2005, YouTube is now one of the most popular sites on the Web, with visitors watching around 6 billion hours of video every month.