3 Comments
User's avatar
Pucheng Yang's avatar

Thanks for your sharing.

A couple of thoughts:

- cost might be a concern if AI agents can launch a large cluster for experimentation

- if cost is a concern, then how to conduct experiment with a smaller sample will be important

Zheng Shao's avatar

That's a very good point! Infra engineers have a special skill in minimizing the size of a cluster to reproduce a bug. Something that the AI agents need to learn as well.

Pucheng Yang's avatar

What you said is a good point as well! When troubleshooting my client's issue, I always ask client to share a minimum reproducible job for me to do remote debug. Maybe that could be part of the AI agent actions as well.