Harvard DSR
AI/ML
Teaching AI to Test Itself in the Real World
Right now we evaluate AI like we'd judge a swimmer in a lab pool. This work asks: what happens when we actually run field tests, watching how the system performs when the water temperature, currents, and rules all change mid-swim?
This means AI deployed in hospitals, courts, or schools can be held accountable to evidence in ways labs alone can't guarantee.
Bug reported: No