Giving Feedback on Interactive Student Programs with Meta-Exploration
Giving Feedback on Interactive Student Programs
with Meta-Exploration

Evan Z. Liu*
Moritz Stephan*
Allen Nie
Chris Piech

Emma Brunskill
Chelsea Finn

Neural Information Processing Systems (NeurIPS), 2022
(Selected as Oral)

Abstract. Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science. However, teaching and giving feedback on such software is time-consuming — standard approaches require instructors to manually grade student-implemented interactive programs. As a result, online platforms that serve millions, like, are unable to provide any feedback on assignments for implementing interactive programs, which critically hinders students’ ability to learn. One approach toward automatic grading is to learn an agent that interacts with a student’s program and explores states indicative of errors via reinforcement learning. However, existing work on this approach only provides binary feedback of whether a program is correct or not, while students require finer-grained feedback on the specific errors in their programs to understand their mistakes. In this work, we show that exploring to discover errors can be cast as a meta-exploration problem. This enables us to construct a principled objective for discovering errors and an algorithm for optimizing this objective, which provides fine-grained feedback. We evaluate our approach on a set of over 700K real anonymized student programs from a interactive assignment. Our approach provides feedback with 94.3% accuracy, improving over existing approaches by 17.7% and coming within 1.5% of human-level accuracy.

Visualizations of DreamGrader


Scoring a goal.

Hitting the wall.

Deliberately missing the ball.

We visualize some of the exploration behaviors that DreamGrader learns on Bounce above. Specifically, DreamGrader learns to test what happens when it hits the ball into the goal, when it hits the ball into the wall, and when it deliberately misses the ball. The center example also shows a case where there are multiple balls on the screen at the same time, which can make it challenging to find other errors.


Above shows the the exploration DreamGrader learns for the commmon "skewer paddle" error type in Breakout. DreamGrader learns to deliberately hit the ball from the side, which causes the ball to become skewered on the paddle, exposing that the student program indeed has this error.


This website is adapted from this website, which was in turn adapted from these websites. Feel free to use this website as a template for your own projects by referencing this!

Website under construction.