Manuel F. Castillo
December 12, 2011


In this project, two agents are used in a partially-observable maze to update Q-values as they explore a maze eventually finding the exit.

Concepts Demonstrated

  • Q-learning is used to update values of each cell traversed with respect to the goal.


I am sure the use of 2 or more agents in a maze has been used before, but I have not seen it. I see the project as more of a learning experiment to come up with something innovative at least to myself. With that said, the program uses two agents that collaborate to update what they know with respect to getting to the goal.

Evaluation of Results

I feel the results cannot be fully evaluated because I have not been able to complete the program entirely. This should be updated soon to reflect results of a finished project.

Additional Remarks

Note: This is not the finished product.