# LectureBlog

Home Assignments Lecture Blog Resources Project Discussion Group

Lectures are recorded:

http://echo360.uml.edu/martin201617/artificialintelligence.html

**Meeting 36, Fri Dec 2**

- FPDemo assignment
- MDP problem, Q1 from Spring 2012 CS188 midterm 2
- Search and representation problem, Q2 from Spring 2012 CS188 midterm 1

**Meeting 35, Wed Nov 30**

- fill out project title form: https://goo.gl/forms/sATywEtZPRhYBtin2
- HMM problem, pp. 18–19, Fall 2011 CS188 final exam
- project work time

**Meeting 34, Mon Nov 28**

- Exam 2 grading histogram

- Final project schedule of deliverables
- FPPaper
- team project work time
- course evaluation

**Meeting 33, Wed Nov 23**

- Prof. Heidi Furey and ethics of AI of autonomous cars

**Meeting 32, Mon Nov 21**

- Prof. Heidi Furey and ethics of AI of autonomous cars

**Meeting 31, Fri Nov 18**

- MDP modeling problem, pp. 11–12, Fall 2011 CS188 final exam
- HMM problem, pp. 18–19, Fall 2011 CS188 final exam
- project work time

**Meeting 30, Wed Nov 16**

- Exam 2

**Meeting 29, Mon Nov 14**

- FPC1 is due Sun Nov 20
- student final project proposal presentations https://groups.google.com/d/msg/uml-ai-fall16/9PdtMyKTM2I/5XSJFnfvBQAJ

**Meeting 28, Wed Nov 9**

- student final project proposal presentations

**Meeting 27, Mon Nov 7**

- student final project proposal presentations

**Meeting 26, Fri Nov 4**

- if you don't have a team, fill out single-partner form by 3p today (or make a team and fill out team form)
- review of formalization of search problem and MDP problem
- including goal test or evaluation function

- FPP has been updated with new requirements for
*literature review*and*references*, and explicit detail of what is expected in*problem analysis*. - in-class exercise: review of MDP paper and of ideas for state-space representation for specific domains.
- group work time: formulate the state space for your project, and report it here.

**Meeting 25, Wed Nov 2**

- going over M&Ms problem
- how to find relevant literature to your problem
- how to properly cite academic work
- group work time

**Meeting 24, Mon Oct 31**

**Meeting 23, Fri Oct 28**

- MDPs and final project brainstorming https://groups.google.com/d/msg/uml-ai-fall16/fVLL2czlq5s/szwQPdWxBgAJ

**Meeting 22, Wed Oct 26**

- recap of inference and HMMs

*from 28:20*

**Meeting 21, Mon Oct 24**

- written Bayes assignment renamed to PS4b
- PS4c on approximate inference and particle filters assigned
- in-class exercise: sensor model
- introduction to particle filtering

- in-class exercise: particle filter

**Meeting 20, Fri Oct 21**

- Bayes’ Theorem

Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Bayes%27_Theorem_MMB_01.jpg#mediaviewer/File:Bayes%27_Theorem_MMB_01.jpg

- concept of HMM and its applicability to robot localization;

*Probabilistic Robotics (2006)*:

- HMMs and robot localization (including robot movement)

- in-class exercise: robot localization
- Berkeley tracking Q1 through Q3

**Meeting 19, Wed Oct 19**

- symbol for conditional independence ⊥ is called “up tack”
- PS4b Bayes Theorem assigned (due Sunday) – this is a written assignment
- in-class exercise: bayes theorem

**Meeting 18, Mon Oct 17**

- debrief on PS3b?
- PS4a Exact inference assigned
- knowitvideos intro to probability (through
*Unit 3 5d Question*)

**Meeting 17, Fri Oct 14**

- in-class exercise: bridge crossing redux
- Q-learning

- epsilon-greedy random exploration
- change ϵ over time?
- other ideas for exploration?
- impact of exploring on final computed utilities?
`python crawler.py`

- Approximate Q-learning (from slide 20 of Berkeley Lecture 11 slides)

**Meeting 16, Wed Oct 12**

**Meeting 15, Tue Oct 11**

- exams and solutions handed back
- discussion of PS3a problems Q2 and Q3
- PS3b assigned; due Sun Oct 16
- what if you don't know the T and R?

- value iteration used Bellman Equation
- now let's adapt it for “temporal difference learning”

**Exam 1 results**

*n*=67

maximum possible 46

high 44; median 33

**Meeting 14, Fri Oct 7**

- one-third way through the semester!
- Exam 1 results will be posted to Bottlenose this afternoon; handed back Tuesday
- We have class on Tuesday
- in-class exercises state-action-pairs-and-value-iteration.pdf
- how it should look when running
- Problems 1 through 3 http://ai.berkeley.edu/reinforcement.html

**Meeting 13, Wed Oct 5**

- Echo360 lecture cap usage poll http://pollev.com/fredm
- things that aren't MDPs: people; relationships; value of history
- MDP is analogous to control theory in engineering:

- are MDPs observable?

- Some real-world applications of MDPs:
- population harvesting
- agriculture
- water resources
- inspection, maintenance, and repair
- purchasing, inventory, and production
- finance and investment; queues
- sales promotion
- search
- motor insurance claims
- overbooking
- epidemics
- credit
- sports
- patient admissions
- location
- design of experiments
- others.

*Journal of the Operational Research Society*,

**44**(11), p. 1073–1096. Attach:MDPApplications3.pdf

- additive vs. multiplicative discounting

- in-class exercises discount.pdf
- demo of MDP game:
`python gridworld.py -m`

- q-values are state-action pairs
- value iteration formula

**Meeting 12, Mon Oct 3**

- PS3a assigned; due Mon Oct 10 (Mon is a holiday)
- next week, we meet Tue Wed and Fri

- Markov Decision Processes (MDPs),
*aka*“sequential decision problems” (Ch 17)- a set of states (with initial state
*s*);_{0} - a set ACTIONS(
*s*) of actions in each state; - a transition model
*P(s'|s,a)*; - a reward function
*R(s)*—sometimes, reward-for-state-transition:*R(s, a, s')*

- a set of states (with initial state
- task is to find a
*policy*π which recommends the action for each state; i.e. π*(s)* - the optimal policy (one which produces the largest reward) is π*
- “A policy represents the agent function explicitly and is therefore a description of a simple reflex agent.”
- in-class exercises: gridworld.pdf

**Meeting 11: Wed Sep 28**

- No office hours tomorrow; Friday
- No posting of solution code policy
- Material that Exam 1 may cover
- see sample exams at http://ai.berkeley.edu/exams.html

- consistency (and an example of an inconsistent one)
- continuing min-max.pdf: do you need to expand all nodes?
- Alpha-beta simulation (similar to Fig 5.5)
- Alpha-beta pruning: similar to minimax, but “prunes away branches that cannot possibly influence the final decision.”
- α — value along best path (highest) choice for MAX
- β — value along best path for MIN

- Let’s go over odd (but explainable) behavior of Pacman with fixed search depths
- Why doesn't it eat the food?
`python pacman.py -p MinimaxAgent -l minimaxClassic -a depth=4`

- Why does it rush the ghost and die quickly if
`depth=3`

, but not if`depth=1`

?`python pacman.py -p MinimaxAgent -l trappedClassic -a depth=3`

- Why doesn't it eat the food?
- expectimax
- in-class exercises: expectimax.pdf
- http://ai.berkeley.edu/multiagent.html#Q4 modeling your opponent as selecting randomly among its choices

**Meeting 10: Mon Sep 26**

- reflections on PS1c
- Exam 1 this Friday; may include everything through Fri Sep 23
- PS2 on multi-agent search assigned; due next Sun
- zero-sum games—do they represent the world?

- min-max.pdf
- Fig 5.3 from book (p. 166) on min-max algorithm
- evaluation functions—in general; for PS2

**Meeting 9: Fri Sep 23**

- more about admissibility—
*NEVER OVERESTIMATE* - astar-questions.pdf
- creating heuristics: use a
*relaxed problem*

**Meeting 8: Wed Sep 21**

- Welcome our TA, Pengfei Zhang.

His office hours are on Thursdays from 230p to 430p in Olsen 307 - PS1c due date is extended to Sun Sep 25
- let's look at it

- today: A* (pronounced ”a star“)
- a breadth-first search
- like UCS, except, instead of
*f(n) = g(n)* *f(n) = g(n) + h(n)*- heuristic: an estimate, but not a guess. If “admissible”:
*A mathematically provable lower bound for actual path cost.* - a guarantee that a solution with cost lower than the reported value is not possible
- An admissible heuristic may underestimate cost (report a value lower than what's achievable),

but it cannot*overestimate*(make a point along the frontier seem worse than it is) - A* search proceeds like a contour map (Fig. 3.25)

- In
`util.py`

, use`PriorityQueueWithFunction`

- make sure you are pushing
`Node`

objects into queue - provide a function that implements the
*f(n) = g(n) + h(n)*operation

- make sure you are pushing
- in-class exercises: astar-cost-search.pdf astar-questions.pdf

**Meeting 7: Mon Sep 19**

- What was your DFS implementation? pollev.com/fredm

`node.py`

and `util.py`

, because you will need that for rest of PS1c.
- MOSS is working; identifying people who collaborated; few false positives
- search problems are ones which can be stated in the form presented in PS1a
- examples of strong PS1a's:
- PS1c assigned; due Friday
- time and space analysis of BFS (p. 82–83) and DFS (p. 86–87)
- UCS
- intro to heuristic search
- map-based travel
- 15-tile puzzle
- some PS1c problems

**Meeting 6: Fri Sep 16**

- tree search

- in-class exercises: tree-search.pdf
- how to do DFS efficiently with graph search situation
- discussion of space and time of BFS and DFS
- in-class exercises: uniform-cost-search.pdf

**Meeting 5: Wed Sep 14**

- example of very strong PS0c
- new pair programming policy for PS1b
- trees vs. graphs
- states vs. nodes
- state is a configuration of the world
- node is an internal data structure of the search process, which includes:
- present state
- prior (parent) node
- action taken to get from parent to self
- accumulated path cost

- graph search algorithm
- in-class exercises: graph-search.pdf

- util.py

**Meeting 4: Mon Sep 12**

- deterministic vs. stochastic
- in backgammon, you roll dice first and then move. The moves always succeed. Hence,
*deterministic*. - in Risk, you attack, then roll dice. The attack succeeds or fails partly based on the dice roll. Hence,
*stochastic*. - after-class conversation: worlds with the backgammon-like randomness can be characterized as deterministic and
*dynamic*. The dice roll is an example of the world changing between your turns.

- in backgammon, you roll dice first and then move. The moves always succeed. Hence,
- fully vs. partially observable
- Rush Hour is fully observable—all moves available from any state are disclosed; no information is hidden
- Battleship is partially observable—you learn/discover where your opponent’s battleships are as you attack them

- in-class exercises: search.pdf
- search trees: paths from state to state ending at goal
- search proceeds by
*expanding*a state, which generates a set of successor states, and then expanding them until goal is reached - DFS vs. BFS:
- how do they work
- what are their properties (space, time, optimality, completeness)

- Pac-Man search problem: find shortest path to the food (demo)
- diving into the code and what you have to do

**Meeting 3: Fri Sep 9**

- play Rush Hour online Android iOS
- worksheet on traffic game rushhour.pdf
- discussion of uniformed search problem configuration:
- set of states
- initial state
- set of actions
- transition model
- step cost
- goal test

- what are the properties of an environment that is best solved by a search agent?
- fully observable
- deterministic
- discrete
- static
- sequential

**Meeting 2: Wed Sep 7**

- Poll: Eight Perspectives on AI

- Course grading rubric adjusted (please review Home).
- You are responsible for everything in the syllabus, even if I haven't discussed it in class yet
- Attendance policy / lecture recordings
- Final exam is scheduled: Fri Dec 16 from 3p to 6p
- Email list is set to immediate delivery (you can change later)
- R&N agent model: environment properties Attach:environments.pdf Δ

**Meeting 1: Fri Sep 2**

- What do you want to do with AI? What opportunities will you create / what problems will you solve?

- Sketch of the semester—problem sets then term project (done in a team)
- Grading categories
- 5-agent model of AIMA in-class exercises

- Demo of Bottlenose