CPT302 Multi-Agent Systems 题型

发布于:2025-05-31 ⋅ 阅读:(15) ⋅ 点赞:(0)

Agent games

Wumpus World 乌普斯世界

设定

环境:一个二维网格状的洞穴(cave),由多个房间(rooms)组成。
起点:智能体(agent)总是从左下角的 Room[1,1] 开始。
连接方式:每个房间通过上下左右的通道walkways 与其他房间连接(不包括对角线diagonally)。

内容与感知线索

Wumpus(怪兽)
静止不动,但致命:进入其所在房间的智能体会被吃掉,游戏失败。
它周围相邻的房间会**发出恶臭(stench)**作为提示。
智能体有箭(arrow),如果面朝它可以使用空格键射杀。

Pits(陷阱)
掉进坑里会卡住,游戏结束。
陷阱周围相邻的房间会有**微风(breeze)**提示。

Gold(宝藏)
目标是不死的情况下尽可能收集所有宝藏。
有宝藏的房间会发出**闪光(glitter)**提示。
使用 Enter 键捡起宝藏。


游戏规则与限制
1. 每次移动(↑↓←→)都会扣分(代表“代价”)( The agent is penalized and must pay to move to a room.
),鼓励智能体做出最优路径决策
2. 需要利用感知(微风、恶臭、闪光)来推理未知房间是否安全。
3. 游戏目标:在不死的情况下尽可能多地收集宝藏

控制方式

↑ ↓ ← →:移动智能体。
Enter:拾取当前房间中的金子。
Space:射箭(如果面前是 Wumpus 则可以杀死)。

The Agent starts from Room[1, 1].

four aspecs of rationality

Lec1
o    Performance measure

The criteria by which the agent’s success is judged(评估智能体“表现好坏”的标准).

  • Collect gold

  • Stay alive (avoid Wumpus and pits)

  • Use minimal actions and resources

o    Percept sequence

所有感知信息的历史序列

This is the complete history of what the agent has perceived. At each step, the agent receives sensory input from the environment.

Stench (Wumpus nearby)
Breeze (Pit nearby)
Glitter (Gold here)
Bump (Hit a wall)
Scream (Wumpus killed)

(1,1,E),(1,1,N),shooting, capture

o    Agent's knowledge about environment 智能体对环境的知识

  • Starts at Room[1,1]

  • Rules: Stench → Wumpus nearby, Breeze → Pit nearby

o    Actions

  • Move (↑ ↓ ← →)

  • Grab (Enter)

  • Shoot arrow (Space)

sensors

Sensor Percept Meaning
Stench Smell A Wumpus is in an adjacent room
Breeze Wind A pit is in an adjacent room
Glitter Visual Gold is in the current room

properties of the environment

From T2

1. Identify the properties of the environment:
        o    Fully observable vs partially observable
        o    Deterministic vs non-deterministic
        o    Static vs dynamic
        o    Discrete vs continuous
        o    Episodic vs non-episodic
        o    Real Time

Property Wumpus World Description
Partially Observable The agent only perceives local clues (e.g., stench, breeze), not the full map.
Deterministic Actions (e.g., move, shooting an arrow) may have certain results.
Static The environment doesn’t change.
Discrete The world is a grid of distinct cells.
Non-Episodic Current decisions depend on the entire history (percepts, moves).
not Real-Time 如果有timer, 则是real time


2. What are the elements of the set E and Ac?

Elements of the set E (Percepts)

Each percept can include:

  • stench, breeze, glitter, bump, scream
    So an element e ∈ E could look like:
    e = ⟨stench, breeze, none, none, none⟩

Elements of the set Ac (Actions)

Possible actions:

  • Up, Down, Left, Right, Grab, Shoot, Climb
    So Ac = {Up, Down, Left, Right, Grab, Shoot, Climb}


    Give an example of e0.

3. Example of e₀ (initial percept)

The agent starts at Room[1,1], which is safe and clean.
So:
e₀ = ⟨none, none, none, none, none⟩


    Give an example of a run r.

A run is a sequence of percepts and actions.
Example:
r = ⟨e₀, Right, e₁, Up, e₂, Grab, e₃, Left, e₄⟩

This represents:

  • perceive e₀ → move Right (action, a0)→ perceive e₁ → move Up  (action, a1)→ perceive e₂Grab gold → ...

Environment and Notation 2

from T3

1. An agent can do the following actions (one at a time): Turn(Right), Turn(Left), Forward, Shoot, Grab. Suggest predicates 谓词 to describe the domain!

    An agent can do the following actions (one at a time):
Turn(Right), Turn(Left), Forward, Shoot, Grab

    Predicates to describe the domain:
In(i, j) means the agent is in square (i, j).
Facing(d) the agent is facing direction d = N, E, W or S.
W(i, j) means there is a Wumpus in square (i,j).
S(i, j) means there is a stench in square(i,j).
P(i, j) means there is a pit in square(i,j).
B(i, j) means there is a breeze in square (i,j).
G(i, j) means there is a treasure (and a glitter) in square (i,j).
V(i, j) means that square (i,j) has been visited.
OK(i, j) means that square(i,j) is safe.
Arrow() means that there is still an arrow left (assume there's only one arrow).

2.    Describe the following rules:
o    A square is safe iff it contains no Wumpus and no pit.

OK(i, j) :- ¬W(i, j) ∧ ¬P(i, j)
 


o    A stench iff a Wumpus is in an adjacent square.

S(i, j) :- W(i+1, j) ∨ W(i-1, j) ∨ W(i, j+1) ∨ W(i, j-1)
 


o    A breeze iff a pit is in an adjacent square.

B(i, j) :- P(i+1, j) ∨ P(i-1, j) ∨ P(i, j+1) ∨ P(i, j-1)

    Create rules for grabbing a treasure and shooting an arrow.

Grab :- In(i, j) ∧ G(i, j)

Shoot :- In(i, j) ∧ Arrow() ∧ Facing(d) ∧ W(i1, j1)
 

Predicate

In(i, j) means the agent is in square (i, j).
Facing(d) the agent is facing direction d = N, E, W or S.
W(i, j) means there is a Wumpus in square (i,j).
S(i, j) means there is a stench in square(i,j).
P(i, j) means there is a pit in square(i,j).
B(i, j) means there is a breeze in square (i,j).
G(i, j) means there is a treasure (and a glitter) in square (i,j).
Arrow() means that there is still an arrow left (assume there's only one arrow).

safe(i, j) means that square(i,j) is safe.

V(i, j) means that square (i,j) has been visited.

Actions

Turn () : P: Facing(d) D: Facing(d) A: Facing(d')

Forward, P: In(i, j) ∧ Facing(d) ∧ safe(i', j')   D: In(i, j)  A: In(i', j'), V(i', j')

Shoot, P: Arrow()  D: Arrow()

Grab  P: In(i, j) ∧ G(i, j)  D: G(i, j)

Plan

From T4

Q1. Give the pre-condition list(执行该动作前需要满足的条件, delete list(动作执行后变为不成立的事实, and the add list(动作执行后变为成立的事实 of the actions.

Pre: In(i,j) ∧ V(i,j)∧G(i,j)

  • In(i,j):代理必须在该格子;

  • V(i,j):说明该格子已被访问,表示代理“已经感知到这里有宝藏”,这是一种常规约束;

  • G(i,j):该格子确实有金子,否则就不需要执行 Grab。

Del:G(i,j)

G(i,j):金子被拿走后,格子不再含有金子

Add:--

没有新增谓词(即空 Add 列表),因为在你提供的谓词系统中没有定义像 HasGold() 这样的谓词,因此不能添加它。


Q2. Formally describe the initial state / beliefs B0 of the figure above.

Initial State 整个环境的“真实状态”,包括所有格子的内容(哪有坑、金子、Wumpus 等)
Initial B0 代理(Agent)在游戏开始时“知道的部分信息”,通常是当前格子和感知到的提示

initial state:

In(1,1)  
Facing(E)  
Arrow()  
W(1,3)  
G(2,3)  
OK(1,1) ∧ OK(1,2) ∧ OK(2,1)  
¬W(1,1), ¬W(1,2), ¬W(2,1)  
¬P(1,1), ¬P(1,2), ¬P(2,1)  

...(总之是初始状态就行)

Beliefs B0​,也叫做初始信念状态,表示智能代理在游戏一开始时对世界的已知信息

In(1,1)
Facing(E)
Arrow()
V(1,1)
OK(1,1)
¬W(1,1)
¬P(1,1)

 


Q3  Calculate a plan π that would achieve killing the Wumpus and getting the treasure without dying, given the beliefs B0.

e.g 这里随便编的
Forward         // (1,1) → (1,2)
Forward         // (1,2) → (1,3)
Shoot           // kill Wumpus at (1,3)
Turn(Right)     // now facing South
Forward         // (1,3) → (2,3)
Grab            // pick up the treasure

 

2324 q1 ab

(a) Is the environment of the Wumpus World above static or dynamic? Explain your answer!

static, the environment doesn’t change. 

(b) Given the following possible actions: Turn_Right, Turn_Left, Forward, Shoot, Grab;
And the following predicates:
In(i, j) means the agent is in square (i, j).
Facing(d) the agent is facing direction d = N, E, W or S.
W(i, j) means there is a Wumpus in square (i, j).
S(i, j) means there is a stench in square(i, j).
P(i, j) means there is a pit in square(i, j).
B(i, j) means there is a breeze in square (i, j).
G(i, j) means there is a treasure (and a glitter) in square (i, j).
Give a rule to the deductive reasoning agent in the Wumpus World above for the case of "if there is a treasure to the right of the agent while it is already facing east, and the next room to the right is safe, then move towards the treasure".    ???

G(i+1, j): In(i,j) ∧ Facing(E) ∧ ¬W(i+1, j) ∧ ¬P(i+1, j) → Do(Forward)

Vacuum World

Compact Rules 

from T3

The goal in the Vacuum World is for the robot to clear up all the dirt. There are 3 domain predicates: 
    In(x,y) agent is at (x,y).
    Dirt(x,y) there is dirt at (x,y).
    Facing(d) the agent is facing direction d = N, E, W or S.
    The actions are Ac = {turn, forward, suck}, where turn means "turn right". 

Suggest compact decision making rules for the vacuum world using first-order logic that works for arbitrary grid sizes.  
Note that you can use the usual quantifiers for all ∀ and there exists ∃, equality =, integers and logical operations on them as well as a constant s to denote denote the size of the grid. 

Actions

Go(x, y) P:In(Shakey, x) ∧ Location(x) ∧ Location(y), D: In(Shakey, y), A: In(Shakey, y)

Push(b, x, y) P:In(Shakey, x) ∧ In(b, x) ∧ Box(b) ∧ Location(y) D: In(Shakey, x), In(b, x) A: In(Shakey, y), In(b, y)

ClimbUp(b) P:In(Shakey, x) ∧ In(b, x) ∧ Box(b) D: In(Shakey, x) A: On(Shakey, b)

ClimbDown(b) P:On(Shakey, b) ∧ In(b, x) D: On(Shakey, b) A: In(Shakey, x)

TurnOn(s)   P:On(Shakey, b) ∧ In(b, s) A: light(s)

TurnOff(s)   P: On(Shakey, b) ∧ In(b, s) ∧ light(s), D:light(s)

Rules

(a) 吸尘规则(如果当前位置有脏点,则吸尘)

(b) 向前移动规则(如果当前位置没有脏点,且前方格子在网格内且未清理,则向前移动)

(c) 转向规则(当前格子无脏点,且前方无脏点或不可走,则转向)

subsumption architecture

Design a solution for the vacuum-world example following Brooks’ subsumption architecture. Identify how the design needs to change to allow an additional action to charge the robot at the home location whenever it runs low on power. Discuss how this design compares with the design based on deductive reasoning. 

Overview of action selection in the subsumption architecture:

  • Set of all behaviours Beh = {(c, a) | c ⊆ P er and a ∈ Ac}

  • Behaviour will fire in state s iff see(s) ∈ c

  • Agent’s set of behaviours R ⊆ Beh, inhibition relation <⊆ R × R

  • < is a strict total ordering (transitive, irreflexive, antisymmetric)

  • If b1 < b2, b1 will get priority over b2

  • Action selection in the subsumption architecture:

1.  function action(p: P er) : Ac
2.      var fired := ∅, selected : A
3.      begin
4.          fired ← {(c, a) | (c, a) ∈ R and p ∈ c}
5.          for each (c, a) ∈ fired do
6.              if ¬∃(c', a') ∈ fired such that (c', a') < (c, a) then
7.                  return a
8.          return null
9.      end

Layer Behavior Description
0 Obstacle Avoidance Immediately avoids obstacles using sensors; cannot be interrupted.
1 Recharge When battery is low, suppresses all non-survival behaviors and returns to home to charge.
2 Clean Dirty Spots Uses dirt sensors to clean dirty locations.
3 Random Wandering Roams randomly when no higher-priority task is active.

In deductive systems, charging must be explicitly defined as a goal. When the battery is low, the system re-plans its actions, which allows flexibility but adds complexity and slower response.
In contrast, the subsumption design handles charging by priority—it's faster, simpler, and more robust in dynamic environments.

inhibition relation

Suppose we allow an additional action “move in random direction” to avoid having to specify conditions for each and every situation and to analyse whether the robot can get stuck.
We consider the following percepts: “There is dirt in the robot current location”, and “The robot is facing a wall”.
Describe the robot's behaviour and the corresponding inhibition relation.
Modify further and introduce behaviors related to battery state and the corresponding inhibition relation!

Domain Predicates:
In(x, y): Robot is at (x, y).
Dirt(x, y): There is dirt at (x, y).
Facing(d): Robot is facing direction d ∈ {N, E, W, S}.

Actions (original):
suck: clean dirt at current position
turn: turn right(90)
forward: move forward (if no wall)

Actions (New):
move_random: pick a random direction and move (used as a fallback)
go_home: go to charging station
charge: recharge battery

Rules:

In(x, y) ∧ Dirt(x, y) → do(suck)
FacingWall(d) → do(turn)
¬Dirt(x, y) ∧ ¬FacingWall(d) → do(forward)
Default → do(move_random)
LowBattery ∧ AtHome → do(charge)
LowBattery ∧ ¬AtHome → do(go_home)

 

Priority

charge < turn < go_home < suck < forward < move_random
 

22-23 q1a

(a) Consider a 4-by-4 cell Vacuum World as follows:

where “R” represents a robot agent, “H” represents a hole, “B” represents a battery station and “*” represents dirt. 

1.    Develop a set of rules (including predicates and actions) that can be used to describe the above 4-by-4 cell Vacuum World. 

Domain Predicates:
In(x, y): Robot is at (x, y).
Dirt(x, y): There is dirt at (x, y).
Facing(d): Robot is facing direction d ∈ {N, E, W, S}.
Hole(x,y) there is hole at(x,y)
Battery(x,y) there is battery at(x,y)

Actions (original):
suck: clean dirt at current position
turn: turn right(90)
forward: move forward (if no wall)

Actions (New):
move_random: pick a random direction and move (used as a fallback)
charge: recharge battery
go_home: go to charging station


2.    Use these rules to instruct the following: plan
    the robot to clean up all the dirt starting from (4, a) while avoiding falling into any hole;
    the robot must pass the battery station to get recharged for at least one time;
    after having cleaned all the dirt, the robot is located at (3,d).

3. Design a subsumption architecture for the 4-by-4 cell Vacuum World and use inhibition to coordinate the behaviors. 

Layer Behavior Can Inhibit Inhibited by
1 Avoid Hole 2, 3, 4
2 Charge Battery 3, 4 1
3 Clean Dirt 4 1, 2
4 Explore/Move 1, 2, 3

Shakey's World 

Shakey’s world consisting of four rooms lined up along a corridor

each room has a door and a light switch.

The actions in Shakey’s world include moving from place to place, pushing movable objects (such as boxes), climbing onto and down from rigid objects (such as boxes), and turning light switches on and off. To turn a light on or off, Shakey must be on top of a box at the light switch’s location. 
Shakey’s six actions are the following: 
    Go(x, y) form x to y, which requires that Shakey be at location x.
    Push a box b from location x to location y:  Push(b, x, y).
    Climb onto a box: ClimbUp(b); climb down from a box: ClimbDown(b).
    Turn a light switch on: TurnOn(s); turn it off: TurnOff(s).

e.g desceibe predicates

You are given the following constants: Room1,... , Room4, Corridor, Shakey, Box1,... , Box4, Switch1, ... , Switch4, and Floor.  
Suggest predicates to describe the domain!  

1) In (o,x) object o Shakey/box/switch is in location x e.g In(Shakey, Room3)=True
2) light(x) light is on in location x,
3) Box(o) denote the object o is a box
4)On(o1, o2) object o1 is stacked on object o2
5)  Location(o) denote the object o is a location

plan

Q1 Give the pre-condition list, delete list, and the add list of the six actions.


Q2 Formally describe the initial state / beliefs B0 of the figure above.


Q3 Calculate a plan π that would achieve in intention i = Box2 in Room2, given the beliefs B0.

form T4

Blocks world

from T4

Observe the picture for the Initial State/Beliefs B0 and Goal/Intention i in a particular Blocksworld. Calculate an efficient plan π that would achieve i, given the beliefs B0. 

predicates

On(x,y) object x on top of object y

OnTable(x) object x is on the table

Clear(x) nothing is on top of object x

Holding(x) arm is holding x

ArmEmpty()

actions

Stack(x, y)

Clear(y) ∧ Holding(x)

Clear(y) ∧ Holding(x)

ArmEmpty ∧ On(x, y)

Place object x (held) on top of object y

UnStack(x, y)

On(x, y) ∧ Clear(x) ∧ ArmEmpty

On(x, y) ∧ ArmEmpty

Holding(x) ∧ Clear(y)

Pick up object x from on top of object y

Pickup(x)

Clear(x) ∧ OnTable(x) ∧ ArmEmpty

OnTable(x) ∧ ArmEmpty

Holding(x)

Pick up object x from the table

PutDown(x)

Holding(x)

Holding(x)

OnTable(x) ∧ ArmEmpty ∧ Clear(x)

Put down object x onto the table

plan

1. UnStack(A, B) 2. PutDown(A) 3. Pickup(B) 4. Stack(B, C) 5. Pickup(A) 6. Stack(A, B)

Utilitie function

Tutorial 3

Redo the Lecture Example 2 on finding the best agent where we have non-deterministic/stochastic world and utilities over runs (Page 71 of Lecture 2 Slides). 

Concurrent MetateM 

lec 4

In Concurrent MetateM, each agent is programmed by giving it a temporal logic specification so the truth of proposition changes over time. 
We describe some of the operators with example in the lecture, as summarized here: 

Beginning of time: special nullary operator start satisfied only at the beginning. Agent interface consists of 

    unique agent identifier
    (environment propositions) i.e. messages accepted by the agent
    [component propositions] i.e. messages agent will send

For example: stack(pop, push)[popped, full] 

Computational engine based on MetateM, based on program rules: antecedent about past ⇒ consequent about present and future 

Agents are trying to make present and future true given past 

    Collect constraints with old commitments
    These taken together form current constraints
    Next state is constructed by trying to fulfil these
    Disjunctive formula ⇒ choices
    Unsatisfied commitments are carried over to the next cycle

Given the following program "Snow White" in Concurrent MetateM. 

    The idea is that Snow White has some sweets (resources), which she will give to the Dwarves (resource consumers) who ask.
    She will only give to one dwarf at a time.
    She will always eventually give to a dwarf that asks.

Describe more completely what the program does (the dwarves do) and trace its operation in a table for the first three time steps! 

commitment

2425 mock q2a

Explain what you understand by Blind commitment, Single-minded commitment and Open-minded commitment. (Lec 4)

1. Blind Commitment

Keeps an intention until it believes the goal is achieved, no matter what.

Ignores whether the goal is still possible.

2. Single-minded Commitment

Keeps an intention until it believes the goal is achieved or impossible.

Gives up only if success is clearly not possible.

3. Open-minded Commitment

Keeps an intention as long as it believes the goal is still possible.

More flexible and responsive to new information.

2425 mock q2a

The following pseudo-code defines a control loop for a practical reasoning(BDI) agent:

Recall that “Practical Reasoning = deliberation + means ends reasoning”. With reference to the above code, answer the following questions:
1.    What commitment protocol is used in this code?


2.    What should be modified in this code if the commitment protocol “Opened-minded commitment” is used?


3.    What should be modified in this code if the commitment protocol “Single-minded commitment” is used?


4.    Assume the commitment protocol “Single-minded commitment” is used in the above code. When should an agent stop to reconsider its intentions?

subsumption architecture

Describe what is an “subsumption architecture”. Give an example of subsumption architecture and explain how such a subsumption architecture operates. 

definition: A subsumption architecture is a hierarchy of task-accomplishing behaviours. Each behavior is a basic rule that competes for control.
Lower layers (e.g., obstacle avoidance) take priority over higher ones.

auction

wk9 

Q1. Consider the statement:  “Bidding one’s own valuation in a Vickrey  auction is the dominant strategy for a rational agent.”. If you think the  statement is true, show that statement is true. Otherwise, give a counter- example to show the statement is false.  

wk9

Briefly comment on the English, Vickrey, first-price sealed bid, or Dutch auction protocols which guards better against bidder collusion.

2425 Mock Q1b auction

Explain briefly the principal characteristics and differences between a Vickrey auction and a Dutch auction. 

A Vickrey auction is a sealed-bid auction where the highest bidder wins but pays the second-highest price.

A Dutch auction is an open descending-price auction where the price drops until a bidder accepts. 

The key difference is that Vickrey auctions use sealed bids and promote honest bidding, while Dutch auctions reveal prices publicly and rely on timing decisions.

2324 Q3c

(e) If you believe that there is a dominant strategy for bidders in the English auction, then justify that. If not, give a counterargument.

Dominant strategy: bid a small amount above highest current bid until one’s own valuation is reached

2223 Q4 c auction

What do you understand by a Vickrey auction and a Dutch auction. Also, how can auctioneer and bidder cheat in an English auction?

Vickrey auction: - Highest bidder wins but pays second-highest bid 
Dutch auction: price decreases until a bidder accepts
Cheat in English auction: Highest bidder wins but pays second-highest bid

payoff matrix

2425 Mock Q5b

 Player 1 can play A, B, C or D; and Player 2 can play X or Y. Consider the following payoff matrix:

Determine if either player has any dominant strategy and justify your answer.

Player 1 (row) dominant strategy:

  • Compare strategy A with others:

    • A vs B: A better in both X (6>4) and Y (5>4)

    • A vs C: A better or equal in both X (6>3) and Y (5=5)

    • A vs D: A better in X (6>5), but worse in Y (5<6)

  • Since A is not better than D in all cases, Player 1 has no strictly dominant strategy


Player 2 (column) dominant strategy:

  • Compare strategy Y with X:

    • For Player 2’s payoffs: Y is equal or better than X in all cases, and strictly better for B (3>2) and D (5>4).

  • Therefore, Player 2’s strategy Y is strictly dominant.

2425 Mock Q5 c

Consider the game where Player 1 plays A, B, C, D; and Player 2 plays R, Q, S, T with the following payoff matrix:

With reference to the matrix above, answer the following questions:

1.    Identify with justification, if any, the pairs that are in Nash equilibrium.

Step 1: Player 1's best responses to Player 2's strategies:

  • To R: best response is A (payoff 2)

  • To Q: best response is B (payoff 4)

  • To S: best response is C (payoff 3)

  • To T: best response is D (payoff 5)

Step 2: Player 2's best responses to Player 1's strategies:

  • To A: best response is R (payoff 2)

  • To B: best response is Q (payoff 3)

  • To C: best response is S (payoff 4)

  • To D: best response is T (payoff 5)

nash equilibria are strategy pairs where both players' strategies are mutual best responses:

(A,R),(B,Q),(C,S),(D,T)

At these points, neither player can improve payoff by unilaterally deviating.


2.    State whether any outcomes are Pareto optimal. Justify your answer.

To find Pareto optimal outcomes, eliminate all strategy pairs that are dominated by others (where both players get equal or higher payoffs and at least one strictly higher).

In this game, all low-payoff pairs like (1,1) are dominated.

The four Nash equilibria (A,R),(B,Q),(C,S),(D,T) with payoffs (2,2),(4,3),(3,4),(5,5)are not dominated by any other pairs.

Therefore, these four are Pareto optimal.


3.    Identify with justification, if any, the pairs that maximizes the social welfare.

Pair Payoffs (uj,ui) Total W
(A, R) (2, 2) 4
(B, Q) (4, 3) 7
(C, S) (3, 4) 7
(D, T) (5, 5) 10
Others (1, 1) 2

The pair (D, T) maximizes social welfare with a total payoff of 10.

2324 Q2  abc 

(a) Is the strategy (Invest, Invest) a Pareto optimal outcome? Justify your answer.

(b) Give all Pareto optimal outcomes. Justify your answer.

(c) Give all pure strategy Nash equilibria. Justify your answer.

2324  Q2d

 (d) Give an example of a payoff matrix where all individual payoff values are different, but all strategies are equally good in terms of social welfare. Justify your answer.

2324  Q2e

 Tutorial 5

Given the following payoff matrix. 

a)    For the matrix above, is (2, 1) a Pareto optimal outcome? Explain. If not, find one or more Pareto optimal outcomes!
b)    Find one or more pure strategies NE!

Tutorial 5

Find a pure strategy Nash Equilibrium of this game/payoff matrix: 

Game of Chicken 

tutorial5

In the Game of Chicken, two people drive two very fast cars towards each other from opposite ends of a long straight road. If one of them swerves before the other, he is called a chicken. Of course, if neither swerves, they will crash.  

    The worst possible payoff is to crash into each other, so we assign this a value 0.
    The best payoff is to have your opponent be the chicken, so we assign this a value 3.
    The next to worst possibility is to be the chicken, so we assign this a value 1.
    The last possibility is that both drivers swerve. Then, neither has less honor than the other, so this is preferable to being the chicken. However, it is not quite as good as being the victor, so we assign it a value 2.

S: Swerve 
D: Drive straight 

Find a pure strategy Nash Equillibrium! 

 The Prison's Dilemma

Lec5

Two men are collectively charged with a crime and held in separate cells, with no way of meeting or communicating. 
They are told that: 

    If one confesses and the other denies (C,D) or (D,C), the confessor will be freed, and the other will be jailed for three years;
    If both confess (C,C), then each will bejailed for two years.
    Both prisoners know that if neither confesses (D, D), then they will each bejailed for one year

The corresponding payoff matrix is given below. 

a)    An outcome is Pareto optimal if there is no other outcome where at least one player is better off and no player is worse off compared to that outcome.
For the matrix above, is (-2, -2) a Pareto optimal outcome? Explain.
If not, what is the Pareto optimal outcome?
b)    A strategy is a pure strategy Nash Equilibrium (NE) if neither player can improve their outcome by switching their strategy.
Is (D, D) a pure strategy NE? Explain.
If not, what is a pure strategy NE for that matrix?
c)    What strategy maximizes the social welfare?

gpt

X Y Z
P (3, 2) (1, 4) (0, 1)
Q (2, 3) (2, 2) (1, 0)
R (0, 1) (3, 3) (4, 4)

Nash:

j-i best: X-P,Y-R, Z-R

i-j best: P-Y, Q-X, R-Z

Nash e: Z-R

Pareto:
Z-R

social warfare

组合 收益对 (uj,ui) 社会福利总和 uj+ui
(P, X) (3, 2) 5
(P, Y) (1, 4) 5
(P, Z) (0, 1) 1
(Q, X) (2, 3) 5
(Q, Y) (2, 2) 4
(Q, Z) (1, 0) 1
(R, X) (0, 1) 1
(R, Y) (3, 3) 6
(R, Z) (4, 4) 8

dominant

j: 

比较p和q,没有绝对占优

q和r,没有绝对占优

p和r,没有绝对占优

i:

x-y,没有

y-z,没有

z-x 没有

无玩家列占优策略

coalitional game

concepts

2425 mock Q4a coalitional game

Explain what you understand by the Core of a coalitional game and what happens when the core is empty?

2324 Q3 ad

(a) Explain the two concepts: the Core of a coalitional game and the Shapley value.

(d) For the statement: The Shapley value of any characteristic function belongs to the core. If you think that the statement is correct, then justify it. Otherwise, give a counterargument.

Shapley

wk8

 

calculate Sh({1,2},1) and Sh({1,2},2),分别计算玩家 1 和 2 的 Shapley 值

1.对每个排列,计算该玩家加入时带来的 边际贡献(Marginal Contribution),然后对所有排列求平均(因为两个玩家,有 2!=22! = 22!=2 个排列,所以平均就是除以 2)

wk 8 

 Consider the following characteristic function:  

v({1})=100,v({2})=125,v({3})=50,

v({1,2})=270,v({1,3})=375,v({2,3})=350 and v({1,2,3})=500

Compute the Shapley values for the agents 1, 2 and 3.

sol

天哪这个三个的太多了,到时候慢慢写
 

wk8

Three students share a taxi. Here are the costs for each individual journey: 

    Player 1 - Zhao: 6 (rmb)
    Player 2 - Wang: 12 (rmb)
    Player 3 - Xu: 42 (rmb)

(a). Construct the characteristic function of the above game. 

 (b)Find a fair way of sharing taxi fare for Zhao, Wang and Xu. 

徐的 Shapley 值 ϕ3​:

2425 mock q4c

(c)    Consider the coalitional game with agents Ag = {a, b, c } and characteristic function ν defined by the following:

Compute the Shapley values for the agents a, b, and c. You are required to show the relevant steps in your answers about how you have obtained the values. 

weighted subgraph

25 mock Q4b Weight graph of characteristic function

Consider the following weighted subgraph representation of a characteristic function:

Let µ be the characteristic function defined by the above subgraph. Give the values of µ({v1}), µ({v2}), µ({v3}), µ({v1, v2}), µ({v1, v3}), µ({v2, v3}) and µ({v1, v2, v3}).

µ({v1}) = 1,µ({v2})= 1, µ({v3}) = 1

µ({v1, v2}) = 2+1+1 = 4  , µ({v1, v3})= 4+1+1 = 6 , µ({v2, v3})= 3+1+1 = 5

µ({v1, v2, v3}) =2 + 4 + 3 + 1 + 1 + 1 = 12

Voting

wk 9.5

Plurality voting is NOT a good approach to select among four candidates. If you think it is true, explain the statement. Otherwise, give a counterexample to show that the statement is false.

wk10

A majority candidate: A candidate with more than half of the first place votes is a majority candidate. 
Plurality-With-Elimination Method: This method is a preferential voting method and is a variation of the Plurality Method. Plurality with Elimination is carried out in rounds. After each round of voting the candidate (or alternative) with the fewest first place votes is eliminated and a new round of voting is done with the remaining candi-dates. When only two candidates remain in a round, the candidate with the most votes wins the election. 
The Method of Pairwise Comparisons: Each candidate (or alternative) is matched head-to-head (one-on-one) with each of the other candidates. Each candidate (alternative) gets 1 point for a one-on-one win and a half a point for a tie. The candidate (alternative) with the most total points is the winner. 

Q1.  How do you decide the winner of an election using the Plurality Method
The candidate with the most first place votes wins. 
Q2. How do you decide the winner of an election using the Borda Count Method?
1 point is awarded for last place, 2 points for second to last place, etc. You can then find a point total for every candidate and the candidate with the highest point total wins. 
Q3. How do you decide the winner of an election using the Method of Pairwise Com-parisons
Every candidate goes head to head. The candidate with the most victories wins. 
Q4. Does every election have a majority candidate
Not every election has one. 

wk10

What is a Condorcet candidate? Does every election have a Condorcet candidate?
A candidate who wins against every other candidate in a pairwise comparison (head to head match up) is a Condorcet candidate. Not every election has one. 
 

wk10

For this question, use the following preference schedule: 

(a).  Find the winner of the election using the Plurality Method

(b). Find the winner of the election using the Borda Count Method.

(c). Find the winner of the election using the Method of Pairwise Comparisons. Is there any Condorcet winner in the election? 

Pair Winner Score
A vs B B 23–14
A vs C C 23–14
A vs D D 23–14
B vs C C 19–18
B vs D B 28–9
C vs D C 25–12

Candidate C wins the most pairwise comparisons so he wins as a Condorcet candidate 

wk 9.5

A class of 18 CPT302 students uses the Borda count method (starting with 0) to select one of four candidates: Adam, Boby, Chuck or David. If Adam receives 35 Borda counts, Boby receives 28 Borda counts, and Chuck receives 20 Borda counts, how many Borda counts does David receive? Who wins this Borda election? 

wk 9.5

There are 6 PhD students are for the best Teaching Assistant (TA) of the year award. The decision is made by 100 module leaders using the plurality method. After the first 60 votes have been cast, the situation is as follows:

What is the minimal numbers of the remaining votes Peter needs to be guaranteed to win? Note that you are required to include the relevant steps in your answers to show all your work.

p denotes the number of votes Peter needs to make sure at least a tie with Diana for first place.
Then p+22=18+(40-p) and p=18.
So, if Peter gets 18+1=19 votes from the remaining 40 to guarantee that Peter will be the best TA.

 

Contract Net

wk11

Explain the five stages of task-sharing protocol in Contract Net. 

Recognition: An agent realizes it needs help to complete a task due to lack of resources or skills.
Announcement: The agent broadcasts a task description and requirements to other agents.
Bidding: Interested agents evaluate the task and decide whether to submit a bid based on their capabilities and costs.
Awarding: The announcing agent selects the best bid that meets its criteria and assigns the task.
Expediting: The chosen agent executes the task and may initiate further contracts if needed.


 

CDPS

wk11

This question considers Cooperative Distributed Problem Solving (CDPS). Answer the following two questions

With the aid of an example, explain what is meant by CDPS and why the concept of “cooperation” plays a relevant role in CDPS.

Ans: 
CDPS (Cooperative Distributed Problem Solving) refers to a system where multiple intelligent, independent agents work together to solve problems too complex for any one of them alone. Cooperation is essential because each agent has only partial knowledge or ability. For example, aircraft tracking by sensors in different locations requires combining their distinct data to understand overall movements.


Discuss also the main issues that arise in CDPS.
How to divide problems into sub-tasks for agents.
How to combine sub-solutions into a complete answer.
How to optimize agent activities for coherent results.
How to coordinate agents to avoid conflict and enhance cooperation.