【Notes】Multi-Behavior Sequential Recommendation with Temporal Graph Transformer(2022)
Xia L, Huang C, Xu Y, et al. Multi-Behavior Sequential Recommendation with Temporal Graph Transformer[J]. IEEE Transactions on Knowledge and Data Engineering, 2022.
Key challenge:
(1)如何处理交叉行为之间的关系(不同行为可能传递互补的信号)
(2) 时序多行为的融合
问题定义:
给定用户 u i ∈ U u_i\in U ui∈U,其行为序列 S i ∈ S S_i\in S Si∈S由三元组构成: ( v k , i , b , t ) (v_{k,i},b,t) (vk,i,b,t),其中行为 b ∈ B b\in B b∈B
给定序列 s i s_i si,预测经过 t K , i t_{K,i} tK,i后可能进行交互的商品
1.对于用户 u i u_i ui分为几个sub-users,对应sub-sequence S i k S_i^k Sik(如何进行切分是一项调整的参数)(划分sub-sequence的我认为目的是分别对short-item和long-item两方面进行处理)
2.利用transformer的结构捕捉short-term的信息:首先分割为sub-sequence(short-term);(a)Behavior-Aware Context Embedding: 然后对于sub-sequence,time的embedding采用正余弦函数;与item和behavior的embedding相加构成使用的embedding(使得embedding可以包含时间信息和行为信息)(b)Item-Wise Sequential Dependency: 利用transformer的结构捕捉( E ‾ k r = ∥ h = 1 H ∑ k ′ = 1 K α k , k ′ V h E k ′ r \overline{E}_k^r=\Vert_{h=1}^H\sum^K_{k'=1}\alpha_{k,k'}V^hE_{k'}^r Ekr=∥h=1H∑k′=1Kαk,k′VhEk′r)
3.Aggregation:(local)
分为两步(a)区分不同行为重要程度进行encode,(b)aggregation
构建二分图 G r = G_r= Gr={ u i r ∪ S i , r , ξ r u_i^r\cup S_{i,r},\xi_r uir∪Si,r,ξr}
H b r = A g g r e ( E ‾ r , b ) = σ ( ∑ k = 1 K ϕ ( b k r = b ) E ‾ k r W b ) H_b^r=Aggre(\overline{E}^r,b)=\sigma(\sum_{k=1}^K\phi(b_k^r=b)\overline{E}_k^rW_b) Hbr=Aggre(Er,b)=σ(∑k=1Kϕ(bkr=b)EkrWb)
Multi-channel Projection: 区分不同行为的效果(重要性),得到 W b W_b Wb,这也用于区分global item-wise 依赖
After the behavior embedding projection over multiple base transformations, the behavior-aware semantics are encoded with the developed channel-wise aggregation layer, which endows our TGT method to preserve the inherent behavior semantics of different types of user-item interactions.(这里不是很清楚为什么可以让TGT保留信息)
Aggregation over Cross-Type Relation:(这一步的重点在于动态的用户偏好,因此提出适应性的注意力网络),在中心思想aggeration的基础上进一步处理,计算注意力权重,得到 H ‾ r = ∑ b = 1 B γ b H b r \overline{H}^r=\sum_{b=1}^B\gamma_bH_b^r Hr=∑b=1BγbHbr
γ b = σ 1 ( H b r T σ 2 ( ∑ b = 1 B H b r W A + μ A ) ) \gamma_b=\sigma_1({H_b^r}^T\sigma_2(\sum_{b=1}^BH_b^rW_A+\mu_A)) γb=σ1(HbrTσ2(∑b=1BHbrWA+μA))
这样得到的结果包含了上下文信息
E ‾ j b = σ ( ∑ v k r = j ϕ ( b k r = b ) H ‾ r W b ) ; E ‾ j = ∑ b = 1 B γ b E ‾ j , b \overline{E}_j^b=\sigma(\sum_{v_k^r=j}\phi(b_k^r=b)\overline{H}^rW_b);\overline{E}_j=\sum_{b=1}^B\gamma_b\overline{E}_{j,b} Ejb=σ(∑vkr=jϕ(bkr=b)HrWb);Ej=∑b=1BγbEj,b
这样的到的结果就是从user(节点)到item(节点)的信息传递的结果
4.Global context learning(通过这个部分(解决long-term),representation将同时包含short,long-term):
Global user representation: Γ i : \Gamma_i: Γi:user的embedding, Γ ‾ i = ϕ ( H ‾ r ) = σ ( ∑ r = 1 R i η r H ‾ r ) ; \overline{\Gamma}_i=\phi(\overline{H}^r)=\sigma(\sum_{r=1}^{R_i}\eta_r\overline{H}^r); Γi=ϕ(Hr)=σ(∑r=1RiηrHr);
η r = Γ i T H ‾ r ; H ‾ r = Γ i + t r \eta_r=\Gamma_i^T\overline{H}^r;\overline{H}^r=\Gamma_i+t_r ηr=ΓiTHr;Hr=Γi+tr
η r \eta_r ηr判断 u i u_i ui, u i r u_i^r uir之间关系的权重
如图所示,(1)global to local:计算不同sub-user的embedding
(2)local to global:计算权重之后,得到加权和
High-order Relation Aggregation根据上面的分析,得到了:
i) short-term multi-behavior interactions between user and item ; ii) long-range dynamic structural dependency of user interest across-time durations
最后一步介绍了l到(l+1)层如何传递信息(转换)
5.模型预测和优化
实验结果说明
BPR,NCF,DeepFM:多行为的重要性
Bert4Rec的良好表现也说明,相较于其他attention机制,transformer更好
相较于其他以GNN为基础的图神经网络,本文对于行为(behavior)的影响(heterogeneity)也进行encode
对于考虑heterogenity的神经网络进行比较,没能捕捉行为的动态依赖
The goal of TGT is to aggregate dynamic relation contextual signals from different types of user behaviors and generate contextualized representations for making predictions on target behaviors.