独立同步分布的观测数据 { W i = ( Y i , D i , X i ) ∣ i ∈ { 1 , . . . , n } } \{W_i=(Y_i,D_i,X_i)| i\in \{1,...,n\}\} {Wi=(Yi,Di,Xi)∣i∈{1,...,n}},其中 Y i Y_i Yi表示结果变量, D i D_i Di表示因变量, X i X_i Xi表示控制变量。
目标参数 θ 0 \theta_0 θ0的一般定义形式为:
E [ m ( W ; θ 0 , η 0 ) ] = 0 E[m(W;\theta_0,\eta_0)] = 0 E[m(W;θ0,η0)]=0
W W W为观测到的变量, θ 0 ∈ Θ \theta_0\in \Theta θ0∈Θ为目标参数, η 0 ∈ T \eta_0\in \mathcal{T} η0∈T为辅助参数
例如,ATE 的定义为:
θ 0 A T E ≡ E [ E [ Y i ∣ D i = 1 , X i ] − E [ Y i ∣ D i = 0 , X i ] ] \theta_0^{ATE}\equiv E[E[Y_i|D_i=1,X_i] - E[Y_i|D_i=0,X_i]] θ0ATE≡E[E[Yi∣Di=1,Xi]−E[Yi∣Di=0,Xi]]
ATE的IPW估计定义为:
m I P W ( W i ; θ , α ) ≡ α ( D i , X i ) Y i − θ ≡ [ D i E [ D i ∣ X i ] − 1 − D i 1 − E [ D i ∣ X i ] ] Y i − θ m_{IPW}(W_i;\theta,\alpha)\equiv \alpha(D_i,X_i)Y_i - \theta \equiv [\frac{D_i}{E[D_i|X_i]} - \frac{1-D_i}{1-E[D_i|X_i]}]Y_i - \theta mIPW(Wi;θ,α)≡α(Di,Xi)Yi−θ≡[E[Di∣Xi]Di−1−E[Di∣Xi]1−Di]Yi−θ
ATE的Doubly Robust估计的定义为:
m D R ( W i ; θ , η ) ≡ α ( D i , X i ) ( Y i − E [ Y i ∣ D i , X i ] ) Y i + E [ Y i ∣ D i = 1 , X i ] − E [ Y i ∣ D i = 0 , X i ] − θ m_{DR}(W_i;\theta,\eta)\equiv \alpha(D_i,X_i)(Y_i - E[Y_i|D_i,X_i])Y_i + E[Y_i|D_i=1,X_i]- E[Y_i|D_i=0,X_i]-\theta mDR(Wi;θ,η)≡α(Di,Xi)(Yi−E[Yi∣Di,Xi])Yi+E[Yi∣Di=1,Xi]−E[Yi∣Di=0,Xi]−θ
≡ [ D i E [ D i ∣ X i ] − 1 − D i 1 − E [ D i ∣ X i ] ] Y i + E [ Y i ∣ D i = 1 , X i ] − E [ Y i ∣ D i = 0 , X i ] − θ \equiv [\frac{D_i}{E[D_i|X_i]} - \frac{1-D_i}{1-E[D_i|X_i]}] Y_i + E[Y_i|D_i=1,X_i]- E[Y_i|D_i=0,X_i]-\theta ≡[E[Di∣Xi]Di−1−E[Di∣Xi]1−Di]Yi+E[Yi∣Di=1,Xi]−E[Yi∣Di=0,Xi]−θ
一般情况下,目标参数 θ 0 \theta_0 θ0的估计值定义为:
θ ^ : 1 n ∑ i = 1 n m ( W i ; θ ^ , η ^ ) = 0 \hat{\theta}:\frac{1}{n}\sum_{i=1}^nm(W_i;\hat{\theta},\hat{\eta}) = 0 θ^:n1i=1∑nm(Wi;θ^,η^)=0
一阶泰勒展得出:
1 n ∑ i = 1 n m ( W i ; θ ^ , η ^ ) ≈ 1 n ∑ i = 1 n m ( W i ; θ 0 , η 0 ) + 1 n ∑ i = 1 n ∂ ∂ θ m ( W i ; θ 0 , η 0 ) ( θ ^ − θ 0 ) + 1 n ∑ i = 1 n ∂ ∂ η m ( W i ; θ 0 , η 0 ) ( η ^ − η 0 ) ≈ 0 \frac{1}{n}\sum_{i=1}^nm(W_i;\hat{\theta},\hat{\eta}) \approx \frac{1}{n}\sum_{i=1}^nm(W_i;\theta_0,\eta_0) + \frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)(\hat{\theta} - \theta_0) + \frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\eta}m(W_i;\theta_0,\eta_0)(\hat{\eta} - \eta_0) \approx 0 n1i=1∑nm(Wi;θ^,η^)≈n1i=1∑nm(Wi;θ0,η0)+n1i=1∑n∂θ∂m(Wi;θ0,η0)(θ^−θ0)+n1i=1∑n∂η∂m(Wi;θ0,η0)(η^−η0)≈0
( θ 0 − θ ^ ) ≈ [ 1 n ∑ i = 1 n ∂ ∂ θ m ( W i ; θ 0 , η 0 ) ] − 1 1 n ∑ i = 1 n m ( W i ; θ 0 , η 0 ) + [ 1 n ∑ i = 1 n ∂ ∂ θ m ( W i ; θ 0 , η 0 ) ] − 1 ( η ^ − η 0 ) 1 n ∑ i = 1 n ∂ ∂ η m ( W i ; θ 0 , η 0 ) (\theta_0 - \hat{\theta})\approx [\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)]^{-1}\frac{1}{n}\sum_{i=1}^nm(W_i;\theta_0,\eta_0) + [\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\theta}m(W_i;\theta_0,\eta_0)]^{-1}(\hat{\eta} - \eta_0)\frac{1}{n}\sum_{i=1}^n\frac{\partial}{\partial\eta}m(W_i;\theta_0,\eta_0) (θ0−θ^)≈[n1i=1∑n∂θ∂m(Wi;θ0,η0)]−1n1i=1∑nm(Wi;θ0,η0)+[n1i=1∑n∂θ∂m(Wi;θ0,η0)]−1(η^−η0)n1i=1∑n∂η∂m(Wi;θ0,η0)
目标参数的估计偏差 ( θ 0 − θ ^ ) (\theta_0 - \hat{\theta}) (θ0−θ^)将受到辅助参数估计偏差 ( η ^ − η 0 ) (\hat{\eta} - \eta_0) (η^−η0)的影响,说明目标参数的估计偏差的两种来源分别是:
- 辅助参数的估计偏差 ( η ^ − η 0 ) (\hat{\eta} - \eta_0) (η^−η0)本身,称之为正则化偏差
- 辅助参数的估计偏差 ( η ^ − η 0 ) (\hat{\eta} - \eta_0) (η^−η0)与 W i W_i Wi的强相关性,称之为过拟合偏差
Neyman Orthogonality
∂ ∂ λ { E [ ψ ( W i ; θ 0 , η 0 + λ ( η − η 0 ) ) ] } ∣ λ = 0 = 0 , ∀ η ∈ T \frac{\partial}{\partial\lambda}\{E[\psi(W_i;\theta_0,\eta_0 + \lambda(\eta-\eta_0))]\}|_{\lambda=0}= 0,\forall\eta\in \mathcal{T} ∂λ∂{E[ψ(Wi;θ0,η0+λ(η−η0))]}∣λ=0=0,∀η∈T
m I P W m_{IPW} mIPW is not Neyman orthogonal, m D R m_{DR} mDR is Neyman orthogonal.
Cross Fitting
θ ^ : 1 n ∑ k = 1 K ∑ i ∈ I k m ( W i ; θ ^ , η ^ − k ) = 0 \hat{\theta}:\frac{1}{n}\sum_{k=1}^K\sum_{i\in I_k}m(W_i;\hat{\theta},\hat{\eta}_{-k}) = 0 θ^:n1k=1∑Ki∈Ik∑m(Wi;θ^,η^−k)=0
DML
θ ^ : 1 n ∑ k = 1 K ∑ i ∈ I k ψ ( W i ; θ ^ , η ^ − k ) = 0 \hat{\theta}:\frac{1}{n}\sum_{k=1}^K\sum_{i\in I_k}\psi(W_i;\hat{\theta},\hat{\eta}_{-k}) = 0 θ^:n1k=1∑Ki∈Ik∑ψ(Wi;θ^,η^−k)=0
直接回归不满足 Neyman 正交性
Y = θ T + g ( X ) + ϵ Y = \theta T + g(X) + \epsilon Y=θT+g(X)+ϵ
m ( W ; θ , g ) = Y − θ T − g ( X ) + ϵ m(W;\theta,g) = Y - \theta T - g(X) + \epsilon m(W;θ,g)=Y−θT−g(X)+ϵ
∂ ∂ λ E [ m ( w ; θ , g + λ Δ g ) ] ∣ λ = 0 = E [ − Δ g ( x ) ] ≠ 0 \frac{\partial }{\partial \lambda}E[m(w;\theta,g + \lambda\Delta g)]|_{\lambda=0} = E[-\Delta g(x)] \ne 0 ∂λ∂E[m(w;θ,g+λΔg)]∣λ=0=E[−Δg(x)]=0
DML 满足Neyman正交性
Y − l ( x ) = θ ( T − m ( x ) ) + ϵ ′ , l ( x ) = E [ Y ∣ X = x ] , m ( x ) = E [ T ∣ X = x ] Y-l(x) = \theta (T - m(x)) + \epsilon',l(x) = E[Y|X=x],m(x)=E[T|X=x] Y−l(x)=θ(T−m(x))+ϵ′,l(x)=E[Y∣X=x],m(x)=E[T∣X=x]
m ( W ; θ , η ) = Y − l ( x ) − θ ( T − m ( x ) ) − ϵ ′ , η = ( l , m ) m(W;\theta,\eta) = Y-l(x) - \theta (T - m(x)) - \epsilon',\eta = (l, m) m(W;θ,η)=Y−l(x)−θ(T−m(x))−ϵ′,η=(l,m)
∂ ∂ λ E [ W ; θ , η + λ Δ η ] ∣ λ = 0 = E [ − Δ l ( x ) + θ Δ m ( x ) ] = 0 \frac{\partial}{\partial\lambda}E[W;\theta,\eta + \lambda\Delta\eta]|_{\lambda=0} = E[-\Delta l(x) + \theta\Delta m(x)] = 0 ∂λ∂E[W;θ,η+λΔη]∣λ=0=E[−Δl(x)+θΔm(x)]=0
Example
模拟数据
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import dowhy.datasets, dowhy.plotter
rvar = 1 if np.random.uniform() > 0.2 else 0
is_linear = False # A non-linear dataset. Change to True to see results for a linear dataset.
data_dict = dowhy.datasets.xy_dataset(10000, effect=rvar,
num_common_causes=2,
is_linear=is_linear,
sd_error=0.2)
df = data_dict['df']
print(df.head())
dowhy.plotter.plot_treatment_outcome(df[data_dict["treatment_name"]], df[data_dict["outcome_name"]],
df[data_dict["time_val"]])
因果关系假设:
- 基于领域知识提出因果关系的假设,定义模型结构
from dowhy import CausalModel
model= CausalModel(
data=df,
treatment=data_dict["treatment_name"],
outcome=data_dict["outcome_name"],
common_causes=data_dict["common_causes_names"],
instruments=data_dict["instrument_names"])
model.view_model(layout="dot")
因果关系识别:
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
因果关系估计:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor
dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML",
control_value = 0,
treatment_value = 1,
confidence_intervals=False,
method_params={"init_params":{'model_y':GradientBoostingRegressor(),
'model_t': GradientBoostingRegressor(),
"model_final":LassoCV(fit_intercept=False),
'featurizer':PolynomialFeatures(degree=2, include_bias=True)},
"fit_params":{}})
print(dml_estimate)
因果关系反驳测试:
res_placebo=model.refute_estimate(identified_estimand, dml_estimate,
method_name="placebo_treatment_refuter", placebo_type="permute",
num_simulations=20)
print(res_placebo)