分类、聚类与回归的评价指标

发布于:2025-02-11 ⋅ 阅读:(60) ⋅ 点赞:(0)

cross_validatecross_val_score中,参数scoring,与分类、聚类和回归算法的评价指标有关。

3.4.3. The scoring parameter: defining model evaluation rules

For the most common use cases, you can designate a scorer object with the scoring parameter via a string name; the table below shows all possible values. All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as ‘neg_mean_squared_error’ which return the negated value of the metric
对于最常见的用例,您可以通过字符串名称使用 scoring 参数指定一个评分对象;下表显示了所有可能的值。所有评分对象都遵循这样的约定:返回值越高越好。因此,像 metrics.mean_squared_error 这样衡量模型与数据之间距离的指标,会以 ‘neg_mean_squared_error’ 的形式提供,返回该指标的负值。

1、分类

字符串 函数 公式
accuracy metrics.accuracy_score a c c u r a c y ( y , y ^ ) = 1 n ∑ i = 0 n − 1 1 ( y ^ i = y i ) accuracy(y,\hat{y}) = \frac{1}{n}\sum\limits_{i=0}^{n-1}1(\hat{y}_i=y_i) accuracy(y,y^)=n1i=0n11(y^i=yi)
balanced_accuracy metrics.balanced_accuracy_score b a l a n c e d − a c c u r a c y = 1 2 ( T P T P + F N + T N T N + F P ) balanced-accuracy=\frac{1}{2}(\frac{TP}{TP+FN}+\frac{TN}{TN+FP}) balancedaccuracy=21(TP+FNTP+TN+FPTN)
top_k_accuracy metrics.top_k_accuracy_score t o p − k    a c c u r a c y ( y , y ^ ) = 1 n ∑ i = 0 n − 1 ∑ j = 1 k 1 ( f ^ i , j = y i ) top-k\ \ accuracy(y,\hat{y}) = \frac{1}{n}\sum\limits_{i=0}^{n-1}\sum\limits_{j=1}^{k}1(\hat{f}_{i,j}=y_i) topk  accuracy(y,y^)=n1i=0n1j=1k1(f^i,j=yi)
average_precision metrics.average_precision_score A P = ∑ n ( R n − R n − 1 ) P n AP = \sum_{n}(R_n-R_{n-1})P_n AP=n(RnRn1)Pn
neg_brier_score metrics.brier_score_loss B S = 1 n ∑ i = 0 n − 1 ( y i − p i ) 2 = 1 n ∑ i = 0 n − 1 ( y i − p r e d i c t _ p r o b a ( y = 1 ) ) 2 BS= \frac{1}{n}\sum\limits_{i=0}^{n-1}(y_i-p_i)^2=\frac{1}{n}\sum\limits_{i=0}^{n-1}(y_i-predict\_{proba}(y=1))^2 BS=n1i=0n1(yipi)2=n1i=0n1(yipredict_proba(y=1))2
f1 metrics.f1_score

F 1 = 2 × T P 2 × T P + F P + F N F1=\frac{2\times TP}{2\times TP+FP+FN} F1=2×TP+FP+FN2×TP

(average{‘micro’, ‘macro’, ‘samples’, ‘weighted’} or None, default=’macro’)
neg_log_loss metrics.log_loss

L l o g ( y , p ) = − l o g P r ( y ∣ p ) = − ( y l o g ( p ) + ( 1 − y ) l o g ( 1 − p ) ) L_{log}(y,p)=-logPr(y|p)=-(ylog(p)+(1-y)log(1-p)) Llog(y,p)=logPr(yp)=(ylog(p)+(1y)log(1p))

L l o g ( Y , P ) = − l o g P r ( Y ∣ P ) = − 1 N ∑ i = 0 N − 1 ∑ k = 0 K − 1 y i , k l o g p i , k L_{log}(Y,P)=-logPr(Y|P)=-\frac{1}{N}\sum\limits_{i=0}^{N-1}\sum\limits_{k=0}^{K-1}y_{i,k}logp_{i,k} Llog(Y,P)=logPr(YP)=N1i=0N1k=0K1yi,klogpi,k
precision metrics.precision_score P = T P T P + F P P=\frac{TP}{TP+FP} P=TP+FPTP
recall metrics.recall_score R = T P T P + F N R=\frac{TP}{TP+FN} R=TP+FNTP
jaccard metrics.jaccard_score J ( y , y ^ ) = y ⋂ y ^ y ⋃ y ^ J(y,\hat{y})=\frac{y\bigcap\hat{y}}{y\bigcup\hat{y}} J(y,y^)=yy^yy^
roc_auc metrics.roc_auc_score

Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores

(average{‘micro’, ‘macro’, ‘samples’, ‘weighted’} or None, default=’macro’)
d2_log_loss_score metrics.d2_log_loss_score D 2 ( y , y ^ ) = 1 − d e v ( y , y ^ ) d e v ( y , y n u l l ) D^2(y,\hat{y})=1-\frac{dev(y,\hat{y})}{dev(y,y_{null})} D2(y,y^)=1dev(y,ynull)dev(y,y^)

2、聚类

字符串 函数 公式
mutual_info_score metrics.mutual_info_score M I ( U , V ) = ∑ i = 0 ∣ U ∣ ∑ j = 0 ∣ V ∣ U i ⋂ V j N l o g N ∣ U i ⋂ V j ∣ ∣ U i ∣ ∣ V j ∣ MI(U,V)= \sum\limits_{i=0}^{|U|}\sum\limits_{j=0}^{|V|}\frac{U_i\bigcap V_j}{N}log\frac{N|U_i\bigcap V_j|}{\mid U_i\mid\mid V_j \mid} MI(U,V)=i=0Uj=0VNUiVjlogUi∣∣VjNUiVj
adjusted_mutual_info_score metrics.adjusted_mutual_info_score A M I ( U , V ) = M I ( U , V ) − E ( M I ( U , V ) ) a v g ( H ( U ) , H ( V ) ) − E ( M I ( U , V ) ) AMI(U,V)= \frac{MI(U,V)-E(MI(U,V))}{avg(H(U),H(V))-E(MI(U,V))} AMI(U,V)=avg(H(U),H(V))E(MI(U,V))MI(U,V)E(MI(U,V))
normalized_mutual_info_score metrics.normalized_mutual_info_score N M I ( U , V ) = 2 × I ( U ; V ) H ( U ) + H ( V ) NMI(U,V)= \frac{2\times I(U;V)}{H(U)+H(V)} NMI(U,V)=H(U)+H(V)2×I(U;V)
rand_score metrics.rand_score

R I = a + b C n 2 RI= \frac{a+b}{C_n^2} RI=Cn2a+b

a表示在实际和聚类结果中都是同类别的样本点对数

b表示实际和聚类结果中都不是同类别的样本点对数

adjusted_rand_score metrics.adjusted_rand_score A R I = R I − E ( R I ) m a x ( R I ) − E ( R I ) ARI= \frac{RI-E(RI)}{max(RI)-E(RI)} ARI=max(RI)E(RI)RIE(RI)
completeness_score metrics.completeness_score c = 1 − H ( K ∣ C ) H ( K ) c=1- \frac{H(K|C)}{H(K)} c=1H(K)H(KC)
homogeneity_score metrics.homogeneity_score h = 1 − H ( C ∣ K ) H ( C ) h=1- \frac{H(C|K)}{H(C)} h=1H(C)H(CK)
v_measure_score metrics.v_measure_score v = ( 1 + β ) × h o m o g e n e i t y × c o m p l e t e n e s s β × h o m o g e n e i t y + c o m p l e t e n e s s v=\frac{(1+\beta)\times homogeneity\times completeness}{\beta\times homogeneity+completeness} v=β×homogeneity+completeness(1+β)×homogeneity×completeness
fowlkes_mallows_score metrics.fowlkes_mallows_score F M I = T P ( T P + F P ) × ( T P + F N ) FMI=\frac{TP}{\sqrt{(TP+FP)\times(TP+FN)} } FMI=(TP+FP)×(TP+FN) TP

3、回归

字符串 函数 公式
explained_variance metrics.explained_variance_score e x p l a i n e d _ v a r i a n c e ( y , y ^ ) = 1 − V a r { y − y ^ } V a r { y } explained\_variance(y,\hat{y})=1-\frac {Var\{y-\hat{y}\}}{Var\{y\}} explained_variance(y,y^)=1Var{y}Var{yy^}
neg_max_error metrics.max_error M a x E r r o r ( y , y ^ ) = m a x ( ∣ y i − y i ^ ∣ ) MaxError(y,\hat{y})=max(\mid y_i-\hat{y_i}\mid) MaxError(y,y^)=max(yiyi^)
neg_mean_absolute_error metrics.mean_absolute_error M A E ( y , y ^ ) = 1 n ∑ i = 0 n − 1 ∣ y i − y i ^ ∣ MAE(y,\hat{y})=\frac{1}{n}\sum\limits_{i=0}^{n-1}\mid y_i-\hat{y_i}\mid MAE(y,y^)=n1i=0n1yiyi^
neg_mean_squared_error metrics.mean_squared_error M S E ( y , y ^ ) = 1 n ∑ i = 0 n − 1 ( y i − y i ^ ) 2 MSE(y,\hat{y})=\frac{1}{n}\sum\limits_{i=0}^{n-1}( y_i-\hat{y_i})^2 MSE(y,y^)=n1i=0n1(yiyi^)2
neg_root_mean_squared_error metrics.root_mean_squared_error R M S E ( y , y ^ ) = 1 n ∑ i = 0 n − 1 ( y i − y i ^ ) 2 RMSE(y,\hat{y})=\sqrt{\frac{1}{n}\sum\limits_{i=0}^{n-1}( y_i-\hat{y_i})^2} RMSE(y,y^)=n1i=0n1(yiyi^)2
neg_root_mean_squared_log_error metrics.root_mean_squared_log_error M S L E ( y , y ^ ) = 1 n ∑ i = 0 n − 1 ( l o g e ( 1 + y i ) − l o g e ( 1 + y ^ i ) ) 2 MSLE(y,\hat{y})=\frac{1}{n}\sum\limits_{i=0}^{n-1}( log_e(1+y_i)-log_e(1+\hat y_i))^2 MSLE(y,y^)=n1i=0n1(loge(1+yi)loge(1+y^i))2
neg_median_absolute_error metrics.median_absolute_error M a d A E ( y , y ^ ) = m e d i a n ( ∣ y 1 − y ^ 1 ∣ , . . . , ∣ y n − y ^ n ∣ ) MadAE(y,\hat{y})=median(\mid y_1-\hat y_1\mid,...,\mid y_n-\hat y_n\mid) MadAE(y,y^)=median(y1y^1,...,yny^n)
r2 metrics.r2_score R 2 ( y , y ^ ) = 1 − ∑ i = 1 n ( y i − y ^ i ) 2 ∑ i = 1 n ( y i − y ‾ i ) 2 R^2(y,\hat{y})=1-\frac{\sum\limits_{i=1}^{n}(y_i-\hat y_i)^2}{\sum\limits_{i=1}^{n}(y_i-\overline y_i)^2} R2(y,y^)=1i=1n(yiyi)2i=1n(yiy^i)2

neg_mean_poisson_deviance

neg_mean_gamma_deviance

metrics.mean_poisson_deviance

metrics.mean_gamma_deviance

D ( y , y ^ ) = 1 n ∑ i = 0 n − 1 { ( y i − y i ^ ) 2 , for p=0(Normal) 2 ( y i l o g ( y i / y ^ i ) + y ^ i − y i ) , for p=1(Poisson) 2 ( l o g ( y ^ i / y i ) + y i / y ^ i − 1 ) , for p=2(Gamma) 2 ( m a x ( y i , 0 ) 2 − p ( 1 − p ) ( 2 − p ) − y i y ^ i 1 − p 1 − p + y ^ i 2 − p 2 − p ) , for p=2(otherwise) D(y,\hat{y})=\frac{1}{n}\sum\limits_{i=0}^{n-1}\begin{cases}( y_i-\hat{y_i})^2,& \text{for p=0(Normal)}\\2(y_ilog(y_i/\hat y_i)+\hat y_i-y_i),& \text{for p=1(Poisson)}\\2(log(\hat y_i/y_i)+y_i/\hat y_i-1),& \text{for p=2(Gamma)}\\2(\frac{max(y_i,0)^{2-p}}{(1-p)(2-p)}-\frac{y_i\hat y_i^{1-p}}{1-p}+\frac{\hat y_i^{2-p}}{2-p}),& \text{for p=2(otherwise)}\end{cases} D(y,y^)=n1i=0n1 (yiyi^)2,2(yilog(yi/y^i)+y^iyi),2(log(y^i/yi)+yi/y^i1),2((1p)(2p)max(yi,0)2p1pyiy^i1p+2py^i2p),for p=0(Normal)for p=1(Poisson)for p=2(Gamma)for p=2(otherwise)
neg_mean_absolute_percentage_error metrics.mean_absolute_percentage_error M A P E ( y , y ^ ) = 1 n ∑ i = 0 n − 1 ∣ y i − y ^ i ∣ m a x ( ϵ , ∣ y i ∣ ) MAPE(y,\hat{y})=\frac{1}{n}\sum\limits_{i=0}^{n-1}\frac{\mid y_i-\hat y_i\mid}{max(\epsilon,\mid y_i\mid)} MAPE(y,y^)=n1i=0n1max(ϵ,yi)yiy^i
d2_absolute_error_score metrics.d2_absolute_error_score D 2 ( y , y ^ ) = 1 − ∑ i = 1 n ∣ y i − y ^ i ∣ ∑ i = 1 n ∣ y i − y ‾ i ∣ D^2(y,\hat{y})=1-\frac{\sum\limits_{i=1}^{n}\mid y_i-\hat y_i\mid}{\sum\limits_{i=1}^{n}\mid y_i-\overline y_i\mid} D2(y,y^)=1i=1nyiyii=1nyiy^i