python实现LASSO核心基因(特征)筛选、重要性排序

发布于:2025-07-09 ⋅ 阅读:(27) ⋅ 点赞:(0)

本研究使用Lasso回归进行特征选择,通过模型绘制了系数路径图(图1)和交叉验证误差曲线(图2)。结果显示最佳正则化参数lambda_min为4.328761,1-SE准则下的lambda_1se为23.101297。特征重要性分析(图3-4)表明,Mn、Co、Ga等12种元素对模型贡献显著,其中lambda_min选择了12个特征,lambda_1se筛选出9个关键特征。代码实现了正则化路径可视化、交叉验证和特征选择全流程。 

完整代码下载链接:python实现LASSO核心基因(特征)筛选、重要性排序本研究使用Lasso回归进行特征选择,通过模型绘制了系数路径图(图1)和交叉验证误差曲线(图2)。结果显示最佳正则化参数lambdamin为4.328761,1-SE准则下的lambda1se为23.101297。特征重要性分析(图3-4)表明,Mn、Co、Ga等12种元素对模型https://mbd.pub/o/bread/YZWVlJ9saA==

图1 Lasso系数路径图
# 图1: Lasso系数路径图

for C in C_vals:
    model = LogisticRegression(penalty='l1', solver='saga', multi_class='multinomial', C=C, max_iter=10000)
    model.fit(X_scaled, y)
    coef = model.coef_  # Shape: (n_classes, n_features)
    max_abs_coef = np.max(np.abs(coef), axis=0)  # Max absolute coefficient per feature

coef_paths = np.array(coef_paths)  # Shape: (100, n_features)

# Plot coefficient paths
fig, ax = plt.subplots(figsize=(10, 6))
log_lambda = np.log(lambda_vals)

ax.set_xlabel('log(lambda)')
ax.set_ylabel('Maximum Absolute Coefficient')
ax.set_title('Lasso Coefficient Path Plot')
ax.legend(loc='upper right')

plt.show()
交叉验证误差曲线
# 图2: 交叉验证误差曲线

param_grid = {'C': C_vals}
model = LogisticRegression(penalty='l1', solver='saga', multi_class='multinomial', max_iter=10000)

# Extract mean and std of log loss
mean_neg_log_loss = grid_cv.cv_results_['mean_test_score']
std_neg_log_loss = grid_cv.cv_results_['std_test_score']
mean_log_loss = -mean_neg_log_loss  # Convert to positive log loss
std_log_loss = std_neg_log_loss

# Find lambda.min and lambda.1se
idx_min = np.argmin(mean_log_loss)
lambda_min = lambda_vals[idx_min]
threshold = mean_log_loss[idx_min] + std_log_loss[idx_min]
candidates = np.where(mean_log_loss <= threshold)[0]
idx_1se = candidates[-1] if len(candidates) > 0 else idx_min
lambda_1se = lambda_vals[idx_1se]

# Plot CV error curve
fig, ax = plt.subplots(figsize=(10, 6))

ax.axvline(np.log(lambda_min), color='blue', linestyle='--', label='lambda.min')
ax.axvline(np.log(lambda_1se), color='green', linestyle='--', label='lambda.1se')
ax.set_xlabel('log(lambda)')
ax.set_ylabel('Cross-Validation Log Loss')
ax.set_title('Cross-Validation Error Curve')
ax.legend()

plt.show()
特征基因相关系数排序
# 图3: 带方向的特征重要性

plt.figure(figsize=(10, 6))
colors = ['red' if c > 0 else 'blue' for c in sorted_coefs]  # Red for positive, blue for negative
plt.xlabel('Coefficient Value')
plt.title('Directional Feature Importance Plot')
plt.show()
特征基因重要性排序
# 图4: 特征绝对重要性
plt.figure(figsize=(10, 6))
cmap = plt.get_cmap('viridis')
norm = plt.Normalize(vmin=min(sorted_absolute_coefs), vmax=max(sorted_absolute_coefs))
colors = cmap(norm(sorted_absolute_coefs))
plt.xlabel('Absolute Coefficient Value')
plt.title('Absolute Feature Importance Plot')
plt.show()
# Output best alpha (lambda_min) and 1-SE alpha (lambda_1se)
print(f"Best alpha (lambda_min): {lambda_min:.6f}")
print(f"1-SE alpha (lambda_1se): {lambda_1se:.6f}")
Best alpha (lambda_min): 4.328761
1-SE alpha (lambda_1se): 23.101297
print(f"Features selected with lambda_min: {features_min}")
print(f"Features selected with lambda_1se: {features_1se}")
Features selected with lambda_min: ['Mn', 'Fe', 'Co', 'Cu', 'Ga', 'Ge', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Pb']
Features selected with lambda_1se: ['Mn', 'Co', 'Ga', 'Ge', 'Ag', 'Cd', 'Sn', 'Sb', 'Pb']

完整代码下载链接:python实现LASSO核心基因(特征)筛选、重要性排序


网站公告

今日签到

点亮在社区的每一天
去签到