[machine learning] 모델 평가

머신러닝

[machine learning] 모델 평가

Jasonify97 2023. 3. 10. 21:51

1. 변수 중요도

모델이 트리기반일때 변수중요도 확인방법 - DecisionTree

# 변수 중요도 plot
def plot_feature_importance(importance, names, topn = 'all'):
    feature_importance = np.array(importance)
    feature_names = np.array(names)

    data={'feature_names':feature_names,'feature_importance':feature_importance}
    fi_temp = pd.DataFrame(data)

    fi_temp.sort_values(by=['feature_importance'], ascending=False,inplace=True)
    fi_temp.reset_index(drop=True, inplace = True)

    if topn == 'all' :
        fi_df = fi_temp.copy()
    else :
        fi_df = fi_temp.iloc[:topn]

    plt.figure(figsize=(10,8))
    sns.barplot(x='feature_importance', y='feature_names', data = fi_df)

    plt.xlabel('importance')
    plt.ylabel('feature names')
    plt.grid()

    return fi_df

result = plot_feature_importance(model.feature_importances_, list(x))

result

모델이 트리기반일때 변수중요도 확인방법 - RandomForest

Shapley Value 시각화 1

위 표에서 214의 값은 다른 값들과 달리 이상치에 가까운 값으로 보인다.
만약 위 같은 상황에서 고객이 왜 저렇게 예측했는지 물어본다면 shap의 방법을 사용해서 왜 214가 저 값을 가지게 되었는지, 저 값을 가지게 되는 요소들에 대한것을 설명할 수 있다.

위 방법들과 다르게 shap은 분석단위,행 의 한건,한건에 대해 볼 수 있는 방법이다.

코드

# 구글 colap의 경우 켤때 마다 shap을 설치해줘야함
!pip install shap

from sklearn.ensemble import RandomForestClassifier
import shap


# 1. 모델 만들기
model1 = RandomForestClassifier()
model1.fit(x_train, y_train)
pred = model.predict(x_val)


#2. shap 
explainer = shap.TreeExplainer(model) # 선언
shap_values = explainer.shap_values(x_val) # shap 값 뽑기

# 3.

shap value 시각화 2

shap value들의 기여도값들을 한눈에 볼 수 있는 시각화 그래프

위 그래프로 알 수 있는 것

폭이 좁을 수록 영향을 덜 받는다 - 변수 중요도를 파악할 수 있다.
rm같은 경우 양의 부분은 관여를 크게 하지만, 음의 부분은 관여를 크게 안한다.

shap_values1 = explainer1.shap_values(x_train)
shap.summary_plot(shap_values1, x_train)

shap value 시각화 3

shap.initjs() # JS 시각화 라이브러리 --> colab에서는 모든 셀에 포함시켜야 함.
shap.force_plot(explainer1.expected_value, shap_values1, x_train)