๐Ÿ’ก WIDA/DACON ๋ถ„๋ฅ˜-ํšŒ๊ท€

[DACON/์ฐธ๊ณ ์ž๋ฃŒ] SVM ์ฐธ๊ณ ์ž๋ฃŒ

๋ ค์šฐ 2023. 4. 26. 21:32

๋˜ ๋‹ค๋ฅธ ๋ชจ๋ธ, SVM

from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import pandas as pd

# svm ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 7๊ฐ€์ง€ ๋ชจ๋ธ ์ค‘ svm.SVC๋กœ ๋ถˆ๋Ÿฌ์˜ด
svm_clf = svm.SVC(kernel="linear")

# training dataset ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
train_data = pd.read_csv("C:/Users/1ayou/PycharmProjects/dacon_astronomy/dataset/train.csv")

X = train_data.iloc[:, 2:]
y = train_data.iloc[:, 1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=11)

# ๋ชจ๋ธ ํ•™์Šต, ๊ทผ๋ฐ ์™œ ์•ˆ๋Œ์•„๊ฐˆ๊นŒ?
svm_clf.fit(X_train, y_train)

# ๋ชจ๋ธ ์˜ˆ์ธก
svm_pred = svm_clf.predict(X_test)

# ์ •ํ™•๋„ ์ถœ๋ ฅ
print(accuracy_score(y_test, svm_pred))
  • svm ๋ชจ๋ธ์˜ ์žฅ์ 
    • ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„
    • ์ƒ˜ํ”Œ์˜ ์ˆ˜๋ณด๋‹ค ์ฐจ์› ์ˆ˜๊ฐ€ ๋” ํฐ ๊ฒฝ์šฐ์—๋„ ํšจ๊ณผ์ ์ž„
    • support vectors๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๊ฒฐ์ •ํ•จ์ˆ˜์—์„œ ํŠธ๋ ˆ์ด๋‹ ํฌ์ธํŠธ์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ์ด ์‚ฌ์šฉ๋จ → ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ด ์ข‹์Œ
    • ๋‹ค์–‘ํ•œ ์ปค๋„ ํ•จ์ˆ˜๋“ค์€ ๊ฒฐ์ •ํ•จ์ˆ˜๋ฅผ ๊ตฌ์ฒดํ™”ํ•  ์ˆ˜ ์žˆ์Œ, ์ผ๋ฐ˜์ ์ธ ์ปค๋„๋“ค์€ ์ œ๊ณต๋˜๋‚˜ ์ง์ ‘ ์ปค๋„๋“ค์„ ์ œ์ž‘ํ•  ์ˆ˜ ์žˆ์Œ
  • svm ๋ชจ๋ธ์˜ ๋‹จ์ 
    • ํ”ผ์ณ์˜ ์ˆ˜๊ฐ€ ์ƒ˜ํ”Œ์˜ ์ˆ˜๋ณด๋‹ค ํ›จ์”ฌ ๋” ํฌ๋‹ค๋ฉด ์ปค๋„ํ•จ์ˆ˜๋ฅผ ๊ณ ๋ฅด๊ณ  ์ •๊ทœํ•˜์—ฌ ์˜ค๋ฒ„ํ”ผํŒ…์„ ํ”ผํ•˜๋Š” ๊ฒƒ์ด ํ•„์ˆ˜์ ์ž„
    • svm์€ ํ™•๋ฅ ์˜ ์ถ”์ •์น˜๋ฅผ ๋ฐ”๋กœ ์ œ๊ณตํ•˜์ง€ ์•Š์•„, ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ๋น„์šฉ์ด ๋งŽ์ด ํ•„์š”ํ•œ five-fold cross-validation์„ ์ด์šฉ

SVM์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๋ถ„์„

  • SVM ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ๊ฐ™์€ ๊ฒƒ๋“ค์ด ์žˆ๋‹ค.
  • SVM ๋ชจ๋ธ๋“ค ์ค‘ multi-class classification์ด ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ๋“ค์ด ์กด์žฌํ•จ
    • ovo(One-Versus-One)์ด๋ผ๊ณ  ํ•˜๋ฉฐ, ํ•˜๋‚˜์˜ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์—ฌ๋Ÿฌ ํด๋ž˜์Šค ์ค‘ ๋‘๊ฐ€์ง€ ํด๋ž˜์Šค์— ๋Œ€ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต, ์ด๋ ‡๊ฒŒ ์—ฌ๋Ÿฌ ๋ถ„๋ฅ˜๊ธฐ๋“ค์ด ๊ฐ๊ฐ์˜ ๋‘ ํด๋ž˜์Šค์— ๋Œ€ํ•ด์„œ ํ•™์Šตํ•œ ํ›„ ๊ฐ ๋ถ„๋ฅ˜๊ธฐ์˜ ๊ฒฐ๊ณผ๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ์ตœ์ข… ๋ถ„๋ฅ˜๋ฅผ ๊ฒฐ์ •
    • ovr(One-Versus-Rest) ๋ถ„๋ฅ˜๊ธฐ๋Š” ํ•˜๋‚˜์˜ ํด๋ž˜์Šค๋ฅผ ๋‹ค๋ฅธ ๋ชจ๋“  ํด๋ž˜์Šค์™€ ๋ถ„๋ฅ˜ํ•˜๋„๋ก ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹
  • svm.LinearSVC
    • binary, multi class classification์ด ๋ชจ๋‘ ๊ฐ€๋Šฅํ•จ
    • kernel ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์—†์œผ๋ฉฐ, ์„ ํ˜•์œผ๋กœ ์ถ”์ •ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ
    • fit ํ•  ๋•Œ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 2๊ฐœ๊ฐ€ ๋“ค์–ด๊ฐ(X_train, y_train)
    • ovr ๋ฐฉ์‹์„ ์ด์šฉํ•ด multi-class ๋ถ„๋ฅ˜ ์‹œํ–‰
      • multi_class = “cramer_singer”

  • svm.LinearSVR
    • Linear Support Vector Regression
  • svm.NuSVC
    • Nu-Support Vector Classification
    • binary, multi class classification์ด ๋ชจ๋‘ ๊ฐ€๋Šฅํ•จ
    • svm.SVC์™€ ์œ ์‚ฌํ•œ ๋ฐฉ์‹์ด์ง€๋งŒ ์ˆ˜ํ•™๊ณต์‹๊ณผ ๊ตฌ์„ฑ๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์ด ์กฐ๊ธˆ์”ฉ ๋‹ค๋ฆ„
    • fit ํ•  ๋•Œ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 2๊ฐœ๊ฐ€ ๋“ค์–ด๊ฐ(X_train, y_train)
    • ovo ๋ฐฉ์‹์„ ์ด์šฉํ•ด multi-class ๋ถ„๋ฅ˜ ์‹œํ–‰
  • svm.NuSVR
    • Nu-Support Vector Regression
  • svm.OneClassSVM
    • Unsupervised Outlier Detection
  • svm.SVC
    • C-Support Vector Classification
    • binary, multi class classification์ด ๋ชจ๋‘ ๊ฐ€๋Šฅํ•จ
    • fit ํ•  ๋•Œ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 2๊ฐœ๊ฐ€ ๋“ค์–ด๊ฐ(X_train, y_train)
    • ovo ๋ฐฉ์‹์„ ์ด์šฉํ•ด multi-class ๋ถ„๋ฅ˜ ์‹œํ–‰
  • svm.SVR
    • Epsilon-Support Vector Regression
  • svm.li_min_c
    • Return the lowest bound for C
  • svm ๋ชจ๋ธ์˜ ๊ฒฐ์ •ํ•จ์ˆ˜๋Š” ํŠธ๋ ˆ์ด๋‹ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ์— ์˜ํ•ด ๊ฒฐ์ •๋˜๋Š”๋ฐ, ์ด๋ฅผ support vectors๋ผ๊ณ  ๋ถ€๋ฆ„
  • ์ด support vectors์˜ ๋ช‡๋ช‡ ์„ฑ์งˆ๋“ค์€ ์„ธ๊ฐ€์ง€ attribute์„ ์ด์šฉํ•ด ๊ตฌํ•  ์ˆ˜ ์žˆ์Œ
    • suppport_vectors_ , support_ , n_support_

์ถœ์ฒ˜