Scikit-LLMの解説 – S-Analysis

1. scikit-llmとは

Scikit-LLMは、拡張テキスト分析タスクを容易にするために、scikit-learnフレームワークに大規模言語モデル（LLMs）を統合するよう設計されたPythonパッケージです。

2. 特上

Scikit-LLMに関するさまざまなソースからの主なポイントは以下の通りです：

– Scikit-LLMは、ChatGPTのような強力な言語モデルをscikit-learnフレームワークにシームレスに統合し、拡張テキスト分析タスクの貴重なツールとしています1。

-強力な言語モデルとscikit-learnを組み合わせる能力により、テキストの理解と検討のための比類ないツールキットを提供するため、テキスト分析のゲームチェンジャーと説明されています2。

-このパッケージは、scikit-learnフレームワーク内で機能するように特別に設計されているため、scikit-learnに慣れている人はScikit-LLMで簡単に作業できます。

– Scikit-LLMは、機械学習の世界で際立ったオープンソースプロジェクトであり、ChatGPTのような大規模言語モデルの力と、人気のある機械学習ライブラリであるscikit-learnの柔軟性を巧妙に組み合わせています。

3. 実験（コード）

環境構築：Google Colabで実験しました。

%%capture

!pip install scikit-llm

OpenAIのAPI_KEYとORGANIZATION_IDが必要になります。

https://platform.openai.com/account/api-keys

https://platform.openai.com/account/org-settings

from skllm.config import SKLLMConfig

SKLLMConfig.set_openai_key(“<API_KEY>”)

SKLLMConfig.set_openai_org(“<ORGANIZATION_ID>”)

3.1 Zero Shot GPTClassifier

– 再訓練なしでテキスト分類を実行します。

– Scikit-LLMは、自動的にOpenAI APIをクエリし、レスポンスを通常のラベルのリストに変換します。

– ゼロショット分類器は、ラベル自体の構造に大きく依存しています。それは自然言語で表現され、説明的で自己説明的でなければなりません。

データセットを読み込みます。

from skllm import ZeroShotGPTClassifier

from skllm.datasets import get_classification_dataset

# get classification dataset from sklearn

X, y = get_classification_dataset()

データ確認

print(len(X))

print(X[1])

print(y)

The special effects in ‘Star Battles: Nebula Conflict’ were out of this world. I felt like I was actually in space. The storyline was incredibly engaging and left me wanting more. Excellent film.

[‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘neutral’, ‘neutral’, ‘neutral’, ‘neutral’, ‘neutral’, ‘neutral’, ‘neutral’, ‘neutral’, ‘neutral’, ‘neutral’]

モデル学習・推論します。

# defining the model

clf = ZeroShotGPTClassifier(openai_model=”gpt-3.5-turbo”)

# fitting the data

clf.fit(X, y)

# predicting the data

y_predict = clf.predict(X)

print(y_predict)

[‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘positive’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘negative’, ‘neutral’, ‘neutral’, ‘neutral’, ‘neutral’, ‘negative’, ‘negative’, ‘neutral’, ‘neutral’, ‘neutral’]

モデル評価

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

y_actual = y

labels = list(set(y))

print(labels)

# Calculating Accuracy

accuracy = accuracy_score(y_actual, y_predict)

print(f”Accuracy: {accuracy:.2f}”)

# Calculating Precision

precision = precision_score(y_actual, y_predict, labels=labels, average=’macro’) # ‘macro’ calculates metrics for each label, and finds their unweighted mean.

print(f”Precision: {precision:.2f}”)

# Calculating Recall

recall = recall_score(y_actual, y_predict, labels=labels, average=’macro’)

print(f”Recall: {recall:.2f}”)

# Calculating F1 Score

f1 = f1_score(y_actual, y_predict, labels=labels, average=’macro’)

print(f”F1 Score: {f1:.2f}”)

# Calculating Confusion Matrix

labels = [“positive”, “negative”, “neutral”]

conf_matrix = confusion_matrix(y_actual, y_predict, labels=labels)

print(“\nConfusion Matrix:”)

print(conf_matrix)

[‘positive’, ‘neutral’, ‘negative’]

Accuracy: 0.90

Precision: 0.92

Recall: 0.90

F1 Score: 0.90

Confusion Matrix:

[[10 0 0]

[ 0 10 0]

[ 0 3 7]]

ラベル付きデータを使用しない学習場合

from skllm import ZeroShotGPTClassifier

from skllm.datasets import get_classification_dataset

X, _ = get_classification_dataset()

clf = ZeroShotGPTClassifier()

clf.fit(None, [“positive”, “negative”, “neutral”])

y_predict = clf.predict(X)

3.2 Few-Shot Text Classification

– 少数ショット分類。これは、トレーニングサンプルがプロンプトに追加され、モデルに渡されることを意味します。

– レーニングセットは 1 つのプロンプトに収まる程度に小さくする必要があります (ラベルごとに最大 10 個のサンプルを推奨します)。

モデル学習・推論

from skllm import FewShotGPTClassifier

from skllm.datasets import get_classification_dataset

X, y = get_classification_dataset()

clf = FewShotGPTClassifier(openai_model=”gpt-3.5-turbo”)

clf.fit(X, y)

y_predict = clf.predict(X)

モデル評価

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

y_actual = y

labels = list(set(y))

print(labels)

# Calculating Accuracy

accuracy = accuracy_score(y_actual, y_predict)

print(f”Accuracy: {accuracy:.2f}”)

# Calculating Precision

precision = precision_score(y_actual, y_predict, labels=labels, average=’macro’)

print(f”Precision: {precision:.2f}”)

# Calculating Recall

recall = recall_score(y_actual, y_predict, labels=labels, average=’macro’)

print(f”Recall: {recall:.2f}”)

# Calculating F1 Score

f1 = f1_score(y_actual, y_predict, labels=labels, average=’macro’)

print(f”F1 Score: {f1:.2f}”)

# Calculating Confusion Matrix

labels = [“positive”, “negative”, “neutral”]

conf_matrix = confusion_matrix(y_actual, y_predict, labels=labels)

print(“\nConfusion Matrix:”)

print(conf_matrix)

[‘positive’, ‘neutral’, ‘negative’]

Accuracy: 0.97

Precision: 0.97

Recall: 0.97

F1 Score: 0.97

Confusion Matrix:

[[10 0 0]

[ 0 10 0]

[ 0 1 9]]

3.3 Multi-Label Zero-Shot Text Classification

データ読み込みます。

# importing Multi-Label zeroshot module and classification dataset

from skllm import MultiLabelZeroShotGPTClassifier

from skllm.datasets import get_multilabel_classification_dataset

# get classification dataset from sklearn

X, y = get_multilabel_classification_dataset()

データ確認

print(len(X))

print(X[1])

The delivery was super fast, but the product did not match the information provided on the website.

print(len(y))

print(y)

[[‘Quality’, ‘Packaging’], [‘Delivery’, ‘Product Information’], [‘Product Variety’, ‘Customer Support’], [‘Price’, ‘User Experience’], [‘Delivery’, ‘Packaging’], [‘Customer Support’, ‘Return Policy’], [‘Product Information’, ‘Return Policy’], [‘Service’, ‘Delivery’, ‘Quality’], [‘Price’, ‘Quality’, ‘User Experience’], [‘Product Information’, ‘Delivery’]]

モデル作成

# defining the model

clf = MultiLabelZeroShotGPTClassifier(max_labels=3)

# fitting the model

clf.fit(X, y)

# making predictions

y_predict = clf.predict(X)

モデル評価

from sklearn.preprocessing import MultiLabelBinarizer

from sklearn.metrics import precision_score, recall_score, f1_score

from sklearn.metrics import accuracy_score, multilabel_confusion_matrix

# Convert your lists into a binary matrix format using MultiLabelBinarizer

mlb = MultiLabelBinarizer()

y_bin = mlb.fit_transform(y)

y_predict_bin = mlb.transform(y_predict)

# Label-based metrics

precision = precision_score(y_bin, y_predict_bin, average=’micro’)

recall = recall_score(y_bin, y_predict_bin, average=’micro’)

f1 = f1_score(y_bin, y_predict_bin, average=’micro’)

print(“Precision: “, precision)

print(“Recall: “, recall)

print(“F1-Score: “, f1)

# Exact Match Ratio

exact_match_ratio = accuracy_score(y_bin, y_predict_bin)

# Average Accuracy

average_accuracy = (y_bin == y_predict_bin).mean()

# Multi-label Confusion Matrix

confusion_matrices = multilabel_confusion_matrix(y_bin, y_predict_bin)

print(“Exact Match Ratio: “, exact_match_ratio)

print(“Average Accuracy: “, average_accuracy)

print(“Confusion Matrices:”)

for label_index, matrix in enumerate(confusion_matrices):

print(f”Label: {mlb.classes_[label_index]}”)

print(matrix)

Precision: 1.0

Recall: 0.9545454545454546

F1-Score: 0.9767441860465117

Exact Match Ratio: 0.9

Average Accuracy: 0.99

Confusion Matrices:

Label: Customer Support

[[8 0]

[0 2]]

Label: Delivery

[[6 0]

[0 4]]

Label: Packaging

[[8 0]

[0 2]]

Label: Price

[[8 0]

[0 2]]

Label: Product Information

[[7 0]

[0 3]]

Label: Product Variety

[[9 0]

[0 1]]

Label: Quality

[[7 0]

[0 3]]

Label: Return Policy

[[8 0]

[0 2]]

Label: Service

[[9 0]

[1 0]]

Label: User Experience

[[8 0]

[0 2]]

3.4 Multi-Label Few-Shot Text Classification

Multi-Label Few-Shotの場合は、下記のコードになります。

from skllm.models.gpt.gpt_few_shot_clf import MultiLabelFewShotGPTClassifier

from skllm.datasets import get_multilabel_classification_dataset

X, y = get_multilabel_classification_dataset()

clf = MultiLabelFewShotGPTClassifier(max_labels=2, openai_model=”gpt-3.5-turbo”)

clf.fit(X, y)

y_predict = clf.predict(X)

4. まとめ

Scikit-LLMは、大規模言語モデル（LLMs）をscikit-learnフレームワークに統合し、テキスト分析タスクを強化するためのPythonパッケージです。この実験において、Scikit-LLMは高い精度でテキスト分類を実行する能力を示し、特に少数ショットテキスト分類の場合、精度、適合率、再現率、およびF1スコアが非常に高かったことが示されました。これにより、Scikit-LLMがテキスト分析タスクにおいて強力かつ柔軟なツールであることが確認されました。

担当者：KW
バンコクのタイ出身　データサイエンティスト
製造、マーケティング、財務、AI研究などの様々な業界にPSI生産管理、在庫予測・最適化分析、顧客ロイヤルティ分析、センチメント分析、SaaS、PaaS、IaaS、AI at the Edge の環境構築などのスペシャリスト

参照：

– github：https://github.com/iryna-kondr/scikit-llm

目次

1. scikit-llmとは

2. 特上

3. 実験（コード）

3.1 Zero Shot GPTClassifier

3.2 Few-Shot Text Classification

3.3 Multi-Label Zero-Shot Text Classification

3.4 Multi-Label Few-Shot Text Classification

4. まとめ