该项目是在rjk-git/self-cognition-instuctions: A dataset template for guiding chat-models to self-cognition, including information about the model’s identity, capabilities, usage, limitations, etc. (github.com)基础上修改的。

将调用的openai api改为了ollama,模型适用范围更广。

1、自我认知数据集格式

具体详见self-cognition-instuctions项目。

2 安装ollama

采用了api调用方法。

首先下载ollama(linux)

curl -fsSL https://ollama.com/install.sh | sh

 通过ollama下载你想要的模型,这里以qwen1.5-32b为例。

ollama pull qwen:32b

需要开启ollama的服务

ollama serve

安装ollama包,为后面调用api提供支持。

pip install ollama

3修改项目中的generate.py代码。

原代码:

import traceback

import openai
import yaml
from template.prompts import prompt_template
from template.questions import questions
from tqdm import tqdm

CONFIG = yaml.load(open("./config.yml", "r", encoding="utf-8"), Loader=yaml.FullLoader)

openai.api_base = CONFIG["openai"]["api_url"]
openai.api_key = CONFIG["openai"]["api_key"]


def main():
    samples = []
    max_samples = CONFIG["data"]["num_samples"]
    pbar = tqdm(total=max_samples, desc="Generating self cognition data")

    while True:
        exit_flag = False
        for question in questions:
            prompt = prompt_template.format(
                name=CONFIG["about"]["name"],
                company=CONFIG["about"]["company"],
                version=CONFIG["about"]["version"],
                date=CONFIG["about"]["date"],
                description=CONFIG["about"]["description"],
                ability=CONFIG["about"]["ability"],
                limitation=CONFIG["about"]["limitation"],
                author=CONFIG["about"]["author"],
                user_input=question,
                role=CONFIG["about"]["role"],
            )
            try:
                chat_completion = openai.ChatCompletion.create(
                    model=CONFIG["openai"]["model"],
                    messages=[{"role": "user", "content": prompt}],
                )
                sample = chat_completion.choices[0].message.content
                json_sample = eval(sample)
                samples.append(json_sample)

修改后的代码:

import ollama #引入ollama
import os
import yaml
import json
import time
from tqdm import tqdm
import traceback

from template.prompts import prompt_template
from template.questions import questions


CONFIG = yaml.load(open("./config.yml", "r", encoding="utf-8"), Loader=yaml.FullLoader)

def main():
    samples = []
    max_samples = CONFIG["data"]["num_samples"]
    pbar = tqdm(total=max_samples, desc="Generating self cognition data")

    while True:
        exit_flag = False
        for question in questions:
            prompt = prompt_template.format(
                name=CONFIG["about"]["name"],
                company=CONFIG["about"]["company"],
                version=CONFIG["about"]["version"],
                date=CONFIG["about"]["date"],
                description=CONFIG["about"]["description"],
                ability=CONFIG["about"]["ability"],
                limitation=CONFIG["about"]["limitation"],
                author=CONFIG["about"]["author"],
                user_input=question,
                role=CONFIG["about"]["role"],
            )
            try:
                response = ollama.chat(model='qwen:32b', messages=[{"role": "user", "content": prompt}]) #调用qwen1.5-32b
                sample = response['message']['content']
                #print(sample)
                sample = sample.replace("```json", "").replace("```", "") #由于生成的数据经常出现```json、```导致报错,这里做一步处理。
                #print(sample)
                json_sample = eval(sample)
                samples.append(json_sample)

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐