dify本地上传知识库(动态更新)并指定切片方式
【代码】dify本地上传知识库(动态更新)并指定切片方式。
·
import requests
import pymysql
import pandas as pd
from datetime import datetime, timedelta
import yaml
import json
from tidbCon import *
"""
功能:1.从文本上传dify知识库(可以实现动态更新),并指定切片方式(默认是\n,且不指定separator时或在dify配置页面再配置切片时,大概率无效--dify1.7.2的bug)。
2.从某库中查询某些字段作为知识库。
"""
current_time = datetime.now()
time_24_hours_ago = current_time - timedelta(hours=24)
# 将时间设置为当天的凌晨(00:00:00)
midnight_of_yesterday = time_24_hours_ago.replace(hour=0, minute=0, second=0, microsecond=0)
current_time = current_time.strftime('%Y-%m-%d %H:%M:%S')
print("\n\n**********当前时间/更新时间**********\n", current_time)
print("**当前时间-3H**\n", time_24_hours_ago)
edate = current_time
edate = '2026-01-07 20:03:00'
# sdate = time_3_hours_ago
sdate = '2026-01-07 05:41:19'
with open('JZ24hconfig.yml', 'r', encoding='utf-8') as f:
dbresult = yaml.load(f, Loader=yaml.FullLoader)
api_key = dbresult['api_key'] #dify知识库的key
dataset_id = dbresult['dataset_id'] #dify知识库的编号
tidb_host = dbresult['tidb_host'] if dbresult['tidb_host'] else ''
tidb_port = dbresult['tidb_port'] if dbresult['tidb_port'] else 0
tidb_user = dbresult['tidb_user'] if dbresult['tidb_user'] else ''
tidb_password = dbresult['tidb_password'] if dbresult['tidb_password'] else ''
tidb_database = dbresult['tidb_dbname'] if dbresult['tidb_dbname'] else ''
tidb_table_name = dbresult['tidb_table'] if dbresult['tidb_table'] else ''
select_field = dbresult['select_field']
# 连接到TiDB
tidb_connection = pymysql.connect(host=tidb_host, port=tidb_port, user=tidb_user, password=tidb_password,
database=tidb_database)
query = f"""
select {select_field} from {tidb_table_name} WHERE sj >= STR_TO_DATE('{sdate}', '%Y-%m-%d %H:%i:%s')"""
# origdata = selectidb(query, tidb_connection)
# 执行查询并获取结果
df = pd.read_sql_query(query, tidb_connection)
# 关闭数据库连接
tidb_connection.close()
# 将 DataFrame 转换为指定格式的列表
formatted_data = df.to_dict(orient='records')
formatted_data = [json.dumps(item, ensure_ascii=False) for item in formatted_data]
# first_data = formatted_data[0]
# rest_data = formatted_data[1:]
result = '***'.join(formatted_data)
print(result)
# 打印结果
print('----------------------')
post_url = f"http://127.0.0.0/v1/datasets/{dataset_id}/document/create-by-text"
data = {"name": "text22222222.txt","text": result,"indexing_technique": "high_quality","process_rule": {"mode": "custom","rules": {"pre_processing_rules":[{"id":"remove_extra_spaces","enabled":True},{"id":"remove_urls_emails","enabled":False}],"segmentation":{"separator":"***","max_tokens":4000}}}}
# separator":"***"是文档分割方式
headers = {"Content-Type": "application/json", 'Authorization': f'Bearer {api_key}'}
response = requests.post(post_url, json=data, headers=headers)
print("响应:", response.json())
'''
{'document': {'id': '94b871c6-8a72-4096-880d-a2c364e9abe9', 'position': 7, 'data_source_type': 'upload_file',...}
'''
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐


所有评论(0)