•  AUC

        AUC计算的关键是找到所有正样的预测值大于负样本预测值的正负样本对。如下表格,假设召回模型召回topk=4,分布为ABCD,其中真实样本中,B、D为正例(这里正例1代表用户点击的样本,0为未点击样本),那么该AUC计算如下:

AUC=\frac{\sum Rank_{i}-\frac{m*(m+1)}{2}}{m*(N-m))},i\epsilon positive

N为样本的总数,m为正例的个数。那么表格中的AUC=\frac{1+3-2*(2+1)/2)}{2*2}=0.25

举例表
样本 真实样本 预测得分 排序
A 0 0.8 1
B 1 0.7 2
C 0 0.6 3
D 1 0.5 4

那么代码可以实现,如下:

# 计算topk召回的auc
def calculate_auc(recall_items: list, true_items: list):
    N = len(recall_items)
    if N == 0:
        return 0
    # hit_item = set(recall_items) & set(true_items)  # 忽略重复点击情况
    hit_item = [item for item in true_items if item in recall_items]
    m = len(hit_item)
    if m == N:
        return 0
    rank_i = [N - recall_items.index(i) for i in hit_item]
    return (sum(rank_i) - (m + 1) * m / 2) / (m * (N - m))
  • HR

这个比较好理解

HR = \sum \frac{hit_{i}}{N}

# topk召回HR
def calulate_HR(recall_items: list, true_items: list):
    N = len(recall_items)
    M = len(true_items)
    if N == 0 or M == 0:
        return 0
    hit_num = 0
    for item in true_items:
        if item in recall_items:
            hit_num += 1
    return hit_num / M
  • Precision

Precision就是召回了K个item,K个item中被点击了n个, 那么Precision = n / K

# topk召回Precision
def calulate_Precision(recall_items: list, true_items: list):
    N = len(recall_items)
    M = len(true_items)
    if N == 0 or M == 0:
        return 0
    hit_items = set(recall_items) & set(true_items)  # 忽略重复点击情况
    return len(hit_items) / N
  • Recall

Recall是用户点击的M个item中,k个物品是在召回模型推荐列表的,那么Recall = k / M

# topk 召回Recall
def calulate_Recall(recall_items: list, true_items):
    N = len(recall_items)
    M = len(true_items)
    if N == 0 or M == 0:
        return 0
    hit_items = [item for item in recall_items if item in true_items]
    return len(hit_items) / M

 

  • F1

F1 = 2 * Precision*Recall / (Precision + Recall)

# topk 召回F1
def calulate_F1(recall_items, true_items):
    Recall = calulate_Recall(recall_items, true_items)
    Precision = calulate_Precision(recall_items, true_items)
    if Recall != 0 or Precision != 0:
        return 2 * Precision * Recall / (Recall + Precision)

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐