自然语言处理--利用点积度量文本之间的重合度
如果能够度量两个文本之间的重合度,就可以很好地估计它们所用词的相似程度,而这也是它们语义上重合度的一个很好的估计。import numpy as npimport pandas as pdsentences = """Thomas Jefferson began building Monticello at the age of 26.\n"""sentences += """Constructi
·
如果能够度量两个文本之间的重合度,就可以很好地估计它们所用词的相似程度,而这也是它们语义上重合度的一个很好的估计。
import numpy as np
import pandas as pd
sentences = """Thomas Jefferson began building Monticello at the age of 26.\n"""
sentences += """Construction was done mostly by local masons and carpenters.\n"""
sentences += "He moved into the South Pavilion in 1770.\n"
sentences += """Turning Monticello into a neoclassical masterpiece was Jefferson's obsession."""
corpus = {}
for i, sent in enumerate(sentences.split('\n')):
corpus['sent{}'.format(i)] = dict((tok, 1) for tok in sent.split())
# pd.DataFrame.from_records()专门用于从元组和字典中创建数据框
df = pd.DataFrame.from_records(corpus).fillna(0).astype(int).T
print(df)
df = df.T
print(df.sent0.dot(df.sent1))
print(df.sent0.dot(df.sent2))
print(df.sent0.dot(df.sent3))
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐



所有评论(0)