ES7.5.2相关性算分问题,找了很多资料都不对,请教老师问题

来源:7-2 -相关性算分

苦瓜苦也

2020-03-15

GET test_search_relevance/_settings
{
  "test_search_relevance" : {
    "settings" : {
      "index" : {
        "creation_date" : "1584190500424",
        "number_of_shards" : "3",
        "number_of_replicas" : "1",
        "uuid" : "SnNbrQ_JR7GKa0tQPqSgZg",
        "version" : {
          "created" : "7050299"
        },
        "provided_name" : "test_search_relevance"
      }
    }
  }
}
PUT test_search_relevance/_bulk
{"index":{"_id":1}}
{"name":"hello"}
{"index":{"_id":2}}
{"name":"hello,world!"}
{"index":{"_id":3}}
{"name":"hello,world! a beautiful world"}
GET /test_search_relevance/_search
{
  "explain": true,
  "query":{
    "match":{
      "name":"hello"
    }
  }
}

截取部分

      {
        "_shard" : "[test_search_relevance][2]",
        "_node" : "cSYFpZ2ZSwWfwgItKZ8zHA",
        "_index" : "test_search_relevance",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "hello"
        },
        "_explanation" : {
          "value" : 0.2876821,
          "description" : "weight(name:hello in 0) [PerFieldSimilarity], result of:",
          "details" : [
            {
              "value" : 0.2876821,
              "description" : "score(freq=1.0), product of:",
              "details" : [
                {
                  "value" : 2.2,
                  "description" : "boost",
                  "details" : [ ]
                },
                {
                  "value" : 0.2876821,
                  "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details" : [
                    {
                      "value" : 1,
                      "description" : "n, number of documents containing term",
                      "details" : [ ]
                    },
                    {
                      "value" : 1,
                      "description" : "N, total number of documents with field",
                      "details" : [ ]
                    }
                  ]
                },
                {
                  "value" : 0.45454544,
                  "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details" : [
                    {
                      "value" : 1.0,
                      "description" : "freq, occurrences of term within document",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.2,
                      "description" : "k1, term saturation parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.75,
                      "description" : "b, length normalization parameter",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "dl, length of field",
                      "details" : [ ]
                    },
                    {
                      "value" : 1.0,
                      "description" : "avgdl, average length of field",
                      "details" : [ ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }

问题1)里面的max_score怎么计算出来的?
图片描述
问题2)里面的hits._score的值0.2876821怎么计算出来的?
图片描述
问题3)里面的_explanation.value的值0.2876821是怎么计算出来的?
图片描述里面的IDF数值计算出来的倒是0.2876821。
图片描述

import math
print(math.log(1 + (1 - 1 + 0.5) / (1 + 0.5)))	#0.28768207245178085

里面的TF数值计算出来的倒是0.45454544。
图片描述

print(1.0 / (1.0 + 1.2 * (1 - 0.75 + 0.75 * 1.0 / 1.0))) #0.45454545454545453

查询计算资料的时候的一篇文档是这样的。
图片描述

但是用IDF*(qi) * R(qi,d) = 0.2876821 * 0.45454544 = 0.13076458672462402 显然跟前面提的三个问题里面的数值不一样?不知道那三个数值怎么算的

写回答

1回答

rockybean

2020-03-20

max_score 是指所有命中的文档中得分最高的那个文档 score。

里面分值的计算,你按照它里面的详细解释应该是可以算出来的。

你最后算不出来应该是计算步骤有问题,这个我得找时间看下。


0
0

Elastic Stack从入门到实践,动手搭建数据分析系统

有了Elastic Stack,不用写一行代码,你也可以玩转大数据分析!

1361 学习 · 397 问题

查看课程