ElasticSearch2基本操作（05关于搜索）

ES的搜索，不是关系数据库中的LIKE，而是通过搜索条件及文档之间的相关性来进行的。

对于一次搜索，对于每一个文档，都有一个浮点数字段_score 来表示文档与搜索主题的相关性， _score 的评分越高，相关性越高。

评分的计算方式取决于不同的查询类型：
fuzzy查询会计算与关键词的拼写相似程度
terms查询会计算找到的内容与关键词组成部分匹配的百分比
而全文本搜索是指计算内容与关键词的类似程度。

ES通过计算TF/IDF（即检索词频率/反向文档频率， Term Frequency/Inverse Document Frequency）作为相关性指标，具体与下面三个指标相关：
检索词频率TF: 对于一条记录，检索词在查询字段中出现的频率越高，相关性也越高。比如，一共有5个检索词，有4个出现在第一条记录，3条出现在第二条记录，则第一条记录TF会比第二条高一些。

反向文档频率IDF: 每个检索词在所有文档的该字段中出现的频率越高，则该词相关性越低。比如有5个检索词，如果一个词在所有文档中都出现，而另一个词之出现了一次，则所有文档中都包含的词几乎可以被忽略，只出现了一次的这个词权重会很高。

字段长度: 对于一条记录，查询字段的长度越长，相关性越低。比如有一条记录长度为10个词，另一条记录长度为100个词，而一个关键词，在两条记录里都出现了一次。则长度为10个词的记录，比长度为100个词的记录，相关性会高很多。

通过对TF/IDF的了解，可以让你解释一些看似不应该出现的结果。同时，你应该清楚，这不是一种精确匹配算法，而是一种评分算法，根据相关性进行了排序。

如果认为评分结果不合理，可以用下面的语句，查看评分过程：

#解释查询是如何进行评分的
crul -XPost http://127.0.0.1:9200/myindex/user/_search?explain -d'
{
   "query"   : { "match" : { "家庭住址" : "魔都大街" }}
}'

#结果如下：
{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 4,
        "max_score": 4,
        "hits": [
            {
                "_shard": 4,
                "_node": "5Tv2a5YaQDqmzUFbTp4iaw",
                "_index": "myindex",
                "_type": "user",
                "_id": "u002",
                "_score": 4,
                "_source": {
                    "用户ID": "u002",
                    "姓名": "李四",
                    "性别": "男",
                    "年龄": "25",
                    "家庭住址": "上海市闸北区魔都大街007号",
                    "注册时间": "2015-02-01 08:30:00"
                },
                "_explanation": {
                    "value": 4,
                    "description": "sum of:",
                    "details": [
                        {
                            "value": 4,
                            "description": "sum of:",
                            "details": [
                                {
                                    "value": 1,
                                    "description": "weight(家庭住址:魔 in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "score(doc=0,freq=1.0), product of:",
                                            "details": [
                                                {
                                                    "value": 0.5,
                                                    "description": "queryWeight, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 0.5,
                                                            "description": "queryNorm",
                                                            "details": []
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value": 2,
                                                    "description": "fieldWeight in 0, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "tf(freq=1.0), with freq of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "termFreq=1.0",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldNorm(doc=0)",
                                                            "details": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value": 1,
                                    "description": "weight(家庭住址:都 in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "score(doc=0,freq=1.0), product of:",
                                            "details": [
                                                {
                                                    "value": 0.5,
                                                    "description": "queryWeight, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 0.5,
                                                            "description": "queryNorm",
                                                            "details": []
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value": 2,
                                                    "description": "fieldWeight in 0, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "tf(freq=1.0), with freq of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "termFreq=1.0",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldNorm(doc=0)",
                                                            "details": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value": 1,
                                    "description": "weight(家庭住址:大街 in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "score(doc=0,freq=1.0), product of:",
                                            "details": [
                                                {
                                                    "value": 0.5,
                                                    "description": "queryWeight, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 0.5,
                                                            "description": "queryNorm",
                                                            "details": []
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value": 2,
                                                    "description": "fieldWeight in 0, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "tf(freq=1.0), with freq of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "termFreq=1.0",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldNorm(doc=0)",
                                                            "details": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value": 1,
                                    "description": "weight(家庭住址:街 in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "score(doc=0,freq=1.0), product of:",
                                            "details": [
                                                {
                                                    "value": 0.5,
                                                    "description": "queryWeight, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 0.5,
                                                            "description": "queryNorm",
                                                            "details": []
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value": 2,
                                                    "description": "fieldWeight in 0, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "tf(freq=1.0), with freq of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "termFreq=1.0",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldNorm(doc=0)",
                                                            "details": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "value": 0,
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "value": 0,
                                    "description": "# clause",
                                    "details": []
                                },
                                {
                                    "value": 0.5,
                                    "description": "_type:user, product of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "boost",
                                            "details": []
                                        },
                                        {
                                            "value": 0.5,
                                            "description": "queryNorm",
                                            "details": []
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "_shard": 0,
                "_node": "5Tv2a5YaQDqmzUFbTp4iaw",
                "_index": "myindex",
                "_type": "user",
                "_id": "u003",
                "_score": 0.71918744,
                "_source": {
                    "用户ID": "u003",
                    "姓名": "王五",
                    "性别": "男",
                    "年龄": "26",
                    "家庭住址": "广州市花都区花城大街010号",
                    "注册时间": "2015-03-01 08:30:00"
                },
                "_explanation": {
                    "value": 0.71918744,
                    "description": "sum of:",
                    "details": [
                        {
                            "value": 0.71918744,
                            "description": "product of:",
                            "details": [
                                {
                                    "value": 1.4383749,
                                    "description": "sum of:",
                                    "details": [
                                        {
                                            "value": 0.71918744,
                                            "description": "weight(家庭住址:大街 in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "value": 0.71918744,
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "value": 0.35959372,
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "idf(docFreq=1, maxDocs=2)",
                                                                    "details": []
                                                                },
                                                                {
                                                                    "value": 0.35959372,
                                                                    "description": "queryNorm",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "value": 1,
                                                                            "description": "termFreq=1.0",
                                                                            "details": []
                                                                        }
                                                                    ]
                                                                },
                                                                {
                                                                    "value": 1,
                                                                    "description": "idf(docFreq=1, maxDocs=2)",
                                                                    "details": []
                                                                },
                                                                {
                                                                    "value": 2,
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": []
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        },
                                        {
                                            "value": 0.71918744,
                                            "description": "weight(家庭住址:街 in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "value": 0.71918744,
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "value": 0.35959372,
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "idf(docFreq=1, maxDocs=2)",
                                                                    "details": []
                                                                },
                                                                {
                                                                    "value": 0.35959372,
                                                                    "description": "queryNorm",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "value": 1,
                                                                            "description": "termFreq=1.0",
                                                                            "details": []
                                                                        }
                                                                    ]
                                                                },
                                                                {
                                                                    "value": 1,
                                                                    "description": "idf(docFreq=1, maxDocs=2)",
                                                                    "details": []
                                                                },
                                                                {
                                                                    "value": 2,
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": []
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value": 0.5,
                                    "description": "coord(2/4)",
                                    "details": []
                                }
                            ]
                        },
                        {
                            "value": 0,
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "value": 0,
                                    "description": "# clause",
                                    "details": []
                                },
                                {
                                    "value": 0.35959372,
                                    "description": "_type:user, product of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "boost",
                                            "details": []
                                        },
                                        {
                                            "value": 0.35959372,
                                            "description": "queryNorm",
                                            "details": []
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            },
            ......
        ]
    }
}

你可以看到，不仅是“魔都大街”的记录被查询出来了，只要有“大街”的记录也被查出来了哦。同时，也告诉了你，为什么”u002″是最靠前的。

还有一种用法，就是让ES告诉你，查询语句哪里错了：

curl -XPOST http://127.0.0.1:9200/myindex/user/_validate/query?explain -d'
{
   "query"   : { "matchA" : { "家庭住址" : "魔都大街" }}
}'

{
    "valid": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "explanations": [
        {
            "index": "myindex",
            "valid": false,
            "error": "org.elasticsearch.index.query.QueryParsingException: No query registered for [matchA]"
        }
    ]
}

ES会告诉你matchA这里错了哦。

Leave a Reply Cancel reply