About neohope

一直在努力,还没想过要放弃...

ElasticSearch2基本操作(06关于查询条件及过滤)

过滤

match_all 全部匹配,不做过滤,默认
term 精确匹配
terms 精确匹配多个词
range 范围匹配
exists 文档包含某属性
missing 文档不包含某属性
bool 多个过滤条件的组合

其中,对于bool过滤,可以有下面的组合条件:

must 多个查询条件的完全匹配,相当于 and。
must_not 多个查询条件的相反匹配,相当于 not。
should 至少有一个查询条件匹配, 相当于 or。

查询

match_all 全部匹配,默认
match 首先对查询条件进行分词,然后用TF/IDF评分
multi_match 与match类似,但可以用多个条件
bool 多个条件的组合查询

其中,对于bool查询,可以有下面的组合条件:

must 多个查询条件的完全匹配,相当于 and。
must_not 多个查询条件的相反匹配,相当于 not。
should 至少有一个查询条件匹配, 相当于 or。
#查询性别为男,年龄不是25,家庭住址最好有魔都两个字的记录
curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d'
{
    "query": {
        "bool": {
            "must": {
                "term": {
                    "性别": "男"
                }
            },
            "must_not": {
                "match": {
                    "年龄": "25"
                }
            },
            "should": {
                "match": {
                    "家庭住址": "魔都"
                }
            }
        }
    }
}'

#查询注册时间从2015-04-01到2016-04-01的用户
curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d'
{
    "query": {
        "bool": {
            "must": {
                "range": {
                    "注册时间": {
                        "gte": "2015-04-01 00:00:00",
                        "lt": "2016-04-01 00:00:00"
                    }
                }
            }
        }
    }
}'

#查询没有年龄字段的记录
curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d'
{
    "query": {
        "bool": {
            "must": {
                "missing": {
                    "field": "年龄"
                }
            }
        }
    }
}'

#查询家庭地址或工作地址中包含北京的用户
curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d'
{
    "query": {
        "multi_match": {
            "query": "北京",
            "type": "most_fields",
            "fields": [
                "家庭住址",
                "工作地址"
            ]
        }
    }
}'

#查询性别为男的用户
curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d'
{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "term": {
                    "性别": "男"
                }
            }
        }
    }
}'

#查询注册时间为两年内的用户
curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d'
{
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "range": {
                    "注册时间": {"gt" : "now-2y"}
                }
            }
        }
    }
}'

排序

#查询所有用户,按注册时间进行排序
curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d'
{
    "query": {
        "match_all": {}
    },
    "sort": {
        "注册时间": {
            "order": "desc"
        }
    }
}'

分页

#查询前三条记录
curl -XPOST http://127.0.0.1:9200/myindex/user/_search -d'
{
    "query": {
        "match_all": {}
    },
    "from": 0,
    "size": 3
}'

带缓存的分页

#进行分页
curl -XPOST http://127.0.0.1:9200/myindex/user/_search?search_type=scan&scroll=5m -d'
{
    "query": { "match_all": {}},
    "size":  10
}'

#返回_scroll_id
{"_scroll_id":"c2Nhbjs1OzE1MzE6NVR2MmE1WWFRRHFtelVGYlRwNGlhdzsxNTMzOjVUdjJhNVlhUURxbXpVRmJUcDRpYXc7MTUzNDo1VHYyYTVZYVFEcW16VUZiVHA0aWF3OzE1MzU6NVR2MmE1WWFRRHFtelVGYlRwNGlhdzsxNTMyOjVUdjJhNVlhUURxbXpVRmJUcDRpYXc7MTt0b3RhbF9oaXRzOjc7","took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":7,"max_score":0.0,"hits":[]}}

#发送_scroll_id开始查询
curl -XPOST http://127.0.0.1:9200/_search/scroll?scroll=5m
c2Nhbjs1OzE1MzE6NVR2MmE1WWFRRHFtelVGYlRwNGlhdzsxNTMzOjVUdjJhNVlhUURxbXpVRmJUcDRpYXc7MTUzNDo1VHYyYTVZYVFEcW16VUZiVHA0aWF3OzE1MzU6NVR2MmE1WWFRRHFtelVGYlRwNGlhdzsxNTMyOjVUdjJhNVlhUURxbXpVRmJUcDRpYXc7MTt0b3RhbF9oaXRzOjc7

ElasticSearch2基本操作(05关于搜索)

ES的搜索,不是关系数据库中的LIKE,而是通过搜索条件及文档之间的相关性来进行的。

对于一次搜索,对于每一个文档,都有一个浮点数字段_score 来表示文档与搜索主题的相关性, _score 的评分越高,相关性越高。

评分的计算方式取决于不同的查询类型:
fuzzy查询会计算与关键词的拼写相似程度
terms查询会计算找到的内容与关键词组成部分匹配的百分比
而全文本搜索是指计算内容与关键词的类似程度。

ES通过计算TF/IDF(即检索词频率/反向文档频率, Term Frequency/Inverse Document Frequency)作为相关性指标,具体与下面三个指标相关:
检索词频率TF: 对于一条记录,检索词在查询字段中出现的频率越高,相关性也越高。比如,一共有5个检索词,有4个出现在第一条记录,3条出现在第二条记录,则第一条记录TF会比第二条高一些。

反向文档频率IDF: 每个检索词在所有文档的该字段中出现的频率越高,则该词相关性越低。比如有5个检索词,如果一个词在所有文档中都出现,而另一个词之出现了一次,则所有文档中都包含的词几乎可以被忽略,只出现了一次的这个词权重会很高。

字段长度: 对于一条记录,查询字段的长度越长,相关性越低。比如有一条记录长度为10个词,另一条记录长度为100个词,而一个关键词,在两条记录里都出现了一次。则长度为10个词的记录,比长度为100个词的记录,相关性会高很多。

通过对TF/IDF的了解,可以让你解释一些看似不应该出现的结果。同时,你应该清楚,这不是一种精确匹配算法,而是一种评分算法,根据相关性进行了排序。

如果认为评分结果不合理,可以用下面的语句,查看评分过程:

#解释查询是如何进行评分的
crul -XPost http://127.0.0.1:9200/myindex/user/_search?explain -d'
{
   "query"   : { "match" : { "家庭住址" : "魔都大街" }}
}'

#结果如下:
{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 4,
        "max_score": 4,
        "hits": [
            {
                "_shard": 4,
                "_node": "5Tv2a5YaQDqmzUFbTp4iaw",
                "_index": "myindex",
                "_type": "user",
                "_id": "u002",
                "_score": 4,
                "_source": {
                    "用户ID": "u002",
                    "姓名": "李四",
                    "性别": "男",
                    "年龄": "25",
                    "家庭住址": "上海市闸北区魔都大街007号",
                    "注册时间": "2015-02-01 08:30:00"
                },
                "_explanation": {
                    "value": 4,
                    "description": "sum of:",
                    "details": [
                        {
                            "value": 4,
                            "description": "sum of:",
                            "details": [
                                {
                                    "value": 1,
                                    "description": "weight(家庭住址:魔 in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "score(doc=0,freq=1.0), product of:",
                                            "details": [
                                                {
                                                    "value": 0.5,
                                                    "description": "queryWeight, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 0.5,
                                                            "description": "queryNorm",
                                                            "details": []
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value": 2,
                                                    "description": "fieldWeight in 0, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "tf(freq=1.0), with freq of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "termFreq=1.0",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldNorm(doc=0)",
                                                            "details": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value": 1,
                                    "description": "weight(家庭住址:都 in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "score(doc=0,freq=1.0), product of:",
                                            "details": [
                                                {
                                                    "value": 0.5,
                                                    "description": "queryWeight, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 0.5,
                                                            "description": "queryNorm",
                                                            "details": []
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value": 2,
                                                    "description": "fieldWeight in 0, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "tf(freq=1.0), with freq of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "termFreq=1.0",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldNorm(doc=0)",
                                                            "details": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value": 1,
                                    "description": "weight(家庭住址:大街 in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "score(doc=0,freq=1.0), product of:",
                                            "details": [
                                                {
                                                    "value": 0.5,
                                                    "description": "queryWeight, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 0.5,
                                                            "description": "queryNorm",
                                                            "details": []
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value": 2,
                                                    "description": "fieldWeight in 0, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "tf(freq=1.0), with freq of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "termFreq=1.0",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldNorm(doc=0)",
                                                            "details": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value": 1,
                                    "description": "weight(家庭住址:街 in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "score(doc=0,freq=1.0), product of:",
                                            "details": [
                                                {
                                                    "value": 0.5,
                                                    "description": "queryWeight, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 0.5,
                                                            "description": "queryNorm",
                                                            "details": []
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value": 2,
                                                    "description": "fieldWeight in 0, product of:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "tf(freq=1.0), with freq of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "termFreq=1.0",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 1,
                                                            "description": "idf(docFreq=1, maxDocs=2)",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldNorm(doc=0)",
                                                            "details": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "value": 0,
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "value": 0,
                                    "description": "# clause",
                                    "details": []
                                },
                                {
                                    "value": 0.5,
                                    "description": "_type:user, product of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "boost",
                                            "details": []
                                        },
                                        {
                                            "value": 0.5,
                                            "description": "queryNorm",
                                            "details": []
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "_shard": 0,
                "_node": "5Tv2a5YaQDqmzUFbTp4iaw",
                "_index": "myindex",
                "_type": "user",
                "_id": "u003",
                "_score": 0.71918744,
                "_source": {
                    "用户ID": "u003",
                    "姓名": "王五",
                    "性别": "男",
                    "年龄": "26",
                    "家庭住址": "广州市花都区花城大街010号",
                    "注册时间": "2015-03-01 08:30:00"
                },
                "_explanation": {
                    "value": 0.71918744,
                    "description": "sum of:",
                    "details": [
                        {
                            "value": 0.71918744,
                            "description": "product of:",
                            "details": [
                                {
                                    "value": 1.4383749,
                                    "description": "sum of:",
                                    "details": [
                                        {
                                            "value": 0.71918744,
                                            "description": "weight(家庭住址:大街 in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "value": 0.71918744,
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "value": 0.35959372,
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "idf(docFreq=1, maxDocs=2)",
                                                                    "details": []
                                                                },
                                                                {
                                                                    "value": 0.35959372,
                                                                    "description": "queryNorm",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "value": 1,
                                                                            "description": "termFreq=1.0",
                                                                            "details": []
                                                                        }
                                                                    ]
                                                                },
                                                                {
                                                                    "value": 1,
                                                                    "description": "idf(docFreq=1, maxDocs=2)",
                                                                    "details": []
                                                                },
                                                                {
                                                                    "value": 2,
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": []
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        },
                                        {
                                            "value": 0.71918744,
                                            "description": "weight(家庭住址:街 in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "value": 0.71918744,
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "value": 0.35959372,
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "idf(docFreq=1, maxDocs=2)",
                                                                    "details": []
                                                                },
                                                                {
                                                                    "value": 0.35959372,
                                                                    "description": "queryNorm",
                                                                    "details": []
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value": 2,
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "value": 1,
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "value": 1,
                                                                            "description": "termFreq=1.0",
                                                                            "details": []
                                                                        }
                                                                    ]
                                                                },
                                                                {
                                                                    "value": 1,
                                                                    "description": "idf(docFreq=1, maxDocs=2)",
                                                                    "details": []
                                                                },
                                                                {
                                                                    "value": 2,
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": []
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value": 0.5,
                                    "description": "coord(2/4)",
                                    "details": []
                                }
                            ]
                        },
                        {
                            "value": 0,
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "value": 0,
                                    "description": "# clause",
                                    "details": []
                                },
                                {
                                    "value": 0.35959372,
                                    "description": "_type:user, product of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "boost",
                                            "details": []
                                        },
                                        {
                                            "value": 0.35959372,
                                            "description": "queryNorm",
                                            "details": []
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            },
            ......
        ]
    }
}

你可以看到,不仅是“魔都大街”的记录被查询出来了,只要有“大街”的记录也被查出来了哦。同时,也告诉了你,为什么”u002″是最靠前的。

还有一种用法,就是让ES告诉你,查询语句哪里错了:

curl -XPOST http://127.0.0.1:9200/myindex/user/_validate/query?explain -d'
{
   "query"   : { "matchA" : { "家庭住址" : "魔都大街" }}
}'

{
    "valid": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "explanations": [
        {
            "index": "myindex",
            "valid": false,
            "error": "org.elasticsearch.index.query.QueryParsingException: No query registered for [matchA]"
        }
    ]
}

ES会告诉你matchA这里错了哦。

ElasticSearch2基本操作(04关于分词)

恩,有些初步的感觉了没?那回过头来我们看下最基础的东西:

ES中,常见数据类型如下:

类型名称 数据类型
字符串 string
整数 byte, short, integer, long
浮点数 float, double
布尔 boolean
日期 date
对象 object
嵌套结构 nested
地理位置(经纬度) geo_point

常用字段分析类型如下:

分析类型 含义
analyzed 首先分析这个字符串,然后索引。换言之,以全文形式索引此字段。
not_analyzed 索引这个字段,使之可以被搜索,但是索引内容和指定值一样。不分析此字段。
no 不索引这个字段。这个字段不能被搜索到。

然后,我们测试一下分词器

1、首先测试一下用标准分词进行分词

curl -XPOST http://localhost:9200/_analyze?analyzer=standard&text=小明同学大吃一惊

{
    "tokens": [
        {
            "token": "小",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "明",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "同",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "学",
            "start_offset": 3,
            "end_offset": 4,
            "type": "<IDEOGRAPHIC>",
            "position": 3
        },
        {
            "token": "大",
            "start_offset": 4,
            "end_offset": 5,
            "type": "<IDEOGRAPHIC>",
            "position": 4
        },
        {
            "token": "吃",
            "start_offset": 5,
            "end_offset": 6,
            "type": "<IDEOGRAPHIC>",
            "position": 5
        },
        {
            "token": "一",
            "start_offset": 6,
            "end_offset": 7,
            "type": "<IDEOGRAPHIC>",
            "position": 6
        },
        {
            "token": "惊",
            "start_offset": 7,
            "end_offset": 8,
            "type": "<IDEOGRAPHIC>",
            "position": 7
        }
    ]
}

2、然后对比一下用IK分词进行分词

curl -XGET http://localhost:9200/_analyze?analyzer=ik&text=小明同学大吃一惊

{
    "tokens": [
        {
            "token": "小明",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "同学",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "大吃一惊",
            "start_offset": 4,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "大吃",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "吃",
            "start_offset": 5,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "一惊",
            "start_offset": 6,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "一",
            "start_offset": 6,
            "end_offset": 7,
            "type": "TYPE_CNUM",
            "position": 6
        },
        {
            "token": "惊",
            "start_offset": 7,
            "end_offset": 8,
            "type": "CN_CHAR",
            "position": 7
        }
    ]
}

3、测试一下按”家庭住址”字段进行分词

curl -XGET http://localhost:9200/myindex/_analyze?field=家庭住址&text=我爱北京天安门

{
    "tokens": [
        {
            "token": "我",
            "start_offset": 0,
            "end_offset": 1,
            "type": "CN_CHAR",
            "position": 0
        },
        {
            "token": "爱",
            "start_offset": 1,
            "end_offset": 2,
            "type": "CN_CHAR",
            "position": 1
        },
        {
            "token": "北京",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "京",
            "start_offset": 3,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "天安门",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "天安",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "门",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 6
        }
    ]
}

4、测试一下按”性别”字段进行分词

curl -XGET http://localhost:9200/myindex/_analyze?field=性别&text=我爱北京天安门

{
    "tokens": [
        {
            "token": "我爱北京天安门",
            "start_offset": 0,
            "end_offset": 7,
            "type": "word",
            "position": 0
        }
    ]
}

大家可以看到,不同的分词器,使用场景、针对语言是不一样的,所以要选择合适的分词器。
此外,对于不同的字段,要选择不同的分析方式及适用的分词器,会让你事半功倍。

ElasticSearch2基本操作(03增删改查REST)

接上一篇:

11、更新文档

curl -XPOST http://localhost:9200/_bulk -d'
{ action: { metadata }}\n
{ request body        }\n
{ action: { metadata }}\n
{ request body        }\n
'
操作类型 说明
create 当文档不存在时创建之。
index 创建新文档或替换已有文档。
update 局部更新文档。
delete 删除一个文档。

比如下面的操作:
首先删除一个文件
再新建一个文件
然后全局更加一个文件
最后局部更新一个文件

curl -XPOST http://localhost:9200/_bulk -d'
{ "delete": { "_index": "myindex", "_type": "user", "_id": "u004" }}
{ "create": { "_index": "myindex", "_type": "user", "_id": "u004" }}
{"用户ID": "u004","姓名":"赵六","性别":"男","年龄":"27","家庭住址":"深圳市龙岗区特区大街011号","注册时间":"2015-04-01 08:30:00"}
{ "index": { "_index": "myindex", "_type": "user", "_id": "u004" }}
{"用户ID": "u004","姓名":"赵六","性别":"男","年龄":"28","家庭住址":"深圳市龙岗区特区大街012号","注册时间":"2015-04-01 08:30:00"}
{ "update": { "_index": "myindex", "_type": "user", "_id": "u004"} }
{ "doc" : {"年龄" : "28"}}

结果如下:(局部更新没有执行,没查到原因)

{
    "took": 406,
    "errors": false,
    "items": [
        {
            "delete": {
                "_index": "myindex",
                "_type": "user",
                "_id": "u004",
                "_version": 10,
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "status": 200,
                "found": true
            }
        },
        {
            "create": {
                "_index": "myindex",
                "_type": "user",
                "_id": "u004",
                "_version": 11,
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "status": 201
            }
        },
        {
            "index": {
                "_index": "myindex",
                "_type": "user",
                "_id": "u004",
                "_version": 12,
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "status": 200
            }
        }
    ]
}

Tomcat指定native library路径

一般,指定JVM的native library路径,只需要用下面的参数就好了

-Djava.library.path=PATH_TO_NATIVE_LIBRARY

Tomcat指定native library路径时,还是不要使用这边参数的好。
建议直接修改PATH环境变量,将dll或so放到环境变量PATH的路径下就好了。
TOMCAT默认将PATH赋值给-Djava.library.path参数的。

ElasticSearch2基本操作(02增删改查REST)

接上一篇:

7、更新文档

#新增u004
curl -XPUT http://localhost:9200/myindex/user/u004 -d'
{
"用户ID": "u004",
"姓名":"赵六",
"性别":"男",
"年龄":"27",
"家庭住址":"深圳市龙岗区特区大街011号",
"注册时间":"2015-04-01 08:30:00"
}'

#更新u004
curl -XPUT http://localhost:9200/myindex/user/u004 -d'
{
"用户ID": "u004",
"姓名":"赵六",
"性别":"男",
"年龄":"27",
"家庭住址":"深圳市龙岗区特区大街011号",
"注册时间":"2015-04-01 08:30:00"
}'

#强制新增u004,如果已存在,则会报错
curl -XPUT http://localhost:9200/myindex/user/u004/_create -d'
{
"用户ID": "u004",
"姓名":"赵六",
"性别":"男",
"年龄":"27",
"家庭住址":"深圳市龙岗区特区大街012号",
"注册时间":"2015-04-01 08:30:00"
}'

返回结果如下:

#新增成功,版本为1
{
    "_index": "myindex",
    "_type": "user",
    "_id": "u004",
    "_version": 1,
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "created": true
}

#更新成功,版本为2
{
    "_index": "myindex",
    "_type": "user",
    "_id": "u004",
    "_version": 2,
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "created": false
}

#强制新增失败
Http Error: Conflict

8、删除文档,注意版本号变化

#删除文档
curl -XDELETE http://localhost:9200/myindex/user/u004

9、然后新增,再做局部更新,注意版本号变化

#新增
curl -XPUT http://localhost:9200/myindex/user/u004 -d'
{
"用户ID": "u004",
"姓名":"赵六",
"性别":"男",
"年龄":"27",
"家庭住址":"深圳市龙岗区特区大街011号",
"注册时间":"2015-04-01 08:30:00"
}'

#局部更新
curl -XPOST http://localhost:9999/myindex/user/u004/_update -d'
{
    "doc": {
        "家庭住址": "深圳市龙岗区特区大街013号"
    }
}'

#取回
curl -XGET http://localhost:9999/myindex/user/u004

10、批量取回

#从index开始指定
curl -XGET http://localhost:9999/_mget'
{
   "docs" : [
      {
         "_index" : "myindex",
         "_type" :  "user",
         "_id" :    "u001"
      },
      {
         "_index" : "myindex",
         "_type" :  "user",
         "_id" :    "u002",
         "_source": "家庭住址"
      }
   ]
}'

#index相同
GET -XGET http://localhost:9999/myindex/_mget'
{
   "docs" : [
      {  "_type" : "user", "_id" :   "u002"},
      { "_type" : "user", "_id" :   "u002" }
   ]
}'

#type相同
curl -XGET http://localhost:9999/myindex/user/_mget'
{
   "ids" : [ "u001", "u002" ]
}'

10 types of programmers you’ll encounter in the field


10 types of programmers you’ll encounter in the field
by Justin James

Programmers enjoy a reputation for being peculiar people. In fact, even within the development community, there are certain programmer archetypes that other programmers find strange. Here are 10 types of programmers you are likely to run across.

#1: Gandalf
This programmer type looks like a short-list candidate to play Gandalf in The Lord of the Rings. He (or even she!) has a beard halfway to his knees, a goofy looking hat, and may wear a cape or a cloak in the winter. Luckily for the team, this person is just as adept at working magic as Gandalf. Unluckily for the team, they will need to endure hours of stories from Gandalf about how he or she to walk uphill both ways in the snow to drop off the punch cards at the computer room. The Gandalf type is your heaviest hitter, but you try to leave them in the rear and call them up only in times of desperation.

#2: The Martyr
In any other profession, The Martyr is simply a “workaholic.” But in the development field, The Martyr goes beyond that and into another dimension. Workaholics at least go home to shower and sleep. The Martyr takes pride in sleeping at the desk amidst empty pizza boxes. The problem is, no one ever asked The Martyr to work like this. And he or she tries to guilt-trip the rest of the team with phrases like, “Yeah, go home and enjoy dinner. I’ll finish up the next three week’s worth of code tonight.”

#3: Fanboy
Watch out for Fanboy. If he or she corners you, you’re in for a three-hour lecture about the superiority of Dragonball Z compared to Gundam Wing, or why the Playstation 3 is better than the XB 360. Fanboy’s workspace is filled with posters, action figures, and other knick-knacks related to some obsession, most likely imported from Japan. Not only are Fanboys obnoxious to deal with, they often put so much time into the obsession (both in and out of the office) that they have no clue when it comes to doing what they were hired to do.

#4: Vince Neil
This 40-something is a throwback to 1984 in all of the wrong ways. Sporting big hair, ripped stonewashed jeans, and a bandana here or there, Vince sits in the office humming Bon Jovi and Def Leppard tunes throughout the workday. This would not be so bad if “Pour Some Sugar on Me” was not so darned infectious.

Vince is generally a fun person to work with, and actually has a ton of experience, but just never grew up. But Vince becomes a hassle when he or she tries living the rock ‘n roll lifestyle to go with the hair and hi-tops. It’s fairly hard to work with someone who carries a hangover to work every day.

#5: The Ninja
The Ninja is your team’s MVP, and no one knows it. Like the legendary assassins, you do not know that The Ninja is even in the building or working, but you discover the evidence in the morning. You fire up the source control system and see that at 4 AM, The Ninja checked in code that addresses the problem you planned to spend all week working on, and you did not even know that The Ninja was aware of the project! See, while you were in Yet Another Meeting, The Ninja was working.

Ninjas are so stealthy, you might not even know their name, but you know that every project they’re on seems to go much more smoothly. Tread carefully, though. The Ninja is a lone warrior; don’t try to force him or her to work with rank and file.

#6: The Theoretician
The Theoretician knows everything there is to know about programming. He or she can spend four hours lecturing about the history of an obscure programming language or providing a proof of how the code you wrote is less than perfectly optimal and may take an extra three nanoseconds to run. The problem is, The Theoretician does not know a thing about software development. When The Theoretician writes code, it is so “elegant” that mere mortals cannot make sense of it. His or her favorite technique is recursion, and every block of code is tweaked to the max, at the expense of timelines and readability.

The Theoretician is also easily distracted. A simple task that should take an hour takes Theoreticians three months, since they decide that the existing tools are not sufficient and they must build new tools to build new libraries to build a whole new system that meets their high standards. The Theoretician can be turned into one of your best players, if you can get him or her to play within the boundaries of the project itself and stop spending time working on The Ultimate Sorting Algorithm.

#7: The Code Cowboy
The Code Cowboy is a force of nature that cannot be stopped. He or she is almost always a great programmer and can do work two or three times faster than anyone else. The problem is, at least half of that speed comes by cutting corners. The Code Cowboy feels that checking code into source control takes too long, storing configuration data outside of the code itself takes too long, communicating with anyone else takes too long… you get the idea.

The Code Cowboy’s code is a spaghetti code mess, because he or she was working so quickly that the needed refactoring never happened. Chances are, seven pages’ worth of core functionality looks like the “don’t do this” example of a programming textbook, but it magically works. The Code Cowboy definitely does not play well with others. And if you put two Code Cowboys on the same project, it is guaranteed to fail, as they trample on each other’s changes and shoot each other in the foot.

Put a Code Cowboy on a project where hitting the deadline is more important than doing it right, and the code will be done just before deadline every time. The Code Cowboy is really just a loud, boisterous version of The Ninja. While The Ninja executes with surgical precision, The Code Cowboy is a raging bull and will gore anything that gets in the way.

#8: The Paratrooper
You know those movies where a sole commando is air-dropped deep behind enemy lines and comes out with the secret battle plans? That person in a software development shop is The Paratrooper. The Paratrooper is the last resort programmer you send in to save a dying project. Paratroopers lack the patience to work on a long-term assignment, but their best asset is an uncanny ability to learn an unfamiliar codebase and work within it. Other programmers might take weeks or months to learn enough about a project to effectively work on it; The Paratrooper takes hours or days. Paratroopers might not learn enough to work on the core of the code, but the lack of ramp-up time means that they can succeed where an entire team might fail.

#9: Mediocre Man
“Good enough” is the best you will ever get from Mediocre Man. Don’t let the name fool you; there are female varieties of Mediocre Man too. And he or she always takes longer to produce worse code than anyone else on the team. “Slow and steady barely finishes the race” could describe Mediocre Man’s projects. But Mediocre Man is always just “good enough” to remain employed.

When you interview this type, they can tell you a lot about the projects they’ve been involved with but not much about their actual involvement. Filtering out the Mediocre Man type is fairly easy: Ask for actual details of the work they’ve done, and they suddenly get a case of amnesia. Let them into your organization, though, and it might take years to get rid of them.

#10: The Evangelist
No matter what kind of environment you have, The Evangelist insists that it can be improved by throwing away all of your tools and processes and replacing them with something else. The Evangelist is actually the opposite of The Theoretician. The Evangelist is outspoken, knows an awful lot about software development, but performs very little actual programming.

原文地址:
http://www.techrepublic.com

10种你会遇到的程序员


10种你会遇到的程序员
作者: Justin James
译者: 来自网络,抱歉没查到准确出处

程序员素来就被认为是一个奇特的人群。实际上,就算在程序开发者社群本身之中,也有一些特别的人群能让其他程序员觉得很奇怪。在这我列出10种你可能遇到过的程序员,你能想出更多么?

#1:甘道夫
这种程序员看起来,就像是在《指环王》里扮演甘道夫的最佳候选人。他(甚至是她)有着快要到膝盖的胡子,一顶看起来傻傻的帽子,在冬天可能还会穿一件披风或者是斗篷。对于团队来说幸运的是,此人对自己工作的熟练程度就像甘道夫一样。但不幸的是,他们要经常忍受甘道夫长达数个小时的故事的折磨,而内容主要是关于他或者是她是如何不得不在雪地中上山下山,以把打好孔的纸带送到计算机房。甘道夫类型的程序员是你的究极武器,但是你会总是希望能把他们排到后面,只在快要绝望的时候才向他们寻求帮助。

#2:烈士
对于任何其它职业来说,烈士其实就是一个工作狂而已。但是在开发者的领域,烈士完全进入了另外一个范畴。工作狂至少会回家洗澡睡觉,而烈士们却会以睡在桌子底下的空皮萨盒子堆之中为荣。而问题是,根本就没人要求烈士们像这样工作。而且他或者她总是想用这样的措辞来使团队中的其他人感到内疚,“好的,你们回家吃完饭吧。我会在今晚会完成相当于3个星期的工作量的。”

#3:玩家
小心玩家。如果他或者是她注意到了你,你很有可能就要接受3至4个小时关于龙珠z与高达谁更强、或者是playstation 3 与xbox 360哪个更好的演讲。玩家的桌子上总是堆满了明信片、动作人偶、以及其他各种各样相关的装饰品,大部分可能都是从日本进口的。玩家们不光是很难相处,他们有的时候实在是太多时间在这些东西上(无论是在办公室内外),以至于他们根本就不明白他们什么时候该干老板雇他们做的工作。

#4:文斯 内尔(一个比较有名的摇滚歌手)
这个40岁的家伙就像是颠三倒四的回到了1984.运动型爆炸头,发皱泛白的牛仔裤,还有一条大围巾。文斯还会在工作时间坐在办公室哼着Bon Jovi 和 Def Leppard的歌,这本来也不是很糟,如果《Pour Some Sugar on Me》不是如此的有感染力的话。

总体来说,和文斯一起工作是很有趣的,实际上他有丰富的经验,只是永远长不大而已。但是如果文斯决定用他或者是她的摇滚风格来处理自己的头发和生活的时候,情况就会变得很棘手。因为和一个每天都带着宿醉未醒的人一起工作,相当困难。

#5:忍者
忍者是你们团队当中的重要人物,但是却没人能意识到这点。就好象传奇刺客一样,你不知道忍者是什么时候工作的,但是你总是在第二天早晨发现他们的成果。于是你急忙打开源代码控制系统,然后发现在临晨4点,忍者提交了一份代码,解决了一个你已经研究了一个星期的问题,而你之前甚至都不知道忍者大人知道你所作的项目的存在。明白了吧,当你还在一次次的开会的时候,忍者一直在工作。

忍者是如此的隐蔽,你甚至都不知道他们的名字,但是你知道每一个他们参与的项目都进行的更顺利。不过,注意点,忍者是孤胆战士,不要试图强迫他们在一个严格的等级和文档制度下工作。

#6:理论家
理论家知道一切编程需要知道的东西。他或者是她可以花4个小时去探讨一个很冷僻的语言,或者去证明你写的代码是如何的不完美并且有可能会在运行的时候多花3纳秒。问题在于,理论家根本就不知道什么叫软件开发。当理论家写代码的时候,他的代码是如此的“优美”,以至于我们这些凡人根本就看不懂。他或者她最喜爱的技术就是递归,每一个代码块都被使用到了极致,而代价就是工程进度和可读性。

理论家还很容易分心。一个花一个小时就能完成的工作,理论家们往往需要三个月。因为他们认为当前的开发工具不够好,所以他们必须开发一些新的工具来构建新的库从而构建一个全新的系统来迎合他们的高标准。理论家可以成为你最好的团队成员,前提是你能让他专注于你们所做的工程本身,而不是把时间都花在究极排序算法上。

#7:代码牛仔
代码牛仔是一种无法阻止的天性。他或者她几乎总是一个厉害的编程者,并且总是能以别人2至3倍的速度完成工作。问题是,这些代码至少有一半都靠偷工减料得来的。代码牛仔认为把代码提交到源码控制系统太麻烦,把配置信息存贮在代码之外太麻烦,和其它人交流太麻烦……你懂我的意思吧。

代码牛仔的代码就好像意大利面条一样搅在一起,因为他或者她工作的事如此之快,以至于必要的重构都没有做到。很有可能的是,七页长的核心功能代码也许看起来就像是教科书上关于“不要这么做”的示例,而这些代码居然还神奇的可以运行。代码牛仔绝对没办法和别人一起工作。而且,如果你让两个代码牛仔进入同一个工程,那这个工程一定会失败,因为一个总是被另一个人对代码做的修改而干扰,他们总是拼命的在开枪射击自己搭档的脚。

当按时完成一个工程比把这个工程做好更重要的时候,把一个代码牛仔加入进去吧,这个工程会在截至日期之前完成的。代码牛仔其实就是一个吵闹版的忍者。只是忍者像做外科手术一样精准的编码,而代码牛仔像一只难以控制的公牛,会把所以挡在它面的东西顶翻。

#8:伞兵
你知道那些电影吧,就是指挥官带着机密作战计划被空降到敌人战线之后。在软件开发中,这样的人叫伞兵。伞兵是你对一个将要失败的工程的最后援助。伞兵们缺乏在一个长期任务上工作的耐心。他们最大的价值是拥有快速学习一堆完全陌生的代码并且使用它们工作的惊人能力。其他程序员也许要花几个星期或者其几个月来熟悉一个工程,以便可以有效的参与其中;伞兵们只需要几个小时或者几天。伞兵快速学会的东西也许不能让他们编写核心代码,但是,没有足够的时间形成一个固定的见解可能会帮助他在整个团队失败的地方取得成功。

#9:庸才
“足够好了”,这就是你从一个庸才那能听到的最好的话。他或者是她总是花更多的时间写出比团队中其他任何人都更差的代码。“缓慢,刚刚符合要求”就是对庸才所作的项目的描述。但庸才们总是能做的“足够好”,以至于刚好不会被解雇。

当你面试这种人的时候,他可以告诉你很多他到参与过的项目,但却很少提到他们到底在这些项目里做了什么。筛出这些庸才的方法很简单:问一下他所做工作的细节,他们会突然得了健忘症。但是,一旦让这种人进入你的组织,你可能要花好几年才能再摆脱他们。

#10:传教士
无论你在用哪种编程环境,传教士总会坚持认为如果你把现有的工具和工序抛弃掉并换成其它的一些东西,会对你有很大的帮助。传教士实际上就是理论家的反面。传教士总是直来直去,对软件开发很了解,但却很少真正的去编码。

传教士有一颗项目经理或者部门经理的心,但却缺乏足够的知识或者经验来完成这个跳跃。所以在传教士最终成为一个纯管理者角色之前,其他人不得不一直忍受传教士们对于彻底革新工作环境的尝试。

R语言做线性拟合

1、线性拟合

#生成测试数据
x = seq(-5,5,0.1)
y = 3*x^2+6*x+9+rnorm(length(x))*3;

#把x^2用I来标记成一个变量
#然后进行线性拟合
z=lm(y~I(x^2)+x)

#绘制数据点
#及拟合曲线
plot(x,y)
lines(x,fitted(z))

2、局部多项式回归拟合

#生成测试数据
x = seq(-5,5,0.1)
y = 3*x^2+6*x+9+rnorm(length(x))*3;

#局部多项式回归拟合
z=predict(loess(y~x))

#绘制数据点
#及拟合曲线
plot(x,y)
lines(x,z)

#lowes默认使用局部多项式回归拟合
#z1=lowess(x,y)
#lines(z1)

3、非线性最小二乘拟合

#生成测试数据
x = seq(-5,5,0.1)
y = 3*x^2+6*x+9+rnorm(length(x))*3
ds <- data.frame(x=x, y=y)

#进行非线性最小二乘拟合
f=function(x, a, b, c, d) {a+b*x+c*x^2}
z=nls(y~f(x, a, b, c), data=ds, start=list(a=9, b=6, c=3))

#输出拟合结果
summary(z);

#绘制数据点
#及拟合曲线
plot(x,y)
lines(x,fitted(z))