Elasticsearch 之 Facet,elasticsearchfacet


尽管官网上强调,facet在以后的版本中将会从elasticsearch中移除,推荐使用aggregations。但在工作上,自己还是使用了facet。在阅读《Mastering Elasticsearch》的时候,看到了对facet的介绍,介绍的非常的实用和易懂,于是就摘译了一部分出来,供需要的参考。

 

当使用ElasticSearch 刻面(faceting)机制时,需要牢记:刻面(faceting)结果仅在查询(query)结果上计算;如果你在query实体外包含过滤(filter),这样的过滤不会限制刻面统计的文档(document)

来看例子:

  首先,使用以下命令往books索引内插入一些文本:

curl -XPUT 'localhost:9200/books/book/1' -d '{
"id":"1", "title":"Test book 1", "category":"book",
"price":29.99

}'

curl -XPUT 'localhost:9200/books/book/2' -d '{
"id":"2", "title":"Test book 2", "category":"book",
"price":39.99

}'

curl -XPUT 'localhost:9200/books/book/3' -d '{
"id":"3", "title":"Test comic 1","category":"comic",
"price":11.99

}'

curl -XPUT 'localhost:9200/books/book/4' -d '{
"id":"4", "title":"Test comic 2","category":"comic",
"price":15.99

}'

让我们来看看当使用查询(query)和过滤(filter)时,刻面(faceting)是如何工作的。我们将会执行一个简单的查询(query)——返回books索引上的所有文档。同样,我们会包含一个过滤来将查询结果限制仅仅属于book分类(category),以及包含一个针对price字段的范围切面,来查看有多少文档的价格低于30和有多少是高于30.整个查询如下:

{

    "query": {

       "match_all": {}

    },

    "filter": {

        "term": {

           "category": "book"

        }

    },

    "facets": {

        "price": {

           "range": {

               "field": "price",

               "ranges": [

                    {

                       "to": 30

                    },

                    {

                       "from": 30

                    }

                ]

            }

        }

    }

}

执行后,我们将得到以下结果:

{

"hits":{

        "total":2,

       "max_score": 1.0,

        "hits": [

            {

                "_index": "books",

               "_type": "book",

               "_id": "1",

               "_score": 1.0,

               "_source": {

                   "id": "1",

                   "title": "Test book 1",

                   "category": "book",

                   "price": 29.99

                }

            },

            {

               "_index": "books",

               "_type": "book",

               "_id": "2",

               "_score": 1.0,

                "_source": {

                   "id": "2",

                   "title": "Test book 2",

                   "category": "book",

                   "price": 39.99

                }

            }

        ]

    },

    "facets": {

        "price": {

           "_type": "range",

           "ranges": [

                {

                   "to": 30.0,

                   "count": 3,

                   "min": 11.99,

                   "max": 29.99,

                   "total_count": 3,

                   "total": 57.97,

                   "mean": 19.323333333333334

                },

                {

                   "from": 30.0,

                   "count": 1,

                   "min": 39.99,

                   "max": 39.99,

                    "total_count": 1,

                   "total": 39.99,

                   "mean": 39.99

                }

            ]

        }

    }

}

从结果可以看出,尽管filter限制只包括category字段取值为book的文档,但facet并不是只在这些文档上执行,而是在books索引上的所有文档上执行(因为match_all查询)。也就是说,刻面机制在计算的时候是不考虑filter的。但如果filter作为query的一部分呢?比如filtered查询?继续看例子。

{

    "query": {

       "filtered": {

           "query": {

               "match_all": {}

            },

           "filter": {

               "term": {

                   "category": "book"

                }

            }

        }

    },

    "facets": {

        "price": {

           "range": {

               "field": "price",

               "ranges": [

                    {

                       "to": 30

                    },

                    {

                       "from": 30

                    }

                ]

            }

        }

    }

}

返回结果:

{

...

"hits":{

        "total": 2,

       "max_score": 1.0,

        "hits": [

            {

               "_index": "books",

               "_type": "book",

               "_id": "1",

               "_score": 1.0,

               "_source": {

                   "id": "1",

                   "title": "Test book 1",

                   "category": "book",

                   "price": 29.99

                }

            },

            {

               "_index": "books",

               "_type": "book",

               "_id": "2",

               "_score": 1.0,

               "_source": {

                   "id": "2",

                   "title": "Test book2",

                   "category": "book",

                   "price": 39.99

                }

            }

        ]

    },

    "facets": {

        "price": {

           "_type": "range",

           "ranges": [

                {

                   "to": 30.0,

                   "count": 1,

                   "min": 29.99,

                   "max": 29.99,

                   "total_count": 1,

                   "total": 29.99,

                   "mean": 29.99

                },

                {

                   "from": 30.0,

                   "count": 1,

                    "min": 39.99,

                   "max": 39.99,

                   "total_count": 1,

                   "total": 39.99,

                   "mean": 39.99

                }

            ]

        }

    }

}

从返回结果可以看出,这个时候的filter限制了facet的计算范围。

 

现在,想象我们想要仅仅对title字段包含”2”的书籍计算刻面。我们可以在query增加第二个filter,但是这样的话,会限制查询结果,这并不是我们想要的。我们要做的是引入facet filter。

 

在提供facet的同级使用facet_filter,这允许我们限制计算刻面的文本。比如如果想限制刻面计算只针对title字段包含”2“的文本,elasticsearch语句可修改为:

{

    "query": {

       "filtered": {

            "query": {

               "match_all": {

                   

                }

            },

           "filter": {

               "term": {

                   "category": "book"

                }

            }

        }

    }"facets": {

        "price": {

           "range": {

               "field": "price",

               "ranges": [

                    {

                       "to": 30

                    },

                    {

                       "from": 30

                    }

                ]

            },

           "facet_filter": {

               "term": {

                   "title": "2"

                }

            }

        }

    }

}

返回结果:

{

...

"hits":{

        "total":2,

       "max_score": 1.0,

        "hits": [

            {

               "_index": "books",

               "_type": "book",

               "_id": "1",

               "_score": 1.0,

               "_source": {

                   "id": "1",

                   "title": "Test book 1",

                   "category": "book",

                   "price": 29.99

                }

            },

            {

               "_index": "books",

               "_type": "book",

               "_id": "2",

               "_score": 1.0,

                "_source": {

                   "id": "2",

                   "title": "Test book 2",

                   "category": "book",

                   "price": 39.99

                }

            }

        ]

    },

    "facets": {

        "price": {

           "_type": "range",

           "ranges": [

                {

                   "to": 30.0,

                   "count": 0,

                   "total_count": 0,

                   "total": 0.0,

                   "mean": 0.0

                },

                {

                   "from": 30.0,

                   "count": 1,

                   "min": 39.99,

                   "max": 39.99,

                   "total_count": 1,

                   "total": 39.99,

                    "mean": 39.99

                }

            ]

        }

    }

}

从上面可以看出,facet限制在了一个文本。而query没变。

 

现在,假如我们想要对所有category字段为”book“的文档进行query(查询),但是想要对索引中的所有文档都进行facet,改怎么办呢?

直接看语句吧:

{

   "query": {

       "term": {

           "category": "book"

       }

   },

   "facets": {

       "price": {

           "range": {

                "field":"price",

                "ranges": [

                    {

                        "to": 30

                    },

                    {

                        "from": 30

                    }

                ]

           },

           "global": true

       }

    }

}

返回结果:

{

...

"hits":{

        "total":2,

       "max_score": 0.30685282,

        "hits": [

            {

               "_index": "books",

               "_type": "book",

               "_id": "1",

               "_score": 0.30685282,

                "_source": {

                   "id": "1",

                   "title": "Test book 1",

                   "category": "book",

                   "price": 29.99

                }

            },

            {

               "_index": "books",

               "_type": "book",

               "_id": "2",

               "_score": 0.30685282,

               "_source": {

                   "id": "2",

                   "title": "Test book 2",

                   "category": "book",

                    "price": 39.99

                }

            }

        ]

    },

    "facets": {

        "price": {

           "_type": "range",

           "ranges": [

                {

                   "to": 30.0,

                   "count": 3,

                    "min": 11.99,

                   "max": 29.99,

                   "total_count": 3,

                   "total": 57.97,

                   "mean": 19.323333333333334

                },

                {

                   "from": 30.0,

                   "count": 1,

                   "min": 39.99,

                   "max": 39.99,

                   "total_count": 1,

                   "total": 39.99,

                   "mean": 39.99

                }

            ]

        }

    }

}

这就是global带给facet的好处。

相关内容