Skip to content Skip to sidebar Skip to footer

Django Haystack Edgengramfield Given Different Results Than Elasticsearch

I'm currently running haystack with an elasticsearch backend, and now I'm building an autocomplete for cities names. The problem is that SearchQuerySet is giving me different resul

Solution 1:

After a deep look into the code I found that the search generated by haystack was:

{"query":{"filtered":{"filter":{"fquery":{"query":{"query_string":{"query":"django_ct:(csi.geoname)"}},"_cache":false}},"query":{"query_string":{"query":"name_auto:(mid)","default_operator":"or","default_field":"text","auto_generate_phrase_queries":true,"analyze_wildcard":true}}}},"from":0,"size":6}

Running this query in elasticsearch was given me as result the same 6 objects that haystack was showing...but If I added to the "query_string"

"analyzer":"standard"

it worked as desired. So the idea was to be able to setup a different search analyzer for the field.

Based on the @user954994 answer's link and the explanation on this post, what I finally did to make it work was:

  1. I created my custom elasticsearch backend, adding a new custom analyzer based on the standard one.
  2. I added a custom EdgeNgramField, enabling the way to setup an specific analyzer for index (index_analyzer) and another analyzer for search (search_analyzer).

So, my new settings are:

ELASTICSEARCH_INDEX_SETTINGS = {
    'settings': {
        "analysis": {
            "analyzer": {
                "ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "lowercase",
                    "filter": ["haystack_ngram"]
                },
                "edgengram_analyzer": {
                    "type": "custom",
                    "tokenizer": "lowercase",
                    "filter": ["haystack_edgengram"]
                },
                "suggest_analyzer": {
                    "type":"custom",
                    "tokenizer":"standard",
                    "filter":[
                        "standard",
                        "lowercase",
                        "asciifolding"
                    ]
                },
            },
            "tokenizer": {
                "haystack_ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 15,
                },
                "haystack_edgengram_tokenizer": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 15,
                    "side": "front"
                }
            },
            "filter": {
                "haystack_ngram": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 15
                },
                "haystack_edgengram": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 15
                }
            }
        }
    }
}

My new custom build_schema method looks as follow:

defbuild_schema(self, fields):
    content_field_name, mapping = super(ConfigurableElasticBackend,
                                          self).build_schema(fields)

    for field_name, field_classin fields.items():
        field_mapping = mapping[field_class.index_fieldname]

        index_analyzer = getattr(field_class, 'index_analyzer', None)
        search_analyzer = getattr(field_class, 'search_analyzer', None)
        field_analyzer = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER)

        if field_mapping['type'] == 'string'and field_class.indexed:
            ifnothasattr(field_class, 'facet_for') andnot field_class.field_type in('ngram', 'edge_ngram'):
                field_mapping['analyzer'] = field_analyzer

        if index_analyzer and search_analyzer:
            field_mapping['index_analyzer'] = index_analyzer
            field_mapping['search_analyzer'] = search_analyzer
            del(field_mapping['analyzer'])

        mapping.update({field_class.index_fieldname: field_mapping})
    return (content_field_name, mapping)

And after rebuild index my mapping looks as below:

modelresult: {
   _boost: {
       name: "boost",
       null_value: 1
   },
   properties: {
       django_ct: {
           type: "string"
       },
       django_id: {
           type: "string"
       },
       name_auto: {
           type: "string",
           store: true,
           term_vector: "with_positions_offsets",
           index_analyzer: "edgengram_analyzer",
           search_analyzer: "suggest_analyzer"
       }
   }
}

Now everything is working as expected!

UPDATE:

Bellow you'll find the code to clarify this part:

  1. I created my custom elasticsearch backend, adding a new custom analyzer based on the standard one.
  2. I added a custom EdgeNgramField, enabling the way to setup an specific analyzer for index (index_analyzer) and another analyzer for search (search_analyzer).

Into my app search_backends.py:

from django.conf import settings
from haystack.backends.elasticsearch_backend import ElasticsearchSearchBackend
from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine
from haystack.fields import EdgeNgramField as BaseEdgeNgramField


# Custom Backend classCustomElasticBackend(ElasticsearchSearchBackend):

    DEFAULT_ANALYZER = Nonedef__init__(self, connection_alias, **connection_options):
        super(CustomElasticBackend, self).__init__(
                                connection_alias, **connection_options)
        user_settings = getattr(settings, 'ELASTICSEARCH_INDEX_SETTINGS', None)
        self.DEFAULT_ANALYZER = getattr(settings, 'ELASTICSEARCH_DEFAULT_ANALYZER', "snowball")
        if user_settings:
            setattr(self, 'DEFAULT_SETTINGS', user_settings)

    defbuild_schema(self, fields):
        content_field_name, mapping = super(CustomElasticBackend,
                                              self).build_schema(fields)

        for field_name, field_classin fields.items():
            field_mapping = mapping[field_class.index_fieldname]

            index_analyzer = getattr(field_class, 'index_analyzer', None)
            search_analyzer = getattr(field_class, 'search_analyzer', None)
            field_analyzer = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER)

            if field_mapping['type'] == 'string'and field_class.indexed:
                ifnothasattr(field_class, 'facet_for') andnot field_class.field_type in('ngram', 'edge_ngram'):
                    field_mapping['analyzer'] = field_analyzer

            if index_analyzer and search_analyzer:
                field_mapping['index_analyzer'] = index_analyzer
                field_mapping['search_analyzer'] = search_analyzer
                del(field_mapping['analyzer'])

            mapping.update({field_class.index_fieldname: field_mapping})
        return (content_field_name, mapping)


classCustomElasticSearchEngine(ElasticsearchSearchEngine):
    backend = CustomElasticBackend


# Custom fieldclassCustomFieldMixin(object):

    def__init__(self, **kwargs):
        self.analyzer = kwargs.pop('analyzer', None)
        self.index_analyzer = kwargs.pop('index_analyzer', None)
        self.search_analyzer = kwargs.pop('search_analyzer', None)
        super(CustomFieldMixin, self).__init__(**kwargs)


classCustomEdgeNgramField(CustomFieldMixin, BaseEdgeNgramField):
    pass

My index definition goes something like:

classMyIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    name_auto = CustomEdgeNgramField(model_attr='name', index_analyzer="edgengram_analyzer", search_analyzer="suggest_analyzer")

And finally, settings uses of course the custom backend for the haystack connection definition:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'my_app.search_backends.CustomElasticSearchEngine',
        'URL': 'http://localhost:9200','INDEX_NAME': 'index'
    },
}

Solution 2:

Well, I had a similar problem and my strategy was make a custom backend.

The complete instructions can be found on:

http://www.wellfireinteractive.com/blog/custom-haystack-elasticsearch-backend/

It works to me !

Hope this helps.

Post a Comment for "Django Haystack Edgengramfield Given Different Results Than Elasticsearch"