- Galih Laras Prakoso
I've spent a few weeks learning about Search Relevancy and trying to become a Relevance Engineer. One of my efforts is by reading a book by Doug Turnbull and John Berryman called "Relevant Search: With applications for Solr and Elasticsearch". This book explains that when we talk about search relevancy, we're not only talking about customers but instead, we are serving two masters: Customers and Business. Because business users might have an agenda to improve their sales and gain revenue, it's expected that sometimes we will need to improve business metrics by boosting highly profitable items to the top of the search engine results page.
Relevance is the practice of improving search results for users by satisfying their information needs in the context of a particular user experience while balancing how ranking impacts our business's needs. - Relevant Search, Doug Turnbull, John Berryman
Today, I want to bring my theoretical knowledge come into pratice by taking a use case which is E-Groceries App (I've been working for more than 4 years building e-groceries app, so yeah, It's the closest use case for me to start). Let's start by analyzing the problems.
Sometimes we know that there are "no" urgent issues raised by business users or customers; everything looks fine; we just continue doing our daily work without thinking about new solution because everybody seems to be happy! Yeah, that's what I feel before I learn about Search Relevancy and practicing to become Relevance Engineer.
My current perspective is not capable to see the problems, luckily I've decided to expand my perspective and these are the problems I found when I did some deep dive analysis for the last two weeks in e-groceries platform, I found some problems that can be divided into these three perspectives:
- Product and Business
I will try to explain the problems one by one for each perspective.
To understand the problem of customers when using our search engine, as the book said, it's recommended to use a top-down approach instead of the opposite. When we do it bottom-up (starting from our source data model), we will easily get stuck.
The source data model keeps you stuck in a bottom-up, source-data-model-first view of the searchable data. To undergo full signal modeling, you need to think top-down and user first. - Relevant Search, Doug Turnbull John Berryman
A top-down approach means that we need to take the customer's perspective first. We might start by questioning these questions:
- What do customers care about when searching?
- How do customers intuitively expect the orders of the products to be in the SERP (Search Engine Result Page)?
- How do they translate what they want in mind into a search query?
By asking these questions, we could get at least a better understanding of our customers.
What do customers care about when searching?
On e-groceries platform, usually people are more likely to be straight forward about what they want to find. For example, when they search for "apple," they expect to see apples (fruits) at the top of the search results, perhaps in the first two rows. Similarly, when they search for "chicken," they expect to see real chicken in the first two or three rows of the SERP.
Let's say, this is list of items for query "apple" in existing search result:
- Imported Red Apple
- Premium Red Apple
- Envy Apple
As shown in the search result above, customers can easily find what they are looking for using a single keyword query. For a single keyword query, there seems to be no issues with the result of the search engine.
in this section, we will focus solely on matching; we will discuss ordering in the following sections
However, let's consider a double keyword query such as "red apple", And let's say, this is list of items for query "red apple":
- Imported Red Apple
- Imported Red Plum
- Red Onion
- Red Tommato
- Red Dragon Fruit
- Red Chili
Why are red onions, red chilies, red dragon fruit, and red tomatoes appearing in the SERP? We might easily assume that this is because of the keyword "red". Our search engine appears to be "confused" and unable to understand the context of the customer's query.
It is clear that when customers search for "red apple," they do not care about anything else that is "red". In this sample case, our search engine is fail to satisfy customer's needs by staying in the search context.
And also, some customers in e-groceries app might use the search engine to do another thing beyond searching some items, they might use the search engine to compare prices. Because there might be multiple brands selling same items, so when they want to get the cheapest price, they might use the search engine to do that price comparison activity.
How do customers intuitively expect the order of products to be displayed in SERP (Search Engine Result Page)?
Based on my observation by looking at customer's behaviour in e-groceries platform, customers expect the items to be ordered in the following manner when they search for an "apple":
- Primary Items [red apple, green apple]
- Secondary Items [apple juice, apple flavoured tea, hampers that contain apple]
- Tertiary Items [pear, some other items recommended by the recommendation algorithm]
The main idea here is context - customers want to see items that are relevant to their search context. Therefore, items should be ordered based on their context relevancy.
But context relevancy is not the only thing that customer care about in their mind, as we found in preceding section, they also care about price, relatability, or might be the want to be inspired by some other product recommendations. So to sum up, we can map what customer care about and how they expect the order of the products into this table:
It's important to note that we are not talking about personalization or the use of machine learning yet, as another customer might want an item that seems irrelevant to the search context. So in this article, we start from the foundation before we explore to more fancy solutions.
How do customers translate what they want in their mind into a search query?
This last question is also important for us to answer. Using a top-down approach, we should understand how our users translate what they want into a search query.
When searching for a product, our customers mostly use a single keyword or double keyword query, except when they want to be more specific about the packaging size, color, flavor, etc.
|Single Keyword||Double Keyword||Triple Keyword|
|apple||red apple||red apple 1kg|
|rice||shirataki rice||shirataki rice 5kg|
|chicken||frozen chicken||frozen chicken fillet|
Customers use additional keywords to describe the product they are looking for, so they know that when they want a specific product to be displayed on the SERP, they should include more keywords to describe the product in their search query.
Related to preceding section, currently our search engine in this case doesn't have the capability to focus on some context yet, so when user are searching with more keywords, the result will be more absurd for now. So keyword that contains frozen might also displaying frozen meat, frozen fruits because it contains keyword "frozen".
And actually, we could formulize that in e-groceries platform, customer's search keyword might contains these terms:
[Product Name][Brand Name][Variant Name][Category Name]
Product and Business
Actually, for product and business we can start from questioning "What do business users care about our SERP?" by asking this same question we asked on customer's perspective we could map the answer into this table:
Profitability and Retention are of course always becoming their priority, beside that sometimes they want the search result to help them to strengthen their branding (e.g boosting promo items that will give a sense that in our store, we have a lot of available promos, etc.). Or might be business users just want to do some experimentations (e.g boosting less impressed items, boosting items with specific category, etc.). But the prioritization could be easily shifts over time, depends on the situations right now.
To be clear, the metrics from a product and business perspective can be very dynamic over time. Sometimes they want high-profitable products to be prioritized, sometimes they want to introduce new products and prioritize them, sometimes they want to conduct experiments, and there could be many other cases.
Therefore, they need a flexible and reliable interface to execute their initiatives efficiently. To solve this problem, we need to build a new interface specifically designed for product and business users to execute their experiments and initiatives efficiently and reliably.
We know that Search Relevancy is involving multiple teams with different principles and responsibilities. Related to product and business perspective, from organizational perspective, the problems that might happens on the organization is that our framework that might be tightly coupled across teams. This kind of framework could cause these other problems:
- We need to hold a lot of meetings for every experiment and initiative
- A lot of back and forth communications.
- Engineer don't have time to pay their tech debt and exploring new possible solutions to the problems.
To make our collaboration across teams more efficient, we should build or use a better framework that makes our collaboration less dependent on each other across teams.
Most of the technical problems are might be related to the implementation of solutions for the problems mentioned in previous sections and maybe will only coming when we just start or after implementing proposed solutions. However, other fondational problem is might be lack proper documentation that explains the framework we use in the search engine.
In my opinion, the first technical problems on Search Relevancy even before implementing the solution is that we fail to communicate our technical perspective to other teams and perhaps, that's because our lack of understanding about what is truly going on, and there is no formal definition of our workflow and framework.
We might have never formally designed our search engine conceptually to be presented to other teams so they could understand the high overview about how the machine works. To solve this problems, perhaps we could start with the very basic steps by conceptialize these things like signal/feature modeling, redefining the process of indexing, extraction, enrichment, and analysis, and documenting them properly.
So, at least right now, we understand the existing problems before going into deeper investigations and implementations. I know that based on some books and articles, search problems have never been easy, and we face different monsters in different places and situations. But it just makes the situation more exciting! We can imagine it as an endless adventure, and in every new place or situation, we should learn something new to be able to defeat the monsters!
As Doug Turnbull and John Berryman noted in their book Relevant Search, search applications differ greatly from one another. Search problems are case by case, and every search engine will have its own preferences to serve the needs of its customers and business. Even the same grocery app might have different preferences based on various factors such as branding, business strategy, and customer behavior.
If you are curious about my next article, please stay updated on my personal blog. Or, if you want to discuss something, feel free to contact me via my social media accounts! Thanks for reading!