Interested in the challenges of managing large catalogs? Then this Elasticon talk “Surfacing the Products You love on Spring” by Julie Qiu from Spring (@jqiu25) might interest you. Spring is a marketplace (lots of diverse data sources) focussing on fashion. The following is my quick summary of the presentation – for details, please watch the video.
The talk starts with the basics of trying to keep an Elasticsearch index in sync with the master data stored in a relational database. Product data generally updates more slowly, but inventory levels and pricing data needs to be updated in real time. (The talk goes into how they addressed some of these challenges.)
Being a marketplace, they have many different sources of data with different structure and quality. So sometimes they get nice feeds, other times they scrape sites. But the talk also addressed challenges like what if the supplier is offering a discount (for a period of time), but its only surfaced at checkout and not on the product page? The talk had a few real-life nuggets embedded throughout.
Where the talk got particularly interesting is when it started to talk about machine learning techniques to help improve user experiences.
For example, not all product lines called blue dresses blue. “Baby powder” was one example color name! One technique they used was machine learning to examine the images they had of products, to help enrich and standardize product attributes. For example, they reduced color names to a set of 16 common color names.
With this enriched data, they then used Twiggle to do natural language parsing of queries to identify categories, attributes, utilization etc. (“Blue dress shirt” is an example on their homepage – where “dress” is the “utilization” and “shirt” is the “category”, which makes sure that the search does not return dresses.)
So by combining machine learning and natural language processing techniques, Spring made the product data easier to search (e.g. standardized color names), then leveraged that richer data with query support. The talk also went on to discuss personalization challenges and techniques – such as watching which products the shopper clicked on to guess what style the user was looking for and incorporate that into ordering search results for the user.
That made me curious to see how Google tools could help. (The talk did not mention all the internal implementation details in the talk – it was a talk on Elasticsearch, not machine learning.)
I quickly discovered Google’s Cloud Vision API. The demo page allowed me to drag an image onto the page and it came back with various product attributes. It was that simple. Note: It was not always strictly correct so I would not show end users the attribute values, but certainly good enough to help classify different types of clothing with zero human effort for “similar product” searches. Something to consider for large product collections.
I also discovered the Google Cloud Natural Language service. Again, there was a demo area on the page so I typed in a few queries. It automatically detected “dress shirt” as an entity with zero configuration effort, but more work would be needed for a production site. For example, should “trousers” and “pants” be considered synonyms? On some sites yes, on others no.
But it was great to see technologies such as machine learning becoming the side topics in a talk rather than something novel or unusual.
Disclaimer: I have no relationship with Spring or Twiggle. I just found the talk interesting.
A very interesting topic Alan, thanks for sharing your finding and ideas.
We have a start-up here in Melbourne working in that space, you may check http://www.okkular.io
The also help to do product classification by images using machine learning.