Vigya Sharma

I’m a Principal Engineer at Amazon Search and a Committer & PMC member for the Apache Lucene project. I specialize in building and scaling distributed search systems. The best way to reach me is through a direct message on my LinkedIn profile.

.

Professional Journey

Amazon Product Search (2021 – Present)

I currently lead technical initiatives in Amazon’s Product Search platform, where I architect systems that serve billions of customer queries across billions of products on Amazon’s e-commerce platform. My work sits at the intersection of distributed systems, traditional information retrieval and modern AI-powered search, focusing on making search more scalable, intelligent, and responsive to natural language queries.

While specifics of my work are confidential to Amazon, my projects have broadly been in one of the following buckets:


Amazon OpenSearch Service, AWS (2014 – 21)

I find it useful to describe my time in Amazon OpenSearch Service (formerly known as Amazon Elasticsearch Service) in two phases:

Early Days (2014 – 18): As founding engineer for the service, I was directly involved in building the service from ground up. I took it to launch, saw it take off, and found everything we’d built get outnumbered and outscaled within a few months – scrambling to change engine parts while the jet was in full throttle!

I wrote core data plane components to ensure cluster membership, prevent and detect split brain issues, expose and integrate with control plane APIs, and provide failure detection and crash-recovery. I built the observability layer for the managed, cloud native, distributed search engine. Over the next 3 years, I would rewrite the monitoring layer twice, as the service scaled by another order of magnitude each time. This gave me some hard learned lessons on responsive caches, API contracts and limits, efficient aggregations, push v/s pull models, async communication and dividing responsibilities across independently scalable systems. I designed and led an in-memory configuration management system that reduced the time to update cluster access policies from 16 mins to under 3 seconds, while reducing deployment costs by 2x. The framework continues to serve as a vehicle for major configuration changes.

This era was like drinking from a technical firehose, and in three short years, I grew from a “green behind the ears” SDE-1 to an SDE-3 (Senior Engineer) responsible for multiple technical teams.

.

Later Days (2018 – 21): Even though the service kept growing at an insane rate, it was now more mature. The org size had grown and we had a lot of help across geographies. As our problems shifted from survival and scaling to deep core innovation, I shifted my focus into Elasticsearch (later forked into OpenSearch) internals.

Given the diverse nature of customer scenarios and businesses supported by the service, optimal workload management was a central to ensuring stable, efficient, and cost-effective clusters. This responsibility lies with the ‘shard management’ engine, which is responsible for shard allocation and movement within OpenSearch.

Over the next several years, I acted as the area lead for this space, leading multiple technical efforts to optimize cluster performance and scale shard management.


Speaking, Writing, and Community Work

Publications

.

Conference Talks

.

Others


Open Source Work

As committer and PMC member for Apache Lucene, I develop features, review code, and help shape project direction. Some of my contributions like in Elasticsearch (primarily bug fixes) and OpenSearch, where my past internal projects have been open-sourced.

Public artifacts for my open source work are readily available on the internet. Below is a living summary of my more notable patches.

.

Apache Lucene

.

OpenSearch:

.

Miscellaneous:


Education