Hi, I'm Ruchir 👋

Building scalable systems and elegant solutions

Engineer, Loves building all things tech.

Experience

My professional journey and roles

Here Technologies

Software Development Engineer

Mar 2023 - May 2024

Mumbai, India

  • Led the development of a Scala-based component that specializes in attribute validation and geometry point recalibration, integrated as a step in EMR workflows.
  • Developed an algorithm-driven library to detect and resolve road geometry kinks by analyzing angles between shape points, resulting in a 95% reduction of kinks.
  • Automated configuration file creation, enabling rapid job setup by specifying parameters, reducing manual input errors.
  • Designed and implemented a comprehensive test suite for configuration validation, improving CI/CD reliability.
ScalaEMRAWSAlgorithm DesignCI/CD

Amazon Web Services

Software Development Engineer

Jul 2022 - Mar 2023

Bangalore, India

  • Contributed to Amazon Go's IHM Operational Intelligence team, developing a data and ML platform for operational optimization.
  • Engineered extensible data connectors for an ETL library using best practice design patterns.
  • Spearheaded the migration of the Weight Sensor Validation service to AWS CDK.
  • Developed Lambda-based custom resources for dynamic API Gateway configuration.
AWS CDKLambdaAPI GatewayCloudFormationTypeScript

PayPal

Software Development Engineer Intern

Feb 2022 - Jun 2022

Chennai, India

  • Developed a React hook library for real-time tracking of widget performance metrics.
  • Architected and executed a performance dashboard using React for widget metrics visualization.
  • Engineered a NodeJS application for periodic data synchronization between Looker and MySQL.
  • Optimized data retrieval resulting in 10x faster response times.
ReactNodeJSMySQLTypeScriptPerformance Optimization

Projects

Here are some of my recent projects and experiments

High-Performance Neural Search Engine

Neural search engine over 35M Wikipedia embeddings with sub-100ms latency.

  • Sub-100ms query latency
  • Speedup through memory access pattern optimization
  • 13x index size reduction (53.76 GB → 3.36 GB) to fit in GPU memory by using IVF-PQ quantization
  • Production-ready FastAPI server with async processing
NVIDIA cuVSPythonFastAPICUDAVector QuantizationLMDBNumPyCuPy

Neural Ingestion Engine

End-to-end ingestion pipeline that transforms raw web pages into embedding-ready chunks for neural search.

  • Built a modular pipeline covering scraping, parsing to an intermediate representation, context-aware chunking, and embedding generation
  • Experimented with chunking strategies including structural context enrichment and classifier-guided grouping for higher retrieval relevance
  • Explored late chunking with token-level embeddings and adaptive pooling to minimise context loss while keeping latency low
PythonWeb ScrapingNLPVector Embeddings

Synapse

Rust-based N-dimensional array library inspired by NumPy, featuring SIMD-optimised kernels and flexible broadcasting.

  • Implemented multi-dimensional array abstractions from scratch to mirror NumPy
  • Wrote custom matrix multiplication kernels using SIMD intrinsics to squeeze out additional throughput
RustSIMDNumerical ComputingLinear Algebra

Stanford CS336 Systems Implementations

Hands-on reimplementation of LLM training components and systems concepts from Stanford's CS336 course.

  • Recreated core building blocks such as BPE tokenizers, attention modules, transformer layers, and optimizer primitives
  • Trained a miniature language model end-to-end to validate the training stack
  • Studied FlashAttention 2 in depth and reproduced the kernel
  • Built a from-scratch data parallel training setup to explore distributed training trade-offs
PyTorchCUDAFlashAttention 2Distributed Training

Image Fusion

High-performance browser-based image transformation tool with AI segmentation.

  • Developed real-time image processing using Web Workers and WebAssembly
  • Implemented AI-powered image segmentation for precise transformations utilising Segment Anything Model
  • Optimized performance using parallel processing and efficient memory management
  • Event-based architecture for real-time embedding generation
ReactTypeScriptWeb WorkersWebAssemblyPythonFlask

Write-Heavy Key-Value Store

Distributed append-only key-value store with concurrent operations support, automatic background compaction, and in-memory indexing for optimal write performance.

  • Implemented an append-only storage mechanism with O(1) write performance using memory-mapped files
  • Designed thread-safe data structures supporting concurrent reads/writes with lock-free algorithms
  • Built an automatic background compaction system for efficient space management and data consistency
JavaConcurrent ProgrammingData Structures

Let's Connect

I'm always interested in hearing about new opportunities, interesting projects, or just connecting with fellow developers. Feel free to reach out!