Senior Data Engineer

Senior Data Engineer


At KhataBook, we’re building utility solutions for small & medium businesses (SMBs). Our first offering – an Android app enabling businesses to digitally record the credit they extend to customers – went viral. In a matter of months, KhataBook has been downloaded 2M times, and more and more businesses are becoming a part of this network.

Some context of the market: 90% of India’s ~$1 Trillion Retail Market is controlled by traditional/unorganized sector. Which means ~$900B worth of commerce flows through ~50M small & medium shops/warehouses/kiosks/homes, scattered all over the country – from mighty metros to tiny villages. A network powered by millions of businesses, built-in turn to power these businesses – that’s our goal at KhataBook.

We’re a small team based in Bengaluru, funded by Sequoia, Info Edge & Y Combinator and we’re always looking for great folks to come and join us on this amazing adventure.


  • Own the technical solution design, act as a technical architect and implement all stages right from data acquisition to integration, both batch and real-time
  • Drive continuous improvements in moving, aggregating, profiling, sampling, testing gigabytes/terabytes of data
  • Build data pipelines data processing tools and technologies in open source and proprietary products like Oracle, Redshift, Hadoop, Pig, Hive, HBase, Spark, MongoDB etc.
  • Be the go to person for product owners and analysts for ETL design, and other related big data and programming technologies
  • Quickly create functioning prototypes to address quickly changing business needs and later revamp prototypes to create production-ready data flows.
  • Proactively identify performance & data quality problems and drive the team to remediate them
  • Harness operational excellence & continuous improvement with a can do attitude.


  • 3+ years of intense experience with large scale data delivery platforms, solutions and designing modern data systems to support exponentially data growth.
  • Hands-on experience with an emphasis on the data lake, data warehouse solutions, business intelligence, big data analytics, enterprise-scale custom data products.
  • Redshift, Hadoop, and Spark platform experience is a must
  • Knowledge of data modeling techniques and high-volume ETL/ELT design.
  • Strong SQL optimization and performance tuning experience in a high volume data environment that utilizes parallel processing.
  • Experience with version control systems (Gitlab), deployment tools (e.g. Airflow, Jenkins) and cloud platforms (AWS, GCP, Azure)
  • Hands-on experience with big data technologies like Hadoop MapReduce, Spark, Hive, Pig, HBase, Elasticsearch, and others.
  • Experience with programming languages like Java, Scala & scripting in Python, Bash.
  • Ability to work effectively in an unstructured and fast-paced environment both independently and in a team setting, with a high degree of self-management with clear communication and commitment to delivery timelines.


Market Rate + ESOP


HSR Layout, Bengaluru

We’re always looking for talented people.

Send your resume to