Senior Data Engineer
At KhataBook, we’re building utility solutions for small & medium businesses (SMBs). Our first offering – an Android app enabling businesses to digitally record the credit they extend to customers – went viral. In a matter of months, KhataBook has been downloaded 2M times, and more and more businesses are becoming a part of this network.
Some context of the market: 90% of India’s ~$1 Trillion Retail Market is controlled by traditional/unorganized sector. Which means ~$900B worth of commerce flows through ~50M small & medium shops/warehouses/kiosks/homes, scattered all over the country – from mighty metros to tiny villages. A network powered by millions of businesses, built-in turn to power these businesses – that’s our goal at KhataBook.
We’re a small team based in Bengaluru, funded by Sequoia, Info Edge & Y Combinator and we’re always looking for great folks to come and join us on this amazing adventure.
- Own the technical solution design, act as a technical architect and implement all stages right from data acquisition to integration, both batch and real-time
- Drive continuous improvements in moving, aggregating, profiling, sampling, testing gigabytes/terabytes of data
- Build data pipelines data processing tools and technologies in open source and proprietary products like Oracle, Redshift, Hadoop, Pig, Hive, HBase, Spark, MongoDB etc.
- Be the go to person for product owners and analysts for ETL design, and other related big data and programming technologies
- Quickly create functioning prototypes to address quickly changing business needs and later revamp prototypes to create production-ready data flows.
- Proactively identify performance & data quality problems and drive the team to remediate them
- Harness operational excellence & continuous improvement with a can do attitude.
- 3+ years of intense experience with large scale data delivery platforms, solutions and designing modern data systems to support exponentially data growth.
- Hands-on experience with an emphasis on the data lake, data warehouse solutions, business intelligence, big data analytics, enterprise-scale custom data products.
- Redshift, Hadoop, and Spark platform experience is a must
- Knowledge of data modeling techniques and high-volume ETL/ELT design.
- Strong SQL optimization and performance tuning experience in a high volume data environment that utilizes parallel processing.
- Experience with version control systems (Gitlab), deployment tools (e.g. Airflow, Jenkins) and cloud platforms (AWS, GCP, Azure)
- Hands-on experience with big data technologies like Hadoop MapReduce, Spark, Hive, Pig, HBase, Elasticsearch, and others.
- Experience with programming languages like Java, Scala & scripting in Python, Bash.
- Ability to work effectively in an unstructured and fast-paced environment both independently and in a team setting, with a high degree of self-management with clear communication and commitment to delivery timelines.
Market Rate + ESOP
HSR Layout, Bengaluru
We’re always looking for talented people.
Send your resume to firstname.lastname@example.org