Aaron Beppu
I’m an experienced engineer with interests in machine learning, large-scale data analysis, data engineering, real-time systems and related product development.
I’m seeking to bring these skills to applications which can benefit society.
Contact me
Experience
2013 - 2021, Sr to Principal Engineer, Sift (San Francisco, CA)
I have been a key contributor within engineering during a period of continued and transformative growth. During that period I have driven many of the projects underpinning that transformation. Here’s a sample:
Data hacking:
- Measuring and analysing ML bias without user-level ground-truth about sensitive group membership.
- A model calibration system which supports switching between ML models without disturbing the behavior of downstream systems which consume predictions. Simultaneously preserves specific model accuracy metrics while also matching marginal score distributions.
- Online accuracy metrics for payment fraud. Designed analysis and implemented pipeline and reporting for tracking accuracy, in the presence of incomplete and delayed ground truth data.
- Led efforts to speed up our production training pipelines while preserving semantics. Achieved up to 3x speedup.
- Redesign of monolithic scoring service, to multiple services supporting different model versions. Supports “instant” changes to model serving based on live configuration and routing system.
Product:
- A flexible workflow automation product. Customers can describe “if X then Y else Z” conditions and automations which have access to our ML features and predictions.
- A real-time reporting product. Customers can see up-to-the-moment reports and aggregates describing their use of our product.
- Several large, cross-cutting migrations: Zero-downtime migration between incompatible versions of HBase. Migration between distributed queue systems (SQS to Kafka). Encrypted all data at rest; involved replacing every instance in our fleet. Key contributor in inter-cloud migration (AWS to GCP)
Org Hacking:
- Led a rotation-based system to pay down technical debt.
- Organized and launched a trial of “project-based teams” – temporary, inter-disciplinary teams scoped to key projects.
- Ethics Committee founding member.
Feb - July 2013, Software Engineer, Prismatic (San Francisco, CA)
Built and improved a range of backend services, including topic modeling, document life-cycle, and social media integrations.
Jan 2011 - Jan 2013, Software Engineer, Etsy (New York, NY)
Data-mining system to improve search ranking. (e.g. see my Hadop World 2011 presentation)
Big data tools and infrastructure:
- Migrating from EMR to a Hadoop cluster on our own hardware, including a large codebase of legacy jobs
- Breaking performance bottlenecks in our ongoing processing
- Tooling for creating, scheduling and running workflows of jobs
- Tooling for monitoring and error-recovery
Wrote and improved jobs for large-scale click-stream analysis.
Aggregating and reporting data about product search and search quality.
Education
- 2005 - 2008
- BA, Cognitive Science; UC Berkeley, Departmental Citation
Public communications
While most of my work has not been directed at public release, here are some publicly visible artifacts:
Technologies
Languages I have used in production:
java, scala, python, clojure, javascript, php
libraries/frameworks/services:
hadoop, spark, kafka, hbase, elastic search, solr, mysql, postgres, memcached, airflow, oozie, protobuf, avro, thrift, AWS services (many), GCP services (several)