About the Role:
Our Data Platform group at Crowdstrike is unique among its kind for being uncommonly customer-focused. In line with classic Data Platform groups, we build and operate systems to centralize all of the data from Falcon Sensors and 3rd Party sources derived from trillions of events/day. We also drive industry-leading innovation on a hyper scale security data lake that helps find bad actors and stop breaches.
Setting ourselves apart, we also make it easy for all customers to utilize the platform for batch and streaming analytics, machine learning and threat hunting through the production and delivery of self-service platforms (Query Platform, Analytics Platform, Enrichment platform ) based respectively upon Spark and Flink as the runtime(s). These self-service platforms allow customers to apply their own custom schema, syntax, data models (etc.) to our historical cyberattack data, modeling threats in a way that empowers them to predictively build (among other things) behavioral automations that defend against such threats 'before' they appear in their own environments.
In this role, you will be a Principal Engineer (E9 - what would typically be called 'Distinguished Engineer' elsewhere) within Data Platform, owning - in entirely hands-on capacity the design, build and delivery of a new self-service data enrichment platform which will take our customers' proactive defense measures to the next level.
As a leader in data platform, you will contribute to the full spectrum of our systems, including query processing, scalable pipeline builds with largely Apache-based ingestion, materialized view, transformation and data storage frameworks, and tools/applications that make data available to thousands of users and hundreds of internal systems.
What You'll Do:
Shape the vision of our Analytics Data Platform for its next phase of growth: building a Unified Data Catalog as well as Query Analysis for structured data stored in different forms (columnar vs graph), and building and optimizing query performance using different techniques of indexing and data partitioning.
Design, develop, and maintain a data platform that processes petabytes of data
Participate in technical reviews of our products and help us develop new features and enhance stability
Continually help us improve the efficiency of our services so that we can delight our customers
Help us research, evolve and implement new ways for both internal stakeholders as well as customers to query their data efficiently and extract results in the format they desire
What You'll Need:
(Any of) 15+ years with B.S. in a related field, 12+ years with M.S. in a related field, or 10+ years with PhD in a related field
6+ years of production experience building self-service platforms based upon Spark, Flink or equivalent
Strong familiarity (and ample hands-on experience) with the Apache Hadoop ecosystem: Spark, Kafka, Hive/Iceberg/Delta Lake, Presto/Trino, Pinot, etc.
3+ years coding in Java, Scala, Kotlin or another JVM language.
Production experience with relational SQL and NoSQL databases, including Postgres/MySQL, Cassandra/DynamoDB, etc.
Proven expertise with multiple big data frameworks, especially handling data volume at multi-petabyte scale.
Proven expertise with algorithms, distributed systems design and the software development lifecycle
Great test driven development discipline
Reasonable proficiency with Linux administration tools
Proven ability to work effectively with remote teams
Bonus Points:
Familiarity with Go
Familiarity with Kubernetes/Mesos or equivalent
Production experience with Flink, especially as runtime for a self-service platform
PandoLogic. Category:Technology,