About the role
The project will be taking 5+ years and the main goal is to update and deploy all new software on set top boxes for cable companies around Europe. Any video streaming through any portal (Set top box, Web, App) will be processed through this project.
Team size of teams are on average 6 people including PO. The teams are distributed over Europe and Asia, and there is a core team working from the HQ in Amsterdam.
In this case, we are looking for a Big data applications software engineer.
Essential Technical Skills:
Apache Spark: have big expertise with all currently existing public APIs (Batch APIs (RDD and Spark SQL (DataFrame)) and Streaming APIs (DStream, Spark Structured Streaming(DataFrame)) and deep knowledge of it’s internals (knowledge on how to design, develop, troubleshoot and optimise complex cases). Knowledge on how to use advanced features for this framework to solve cases where standard API do not have required built-in functionality
Apache Kafka-Streams and Apache Kafka-Client APIs: big expertise and deep knowledge of theirs internals, which help with troubleshooting complex cases of failing applications in production. Currently these APIs are the only APIs that has declared support exactly once delivery semantics for Kafka to Kafka data processing applications (important for correct computation of metrics on the platform). Does not have python bindings so java or scala language is required to use this APIs
Akka and Akka-Streams: This framework allows us to spend less time on multithreaded code and concentrate on creating tools which would helps us to improve quality of our pipelines (spark lag checker tool, Kafka-offset tool).
Have good knowledge and big experience in both Java and Scala languages. Most existing technologies which are used on the platform were developed in JVM-based language. This allows us to have to use latest and most stable versions of processing frameworks (useful as it allows us to utilise latest features and latest framework hot fixes). Also this helps us to troubleshoot complex cases as we understand how JVM based applications behave in different corner cases.
Good knowledge and big experience with Python language. This include experience with Python binding of popular data processing APIs (for example PySpark)
Good understanding of internals of all major technologies used on the platform (Apache Kafka, Apache Hadoop (HDFS), ElasticSearch, Kibana, Grafana, Apache Mesos (Marathon and Chronos), Docker, Apache Cassandra and ClickHouse)
What we offer