Senior Java Spark Developer

Apply Now

Company: Diamondpick

Location: Sunnyvale, CA 94087

Description:

Job Summary:

We are seeking a Senior Java Spark Developer with expertise in Java, Apache Spark, and the Cloudera Hadoop Ecosystem to design and develop large-scale data processing applications. The ideal candidate will have strong hands-on experience in Java-based Spark development, distributed computing, and performance optimization for handling big data workloads.

Key Responsibilities:
Java & Spark Development:
  • Develop, test, and deploy Java-based Apache Spark applications for large-scale data processing.
  • Optimize and fine-tune Spark jobs for performance, scalability, and reliability.
  • Implement Java-based microservices and APIs for data integration.

Big Data & Cloudera Ecosystem:
  • Work with Cloudera Hadoop components such as HDFS, Hive, Impala, HBase, Kafka, and Sqoop.
  • Design and implement high-performance data storage and retrieval solutions.
  • Troubleshoot and resolve performance bottlenecks in Spark and Cloudera platforms.

Collaboration & Data Engineering:
  • Collaborate with data scientists, business analysts, and developers to understand data requirements.
  • Implement data integrity, accuracy, and security best practices across all data processing tasks.
  • Work with Kafka, Flume, Oozie, and Nifi for real-time and batch data ingestion.

Software Development & Deployment:
  • Implement version control (Git) and CI/CD pipelines (Jenkins, GitLab) for Spark applications.
  • Deploy and maintain Spark applications in cloud or on-premises Cloudera environments.


Required Skills & Experience:
  • 8+ years of experience in application development, with a strong background in Java and Big Data processing.
  • Strong hands-on experience in Java, Apache Spark, and Spark SQL for distributed data processing.
  • Proficiency in Cloudera Hadoop (CDH) components such as HDFS, Hive, Impala, HBase, Kafka, and Sqoop.
  • Experience building and optimizing ETL pipelines for large-scale data workloads.
  • Hands-on experience with SQL & NoSQL databases like HBase, Hive, and PostgreSQL.
  • Strong knowledge of data warehousing concepts, dimensional modeling, and data lakes.
  • Proven ability to troubleshoot and optimize Spark applications for high performance.
  • Familiarity with version control tools (Git, Bitbucket) and CI/CD pipelines (Jenkins, GitLab).
  • Exposure to real-time data streaming technologies like Kafka, Flume, Oozie, and Nifi.
  • Strong problem-solving skills, attention to detail, and ability to work in a fast-paced environment.

Similar Jobs