AI Evaluation Data Scientist - Health, Mid Level

Jobright.ai

Date: 2 weeks ago

City: Cupertino, CA

Contract type: Full time

Jobright is an AI-powered career platform that helps job seekers discover the top opportunities in the US. We are NOT a staffing agency. Jobright does not hire directly for these positions. We connect you with verified openings from employers you can trust.

Job Summary:

Apple is a leading technology company focused on health technologies that support users in living healthier lives. The AI Evaluation Data Scientist in the Health team will develop and validate evaluation methodologies for Generative AI systems, design human annotation frameworks, and conduct statistical analyses to enhance the quality of health products.

Responsibilities:

• Design and analyze human evaluations of AI systems to create reliable annotation frameworks, and ensure validity and reliability of measurements of latent constructs

• Develop and refine benchmarks and evaluation protocols, using statistical modeling, test theory, and task design to capture model performance across diverse contexts and user needs

• Conduct statistical analysis of evaluation data to extract meaningful insights, identify systematic issues, and inform improvements to both models and evaluation processes

• Analyze model behavior, identify weaknesses, and drive design decisions with failure analysis. Examples include, but not limited to: model experimentation, adversarial testing, counterfactual analysis, creating tools to assess model behavior and user impact

• Collaborate with engineers to translate evaluation methods and analysis techniques into scalable, adaptable, and reliable solutions that can be reused across different features, use cases, and evaluation workflows

• Work cross-functionally to apply methods to real-world applications with designers, clinical experts, and engineering teams across Hardware and Software

• Independently run and analyze experiments for real improvements

Qualifications:

Required:

• Bachelor's degree (or equivalent experience) in a empirical field with emphasis on quantitative methodologies of human behavior, including HCI, Psychometrics, Quantitative or Experimental Psychology, Educational Measurement, Language Assessment, or a relevant field

• Proficiency in Python and ability to write clean, performant code and collaborate using standard software development practices (e.g. Git)

• Strong statistical analysis skills and experience in crafting experiments, validating data quality and model performance

• Experience in building and extending data and inference pipelines to process large scale datasets

Preferred:

• MS and a minimum of 3 years of relevant industry experience or PhD in relevant fields

• Real-world experience with LLM-based evaluation systems and human annotation and human evaluation methodologies

• Experience in rigorous, evidence-based approaches to test development, e.g. quantitative and qualitative test design, reliability and validity analysis

• Customer-focused mindset with experience or strong interest in building consumer digital health and wellness products

• Strong communication skills and ability to work cross-functionally with technical and non-technical stakeholders

Company:

Apple is a technology company that designs, manufactures, and markets consumer electronics, personal computers, and software. Founded in 1976, headquartered in Cupertino, California, USA, team size 10001+ employees, currently Public Company. Apple has a track record of offering H1B sponsorships.

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume