Senior Research Engineer, Post-training & Evaluation
Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.
What this role actually needs.
Senior Research Engineer, Post-training & Evaluation at Reddit in Remote (United States). UpJobz keeps this listing high-signal for applicants targeting serious high-tech roles across the United States, Canada, and Mexico. Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.
Day-to-day expectations
A clear list of the work this role is designed to cover.
- Architect and maintain the "Reddit Benchmark" evaluation suite: A comprehensive harness that rigorously tests model capabilities across Safety, Reasoning, and Reddit-specific knowledge (slang, norms).
- Build scalable SFT (Supervised Fine-Tuning) pipelines: Implement efficient, distributed training loops for instruction tuning, converting raw base models into helpful assistants.
- Develop Model-as-a-Judge systems: Engineer automated evaluation pipelines using strong models (e.g., GPT-5, Nova, Claude) to grade the outputs of our internal models, enabling rapid iteration cycles.
- Execute Synthetic Data generation strategies: Create and curate high-quality instruction sets to improve model generalization where human data is scarce.
- Collaborate with Safety Engineering: Translate high-level safety policies into concrete evaluation metrics and unit tests that run in our CI/CD pipelines.
- Debug post-training instability: Dive deep into loss curves and evaluation logs to identify when fine-tuning is causing alignment tax or capability degradation.
What a strong candidate brings
This keeps the job page specific, readable, and easier to match.
- 4+ years of professional experience in machine learning engineering, with a focus on LLM fine-tuning or evaluation.
- Fluency in Python and PyTorch, with experience using libraries like Hugging Face Transformers, vLLM, or lm-eval-harness.
- Deep understanding of Instruction Tuning (SFT) and how data quality impacts model behavior.
- Experience building Evaluation Pipelines: You know the difference between MMLU, GSM8K, and how to build a custom domain-specific benchmark.
- Familiarity with distributed training (FSDP/DeepSpeed) for fine-tuning jobs.
- Strong data engineering skills for curating and cleaning instruction datasets.
Why people would want this job
Benefits help searchers understand whether the role is a real fit before they apply.
- Comprehensive Healthcare Benefits and Income Replacement Programs
- 401k with Employer Match
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
Browse similar jobs
Turn this listing into an application plan.
This is the first pass at the premium UpJobz layer: a fast brief that helps serious applicants move with more clarity.
Next moves
- Tailor your resume around ai and llm instead of sending a generic application.
- Use the first two bullets of your application to connect your background directly to senior research engineer, post-training & evaluation is a high-signal remote role in remote (united states), and it is most realistic for united states residents.
- Open the role quickly if it fits and bookmark three similar jobs before you leave the page.
Interview themes
Watchouts
- Compensation is hidden, so get range clarity in the first recruiter conversation.
- Use united states residents as part of your positioning so the recruiter does not have to infer it.
- Lead with distributed collaboration, async delivery, and timezone discipline.
Search intent signals for this listing
Helpful keyword hooks for serious tech searchers and future programmatic job pages.
Ready to move on this role?
This page keeps the application flow simple while giving you enough context to decide quickly and move.