Solutions Architect, MLOps

Remote
Full Time
Experienced

In the midst of a compute crunch, Salad is sitting on an enormous supply of tens of thousands of GPUs!  We have the capacity, we have the tech stack, and now we need experienced developers to help onboard a growing list of customers needing to serve AI Inference at massive scale on Salad’s platform. 

We are looking for a Solutions Architect with experience deploying AI Models using containers to achieve deployment-success at large scale.  Our customers have found product-market-fit and need back-end architectures that support 100s of GPUs with high reliability and operability.  Every deployment is different and we are now growing a team of experienced developers and solutions architects to help onboard customers to Salad’s fully managed container service. 

This role offers an incredible opportunity to join a GPU Cloud Provider with massive growth potential in the ‘picks and shovels’ sweet spot of the hottest market that is showing no sign of slowing down!  Your work will help push Salad to become one of the largest GPU cloud providers in the world, simultaneously running millions of GPUs for thousands of customers!

What you’ll be Doing:

  • A considerable amount of time will be spent interacting directly with customers, helping architect their AI deployments to be highly performant and reliable across Salad’s distributed GPU infrastructure
  • Keeping up with the latest AI trends and performing benchmarks when the latest open source models drop on Hugging Face, demonstrating Salad’s unmatched cost-performance
  • Documenting what you know and sharing it with others.  Providing reference architectures, github repos, how-to guides, benchmarks and other blog posts to support our marketing efforts
  • Support our business development team by quickly turning around benchmarks and performance numbers to help close deals in the moment!
  • Answering questions and providing mentorship, to both customers and our developer community on reddit, Hugging Face, Twitter, Hacker News and elsewhere
  • Sharing feedback and customer requests with our product team, perhaps even working directly alongside our engineering team to further the SaladCloud platform and our managed container service

What we need to see:

  • A Computer Science or Engineering background, or equivalent experience
  • 2+ years of work-related experience with machine/deep learning and AI frameworks (tensorflow or pytorch)
  • 5+ years of work-related experience with containers, cloud infrastructure, and system architecture
  • Experience working with DevOps in cloud environments, including but not limited to docker/containers, kubernetes, cloud APIs and other cloud technologies
  • Deep understanding of high performance computing supporting AI Inference at scale, GPU computing, networking, storage, and system design
  • Ability to multitask efficiently in a dynamic environment, a willingness to roll up your sleeves and help out the team where needed
  • Strong analytical and problem solving skills
  • The ability to communicate effectively with customers and colleagues
  • Clear written and oral communication skills to produce engaging blog posts and easy to follow reference architectures

To Stand out from the crowd:

  • Recent experience deploying AI models at scale, anything from the Hugging Face trending list
  • Excellent customer facing skills and experience
  • Development experience with GPUs and a knowledge of Generative AI, LLMs, MLOps and cloud oriented workflows using kubernetes/containers
  • Ability to think creatively to debug and solve complex problems 

Comp Range: $150-$180k

Share

Apply for this position

Required*
Apply with Indeed
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*