Sr Site Reliability Engineer, AI Platform Inference
Company: Marketo, an Adobe Company
Location: San Jose
Posted on: November 14, 2024
Job Description:
Our Company Changing the world through digital experiences is
what Adobe's all about. We give everyone-from emerging artists to
global brands-everything they need to design and deliver
exceptional digital experiences! We're passionate about empowering
people to create beautiful and powerful images, videos, and apps,
and transform how companies interact with customers across every
screen. We're on a mission to hire the very best and are committed
to creating exceptional employee experiences where everyone is
respected and has access to equal opportunity. We realize that new
ideas can come from everywhere in the organization, and we know the
next big idea could be yours!The OpportunityWe're looking for an
outstanding Site Reliability Engineer for Adobe's AI Inference
Platform, Adobe Firefly. You will be part of a team of Site
Reliability Engineers closely working with the Engineering teams on
building, scaling, and securing the AI Platform. This enables the
Firefly product teams to easily manage and deploy Machine Learning
capabilities used by Adobe client applications.The Applied Research
groups from Adobe Research and other App Teams in Adobe will deploy
thousands of models onto this platform in a variety of lifecycle
stages (early research, development, productization, optimization,
etc). This platform will offer ML model serving at scale, with
high-cost efficiency, and on a wide variety of hardware platforms
across multiple clouds.What You'll Do
- Identify and implement methodologies and solutions to increase
reliability, scalability, security, and efficiency.
- Ensure the highest uptime and Quality of Service (QoS) for
Adobe's customers through operational excellence.
- Define service level objectives (SLOs) and indicators (SLIs) to
represent and measure service quality.
- Support and maintain globally distributed, multi-cloud (public
and/or private) environments.
- Automate common, repeatable tasks at a large scale to
streamline operational procedures.
- Identify areas to improve service resiliency through techniques
such as chaos engineering, performance/load testing, etc.
- Coordinate with other Adobe platform teams and service
providers (primarily AWS) to innovate on Generative AI as a
Service.What You'll Need to Succeed
- A Bachelor's or Master's degree in Computer Science, Electrical
Engineering, a related field, or equivalent industry
experience.
- You excel in undefined environments and get excited about
finding pragmatic solutions to complex technical or organizational
challenges.
- You keep up with the industry trends and grow your knowledge
and skills to solve technical problems.
- Experience in building and scaling distributed systems, as well
as experience with containerization and orchestration technologies
like Kubernetes.
- Production level expertise with containerization orchestration
engines (e.g. Kubernetes) and proven understanding of modern,
continuous development techniques and pipelines (IaC, CI/CD,
ArgoCD, Git).
- Fundamental programming skills, ideally practical experience in
one (and preferably more) of the following languages: Python,
Go.
- Good knowledge of infrastructure configuration management tools
like Ansible and Terraform.
- Experience in using observability and tracing-related tools
like InfluxDB, Prometheus, and Elastic Stack.
- An understanding of AI/ML, including ML frameworks, public
cloud, and commercial AI/ML solutions - familiarity with Pytorch,
SageMaker, HuggingFace, NVIDIA TensorRT or OpenAI Triton a
plus.#FireflyGenAIOur compensation reflects the cost of labor
across several U.S. geographic markets, and we pay differently
based on those defined markets. The U.S. pay range for this
position is $154,000 -- $278,800 annually. Pay within this range
varies by work location and may also depend on job-related
knowledge, skills, and experience. Your recruiter can share more
about the specific salary range for the job location during the
hiring process.At Adobe, for sales roles starting salaries are
expressed as total target compensation (TTC = base + commission),
and short-term incentives are in the form of sales commission
plans. Non-sales roles starting salaries are expressed as base
salary and short-term incentives are in the form of the Annual
Incentive Plan (AIP).In addition, certain roles may be eligible for
long-term incentives in the form of a new hire equity award.Adobe
will consider qualified applicants with arrest or conviction
records for employment in accordance with state and local laws and
"fair chance" ordinances.Adobe is proud to be an and affirmative
action employer. We do not discriminate based on gender, race or
color, ethnicity or national origin, age, disability, religion,
sexual orientation, gender identity or expression, veteran status,
or any other applicable characteristics protected by law. Adobe
aims to make Adobe.com accessible to any and all users. If you have
a disability or special need that requires accommodation to
navigate our website or complete the application process, email or
call (408) 536-3015.Adobe values a free and open marketplace for
all employees and has policies in place to ensure that we do not
enter into illegal agreements with other companies to not recruit
or hire each other's employees.
#J-18808-Ljbffr
Keywords: Marketo, an Adobe Company, Sunnyvale , Sr Site Reliability Engineer, AI Platform Inference, Professions , San Jose, California
Didn't find what you're looking for? Search again!
Loading more jobs...