11th March, 2026
Platform Engineer #1051708
Job Description:
Overview:
- Platform Engineering builds and operates shared infrastructure and paved paths that help product teams deliver securely, reliably, and quickly.
- This role leans toward cloud infrastructure, DevOps, and Site Reliability Engineering (SRE), with strong software development skills.
What you'll do - Design, build, and operate cloud infrastructure and platform capabilities (networking, compute, Kubernetes, CI/CD, secrets, certificates, identity).
- Define and improve reliability using service-level indicators (SLIs), service-level objectives (SLOs), and error budgets.
- Implement observability (metrics, logs, traces) with actionable alerting focused on user impact.
- Create self-service workflows and automation (infrastructure as code, GitOps, build/release pipelines) that reduce toil.
- Improve security and compliance through least-privilege access, secure defaults, policy-as-code, and continuous hardening.
- Participate in on-call rotation, incident response, and post-incident reviews; drive systemic fixes and runbook quality.
- Partner with application teams to improve deployability, resilience, and cost efficiency (capacity planning, autoscaling, graceful degradation).
What we're looking for Required:
- Experience operating production cloud platforms and services (e.g., GCP/AWS/Azure) with an SRE mindset.
- Strong fundamentals in Linux, networking, distributed systems, and debugging complex production issues.
- Proficiency with infrastructure as code and automation (e.g., Terraform, Helm/Kustomize, GitOps tooling).
- Experience with containers and orchestration (Docker, Kubernetes) and modern CI/CD.
- Programming and scripting ability (e.g., Go, Python, Java, TypeScript) to build tooling and automate workflows.
- Clear communication, effective incident leadership, and a customer-focused approach to platform work.
Preferred: - Experience defining SLIs/SLOs and implementing SLO-based alerting and dashboards.
- Observability platform experience (e.g., Prometheus/Grafana, OpenTelemetry, centralized logging).
- Policy-as-code and supply chain security (e.g., OPA/Rego, SLSA concepts, SBOMs, artifact signing).
- Experience building golden paths (container images, templates, reference architectures, paved pipelines) adopted by multiple teams.
- Cost optimization experience (FinOps practices, capacity forecasting, right-sizing, multi-tenant platform controls).
How we work - Automate first: eliminate repeatable manual work; measure and reduce toil.
- Reliability is a feature: design for failure with timeouts, retries with jitter, idempotency, and graceful degradation.
- Small, safe changes: incremental delivery, clear rollback strategies, and continuous improvement.
- Engineering excellence: design reviews, blameless postmortems, and strong documentation/runbooks.
What success looks like - Platform capabilities are easy to adopt, well-documented, and measurably reduce lead time for change.
- Reliability improves over time (SLO attainment, reduced incident frequency/severity, faster MTTR).
- Security posture improves via secure-by-default patterns and automated controls.
Skills Required: Cloud Infrastructure, Python, GCP, Platform Support, Kubernetes
- Cloud Infrastructure > Expectation: A candidate has provisioned and operated production-grade infrastructure on a major cloud provider. For example, they designed a multi-region GCP network topology using VPCs, subnets, firewall rules, and Cloud NAT, managed with Terraform and deployed via a GitOps pipeline. They understand networking primitives, IAM boundaries, compute options, and can explain tradeoffs between managed services vs. self-hosted.
- Python Expectation: A candidate has written production Python tooling or automation. For example, a script that queries the GCP Asset Inventory API to identify over-provisioned IAM bindings, generates a report, and opens a Jira ticket for remediation. Code is structured, testable (pytest), and handles errors and retries gracefully. Not just glue scripts, but maintainable tools used by a team.
- GCP Expectation: A candidate has hands-on experience operating GCP services in a real platform context. For example, running workloads on Cloud Run, using Workload Identity for pod-level IAM, configuring policies, managing secrets in Secret Manager, and setting up VPC Service Controls. They can reason about GCP-specific reliability and security patterns, not just surface-level console familiarity.
- Platform Support Expectation: A candidate has acted as a platform team member supporting internal developer customers. For example, they owned an on-call rotation, triaged and resolved incidents for shared Kubernetes or CI/CD infrastructure, led a blameless postmortem, and shipped a runbook improvement or systemic fix that prevented recurrence. They approach support as an engineering problem, not just a queue.
- Kubernetes Expectation: A candidate has operated Kubernetes clusters in production. For example, they managed cluster upgrades on GKE, written and debugged Helm charts or Kustomize overlays, configured RBAC and Network Policies, implemented HPA/VPA for autoscaling, and troubleshot pod scheduling failures, OOMKills, or service mesh connectivity issues. They understand the control plane well enough to debug it, not just deploy to it.
Skills Preferred: Go, Cloud Architecture, Reliability Engineering
- Go Expectation: A candidate has written Go for platform tooling or infrastructure automation. For example, a Kubernetes admission webhook (validating or mutating) that enforces security policies on workloads, or a CLI tool that wraps kubectl and Vault APIs to simplify developer secret management. Code should be idiomatic Go with proper error handling, context propagation, and unit tests.
- Cloud Architecture Expectation: A candidate has contributed to or led the design of a multi-team or multi-service platform architecture. For example, they designed a shared services network hub-and-spoke model on GCP, defined the golden path for how product teams onboard to the platform (container image standards, CI/CD templates, service mesh configuration), and documented reference architectures adopted by multiple teams. They can articulate tradeoffs and present designs for review.
- Reliability Engineering Expectation: A candidate has formally implemented SRE practices, not just conceptual familiarity. For example, they defined SLIs (e.g., request success rate, latency p99) and SLOs for a shared platform service, configured SLO-based alerting in Prometheus/Grafana that pages on burn rate rather than raw errors, maintained an error budget, and used that budget to gate or slow feature releases. They can explain how reliability engineering changes team behavior around change management.
Experience Required: - Engineer Exp: Prac. In 2 coding lang. or adv. Prac. in 1 lang.
- 6+ years in IT
- 4+ years in development
Additional Info: At FastTek Global,
Our Purpose is
Our People and
Our Planet. We come to work each day and are reminded we are
helping people find their success stories. Also,
Doing the right thing is our mantra. We act responsibly, give back to the communities we serve and have a little fun along the way.
We have been doing this with pride, dedication and plain, old-fashioned hard work for
24 years!
FastTek Global is financially strong, privately held company that is
100% consultant and
client focused.
We've differentiated ourselves by being
fast, flexible, creative and
honest. Throw out everything you've heard, seen, or felt about every other IT Consulting company. We do unique things and we do them for Fortune 10, Fortune 500, and technology start-up companies.
Our benefits are second to none and thanks to our
flexible benefit options you can choose the benefits you need or want, options include:
- Medical and Dental (FastTek pays majority of the medical program)
- Vision
- Personal Time Off (PTO) Program
- Long Term Disability (100% paid)
- Life Insurance (100% paid)
- 401(k) with immediate vesting and 3% (of salary) dollar-for-dollar match
Plus, we have a lucrative employee referral program and an employee recognition culture.
FastTek Global was named one of the
Top Work Places in Michigan by the Detroit Free Press in
2013, 2014, 2015, 2016, 2017, 2018, 2019,
2020, 2021, 2022, and 2023! To view all of our open positions go to: https://www.fasttek.com/fastswitch/findwork
Follow us on Twitter: https://twitter.com/fasttekglobal
Follow us on Instagram: https://www.instagram.com/fasttekglobal
Find us on LinkedIn: https://www.linkedin.com/company/fasttek
You can become a fan of FastTek on Facebook: https://www.facebook.com/fasttekglobal/
AI & Hiring Disclosure We use AI tools to support parts of our hiring process, such as reviewing applications and identifying potential matches. These tools are designed to promote efficiency, consistency, and fairness, and they are always used under human oversight.
All personal data collected is used solely for recruitment purposes, and you have the right to know, access, or request deletion of your data at any time, subject to legal limits.
If AI will be used in a video interview, you'll be informed in advance and asked for your consent, with the option to opt out.
Our tools are regularly reviewed to detect potential bias and to ensure compliance with all applicable laws and our commitment to inclusive hiring.
To learn more or exercise your rights, please contact us at info@fasttek.com.
Apply For Job