Oscar Baldenebro

Oscar
Baldenebro

Senior Site Reliability Engineer

Reliability · Observability · Automation · Platform Engineering

Senior SRE with 15+ years supporting and operating revenue-critical, highly regulated systems in financial services and large enterprises. Specialized in incident reduction, observability, automation, and platform reliability across Linux/Windows environments and Kubernetes-based infrastructure. Proven track record of reducing outages, improving MTTR, and translating engineering work into measurable risk reduction and business outcomes.

01Core Expertise

Reliability & Operations

Incident Response On-Call Root Cause Analysis SLA/SLO Management Change & Risk Management

Observability

Splunk Dynatrace Prometheus Grafana

Cloud & Platforms

GCP (GKE) AWS Kubernetes

Automation & IaC

Python Bash PowerShell Terraform Ansible

Operating Systems

Linux (enterprise) Windows (enterprise)

Data & Systems

Oracle SQL Server API-driven platforms

Security & Compliance

Vulnerability remediation CVE management Regulated environments

02Professional Experience

Application Lead Engineer

Senior SRE / Platform Focus

TEKsystems — Client: Bank of America

Nov 2024 – Present
  • Own reliability, performance, and operational stability of the iManage Document Management platform supporting a global financial user base
  • Designed and expanded enterprise observability strategy using Splunk and Dynatrace, improving proactive detection by 35% and reducing incident noise
  • Leading AIOps adoption to enable anomaly detection and predictive failure prevention in production systems
  • Directed large-scale infrastructure initiatives including iManage RAVN expansion from 6 → 48 servers and 95 TB corporate PST migration with full audit and compliance integrity
  • Partnered with Global Information Security to remediate vulnerabilities, achieving 100% CVE compliance and reducing exposure risk by 40%
  • Act as senior escalation point during high-severity incidents, ensuring calm execution and rapid recovery
  • Lead and mentor cross-functional onshore/offshore teams, improving operational consistency and delivery quality
Splunk Dynatrace AIOps iManage CVE Management

DevOps Engineer / Site Reliability Engineer

TEKsystems — Client: Bank of America

Aug 2022 – Aug 2024
  • Automated AWS infrastructure provisioning using Terraform, reducing manual changes and deployment risk
  • Built Python and PowerShell automation leveraging iManage APIs, eliminating ~50% of repetitive operational work
  • Served as Tier-3 / escalation owner for critical incidents, improving MTTR by 50% and sustaining 98% SLA compliance
  • Developed automated ingestion pipelines for Outlook PST migrations, preserving full metadata for regulatory compliance
  • Designed automated document purging workflows (Python + SQL), reclaiming ~500 GB/month in storage and reducing costs
  • Optimized deployment workflows across Linux and Windows servers, reducing downtime by 40% and improving service continuity
Terraform AWS Python PowerShell SQL

Systems Engineer II

SRE / Platform Support

Charter Communications

Nov 2019 – Aug 2022
  • Supported production GKE clusters delivering enterprise-scale services
  • Improved billing and data workflows using Python, Pandas, and SQL, increasing performance by ~30%
  • Designed executive dashboards integrating Python, Oracle, and visualization tools to support C-suite decision-making
  • Resolved complex Tier-3 production incidents, reducing resolution time by 40%
  • Designed and implemented a new call-billing process that eliminated revenue leakage from previously unbilled services
GKE Python Pandas Oracle Kubernetes

Systems Administrator

MIC Customs Solutions

May 2018 – Nov 2019
  • Supported SaaS and on-prem customer environments with high availability requirements
  • Implemented logging and monitoring using Elasticsearch and New Relic
  • Supported CI/CD pipelines with Jenkins and Ansible
  • Automated operational tasks using Python, Bash, and SQL
  • Reduced recurring incidents by ~30% through root cause analysis and proactive remediation
Elasticsearch New Relic Jenkins Ansible

IT Operations & Infrastructure Leader

Club Premier (Aeroméxico Loyalty Program)

Jun 2014 – Apr 2018
  • Led IT Operations and on-call rotations for customer-facing platforms
  • Designed and implemented disaster recovery architecture in AWS
  • Improved platform security posture across applications, servers, and databases
  • Migrated Oracle databases from Windows to Linux, improving performance and reducing licensing costs
  • Increased production uptime from 99.3% → 99.8% for ClubPremier.com and CRM systems
AWS Oracle Linux DR Architecture

Earlier Roles

  • IT Operations Specialist — Club Premier Aeroméxico
  • Junior Developer — GoNet (Aeroméxico)

03Education

Bachelor of Engineering in Electronics

Instituto Tecnológico de Sonora

Information Security Diploma

Instituto Tecnológico de Estudios Superiores de Monterrey (ITESM)

04Certifications

Introduction to Site Reliability Engineering

Google

SRE Fundamentals

Google