Our client is a leading technology company specializing in AI-centric cloud infrastructure, which is reshaping the landscape of artificial intelligence. The company operates one of the most powerful commercially available supercomputers and provides scalable AI cloud infrastructure optimized for AI/ML workloads, leveraging thousands of NVIDIA GPUs, high-performance InfiniBand networking, and managed Kubernetes or Slurm orchestration. Their platform supports AI developers and enterprises requiring large-scale GPU compute power and sustainable, energy-efficient data centers.
To democratize access to high-performance AI infrastructure by delivering innovative, scalable, and sustainable cloud solutions that empower AI developers and enterprises worldwide to accelerate their AI and machine learning workloads with unparalleled efficiency and reliability.
We are seeking a Senior Network Engineer to join our client’s infrastructure team. This role is critical in ensuring the smooth and stable operation of the company’s large-scale data center networks and global backbone. You will design, develop, and maintain advanced network architectures supporting thousands of server ports and GPU cluster interconnects, contributing directly to the company’s mission of delivering cutting-edge AI cloud services. This is a remote US-based role requiring frequent travel to data center sites, as well as the ability to work European time zone hours and collaborate with international teams.
Ensure stable operation of data center infrastructure, points of presence, and global backbone networks
Design and develop large-scale data center networks, including InfiniBand-based GPU cluster interconnects
Develop and maintain monitoring and automation tools to improve network operations
Provide technical design, operational support, and collaborate cross-functionally with R&D, SRE, ITDC, and network development teams
Lead and support major network infrastructure upgrades and new region launches
Liaise with vendors for troubleshooting and network infrastructure testing
Maintain comprehensive network documentation and testing plans
Participate in on-call rotations and travel 2-3 times monthly to sites in New Jersey, Kansas City, and occasionally Amsterdam
Work European time zone hours and coordinate with global teams
At least 5 years of experience working in large, complex technology environments
Proven ability to manage and support critical infrastructure serving large user bases
Strong analytical and troubleshooting skills for resolving complex network issues
Hands-on, proactive approach to maintaining and improving network systems
Self-motivated and able to work independently while contributing to a high-performing team
Excellent communication skills with experience collaborating across diverse teams and cultures
Ability to travel domestically and internationally 2-3 times monthly
Legal authorization to work full-time in the U.S. without visa sponsorship
Networking Certifications: CCNP, CCIE, JNCIE, or equivalent expert-level qualifications
Routing & Switching: BGP, IS-IS, Segment Routing MPLS (with IPv6), Ethernet switching, VXLAN, ECMP, L3 MPLS VPNs
Data Center & Cloud Networking: Troubleshooting TCP/IPv4/IPv6 in complex data center topologies (e.g., CLOS networks), cloud overlay network technologies, software-defined networking (SDN)
Vendor Ecosystem: Hands-on experience with Juniper, Arista, Huawei, and Mellanox network equipment
Cloud Platforms: AWS, Azure, Google Cloud
Bonus: Knowledge of GPU and Infiniband networking
Bonus: Programming skills in Python or Go for network automation
Bonus: Experience working in Linux environments
Passion for staying current with HPC and AI infrastructure domains
Commitment to maintaining the highest standards of network reliability and performance in environments where milliseconds matter
Comfort working with international teams and adapting to different operational requirements across global data center locations
Drive to expand technical expertise in cutting-edge areas like GPU networking, software-defined infrastructure, and cloud-native networking
Competitive base salary ranging from $115,000 to $145,000 per year plus quarterly performance bonuses
100% company-paid medical, dental, and vision insurance for employees and families
401(k) plan with up to 4% company match and immediate vesting
Generous parental leave (20 weeks primary, 12 weeks secondary caregivers)
Remote work reimbursement up to $85/month for mobile and internet
Travel allowance and support for frequent business travel
Company-paid short-term, long-term disability, and life insurance
Work with a team operating one of the world’s most powerful supercomputers
Contribute to sustainable AI infrastructure with energy-efficient data centers that reuse waste heat
Enjoy a culture that blends startup innovation with the resources of an established company
Level 1: Interview with the Hiring Manager
Level 2: Internal Routing Skills Interview
Level 3: External Routing Skills Interview
Reference and Background Checks: Conducted after the successful completion of all interview stages
Job Offer: Extended to the selected candidate following successful checks
We are proud to be an equal opportunity workplace and are committed to equal employment opportunity regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity, or expression, sexual orientation, or any other characteristic protected by applicable federal, state or local law.
Please ensure that you regularly check the email address that you provide during the application process for any updates from potential employers. Your application status, interview invitations, or job offers will be sent via email. Respond promptly to maintain your candidacy.