Designing production systems @ TikTok

Marcus Peh.

Software Engineer @ TikTok.

Building scalable data platforms and algorithmic trading systems. Previously Google, ByteDance, and GovTech.

01 — Experience

Now, and before.

Currently shipping production systems. Previously: engineering internships where I learned to operate at scale.

Now · Current role

TikTok logo

Backend Software Engineer

@ TikTok

Apr 2024 — Present

Singapore

Owning production systems on TikTok's real-time user intelligence platform — a distributed rule evaluation engine and a designed-from-scratch user group analysis system. Responsible for end-to-end reliability: rule-tree execution semantics, async fan-out across downstream services, latency budgets, and observability across millions of QPS with P99 under 50ms. Shipped incrementally through a continuously monitored, fault-tolerant rollout pipeline.

GoPythonMySQLRedisClickHouse

Before · Engineering internships

Google logo

Google

Software Engineering Intern · Google Pay

May 2023 — Aug 2023

Cut QR code loading by 93.5% (1.38s → 0.09s) by removing redundant RPC calls on the hot path.

JavaProtocol BuffersgRPC
ByteDance logo

ByteDance

Software Engineering Intern · Global Payments

Aug 2023 — Dec 2023

Improved API performance by 90%+ for a payments channel platform processing $7.5B+ monthly.

GoMongoDB
GovTech logo

GovTech

Software Engineering Intern

May 2022 — Jul 2022

Improved server query response time by 200× through API restructuring under load.

TypeScriptReactJSPostgreSQL

02 — Side work

Infrastructure on the side.

Three things I run or maintain outside of production.

01 / 03

Home Lab & Observability

Self-hosted infrastructure running Docker containers, Prometheus, Grafana, and an internal service mesh — built as a sandbox for distributed-systems experimentation.

Used as a personal staging environment to validate deployment patterns, failure modes, and observability wiring before applying them at work.

Ubuntu ServerDockerPrometheusGrafana

02 / 03

Multi-Agent Orchestration

Agent runtime with bounded execution, tool-routing, and observability primitives — designed for operating production systems with autonomous workflows.

Applying lessons from real-time distributed systems to AI agents: explicit timeouts, retry policies, structured outputs, and full traces.

PythonDockerLLMs

03 / 03

Low-code Widget Framework

Analytics framework that powers scalable, configurable dashboards inside the live operations platform.

Built and maintained as part of the production platform; adopted by the majority of internal use cases.

JavaSpring BootRedis

03— Currently

Where my attention is.

Research and infra work happening alongside production systems.

01

Multi-agent trading infrastructure

Designing agent orchestration for live market execution.

In progress
02

Portfolio optimization research

Factor models, risk decomposition, and backtest pipelines.

Exploring
03

Home server & observability stack

Prometheus, Grafana, and containerized self-hosted services.

Building
04

OMSCS — Computer Systems

Studying computer systems, databases, and software engineering.

Studying

04 — About

In short.

I'm a software engineer at TikTok building distributed systems for user segmentation and decisioning.

I enjoy designing scalable infrastructure, working on complex systems problems, and shipping things that operate reliably at scale.

Based in
Singapore
Focus
Complex systems
Background
NUS · CS
Studying
Computer Systems

05 — Get in touch

Open to senior backend roles and systems conversations.

Especially interested in real-time infrastructure, decisioning platforms, and distributed-systems teams.