Lakehouse Overview
The SMACKZ Lakehouse is the analytical data platform. It separates OLAP workloads from the transactional Yum database, providing near-real-time analytics without impacting production API performance.
Architecture
Yum API ──[Redis Streams]──> Writer (1 replica)
|
[PyArrow -> Parquet]
|
v
Cloudflare R2
|
[DuckDB httpfs]
|
v
Query API (N replicas)
|
v
Dashboards / Ad-hoc SQL
Components
| Component | Role | Scale |
|---|---|---|
| Writer | Consumes events from Redis Streams, buffers them, writes Parquet files to R2 | 1 replica (fixed) |
| Query API | FastAPI + embedded DuckDB, reads Parquet from R2, serves analytics endpoints | 1-5 replicas (auto-scaled) |
| Metabase | Self-service dashboard UI with DuckDB driver | 1 replica (fixed) |
Tech Stack
- Python 3.12 + FastAPI + uvicorn
- DuckDB (embedded query engine, reads Parquet via httpfs)
- PyArrow (typed Parquet writing)
- Cloudflare R2 (S3-compatible storage, zero egress fees)
- Redis Streams (event ingestion from Yum API)
Data Tables
The lakehouse currently stores six Parquet-backed tables:
| Table | Description |
|---|---|
orders |
Order headers with status, totals, timestamps |
order_items |
Individual line items per order |
order_status_changes |
Status transition history |
users |
Customer records |
restaurants |
Restaurant profiles |
locations |
Restaurant location data |
Data Freshness
Events flow from Yum through Redis Streams to the Writer, which buffers by time and count thresholds before flushing to Parquet. Typical end-to-end latency is 60-120 seconds.
Local Development
# Start all services (MinIO, Redis, Writer, Query API)
docker compose up --build
# MinIO console: http://localhost:9001 (minioadmin/minioadmin)
# Writer health: http://localhost:8090/health
# Query API docs: http://localhost:8091/docs
# Metabase UI: http://localhost:3010
MinIO replaces Cloudflare R2 for local development, providing an S3-compatible interface.
Deployment
Three workloads on Control Plane, deployed via GitHub Actions on push to qa, staging, or main:
| Workload | Type | URL |
|---|---|---|
smackz-lakehouse-writer |
Standard (1 replica) | Internal only |
smackz-lakehouse-query |
Serverless (1-5 replicas) | Internal only |
smackz-lakehouse-metabase |
Standard (1 replica) | analytics.{env}.smackz.co |
Key Files
smackz-lakehouse/README.md-- Full setup and structuresmackz-lakehouse/config.py-- Settings (R2, Redis, Firebase)smackz-lakehouse/writer/-- Event consumer and Parquet writersmackz-lakehouse/query/-- Analytics APIsmackz-lakehouse/metabase/-- Metabase configuration and seed scripts