Skip to content

Lakehouse Overview

The SMACKZ Lakehouse is the analytical data platform. It separates OLAP workloads from the transactional Yum database, providing near-real-time analytics without impacting production API performance.

Architecture

Yum API ──[Redis Streams]──> Writer (1 replica)
                                 |
                           [PyArrow -> Parquet]
                                 |
                                 v
                          Cloudflare R2
                                 |
                           [DuckDB httpfs]
                                 |
                                 v
                          Query API (N replicas)
                                 |
                                 v
                          Dashboards / Ad-hoc SQL

Components

Component Role Scale
Writer Consumes events from Redis Streams, buffers them, writes Parquet files to R2 1 replica (fixed)
Query API FastAPI + embedded DuckDB, reads Parquet from R2, serves analytics endpoints 1-5 replicas (auto-scaled)
Metabase Self-service dashboard UI with DuckDB driver 1 replica (fixed)

Tech Stack

  • Python 3.12 + FastAPI + uvicorn
  • DuckDB (embedded query engine, reads Parquet via httpfs)
  • PyArrow (typed Parquet writing)
  • Cloudflare R2 (S3-compatible storage, zero egress fees)
  • Redis Streams (event ingestion from Yum API)

Data Tables

The lakehouse currently stores six Parquet-backed tables:

Table Description
orders Order headers with status, totals, timestamps
order_items Individual line items per order
order_status_changes Status transition history
users Customer records
restaurants Restaurant profiles
locations Restaurant location data

Data Freshness

Events flow from Yum through Redis Streams to the Writer, which buffers by time and count thresholds before flushing to Parquet. Typical end-to-end latency is 60-120 seconds.

Local Development

# Start all services (MinIO, Redis, Writer, Query API)
docker compose up --build

# MinIO console:   http://localhost:9001 (minioadmin/minioadmin)
# Writer health:   http://localhost:8090/health
# Query API docs:  http://localhost:8091/docs
# Metabase UI:     http://localhost:3010

MinIO replaces Cloudflare R2 for local development, providing an S3-compatible interface.

Deployment

Three workloads on Control Plane, deployed via GitHub Actions on push to qa, staging, or main:

Workload Type URL
smackz-lakehouse-writer Standard (1 replica) Internal only
smackz-lakehouse-query Serverless (1-5 replicas) Internal only
smackz-lakehouse-metabase Standard (1 replica) analytics.{env}.smackz.co

Key Files

  • smackz-lakehouse/README.md -- Full setup and structure
  • smackz-lakehouse/config.py -- Settings (R2, Redis, Firebase)
  • smackz-lakehouse/writer/ -- Event consumer and Parquet writer
  • smackz-lakehouse/query/ -- Analytics API
  • smackz-lakehouse/metabase/ -- Metabase configuration and seed scripts