# Funnel Tracker

Pulls weekly data from Mailchimp, Facebook Lead Ads, Stripe, and Google Analytics 4, stitches the per-person sources together by email, and writes a per-person funnel, per-campaign rollup, and GA4 traffic-source breakdown to Google Sheets.

## Terminology

To keep paid and free signups distinct everywhere in the code and in the spreadsheet:

- **Lead** — a Facebook Lead Ad submission. Top of funnel.
- **Newsletter subscriber** — a free Mailchimp opt-in (`newsletter_*` in code).
- **Newspaper subscriber** — a paid Stripe customer who has reached a paying status (`newspaper_*` in code).

Source-layer modules (`sources/`) use each platform's native terminology (e.g. "Stripe subscription", "Mailchimp member"). The translation to the domain-layer names above happens in `pipeline/stitch.py`.

GA4 data lives at the aggregate level (sessions, conversion-event counts by source/medium/campaign), not the per-person level — GA4 doesn't expose user identity unless you explicitly set a `user_id`, which we don't. The GA4 numbers go into their own sheet tab as supporting context.

## Setup

1. Create and activate a virtualenv:
   - Linux: `python3 -m venv .venv && source .venv/bin/activate`
   - Windows: `python -m venv .venv && .venv\Scripts\activate`
2. Install dependencies: `pip install -r requirements.txt`
3. Copy `.env.example` to `.env` and fill in credentials.
4. Edit `config.yaml` to set your timezone, Mailchimp audience IDs, Facebook ad account IDs, and Google Sheet ID.
5. Save the Google service-account JSON at the path referenced in `.env` (default `./service-account.json`) and share the target Google Sheet with the service-account email as Editor.

## Run

```
python run.py
```

## Schedule (Linux server, weekly Mondays at 06:00 local time)

```
0 6 * * 1 cd /path/to/Analytic && /path/to/.venv/bin/python run.py >> run.log 2>&1
```

## Adding a new data source

1. Add a fetcher in `sources/<name>.py` that returns raw records.
2. Add a normalizer in `pipeline/normalize.py` that converts raw records into a dataclass from `models.py`.
3. Update `pipeline/stitch.py` if the new source contributes identity information.
4. Wire it into `run.py`.

## Layout

```
run.py               entry point: pull → normalize → stitch → metrics → write
config.yaml          non-secret config (timezone, audience IDs, sheet ID)
.env                 secrets (API keys, service account path) — gitignored
models.py            shared dataclasses
sources/             per-platform API fetchers
pipeline/            normalize, stitch (identity join), metrics
output/              Google Sheets writer
web.py               Flask app that serves dashboard.html behind a login
deploy/              systemd unit for the web service
.deploy-exclude      files rsync must never sync to the server
../.gitlab-ci.yml    at the Shaggy repo root — `deploy_dashboard` job
                     auto-deploys this folder on push to the deploy branch
```

## Deploy

CI/CD pushes to an in-house Linux server on every commit to `main`. The Flask
app (`web.py`) runs under gunicorn behind systemd; the pipeline (`run.py`) runs
via cron on the same host and writes `dashboard.html` to disk where the web app
reads it. `dashboard.html`, `.env`, and credential JSONs live only on the
server — they are never in the git repo.

### One-time server bootstrap

As root on the in-house server:

```bash
# 1. Create deploy user and target directory
useradd -m -s /bin/bash deploy
mkdir -p /opt/funnel-tracker
chown deploy:deploy /opt/funnel-tracker

# 2. Create the venv (as deploy)
sudo -u deploy bash -c 'cd /opt/funnel-tracker && python3 -m venv .venv'

# 3. Let deploy restart the web service without a password prompt
echo 'deploy ALL=(root) NOPASSWD: /bin/systemctl restart funnel-tracker-web' \
  > /etc/sudoers.d/funnel-tracker
chmod 440 /etc/sudoers.d/funnel-tracker
```

From your laptop, push secrets (.env and credential JSONs are gitignored, so CI
will never touch them on the server):

```bash
scp .env deploy@<server>:/opt/funnel-tracker/.env
scp service-account.json oauth-client.json ga4-token.json \
    deploy@<server>:/opt/funnel-tracker/
```

The server's `.env` **must** include `FLASK_SECRET_KEY=<random hex>`. Without
it each gunicorn worker generates its own secret and sessions break across
workers. Generate one with `python3 -c "import secrets; print(secrets.token_hex(32))"`.

After the first CI deploy populates `/opt/funnel-tracker/`, install and enable
the systemd unit on the server:

```bash
cp /opt/funnel-tracker/deploy/funnel-tracker-web.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now funnel-tracker-web
```

The unit binds gunicorn to `127.0.0.1:5000` — assumes a Cloudflare Tunnel (or
nginx) sits in front. For direct LAN access, edit the `--bind` to `0.0.0.0:5000`.

Add the pipeline cron (as deploy) per the Schedule section above, pointing at
`/opt/funnel-tracker/.venv/bin/python /opt/funnel-tracker/run.py`.

### GitLab CI runner access

On the server, generate an SSH key the GitLab Runner will use to log in as
deploy:

```bash
sudo -u deploy ssh-keygen -t ed25519 -f /home/deploy/.ssh/gitlab_ci -N ""
cat /home/deploy/.ssh/gitlab_ci.pub >> /home/deploy/.ssh/authorized_keys
```

Capture the contents of `/home/deploy/.ssh/gitlab_ci` (the private key) and run
`ssh-keyscan <server-hostname>` from anywhere that can reach the server.

In the project on git.lmtribune.com → Settings → CI/CD → Variables, add:

| Variable | Type | Protected | Value |
|---|---|---|---|
| `DEPLOY_SSH_PRIVATE_KEY` | File | Yes | Paste the private key contents. |
| `DEPLOY_HOST_KEY` | Variable | Yes | Paste the `ssh-keyscan` output. |
| `DEPLOY_HOST` | Variable | Yes | Server hostname or IP. |
| `DEPLOY_USER` | Variable | Yes | `deploy` |
| `DEPLOY_PATH` | Variable | Yes | `/opt/funnel-tracker` |

### Update workflow

After bootstrap, every `git push origin main` triggers `.gitlab-ci.yml` →
rsyncs the working tree to `/opt/funnel-tracker/` (preserving `.env` and
credential JSONs via `.deploy-exclude`) → installs any new requirements →
restarts the systemd service. Watch the job at
`git.lmtribune.com/<group>/funnel-tracker/-/pipelines`.

### Two unknowns to verify before relying on this

1. Your self-hosted GitLab must have a Runner that can reach the in-house
   server over SSH. If you don't see Runners under Settings → CI/CD → Runners,
   ask your GitLab admin to register one (a small Docker runner on the same
   server works fine).
2. The in-house server must have outbound network access to mailchimp.com,
   stripe.com, graph.facebook.com, and googleapis.com for the pipeline to run.
