PHOENIX research log

An open, honest record of our Sicily wildfire-detection science — methods, results, negatives, and limits.

Bottom line up front. This is PHOENIX's open research log: the methods we use to detect Sicilian wildfires from satellites and ground sensors, and what we've learned building and testing them. It is deliberately honest — negative results, corrections, and preliminary findings are recorded as first-class entries, because a detection system you can trust is one whose limits are stated plainly. Each entry gives a plain-language summary, then the technical methodology and its caveats. PHOENIX is a small grassroots effort; nothing here is a guarantee of detection, and operational claims are gated behind validation.

subpixel_v1 FRP formula — Wooster audit (non-canonical T⁸ formula found and corrected)

Date: 2026-06-05   Status: correction shipped (defensible after patch)

PHOENIX's `subpixel_v1` Fire Radiative Power (FRP) estimator was audited and the live formula `_wooster_frp_mw` was confirmed WRONG: it used a non-canonical `FRP = σ · A_pix · (T⁸_fire − T⁸_bg)` form with σ=1.89e-19, instead of the canonical Wooster Stefan–Boltzmann pixel-integrated form `FRP = A_fire · σ_SB · (T⁴_fire − T⁴_bg)` with σ_SB = 5.6704e-8. An earlier 2026-05-25 fix had only patched the constant (from 1.89e-7), never the wrong T⁸ exponent.

Synthetic test (1000 K fire / 0.01 km² in a 3 km² SEVIRI pixel): canonical Wooster gives 563 MW; the old PHOENIX function gave 28.6 MW (~20× under at fire-temperature inputs), while in the 295–315 K band PHOENIX actually operated in it over-predicted canonical by ~7×. The retraction scope was narrow: live `/api/detections` was returning zero `subpixel_v1_alpha` rows at the time, and served FRP came from `wind_diff` and `fci_l1c` paths that do not use the audited formula.

The transparency page had publicly asserted "FRP matches Wooster-2005 to within numerical precision," which was false even after the May patch. On 2026-06-05 the formula was corrected to the canonical Stefan–Boltzmann T⁴ form (σ_SB = 5.6704e-8), shipped with the transparency-page amendment and a `/change-log` entry per the no-quiet-corrections policy.

entry 0001

dozier_v1 — bi-spectral sub-pixel fire solver shipped

Date: 2026-06-05   Status: defensible (live, parallel-emit shadow)

PHOENIX shipped `dozier_v1`, a bi-spectral (Dozier) sub-pixel fire solver that inverts paired MIR/TIR brightness temperatures to recover the sub-pixel fire temperature, fire fractional area, and FRP. The implementation (`src/detectors/dozier_v1.py`, ~415 lines) uses `fsolve` + `brentq` numerical root-finding with environment-keyed band selection.

Validation: the unit-test suite passes 5/5, and on the canonical synthetic case (1000 K fire / 0.01 km² in a SEVIRI pixel) it returns 562.7 MW — consistent with the canonical Wooster Stefan–Boltzmann benchmark (563 MW), confirming the inversion is physically grounded. It runs LIVE in parallel-emit mode alongside `subpixel_v1`. A 72-hour `dozier_v1` vs `subpixel_v1` shadow comparison was scheduled. The detector recovers `(T_fire, A_fire, FRP)` whenever the upstream LST gate next admits a candidate, so its yield is bounded by the same SEVIRI/FCI sensitivity floor as the rest of the thermal stack.

entry 0002

PHOENIX satellite honest baseline (recall / precision / lead-time)

Date: 2026-06-13   Status: defensible (accepted baseline; "build up from here")

PHOENIX established an honest, accepted satellite mission-eval baseline after fixing the evaluation harness. The old eval scored real hits as misses because it demanded a detection match a fire's exact minute, but confirmed truth comes from Sentinel-2 burn scars whose timestamp is location-true / time-approximate. Matching on LOCATION (4 km, time-loose) instead flipped gold recall from ~0 to ~99.8%.

Baseline scores (event-level, confidence sweep):

| Detector | gold recall | precision | median lead vs FIRMS | note | |---|---|---|---|---| | subpixel_v1_alpha_retro | 99.8% | 38% | negative (later) | finds all fire locations; too many false alarms; no time edge in this measure | | s2_primary_v1_alpha | 16% | 73% | later | precise but low recall |

Honest read: PHOENIX reliably LOCATES fires, but (a) the fast detector's false-alarm rate is too high and (b) it was not yet measurably faster than FIRMS on the naive metric. The earlier "+624 min before FIRMS / 1687 sole catches" figure was shown to be an over-firing/subset artifact and was corrected. Mark accepted this as the baseline ("that is our score… we build on it and build up"), with the build-up plan being: cut false alarms without losing near-total recall, then win lead-time. Harness: `sat_mission_eval_v3.py` / `sat_mission_eval_v3b_precision.py`.

entry 0003

False-alarm reduction arc — terrain-patch CNN reaches FA 0.017 on held-out core

Date: 2026-06-13   Status: preliminary (soak-ready candidate; NOT promoted)

PHOENIX ran a 12-iteration false-alarm (FA) reduction arc on the retro subpixel detector, taking per-event FA from a baseline of 0.688 down to 0.017 on a held-out dense-coverage core. Key milestones, all on honest time-ordered holdouts:

Comparator delta at the dense core: FA 0.0171 vs VIIRS 0.085 (~5× lower), MODIS 0.064, level with SLSTR 0.019, approaching FCI-L2 ~0.00. Adding static land-cover rasters plateaued at 0.017 (iter 12). The honest limitation: it does not extend to the far-north band (a gold-data shortage, not a model failure). Best candidate = iter-11 terrain-patch CNN, soak-ready south of 38N. NOTHING promoted; promotion is Mark's gate.

entry 0004

FCI ensemble-inclusion recall win SURVIVES false-alarm suppression

Date: 2026-06-15   Status: defensible (eval only; NOT promoted)

Including MTG-FCI in the thermal ensemble was shown to deliver defensibly MORE real fires at matched-or-better precision, even after applying a false-alarm suppression gate to both ensembles. Pre-gate, WITH-FCI beat WITHOUT-FCI on recall by +0.251 (raw pixel precision dropped as FCI floods false positives, as expected).

After applying the FA gate (PHOENIX land/terrain mask + persistence + contextual-confidence threshold — the pod-available equivalent of the iter-11 terrain-CNN) to BOTH ensembles: post-gate recall WITH-FCI still beats WITHOUT by +0.20 (CIs non-overlapping at every gate point). At MATCHED precision the FCI recall gain is +0.10 to +0.32 across the operating curve — never negative, never zero. The gate lifts WITH-FCI pixel precision from 0.016 to ~0.049–0.070 (≈3×) while keeping the small fires: 77 of the 96 sole-FCI real-fire recoveries survive the gate at the loosest operating point (n_truth = 383; FIRMS-VIIRS 4 km clusters + dNBR≥0.27).

Verdict: FCI-in = defensibly more real fires at matched-or-better precision; this closes the one open caveat from the FCI investigation. Caveat: absolute precision here is pixel-level (~0.05–0.09) because the full terrain-CNN was not staged on the pod — but the FCI delta cancels that limitation since WITH vs WITHOUT use the SAME gate. EVAL ONLY, NOT promoted. Artifacts: `ensemble_fci_gated_v1/`.

entry 0005

Localization "7 km NNE bias" REFUTED — residual is isotropic, near the sensor-physics floor

Date: 2026-06-11   Status: refuted (prior claim) / defensible (corrected finding)

A 2026-06-09 public claim of a "systematic ~7 km NNE bias vs FIRMS" (n=9) and its 2026-06-10 reframing as a "bimodal residual" (also n=9) were BOTH REFUTED by a large-sample audit at n=752 matched pairs. The Sicily SEVIRI residual is:

Conclusion: PHOENIX subpixel_v1 detections are in the CORRECT SEVIRI pixels, not the wrong ones. The residual is dominated by the instrument's own spatial resolution at Sicily's viewing geometry — a physics floor no detection algorithm can shrink below ~3 km. There is no systematic direction to push toward; any static Δlat/Δlon calibration would hurt as often as it helps, so the calibration drop-in was rolled back. Honest public framing: "PHOENIX SEVIRI/FCI detections operate near the instrument-physics floor (~6 km median residual vs FIRMS); FIRMS-confirmed positions are <1 km." A public retraction shipped to `/transparency` in English and Italian, with both prior entries kept visible for provenance.

entry 0006

Terrain-parallax geolocation correction — REAL but small, data-capped

Date: 2026-06-14 &nbsp; Status: defensible (EFFIS-validated); NOT promoted

PHOENIX tested DEM-based terrain parallax (`Δ = h·tan(view_zenith)`, shifting the reported position TOWARD Meteosat-0E) against EFFIS 2026 Sicily burn-scar centroids (Copernicus/JRC, 162 perimeters). Terrain parallax is a REAL, directionally-correct, monotone-with-elevation term in PHOENIX's geolocation residual — the earlier "parallax near-zero" reading was confirmed to be a low-terrain artifact.

Numbers (geostationary Meteosat-0E geometry, view-zenith ~45.4–46.9°, azimuth-to-sat ~203.7° SSW over Sicily):

Reconciliation with self-cal: the self-cal translation points NE (+1.78, +2.84 km, mag 3.36) while the parallax shift points SW (mag 0.34) — OPPOSITE directions, so they are additive, separate corrections (not aliased). Verdict: parallax is a free, deterministic correction worth shipping gated to PHOENIX-internal detectors only (never FIRMS/VIIRS/MODIS — double-correction), because it scales with `h·tan(46°)` and becomes a 1–3 km lever the moment an Etna/Madonie/Nebrodi fire is detected. It is NOT the big geoloc win for the current low-terrain residual (that remains FRP-weighted within-pixel de-quantization, ~2–3 km). NOT promoted.

entry 0007

Pre-ignition risk model does NOT verifiably beat FWI — REFUTED headline

Date: 2026-06-14 &nbsp; Status: REFUTED (the "beats FWI" claim); the concentration claim survives

An independent adversarial rebuild REFUTED the headline that PHOENIX's pre-ignition risk model "beats the FWI standard." The original 0.83 figure came from comparing the model to a raw-FWI-threshold strawman (0.757). Against a FAIR strong baseline — FWI's own components run through the SAME HistGradientBoosting model — FWI scored AUC 0.827 vs the full model's 0.815 (the model is WORSE). Paired bootstrap: P(full beats FWI-only) = 8.5%; ΔAUC 95% CI [−0.031, +0.005] (includes/below 0). FWI-only also won on AUPRC and on lift@100.

The study was underpowered (~22–23 days / ~126–266 test positives; model-vs-FWI delta not significant) and the headline figures (0.83 / 4.9× / 33.6%) were each cherry-picked from DIFFERENT configs and splits. No hard data leakage was found, but the model's skill is largely persistent spatial fire-proneness that FWI already encodes — explaining why FWI alone matches it.

THE ONLY DEFENSIBLE CLAIM (survives all attacks): "A daily fire-weather risk map over Sicily concentrates next-2-day ignitions: the top ~10% of cells (~5,000 km²) captures ~30% (honest holdout 31.5%; AUC ≈ 0.79–0.84) — comparable to, NOT better than, FWI run through the same model." Rule going forward: NEVER tell academics or publish "pre-ignition beats FWI." Pre-ignition remains a pinnacle PHOENIX goal; the correction is about honesty of the current claim, and the largest untapped lever is the anthropogenic (human-ignition) layer plus far more data, benchmarked fairly.

entry 0008

Pre-ignition pillar — real targeting signal, bounded by the anthropogenic ceiling

Date: 2026-06-14 &nbsp; Status: preliminary (proof-of-signal, data-limited)

PHOENIX built a first-ever pre-ignition risk forecaster over Sicily on a per-cell-per-day ~3 km grid (57×100), labeling ignition within t+1..t+2 days from FIRMS VIIRS/MODIS + manual truth, on an honest time-ordered holdout. Inputs are per-cell rasters: JRC FWI/FFMC/ISI/DMC/DC/BUI, LSA-SAF FAPAR, ERA5 soil water, and MTG-LI lightning. The gradient-boosted model reached AUC ~0.775 with precision@100 = 0.060 (5.4× lift over the 1.1% base rate), and in v2 the top-10% of cells (~5,000 km²) captured 33.6% of next-2-day ignitions (top-20% 60%, top-30% 82%).

The decisive finding from error analysis: ~2/3 of Sicily's June ignitions are human-caused with no remotely-sensed precursor — 0/40 missed ignitions had same-day lightning and only 6/40 had any Hawkes signal. This is a structural ceiling, independently corroborated by a 35-reference research synthesis concluding Sicily ignition is overwhelmingly anthropogenic (>70% arson + stubble/pasture burning). The weather+fuel+lightning+vegetation model captures the flammability envelope but is structurally BLIND to WHERE a human lights a fire.

Honest caveats: data-limited (~22 days, one season onset; FWI grids only span 2026-05-23→06-14); `fuel_moisture_grid` was degenerate (constant 17.5) and dropped; the literature's 0.90+ AUCs are balanced-sampling/random-split inflated and not apples-to-apples with this time-ordered holdout. Highest-value next lever (named by every research angle): the anthropogenic ignition layer (population/GHSL, roads, WUI/wildland-ag interface, night-lights), used to GATE the daily flammability score. MESOGEOS (Mediterranean datacube, 1 km daily 2006–22) is the top dataset pick for multi-year training.

entry 0009

PHOENIX detection-stack ship inventory (9 components)

Date: 2026-06-05 &nbsp; Status: defensible (shipped; mostly shadow-mode pending signal)

In a single ship day PHOENIX deployed 9 detection-stack components to the DGX pipeline:

1. subpixel_v1 FRP correction — canonical Stefan–Boltzmann T⁴ (σ_SB = 5.6704e-8) — LIVE, with transparency-page + change-log entries. 2. dozier_v1 bi-spectral sub-pixel solver (fsolve + brentq) — LIVE parallel-emit; tests 5/5 PASS; 1000 K/0.01 km² → 562.7 MW. 3. LST resampler (LSA-SAF → SEVIRI/FCI grid, ~220× speedup) — LIVE; 853/904 SEVIRI + 534/540 FCI frames augmented. 4. Multi-sensor voter (≥2-source coincidence, ±30 min / ±5 km) — LIVE; 18 voted events in a 7-day backfill (1.24% rate). 5. FIRMS-anchored prior (relax gate at FIRMS lat/lon ±1 px for 90 min) — SHADOW; 2 TP / 0 FP in a 14-day backfill. 6. Persistence gate (multi-frame, per-env tolerance) — SHADOW; HOLD verdict (+4.11 pp precision but 33% recall loss, below ship bars). 7. Sat thermal validator (conformal singleton, α=0.01) — LIVE; 65.7% validation coverage at 100% precision. 8. Triage UI at adr-wildfire.com/triage (Basic Auth) — LIVE; daily flywheel cron feeds the hard-negative dataset. 9. Prod tree → private GitHub backup — LIVE; auto-push on every commit.

Three gate sweeps ran the same day (SEVIRI LST-augmented, FCI LST-augmented, and a historical 2026-04-23..06-05 retrospective) — all returned NO-WINNER verdicts, confirming the binding constraint is the SEVIRI 3 km / FCI 2 km IFOV physical sensitivity floor (best FCI precision 4.149e-03, ~200× below the 0.80 ship bar), not gate tuning. The morning hypothesis that detections were in the wrong SEVIRI pixels was falsified that evening (projection traced correct to within 1.6 km). Forward paths carried (not shipped): MTG-FCI 2 km, dozier_v1 inversion, candidate-level multi-frame persistence, VIIRS/MODIS-anchored prior expansion.

entry 0010

Persistent-thermal-sources Sicily false-positive catalog (open data, Zenodo DOI)

Date: 2026-05-24 (catalog v1.0.0 + DOI) &nbsp; Status: defensible (published, citable open data)

PHOENIX publishes an open-data catalog of persistent thermal anomalies in Sicily — volcanoes, refineries, glasshouses, solar farms, and quarries — that repeatedly cause false-positive wildfire detections. It is released under CC-BY-4.0 (data) + MIT (scripts) at `github.com/markl02us/persistent-thermal-sources-sicily` and is permanently citable via DOI 10.5281/zenodo.20369891.

How the catalog is built (6 steps): (1) mine the last 30 days of PHOENIX `internal_fires` + `external_fires`, flagging cells with ≥6 hits / ≥3 distinct days / no Sentinel-2-verified burn scar; (2) download a 250 m Esri World Imagery tile per candidate; (3) classify each tile with Claude Sonnet 4.5 into categories (volcanic vent, industrial, glasshouse, solar farm, quarry, urban, ag-burn, fire scar, other) with confidence; auto-promote at confidence ≥0.85 in the auto-annotate categories; (4) enrich with OpenStreetMap Overpass tags + Wikidata; (5) emit a per-source JSON card; (6) route confidence <0.85 candidates to daily human review.

Known anchor sources include Mt. Etna summit craters (15 km radius mask, FP-confidence 1.0), Stromboli, Vulcano (La Fossa), the Augusta-Priolo-Melilli petrochemical complex, and the Gela and Milazzo refineries. A full end-of-day re-review classified the catalog as 19 mask (2 glasshouse + 17 water) / 19 real-fire / 64 ag-burn / 12 unsure on origin, with 14 burn-scar sources wired in. The catalog feeds PHOENIX's `land_mask` FP suppression and is maintained by autonomous scheduled jobs (MODIS daily, FCI 6h, OLCI proxy daily, borderline-recheck daily, weekly SemVer bump).

entry 0011

Alessandria della Rocca ground sensor v1.0 — scoping

Date: 2026-06-05 &nbsp; Status: scoping (build NOT authorized)

PHOENIX scoped a v1.0 ground-based wildfire detection sensor for Alessandria della Rocca (Agrigento, Sicily), the project's anchor pilot site, as the ground pillar that complements the satellite track with on-site optical confirmation. This is the PHOENIX "Track 2" model — the only model artifact pushed to the Sicily collaborator (Gaetano) — distinct from the satellite detector models that run online on PHOENIX infrastructure.

Reference bill of materials: Raspberry Pi 5 + AI HAT (Hailo-8/8L NPU) running a YOLO smoke/flame detector on one RTSP IP-camera stream, deployed as a Hailo Executable Format (`.hef`) compiled from a PyTorch checkpoint; an AREDN mesh node (ham-licensed-operator only, treated as an optional path); a LoRa radio backup (region-locked EU868 for Sicily) for telemetry when AREDN/internet is down; and field hardening (IP66 enclosure, PoE or solar power, camera mount). The ground-sensor vision model must be trained clean-slate on Sicilian footage — it is a PHOENIX-specific model, not a shared module.

Onboarding follows a Wunderground analogue: power on → configure via a captive-portal web UI → device claims itself and reports into the PHOENIX backend over mTLS. Scoping notes: ~80% of the work is non-software (hardware BOM/supply, field hardening, regulatory split); AREDN's ham-license requirement means the unlicensed path must stay clean by keeping AREDN optional; the LoRa frequency plan is region-locked. Mark has NOT authorized the physical build — this remains a feasibility scope awaiting an explicit go.

entry 0012

Seismic-puck addendum to the Alessandria ground sensor

Date: 2026-06-07 &nbsp; Status: scoping (build NOT authorized)

PHOENIX scoped a ground-coupled seismic ("P-wave") puck as an add-on to the Alessandria della Rocca ground wildfire sensor. The rationale: earthquakes are a leading indicator for fires (broken gas lines, downed power, post-quake ignitions), and Sicily/Etna seismic and wildfire risk are co-located, so one node legitimately serves both INGV and PHOENIX. The puck sits 3–5 m from the camera pole on a poured concrete pad (pole-mounting is useless due to wind sway). Power impact is modest: +0.2 to +1.7 W on a ~10–12 W baseline.

Sensor tiers were costed: Tier A (ADXL355 MEMS, +$60 / +0.2 W); Tier B (SM-24 geophone + ADS1256 ADC, +$150 / +0.3 W); Tier C (Raspberry Shake RS1D, +$415 / +0.5 W, INGV-queryable out of the box); Tier D (RS3D 3-component, +$740 / +1.7 W). On 2026-06-08 Mark chose the DIY Raspberry-Shake-equivalent path (so PHOENIX owns the sensor identity), co-located on the camera Pi 5, with calibration by cross-comparison against a known INGV station on Etna; estimated BOM ~$260 (1D) or ~$430 (3D).

Ingestion path: SeedLink over TCP (MiniSEED Steim2), channels EHZ/HHZ/ENZ, via either the AM-network (RS1D) or a new EIDA network code (INGV-Catania as natural host). On-device STA/LTA triggering (ObsPy `recursive_sta_lta`) runs at <5% of one core. A mandatory cross-validation gate requires either a neighbouring AREDN node trigger within ±30 s or an INGV FDSN event within ±60 s / ≤100 km before raising PHOENIX detection sensitivity in the affected cells (`post_seismic_window` = 24 h, 72 h for M≥4.5); a single node never autonomously raises sensitivity. Two questions remain open before procurement: 1D vs 3D, and whether single-node "unverified" raises are allowed at v1.0.

entry 0013

PHOENIX Citizen Android app — delivery-to-citizens pillar

Date: 2026 (active development) &nbsp; Status: defensible (public repo; offline-first reporting app)

The PHOENIX Citizen Android app is the project's citizen-reporting pillar, letting Sicilian (and eventually other) residents instantly report fires from their phone. Reports are validated on the PHOENIX backend against satellite detections (FIRMS / VIIRS / EUMETSAT / SLSTR) and farmer-network corroboration. The app is native Kotlin + Jetpack Compose (Material 3), min SDK 26 (Android 8.0, ~95% of Italian Android users), target SDK 34, MIT-licensed, package `com.phoenix.citizen`, source at `github.com/markl02us/phoenix-citizen-android`.

Architecture is offline-first: every report is written to a local Room database first, then submitted, with a WorkManager periodic SyncWorker (15-minute) draining the queue and refreshing corroboration whenever the device has network. The app calls four backend endpoints on adr-wildfire.com: `/api/citizen_report` (POST), `/api/citizen_report_status` (GET), `/api/detections` (GET, for the map), and `/api/source_health` (GET). A report carries a stable per-install device hash, lat/lon, UTC timestamp, observation type (flame / smoke / unsure), optional observed wind direction, photo, and note. Every POST includes an `X-Integrity-Token` (Play Integrity API) for device attestation.

This app is goal-level, not peripheral: PHOENIX's foundational objective is delivering wildfire information TO citizens better/faster/more accurately than anyone else, so the citizen app, public site, and alerting are core performance surface. The app also makes citizen submissions a high-trust ground-truth source that can confirm satellite detections, especially for the weak-signal small/grass fires that sit below the satellite burn-scar floor.

entry 0014

PHOENIX foundational goal (canonical statement)

Date: 2026-06-13 &nbsp; Status: defensible (canonical mission statement)

PHOENIX's foundational goal, in Mark's own words: "Our goal is to be better, faster, and more accurate than everyone else at delivering this information to the citizens. Through specialized processing we will combine, develop, and innovate ways to deliver wildfire smoke, flame, pre-ignition, or other signals before others, with greater precision on location, and with a higher confidence and assurance that it is an actual fire."

This corrects the too-narrow framing of PHOENIX as "sub-pixel thermal extraction out of SEVIRI/FCI" — sub-pixel is ONE mechanism, not the goal. Decomposed:

Posture: a long-horizon goal with abundant resources to build on. The external voice stays humble and partnership-framed (credit FIRMS/EUMETSAT; "the teammate to have"), while the internal bar is relentlessly "have the best." The quality bar is that the larger fire-detection community never has to ignore a PHOENIX delivery because it is lower quality than what someone else already has — which is why honest, de-inflated win-reporting matters.

entry 0015

Sole-reporter confirmation rate — full-scale defensible result

Date: 2026-06-14 &nbsp; Status: defensible (full-scale, with caveats)

PHOENIX measured how often its "sole-reporter" detections — fires PHOENIX saw that no independent VIIRS/MODIS/SLSTR pass corroborated within 7 km / 12 h — were later confirmed real. Over PHOENIX's full ~7-week record (2026-04-26 → 06-14; the project has no detection history before 2026-04-26), of 6,690 PHOENIX-led detections, 5,563 (83.2%) were truly sole-reporter (the earlier "100% sole" claim was overstated; ~17% had a polar pass nearby).

After maximizing post-hoc verification coverage (PHOENIX's own Sentinel-2 dNBR engine via Microsoft Planetary Computer, plus local VVF/ANSA/SAR), the unverifiable fraction dropped from ~46% to 13.2%. Results (bootstrap 2000×, 95% CI):

The true rate lies between P1 and P2; the gap is the NO_BURN_S2 bucket (dNBR median 0.049, just below the 0.10 scar floor) — weak/partial scars consistent with small/grass fires, concentrated in `wind_diff`, not clean false positives. Per-detector P1: fci_l1c 99.8%, subpixel 92.9%, wind_diff 94.9%. Discipline: the population is large (tight CIs) but the time span is fixed at 7 weeks, and no per-AOI claim is made for thin/cloud-blocked areas (e.g. ADR n=131 reads P1 100% but only 59 adjudicable). The irreducible band (78.9–96.5%) is where Gaetano's ground sensor + citizen reports become the missing field truth.

entry 0016

Delivery-clock lead time — PHOENIX SEVIRI detectors deliver ahead of FIRMS VIIRS

Date: 2026-06-14 &nbsp; Status: defensible (delivery clock), with detector-specific caveats

PHOENIX established a defensible "earlier than FIRMS" claim on the DELIVERY clock — the time a detection actually reaches the citizen (measured from real `ingested_at` minus sensing timestamp), among co-detected cell-day fire events. This superseded an earlier −601-minute figure that was shown to be a measurement artifact (the retro ran daytime-only SEVIRI and paired across calendar days, structurally unable to beat FIRMS NOAA-20's ~00–02h UTC night overpass).

Measured comparator delivery latencies (median): FIRMS VIIRS ~513–560 min (~9 h), FIRMS MODIS ~387 min, SLSTR ~461 min, FCI-L2 ~23.5 min, ground vigili_fuoco ~0 min. PHOENIX delivery latencies (median): wind_diff 0.2 min, subpixel_v1_alpha 0.6 min (SEVIRI geostationary ≈ near-instant), fci_l1c ~240 min, s2_swir ~1225 min (~20 h).

Defensible lead results: PHOENIX subpixel_v1 vs VIIRS = +424 min / 80% PHOENIX-first (n=10, confirms a prior +414/80%); fci_l1c vs VIIRS = +129 min / 72% (n=43, most statistically robust). Honest caveats kept: `wind_diff` vs VIIRS = −58 min / 45% — NOT defensibly earlier, do not claim. And `s2_swir` (~20 h delivery) is SLOWER than FIRMS VIIRS (~9 h), so per-fire s2_swir lead claims are false on the delivery clock. The defensible public "we're faster than FIRMS-VIIRS" story is SEVIRI subpixel + fci_l1c only. An honest-wins reporter vets every public win claim with 5 gates (dedup to one fire event, require multi-family corroboration, both sensing+delivery clocks, same-sensor → sensitivity-not-speed, wind_diff guardrail), dropping ~1,740 overclaims from a recent 3-week window down to 31 real delivery-lead speed wins.

entry 0017

SEVIRI physical sensitivity floor — the binding constraint on thermal detection

Date: 2026-05-22 (confirmed across later gate sweeps) &nbsp; Status: defensible (physics floor)

PHOENIX confirmed that the binding constraint on its geostationary thermal detection is the instrument's physical sensitivity floor, not algorithm or threshold tuning. Backtesting against VIIRS-confirmed fires showed the SEVIRI mid-infrared (MIR) brightness temperature 99th percentile peaks around 305 K and never reaches the ~320 K absolute fire-detection floor — so SEVIRI alone cannot beat FIRMS-VIIRS on small fires; reaching them requires MTG-FCI (2 km) and Sentinel-2 SWIR (20 m).

This floor is reaffirmed by repeated no-winner gate sweeps: on a 2026-06-05 sweep, the hottest MIR pixel in a representative frame was 312.98 K (below the 320 K floor), and 12–15 threshold configurations across LST-augmented SEVIRI and FCI corpora produced no shippable winner (best FCI precision 4.149e-03, ~200× below the 0.80 ship bar). The detector-level conclusion (later confirmed by the n=752 localization audit) is that the detections sit in the correct SEVIRI pixels; those pixels are simply, correctly, not hot enough at SEVIRI's 3 km / FCI's 2 km IFOV versus a VIIRS 375 m sub-pixel fire.

Implication: the lever is not more threshold sweeping (a documented dead end) but better resolution (FCI 2 km, S2 SWIR 20 m), sub-pixel inversion (dozier_v1), multi-sensor fusion, and richer per-candidate features (contextual background tests, terrain morphology) — which is exactly where the false-alarm-reduction and FCI-inclusion work delivered gains.

entry 0018

FIRMS-anchored prior was born-expired for every VIIRS detection — found via a citizen report, fixed

Date: 2026-06-17 &nbsp; Status: correction shipped

BLUF. A citizen report of a fire near the Raffadali wind farm (Agrigento) led us to confirm a real, multi-satellite-detected wildfire that PHOENIX was not surfacing — and, more importantly, to find and fix a systemic bug that had silently disconnected every VIIRS satellite fire-detection (0 of ~1,251 in two weeks) from PHOENIX's own geostationary detectors. The fix is deployed; the critical service stayed healthy throughout.

The fire (ground truth). At 37.433/13.486 on 2026-06-17, three VIIRS passes across two satellites detected fire with rising radiative power (SNPP 11:45 UTC 3.8 MW; NOAA-21 12:51 UTC 5.5 and 11.7 MW, high-confidence, daytime), plus an MTG geostationary active-fire detection at 18:20 UTC. The location shows fire on exactly one day in the archive (not a recurring industrial source). It is a real wildfire below the geostationary mid-infrared detection floor — which is why PHOENIX's own SEVIRI/FCI detectors never independently fired on it.

The bug. PHOENIX uses a "FIRMS-anchored prior": when a polar-orbiting satellite (VIIRS/MODIS) reports fire, the prior temporarily relaxes the detection thresholds of our geostationary detectors at that location so they can confirm a fire the polar sensor already saw. The prior's active window was anchored to the satellite sensing time with a 90-minute lifetime. But VIIRS/MODIS near-real-time data arrives ~154 minutes after the overpass — so by the time the detection is ingested and the emitter runs, the 90-minute window (e.g. 12:51→14:21 UTC) is already in the past. A guard then skipped the prior as "already expired." Result: the prior was born expired for every VIIRS detection. Only low-latency geostationary sources (MTG, SLSTR) ever anchored. Quantified: 0 of 1,251 distinct VIIRS detection-locations over two weeks were anchored; 244 MTG detections were.

The fix. For operationally-recent detections, the active window is now anchored forward from ingestion (+150 min, with a 6-hour staleness cap) so the next SEVIRI/FCI frames fall inside it. Unit-tested (a 130-minute-old VIIRS detection now emits two forward-live priors where the old code emitted zero); the live emitter ran clean; the PHOENIX service stayed active. Sensing-anchored low-latency sources are unaffected.

Why it matters. This re-arms PHOENIX's entire sub-pixel thermal detection path against the highest-precision near-real-time fire source. It does not, by itself, make a sub-floor fire self-confirm — so a complementary surfacing safety-net was added [0020]. The discovery underlines the value of ground reports: a single local tip surfaced a class of misses affecting hundreds of detections.

entry 0019

Multi-sensor surfacing safety-net — so a real fire below the IR floor is never invisible

Date: 2026-06-17 &nbsp; Status: preliminary (shadow; promotion gated)

BLUF. The Raffadali fire [0019] was real and seen by multiple satellites, yet PHOENIX showed nothing because our own detectors couldn't confirm a fire below the geostationary infrared floor, and a single polar-satellite report is treated as a confidence cue, not a confirmation. We added a surfacing safety-net that flags strong, repeated, multi-sensor satellite fire-clusters our own detectors miss, so such a fire is never hidden. It runs in shadow (computed but not yet public) pending a precision gate.

The rule. A cluster is flagged (would-surface, tier = CAUTION) when it has ≥2 distinct sensors OR ≥2 passes, peak fire radiative power ≥2 kW, the weather plausibility arm is clear, and no co-located PHOENIX own-detection confirmed it — i.e. exactly the case where a real fire would otherwise stay invisible.

False-positive control (learned in evaluation). A first run flagged 6 clusters/day, of which evaluation showed ~3 were persistent industrial thermal sources (the Priolo/Augusta petrochemical complex at 21.6 MW recurring; a Catania-area source active 30+ days). We added a persistent-source filter: any location with satellite fire-hits on more than 5 distinct days in the trailing 60 is treated as an industrial flare, not a wildfire, and excluded. After the filter: 4 clusters/day, all genuine one-time fires (recurrence 1–5 days); the large industrial flares are correctly removed.

Evaluation method. Each flagged cluster is labeled against independent ground truth: sustained multi-day detection, later own-detector confirmation, a post-fire Sentinel-2 dNBR burn scar (≥0.10), or a confirmed truth event. A flag is a false positive only when a post-fire Sentinel-2 validation (computed *after* the fire — a pre-fire scene reads ≈0 and must not be mistaken for "no burn") shows no scar. Precision is tracked daily.

Status. Shadow only — no public-site or alerting change. Current precision 1 true-positive / 0 false-positives / 3 pending, with 7 industrial false-positives avoided by the filter. Promotion to a live public CAUTION tier is gated on ≥3 labeled clusters at ≥85% precision with zero unexplained false-positives and a sane (~≤10/day) volume. Until then it informs evaluation only.

entry 0020

Hyperspectral burn detection — PRISMA fails on faded ag scars, EMIT succeeds on real fires

Date: 2026-06-17 &nbsp; Status: defensible (eval only; not promoted)

BLUF. We tested whether hyperspectral satellites (ASI's PRISMA, NASA's EMIT — ~230–285 narrow bands, 30–60 m) can independently locate burn scars as a precision truth source for PHOENIX. On *faded agricultural* scars the answer is no — but that is a property of the target, not the sensor: on *real severe* burns, hyperspectral detection works and is confirmed by independent Sentinel-2 change-detection. Localization is at the current ~1.36 km scar-centroid floor at 60 m, plausibly below it at 30 m (not yet settled). Nothing was promoted.

PRISMA, faded ag scars — negative, three ways. A near-cloud-free PRISMA scene (2026-06-09) over 22 late-May 2026 burn scars in the Sicani failed to separate any of them from the surrounding farmland using (a) seven standard burn indices, (b) a full-spectral adaptive-cosine-estimator char detector trained on confirmed burns, and (c) a cross-sensor change index (pre-fire Sentinel-2 minus post-fire PRISMA). The cause is mechanistic: these were ephemeral agricultural/grass burns that recover or are tilled within ~2 weeks, in a summer-harvest landscape where bare fields already look char-like. A single-date absolute index cannot recover a scar that is physically gone; the original detection worked only because Sentinel-2 used pre-vs-post *change* detection at the time of burn.

EMIT, real July-2023 mega-fire burns — positive. Pivoting to the fair arena, an EMIT scene (2023-08-10) over the Palermo/Sicani hinterland after the catastrophic July-2023 Sicily fires shows ~800 coherent burn regions; the large scars separate at 1.3–2.0 standard-deviations with negative-NBR char cores. The top three were validated against independent Sentinel-2 differenced-NBR (pre-June vs post-August 2023): all three confirmed at dNBR +0.32 / +0.33 / +0.43 (moderate-to-high severity). Hyperspectral reliably detects real persistent burns.

Localization. EMIT (60 m) burn centroids sit a median 1.38 km from the Sentinel-2 burn centroids — at the ~1.36 km scar-centroid floor, but this is an upper bound (it compares different delineations of large irregular burns at coarse resolution). On the burns it could see, 30 m PRISMA localized to ~0.35 km. So below-floor hyperspectral localization is plausible at 30 m but not conclusively demonstrated; settling it needs 30 m imagery with matched like-for-like delineation.

Also validated: a PRISMA live-fuel/canopy-moisture layer (narrow water-absorption bands) is physically consistent (water > vegetation > bare, mutually agreeing, spatially coherent) — a differentiated pre-ignition fuel-state input, with predictive value pending a fire-season scene. EMIT's open fire-season Sicily archive (2023–2025) is a historical hyperspectral corpus we can mine for severity and fuel.

entry 0021

Lightning is a spatially-specific ignition precursor — quantified by case-crossover

Date: 2026-06-18 &nbsp; Status: preliminary (one window; needs cross-season replication)

BLUF. Motivated by dry lightning near the Raffadali fire, we asked whether lightning predicts fire ignition in our data. Using a case-crossover design with two controls, lightning is a genuine, spatially-specific precursor: a fire location is ~2.5× more likely than a 50 km-shifted location to have had a recent nearby strike, and ~18–20% of confirmed fires carry a spatially-specific lightning precursor beyond the ambient rate. This is a candidate pre-ignition signal, not yet operational.

Method. For 1,991–2,109 confirmed Sicily fires within the lightning-data window (2026-05-25 to 06-16) we measured, for each fire, whether any lightning struck within R km and in the preceding window. Two controls isolate the signal from confounds: a time control (same location, random times in the coverage window — removes static spatial confounding) and a space control (same time, location shifted ~50 km east — tests whether the lightning must be *at* the fire rather than just a stormy day everywhere). A real precursor requires the fire rate to exceed both controls.

Result. At a 48-hour window the spatial test is clean across a radius sweep: even requiring a strike within 2 km, fires show a 0.351 precursor rate versus 0.177 at the +50 km control (odds ratio 2.52); the excess (fire minus space-control) is a stable ~0.17–0.20 across all radii 2–20 km, while the random-time control is far lower (0.069 at 2 km). The 24-hour window is weaker (odds ratio ~1.3). Interpretation: the period was lightning-saturated (high ambient rate), but the *spatially-specific* excess is robust and not an artifact.

Caveats (first-class). One ~3-week window; the +50 km control could carry terrain/coast bias (mitigated by radius-stability); association is not causation (holdover and dry-lightning ignition are plausible); most Sicily fires are human/agricultural, so this is a real but minority subset. This refines the earlier "lightning is an additive timing signal" finding with a concrete effective scale (~2–5 km / 48 h) and attributable fraction.

Forward. Prototype a lightning-anchored ignition-risk prior — the pre-ignition analogue of the FIRMS-anchored detection prior [0019] — that pre-elevates detection sensitivity near recent strikes, most valuable fused with dry-fuel and fire-weather indices. Gated on cross-season replication before any operational use.

entry 0022

Which fires we miss — characterizing the sub-floor detection gap

Date: 2026-06-18 &nbsp; Status: defensible (eval only)

BLUF. PHOENIX's own geostationary detectors independently confirm about 85% of the wildfire detections reported by polar-orbiting satellites (NASA VIIRS) over Sicily; the ~15% we miss are systematically smaller and more nocturnal — exactly the class of fire that sits below the geostationary infrared detection floor (the Raffadali windmill fire was one). The gap has a clear, separable signature, which is good news: it means it can be targeted, and it is already partly closed by two changes made this week.

Method. Over a two-week window we took every distinct VIIRS fire detection (deduplicated by location and day) and asked whether PHOENIX's own detectors produced a co-located detection (within 3.5 km). We split the detections into "caught" and "missed" and compared their fire radiative power (a proxy for fire size/intensity) and day/night timing. Read-only on the production database; nothing changed.

Result. Of 1,215 distinct VIIRS detections, 1,033 were caught and 182 missed (15.0%). The missed fires are ~2.4× smaller in radiative power (median 2,570 W vs 6,170 W; the missed quartiles 1,462–5,950 W sit largely below the caught range) and more nocturnal (65% of misses at night vs 52% of catches). This is the expected physics: a small or cool fire radiates too little to cross the mid-infrared threshold of a 2–3 km geostationary pixel, so a 375 m polar sensor sees it first.

Caveats (first-class). The "caught" test is generous (it ignores timing within the window), so 15% is a rough upper bound on the miss rate; a detection counted as missed here may still be confirmed by other channels. One two-week window. A per-pass persistence metric was discarded because the polar feed re-inserts each detection many times, contaminating pass counts.

Why it matters / recovery path. The miss class is separable (size + timing), so it is addressable rather than random. Two changes already target it: (1) the FIRMS-anchored-prior fix [0019], which re-arms our own detectors with relaxed thresholds exactly where a polar sensor reports fire; and (2) the multi-sensor surfacing safety-net [0020], which surfaces a strong polar-detected fire even when our own detectors cannot confirm it. A simple size-and-time rule — or a trained classifier — can additionally prioritise the high-confidence small-fire detections for review. The goal remains detecting the large majority of fires, and this quantifies precisely which ones we currently don't.

entry 0023

Night misses are a size effect, not a blind spot — the detection floor is diurnally flat

Date: 2026-06-18 &nbsp; Status: defensible (eval only)

BLUF. Our previous entry [0023] found that the fires PHOENIX misses skew nocturnal, which invites a tempting conclusion — that our detectors are weaker at night. They are not. At matched fire size, day and night confirmation rates are the same. The real reason is that the polar sensor (VIIRS) detects many more *small* fires at night, and small fires are the below-floor class. So the lever to pull is fire-size sensitivity, not a nighttime-specific fix. This is the kind of confound worth catching before building the wrong thing.

Method. Using the same two-week VIIRS-vs-own-detector dataset, we split detections into day/night (from the polar product's day/night flag) and computed our own-detector confirmation rate within fire-radiative-power bins. If night were a genuine blind spot, confirmation would be lower at night *at matched power*. Read-only.

Result. Overall confirmation is 0.89 by day vs 0.83 at night — but that gap vanishes once you control for fire size:

| Power bin (W) | day confirm (n) | night confirm (n) | |---|---|---| | 1,500–3,000 | 0.76 (55) | 0.78 (59) | | 3,000–6,000 | 0.84 (127) | 0.78 (18) | | 6,000–12,000 | 0.86 (173) | 1.00 (12) | | 12,000+ | 0.98 (200) | 1.00 (58) |

Within each size band the day/night rates are comparable (differences sit inside the small-sample noise). The overall night deficit comes entirely from the *distribution*: the smallest bin (0–1,500 W) holds 120 night detections versus only 5 by day — VIIRS's stronger thermal contrast against a cool nighttime background surfaces far more small/smouldering fires at night, and those are precisely the ones near our geostationary floor.

Caveats (first-class). A few day cells have small samples (e.g. 5 detections in the smallest day bin), so individual cells are noisy; the robust finding is the matched-size comparability plus the night-skewed size distribution. One two-week window.

Implication. Don't build a night-specific detector boost — it would chase a confound. The genuine lever is small-fire sensitivity (the infrared floor itself): temporal accumulation across consecutive geostationary frames, the polar-anchored threshold relaxation [0019], and the multi-sensor surfacing safety-net [0020]. Those target the actual missed population — small fires — regardless of the hour.

entry 0024

Lightning comes with storms, not drought — why the "dry-lightning" prior isn't testable yet

Date: 2026-06-18 &nbsp; Status: preliminary (inconclusive; data-limited)

BLUF. We found earlier that lightning is a spatially-specific precursor to fire [0022], and the natural next step was a *conditional* prior: trust lightning most when fuel is dry (classic dry-lightning ignition). The data does not support that — and shows why. Lightning precursors cluster under humid, stormy conditions, not dry ones, because lightning requires thunderstorms, which bring moisture. The clean test is also blocked by weather-station representativeness. So we are not operationalising a dryness-gated lightning prior; the honest next step is better fuel-moisture data.

Method. For confirmed fires within the lightning-coverage window we compared fire-local relative humidity and recent precipitation for fires *with* versus *without* a nearby recent strike (within 5 km / 48 h), and computed the lightning-precursor rate in a dry stratum (RH < 45% and precip < 0.5 mm) versus the rest. Read-only.

Result. Fire-local humidity is essentially identical with vs without a lightning precursor (median RH ~66% both; median precip 0 mm both). Stratified, the lightning-precursor rate is 0.73 in the wet/humid stratum and 0.00 in the dry stratum (n = 27 dry). Two things follow: (1) lightning is associated with *humid* conditions, as expected physically — storms produce both the strikes and the moisture — so a simple "lightning + dry" gate would almost never fire; (2) only 27 of 624 fire-sites register as "dry," which is implausibly few for a Sicilian fire season and indicates the available weather stations (likely coastal-biased) do not represent inland fire-site fuel dryness.

Why it matters. This pumps the brakes, correctly, on two things. First, the dry-lightning ignition mechanism is not demonstrable here, and the [0022] lightning-precursor signal should be held cautiously — part of it may be seasonal/terrain co-occurrence (mountains get both more storms and more fires) rather than strikes causing fires. Second, the binding limitation is fuel-moisture data at the fire site, not the lightning analysis. The right path is to bring in site-representative dryness — the PRISMA/EMIT hyperspectral fuel-moisture layer [0021] or a reanalysis product (ERA5-Land) — and re-test the lightning signal conditioned on *measured* fuel state, ideally using conditions at the strike time rather than the fire time.

Status. Inconclusive by design — a negative on the simple hypothesis plus a clear data requirement. Nothing promoted. The lightning-anchored prior is parked behind the fuel-moisture acquisition track.

entry 0025

Anatomy of our false positives — the raw candidate stream, and why multi-sensor agreement is near-perfect

Date: 2026-06-18 &nbsp; Status: defensible (eval only)

BLUF. This entry looks at where PHOENIX's *raw* satellite fire-candidates go wrong, using Sentinel-2 as the burn arbiter. Important framing first: these are raw candidates — the input our voting, persistence, weather and validator gates filter — not our shipped detections. Of the raw candidates that get Sentinel-2-checked, about 69% come back as no-burn, and crucially those false positives are transient one-offs (cloud, dust, sun-glint, warm bare soil), not industrial flares — only ~3% sit at recurring thermal sites. The standout positive: candidates corroborated by an independent satellite (FIRMS) are 99% real, which makes multi-sensor agreement our single strongest precision lever.

Method. We used the Sentinel-2-adjudicated truth table (a detection is "real" if a post-fire differenced-NBR burn scar is found, "false" if the surface is unburned). We computed the real-vs-false split for the raw candidate stream overall and per reporting source, the severity breakdown of the false ones, and how often false vs real events sit at recurring (industrial-like) hotspots. Read-only.

Result. - Raw S2-checked candidates: 2,467 real vs 5,255 no-burn — i.e. the raw candidate stream is ~31% real before filtering. Again: this is the gate *input*, not the shipped output. - By source: independent-satellite (FIRMS) corroborated candidates are 99% real (80/81). The bulk internal-detector candidate stream is ~31% real on its own — which is exactly why it is gated, not shipped directly. - False positives are dominated by "unburned" surfaces (3,612) and "negative" (1,130) — transient warm/bright pixels, not persistent heat. Only 3% of false positives are at recurring hotspots (vs 7% of real fires), so industrial flares are a small part of the problem.

Why it matters. Two clear implications. (1) Multi-sensor agreement is the highest-value precision signal — a candidate seen by an independent satellite is almost always real. This is the principle behind the polar-anchored prior [0019] and the surfacing safety-net [0020], and it's now quantified. (2) The persistent-source filter we added [0020] only addresses ~3% of false positives (the industrial ones); the majority are transient atmospheric/surface confusers (cloud edges, dust, glint, hot bare soil) that the literature attacks with spectral dust/smoke discrimination. Building that needs per-pixel spectral data, which isn't in our event database — so it's a data-acquisition step, not just an algorithm.

Caveat (load-bearing). The 31% figure is the raw-candidate validation rate, not PHOENIX's public detection accuracy; the gating stack (voting, persistence, weather plausibility, satellite validator) exists precisely to convert this noisy candidate stream into high-precision shipped detections. Nothing here changes a shipped number.

entry 0026

What the literature says to try next — and the bar it has to clear here

Date: 2026-06-18 &nbsp; Status: scoping (methodology; not yet tested locally)

BLUF. A scan of recent academic and industry work surfaced three concrete, replicable methods worth testing for PHOENIX: (1) contextual + temporal convolutional networks on 10-minute geostationary imagery, which in published work detect fires within ~12 minutes; (2) U-Net "super-resolution" that downscales coarse geostationary thermal imagery toward polar-orbiter resolution to recover small fires; and (3) a deep-learning next-day fire-danger model (vegetation + weather + soil moisture) that outperformed the Fire Weather Index *in the Eastern Mediterranean* — our own region. We have queued all three. But each must clear a local bar before we would adopt it — and that bar is higher than "beats FWI," because we have already shown that simpler baselines (recurrence climatology) beat FWI here.

The methods and how they map to our work. - Contextual + temporal CNN on geostationary data. Published Himawari-8 work (10-minute cadence) reports a CNN detecting all test fires within ~12 minutes, with one case 9 minutes ahead of the official record. The reusable principle — *temporal information reduces detection latency, spatial/contextual information reduces false alarms* — matches our own finding that multi-sensor agreement is the strongest precision lever [0026]. Architectures: ConvLSTM and attention for joint space-time modelling. This feeds our multi-sensor temporal-fusion track. - U-Net super-resolution for small fires. Downscaling geostationary multispectral imagery to near-polar resolution to keep watching small fires between polar overpasses — the same idea as the AI thermal super-resolution used by commercial constellations. This feeds our small-fire/sub-floor recovery track, the population we showed we currently miss [0023]. - DL fire-danger beating FWI (Eastern Mediterranean). Kondylatos et al. (2022) report a deep-learning next-day danger model using vegetation, meteorology and soil moisture that outperformed the Fire Weather Index in our exact region. This feeds our pre-ignition track.

The honest local bar. We do not adopt a method because a paper reports it works elsewhere. Two specifics: our own analysis found that the wildfire-detection floor here is set by sensor physics (a 2–3 km infrared pixel), so a super-resolution or temporal method has to demonstrably recover *real, independently-confirmed* small fires, not just produce sharper pictures. And on the pre-ignition side, we have repeatedly found that a plain recurrence climatology beats the Fire Weather Index in Sicily — so a model only has to beat FWI to match what we already have; to be worth shipping it has to beat *recurrence*, CI-clean, across independent splits. Several of these methods also need data we must acquire first (soil-moisture reanalysis, multi-temporal stacks), which is itself the next concrete step.

Status. Scoping only — methods queued and linked into our research pipeline, each with a pre-registered local baseline to beat. Nothing tested or adopted yet.

entry 0027

Re-testing dry-lightning with real soil moisture — the answer holds, and the data turns out to be in hand

Date: 2026-06-18 &nbsp; Status: defensible (closes the dry-lightning question; pre-ignition fuel data now available)

BLUF. Our first attempt to test whether lightning ignites fires preferentially under dry conditions was limited by humidity stations that don't represent inland fire sites [0025]. We re-ran it properly, sampling actual modelled soil moisture (ERA5) at each fire location. The answer is the same and now robust: lightning precursors cluster under wetter soil, not drier — because lightning comes with storms, which wet the ground. The dry-lightning ignition mechanism is not supported here. The useful by-product: the soil-moisture data we needed was already being collected by PHOENIX, which unblocks the broader fuel-and-weather pre-ignition work.

Method. For every confirmed fire within the soil-moisture coverage window we sampled the top-layer volumetric soil water from the cached ERA5 grid at the fire's location and nearest time, and split fires by whether a lightning strike occurred nearby (within 5 km / 48 h) beforehand. If dry-lightning ignition were real, fires with a lightning precursor should sit on *drier* soil, and the precursor rate should be higher in a dry-soil stratum. Read-only.

Result. Across 2,375 fires with both soil and lightning coverage: fires with a lightning precursor sit on wetter soil (median soil water 0.178) than fires without one (0.156). Stratifying at the scene median, the lightning-precursor rate is 0.44 in the dry-soil half versus 0.65 in the wet-soil half — a dry/wet ratio of 0.68, i.e. lightning is *more* associated with wet ground, not less. This reproduces the earlier finding with site-representative data and removes the coastal-humidity caveat.

Caveats (first-class). The soil-moisture grid is coarse (~3 km) and reflects top-layer soil, a proxy for fuel dryness rather than fine-fuel or live-fuel moisture; the coverage window is partial (about two weeks). The conclusion — lightning co-occurs with wet conditions, so a simple "lightning + dry" ignition gate does not hold here — is consistent across both the humidity test and this soil-moisture test.

What it unblocks. The genuinely valuable outcome is that the soil-moisture and modelled fuel-moisture layers we assumed we'd have to acquire are already in the system. That removes the data block on the pre-ignition fuel-coupling work — testing whether a vegetation + weather + soil-moisture model improves fire-risk targeting. The bar there remains the hard one: it must beat our recurrence-climatology baseline, not merely the Fire Weather Index [0027]. The lightning-anchored prior stays parked; the fuel-coupling track is now open.

entry 0028

Multi-sensor agreement: high-precision recovery of fires we'd otherwise miss

Date: 2026-06-18 &nbsp; Status: defensible (eval only)

BLUF. Agreement between independent satellites is our strongest precision signal, and putting it to work is a clear gain. Requiring two independent platforms to agree nearly doubles precision (from 40% to 74%), and — the important part — it recovers about 12% of the small fires our own detectors miss entirely. Those are real fires that would otherwise be invisible to the system, now surfaced and trustworthy. It is not the *complete* recall solution: the other ~88% of missed fires are seen by a single satellite overpass, so there's no second platform to agree with, and recovering them needs a different mechanism (per-detection threshold relaxation plus temporal persistence). But every fire the multi-sensor rule catches is one we didn't have before — additive recovery at high precision.

Method. We grouped all satellite fire detections over two weeks into location-day clusters and counted how many distinct independent platforms (the two VIIRS satellites, MODIS Aqua/Terra, Sentinel-3 SLSTR, the geostationary MTG fire product) detected each. We then measured the Sentinel-2-adjudicated precision of single-platform versus multi-platform clusters, and asked, of the fires our own geostationary detectors missed, how many had multi-platform corroboration (and so could be surfaced safely by the multi-sensor safety-net). Read-only.

Result. - Single-platform clusters: precision 0.40 (711 labeled). Multi-platform clusters: precision 0.74 (183 labeled). Requiring a second independent platform nearly doubles precision — the quantified version of "agreement is the strongest filter." - Recall on the miss class is the catch. Of 238 VIIRS-detected fires our own detectors missed, only 29 (12%) were multi-platform — the rest were seen by just one satellite. So the multi-sensor safety-net can recover only about an eighth of the misses; the other seven-eighths are genuine single-overpass small fires.

Why it matters. It separates two jobs that are easy to conflate. Precision (don't cry wolf): multi-sensor agreement is excellent and we use it (the safety-net only surfaces clusters seen by two or more sensors). Recall (don't miss real fires): multi-sensor agreement can't help when only one satellite saw the fire, which is the common case for small fires. Recovering those needs mechanisms that work on a *single* detection — the polar-anchored threshold relaxation that re-arms our own detectors at a reported location [0019], and temporal persistence (the same satellite seeing the spot on consecutive passes).

Caveat. Two-week window; clustering at ~1 km may merge or split some events; "single-platform precision 0.40" is candidate-level, not shipped precision (the gating stack lifts it). The robust finding: multi-sensor agreement roughly doubles precision *and* additively recovers ~12% of otherwise-invisible missed fires at that high precision — with the remaining recall to be won by single-detection mechanisms.

entry 0029

A 128 MW "fire" with no scar: adding an industrial-flare filter to the safety-net

Date: 2026-06-18 &nbsp; Status: defensible (shadow; precision 75% → 100% on the labeled set; not promoted)

BLUF. While evaluating whether to promote our multi-sensor safety-net to the live map, it flagged a cluster on the far-west Sicilian coast reporting 128 MW of fire power but leaving no burn scar. That is not a wildfire — it's an industrial gas flare (the spot also registered 884 MW, 368 MW and 72 MW on other days the same week, thousands of detections at the same pixel). Real fires don't sustain tens-to-hundreds of megawatts at one location for a week, and they leave a scar. We added a physically-grounded filter — *a location with very high power (>50 MW) on multiple days is a persistent flare, not a fire* — which removes it while keeping every real fire, lifting the safety-net's precision from 75% to 100% on the labeled set.

Method. The persistent-source filter we already had excludes locations active more than five distinct days, which catches steady industrial sources. But this flare was intermittent enough to slip just under that threshold at the cluster centroid. We added a second, intensity-based test: count the days a location shows a >50 MW detection in the trailing 60 days; two or more such days marks it a flare and excludes it. The threshold is well clear of real fires — the genuine fires in our evaluation peaked under 12 MW (the Raffadali windmill fire was 11.7 MW), whereas this flare ran 72–884 MW.

Result. Re-evaluated, the safety-net's flagged clusters went from 7 (3 real / 1 false / 3 pending) to 6 (3 real / 0 false / 3 pending) — precision 100% on the labeled set, with 36 persistent industrial sources now correctly excluded (up from 32). All three confirmed real fires survived the new filter (the citizen-reported Raffadali fire, one independently confirmed by our own detector, one with a genuine post-fire burn scar). The change is isolated to the shadow tier; nothing is live.

Caveat (load-bearing). The labeled set is still small (three confirmed cases), and we tightened the filter immediately after observing the one false positive — so "100%" is on thin, recently-adjusted evidence. The flare filter itself is general and physically sound (it would catch any persistent high-power source, not just this one), but before promoting the safety-net to the public map we want the three still-maturing clusters to resolve and confirm the precision holds on a larger sample. Promotion stays gated.

Why it matters. A wildfire alerting system that cries "128 MW fire!" at a gas flare loses trust fast. Distinguishing industrial heat from wildfire is one of the oldest false-alarm problems in fire remote sensing; here a simple, interpretable rule grounded in fire physics (intensity-persistence plus the absence of a burn scar) does the job without machine learning or extra data.

entry 0030

Waiting for the second look: temporal persistence recovers a third of the single-pass misses

Date: 2026-06-18 &nbsp; Status: defensible (eval only)

BLUF. Most of the small fires we miss are seen by only a single satellite overpass, so multi-sensor agreement can't help them [0029]. But a different lever can: persistence over time. Of the single-pass fires our own detectors miss, about 29% are detected again on a second pass within the day, and those repeat detections are markedly more trustworthy — 74% real, versus 54% for one-and-done. So a "wait for the second look, then surface" rule could recover roughly a third of the single-pass miss class at decent precision, at the cost of a little latency (you wait for the next overpass). Together with multi-sensor agreement, the recall levers are stacking up.

Method. We took every VIIRS-detected fire our own detectors missed and counted the distinct sensing passes at that location within the day (the feed re-inserts each detection many times with the *same* sensing time, so we de-duplicated to distinct ~10-minute sensing bins to count genuine passes, not re-inserts). We split the missed fires into single-pass versus recurring (two or more passes) and measured each group's Sentinel-2-adjudicated precision. Read-only.

Result. Of 103 missed single-overpass clusters, 30 (29%) recurred on a second pass. Their precision was 0.74 (27 labeled), versus 0.54 for the single-pass group (39 labeled). Recurrence is both a recovery mechanism (the fire is seen again, so it can be surfaced) and a real-ness signal (a spot that lights up twice is more likely a real fire than transient sensor noise).

Why it matters. It maps the recall problem cleanly. The fires we miss split into: a small slice recoverable immediately by multi-sensor agreement (~12%, high precision [0029]); a larger slice — about a third — recoverable by waiting one overpass for a second detection (~29%, 0.74 precision); and the rest, genuine single-pass one-and-done detections, which remain the hard core and will need detector-side sensitivity gains (sub-pixel / super-resolution) rather than logic. The temporal lever trades latency for recall, so it suits a "caution, watch this" tier more than an instant high-confidence alert.

Caveat. 0.74 precision means roughly a quarter of recurring detections are still false, so this lever needs the gating stack (including the industrial-flare filter [0030]) before anything is surfaced; it is not a standalone confirmer. Two-week window; clustering at ~1 km. The multi-sensor and persistence slices partly overlap, so they don't simply add — but they are largely complementary mechanisms.

entry 0031

You can't sharpen what isn't there: super-resolution won't recover the small fires we miss

Date: 2026-06-18 &nbsp; Status: defensible (eval only)

BLUF. A popular edge-AI idea is to "super-resolve" coarse geostationary thermal imagery — sharpen a 2–3 km infrared pixel toward the resolution of a polar sensor — to catch small fires between overpasses. We tested whether that could recover the small fires we miss. It can't: 96% of our missed fires leave no geostationary trace at all — not even a sub-threshold flicker. There is nothing to super-resolve; the fire is simply below what a 2–3 km infrared pixel can register. So the fix for these misses is not better satellite processing — it's trusting the sharper polar sensor that already saw them (our safety-net plus temporal persistence) and the ground-based cameras.

Method. For every VIIRS-detected fire our own geostationary detectors missed, we checked whether *any* geostationary detector output — including sub-threshold candidates — existed nearby. If a missed fire has no geostationary signal at all, super-resolution (which enhances existing signal) cannot recover it; that's a sensor-physics floor, not an algorithm gap. Read-only.

Result. Of 194 missed clusters, only 7 (4%) had any geostationary detector trace within range. The other 96% are invisible to the geostationary instrument entirely. This is consistent with the physical sensitivity floor of these sensors: a small or cool fire radiates too little to lift the brightness temperature of a 2–3 km pixel, no matter how cleverly you upsample it.

Why it matters. It closes a tempting avenue cleanly. We will *not* chase a super-resolution detector for small-fire recall — the signal isn't there to enhance. Instead the recall strategy is settled by physics: (1) trust the polar sensor that does resolve them (375 m), via the multi-sensor safety-net [0029] and the "wait for a second pass" persistence lever [0031]; (2) catch the genuinely sub-pixel cases with a different modality — the ground-based smoke/flame cameras, which see a plume the satellite can't; (3) benefit passively as the newer geostationary hardware (finer pixels) comes online. Super-resolution may still help *localize* fires that are already above the floor, but it is not a recall mechanism for the ones we miss.

Caveat. The geostationary-trace table we used is sparse, so 4% is a lower bound on how much signal exists — but the qualitative conclusion (small fires are largely sub-floor for these sensors) matches the documented physical floor and our whole sub-floor-miss analysis. The honest verdict: a recovery lever blocked by sensor physics, redirecting effort to the levers that aren't.

entry 0032

A stronger flare filter, and the case for fusing citizen reports

Date: 2026-06-18 &nbsp; Status: scoping (methodology; from a literature scan)

BLUF. A scan of recent work surfaced two concrete upgrades. First, our home-grown industrial-flare filter [0030] — which excludes persistent very-high-power sources — has an authoritative alternative: a global catalog built from VIIRS Nightfire has identified 25,045 industrial gas flares worldwide (2012–2025) and can physically distinguish gas flaring from biomass burning. Cross-checking our flagged clusters against that catalog would be a stronger, data-backed exclusion than our intensity rule (the gas flare we caught near Mazara would almost certainly be in it). Second, the broader early-warning literature gives evidence that sparse, imperfect citizen reports carry real value when fused properly — which is exactly the pillar that turned up the Raffadali fire [0019].

The flare-catalog upgrade. Our filter excludes a location showing very high fire power on multiple days, which is physically sound but heuristic. The published approach is to use a curated flare catalog as a mask: known industrial combustion sites are simply never treated as wildfires. Because the catalog is derived from the same VIIRS sensor we ingest and is maintained over many years, it is both authoritative and stable. We would adopt it as a complementary exclusion layer — catalog first, then our intensity-persistence rule for any flare too new or intermittent to be catalogued yet. This is queued, not yet built.

Fusing citizen reports. Outside wildfire, operational flood forecasting has shown that asynchronous and even inaccurate crowdsourced observations measurably improve event detection and situational awareness when combined with physical sensors — through mobile-app ground reports, crowdsourced imagery, and social-media mining. The lesson maps directly onto our situation: a single local report ("smoke near the windmills") is sparse and imprecise, yet it surfaced a real fire and a systemic bug. The methodological direction is to treat such reports as a confirmation-and-localization prior — not a standalone alert, but a signal that raises attention and tightens search around a place — and to make reporting easy enough that more of them arrive.

Why it matters. Both are about doing more with sources we already touch. The flare catalog hardens precision (fewer industrial false alarms) using authoritative external data rather than a local rule of thumb. Citizen fusion hardens recall and trust by formally valuing the eyes on the ground. Neither is built yet; both are now linked in our research pipeline with the bar that any change must demonstrably help on our own Sicily data before it ships.

entry 0033

The flare catalog is thin for Sicily — and the over-tightening trap that nearly hid a real fire

Date: 2026-06-18 &nbsp; Status: defensible (eval only) — includes a self-correction

BLUF. We did what [0033] queued: fetched the authoritative global gas-flare catalog (built from years of VIIRS Nightfire) and cross-checked it against our flagged clusters. Two honest findings. First, the catalog is sparse and outdated for Sicily — it lists only two flares in our region, and it does *not* contain the 128 MW industrial flare we caught near Mazara. So the catalog is a useful *complement*, not a replacement, for our own intensity-persistence rule. Second, while trying to make our filter catch one stubborn western-coast source, we over-tightened it and it briefly excluded the Raffadali fire itself — a real fire, the very one that started this whole thread. We caught the mistake in evaluation, reverted it, and restored the safety-net to 100% precision (7 of 7) on labeled clusters. The lesson is worth more than the catalog.

The catalog cross-check. The published flare catalog identifies industrial gas-flaring sites by their distinct combustion signature, accumulated over many years of the same VIIRS sensor we ingest. We pulled the global list and filtered to the Sicily box. It contains exactly two flares — Gela and Milazzo — both from the older survey. Our two real industrial false positives this month, the 128 MW Mazara flare and a 24 MW source on the far western coast, are in neither the catalog nor any updated entry: they are too new, too intermittent, or offshore. None of our currently-surfaced clusters match a catalogued flare. Conclusion: adopt the catalog as a first-pass mask where it *does* cover a site (it is authoritative there), but keep our own "very high power, repeated over days" rule as the primary filter, because it is what actually caught the flares the catalog missed.

The over-tightening trap (the real lesson). One western-coast cluster (24 MW, active on ~11 separate days over three months) looked like an uncatalogued flare slipping past our filter. Our recurrence check asks: *how many distinct days has this exact spot lit up?* — a persistent source lights up on many days; a wildfire on a few. The cluster's hits were spread across roughly four kilometers, so at our normal one-kilometre radius it didn't register as persistent. The obvious fix was to widen the radius. We did — and it backfired. At four kilometres, the recurrence count stops measuring "one source lighting up repeatedly" and starts measuring "how fire-prone is this neighbourhood." Fire-prone farmland with several *separate* fires over a season suddenly looks identical to a persistent industrial source. The widened filter promptly flagged the Raffadali area as "persistent" and dropped it — excluding a confirmed real fire. We saw it in the evaluation (the labeled set lost Raffadali), reverted immediately, and the safety-net returned to 7 of 7 labeled clusters correct, with 65 genuine persistent sources still excluded.

Why it matters. A precision filter that quietly deletes real fires is far worse than one that occasionally lets an industrial flare through — a flare on the map is an annoyance; a missed fire is the whole failure we exist to prevent. So we keep the conservative filter, add the catalog only where it is authoritative, and leave genuinely ambiguous sources (like that 24 MW western cluster) *visible and pending* rather than risk a rule that hides real fires. The cluster will classify itself: if it is a flare, it will keep burning with no burn scar, and the next evaluation gate will label it cleanly. Restraint, here, is the correct engineering.

entry 0034

A recurring "fire" with no scar — and the filter we refused to ship because it would have hidden real ones

Date: 2026-06-18 &nbsp; Status: refuted (pre-registered filter killed on validation) — eval only

BLUF. Our multi-sensor safety-net surfaced a thermal source in the Sicilian interior (37.301/13.465) that two VIIRS satellites and one of our own thermal detectors all agree is real heat — yet three separate Sentinel-2 looks over three weeks show no burn scar at all. That makes it a recurring *non-wildfire* (managed agricultural burning at a fixed field, or a small industrial source), and it dropped our safety-net's measured precision to 88%. The obvious fix was a filter to exclude "recurring sources that never scar." We pre-registered that rule, validated it read-only before shipping anything — and it failed: it would have excluded not just the nuisance source but the confirmed Raffadali fire itself and a second scar-confirmed real fire. We did not ship it. The discipline of validating a fix against ground truth *before* deploying it is the result here.

The source. Three independent thermal signals at one interior location: our own `wind_diff` detector fired there on 30 and 31 May (and Sentinel-2 came back "unburned," dNBR 0.04–0.07); two VIIRS platforms fired there on 18 June, 47 minutes apart, at 3–6 MW (and Sentinel-2 came back unburned again, dNBR 0.009). FIRMS has lit the spot on three separate days in sixty. The pattern — intermittent moderate heat that *never* leaves a scar across repeated looks — is the signature of repeated managed burning at a fixed field, or a small sub-industrial source, not a spreading wildfire. One no-scar look would be ambiguous (small real fires can burn below the scar-detection floor); three independent no-scar looks are not.

The filter, and why it died. The proposed rule: exclude a surfaced cluster only if it *recurs* (≥3 thermal-days), *and* has ≥2 prior Sentinel-2 looks all "unburned," *and* has zero "burned" looks on record. It reads as conservative — it should only catch sources that demonstrably never scar. We ran it read-only against all twenty currently-surfaced clusters. It flagged seven for exclusion, and two of those seven were real fires: the confirmed Raffadali windmill fire (whose own burn-scar validation is still pending, so it currently shows "prior unburned looks, no burned look yet" — exactly the nuisance signature), and a second cluster that *did* leave a 0.117 scar which simply isn't recorded as a "burned" row in the validation queue the rule reads. In a landscape as fire-prone as interior Sicily, prior "unburned" looks are common (every earlier false candidate leaves one) and fresh fires haven't scarred yet — so a location-history rule cannot distinguish a recurring nuisance from a brand-new real fire. This is the third time in one working session that a geography- or history-based exclusion has collided with a real fire.

What we concluded instead. We keep the conservative behavior and accept the source as surfaced. It appears only at the CAUTION tier — explicitly our lower-confidence layer — and surfacing genuine heat there, even from a managed burn, is low-harm: it is real, and the cost of a cautionary flag on a recurring farm burn is far below the cost of a rule that hides a real fire. The durable lesson is that suppression in this landscape must be decided per event, on that event's own evidence (its own burn scar once Sentinel-2 passes), never on the history of its location. A filter that quietly deletes a confirmed fire is not a precision improvement; it is the failure we exist to prevent — and the only reason we know this one would have is that we tested it against the ground truth before trusting it.

entry 0035

Citizen reports aren't a hint — they're a detection. What the Raffadali fire taught us about fusing the crowd

Date: 2026-06-18 &nbsp; Status: preliminary (mechanism built in shadow; validated on one real case; not promoted)

BLUF. We set out to fuse citizen reports the obvious way: treat a report of smoke as a *prior* that relaxes our satellite detectors near that spot, exactly as we already do for FIRMS hits. We built that mechanism in shadow. Then we validated it against the one real, confirmed case we have — the Raffadali windmill fire a local resident reported on 17 June — and the data corrected our design. At Raffadali that day our own detectors had nothing: not a confirmed fire, not even a sub-threshold candidate to relax onto. The fire was simply below our geostationary sensors' floor. A prior that lowers a detection threshold cannot surface a signal that was never there. The lesson: for the fires that most need the crowd, the citizen report is not a hint that sharpens a detector — it is the detection. The right design is a direct citizen surfacing tier, with the threshold-relaxing prior kept as a secondary aid for borderline cases.

What we built. A shadow citizen-anchored-prior emitter, mirroring our FIRMS-anchored prior: each report becomes a short-lived, spatially-bounded relaxation of nearby detector thresholds, scaled by the reporter's reputation and what they saw (open flame counts for more than "unsure," a trusted device for more than a brand-new one). It reads reports read-only and writes only to its own shadow table — it touches no live detector. The mechanism works and is reputation-weighted as intended.

What the real case showed. The Raffadali fire was genuinely confirmed — VIIRS caught it across multiple passes, building past 5 MW, and a burn signature followed. But on the day it mattered, three facts lined up: (1) our own detectors produced *zero* candidates there, sub-threshold or otherwise; (2) the earliest satellite pass sensed the fire at 11:45 but, with roughly two and a half hours of standard near-real-time data latency, that alert would not have surfaced until past 14:00; (3) the resident's report was actionable at 12:51. So the human was the earliest signal we could have acted on — by around an hour and a half — for a fire our satellites could not yet deliver and our own detectors could not see at all. A threshold-relaxing prior would have had nothing to attach to. The value was the report itself.

What this changes. Citizen reports already flow into our post-event burn-scar validation queue, so the back-end fusion exists. The missing piece is the front end: a first-class citizen tier on the public map, where a sufficiently-corroborated, sufficiently-reputable report stands as its own surfaced observation — independent of whether any satellite has caught up yet — with the burn-scar check then confirming or retiring it after the fact. The anchored-prior remains worth keeping for *near*-floor fires, where a faint sub-threshold signal does exist and a nudge could pull it over the line; but we cannot yet validate that path, because every citizen report we currently hold is uncorroborated test data. We are not promoting either mechanism. We are recording that the crowd's real contribution, on the one case we can check, was not to refine a machine detection but to *be* one — earlier than the machines could manage.

entry 0036
PHOENIX is a grassroots Sicily wildfire-detection project. Entries include negative and preliminary results by design. For inquiries, contact the ADR Wildfire team. Updated automatically as new entries are validated.