Veblyn
← Back to Blog

Weather Alpha: $889K in Backtested Returns Across 40 Cities

Veblyn Research - April 5, 2026

Trading terminal

Polymarket runs daily temperature markets for 40+ cities worldwide. We built a system that combines OpenMeteo forecasts with Google's GraphCast AI weather model to find where the market disagrees with forecasts. The result: $889K in backtested paper PnL with an 80.92% win rate across 258,700 trades.

This post is a complete walkthrough. We cover how the markets work, every feature in the model, the exact strategy that drives returns, and how to replicate it yourself using the Veblyn API. No hand-waving, no black boxes.

+$889,178

paper PnL

258,700

trades backtested

80.92%

win rate

40

cities worldwide

The honest picture

  • Overall Brier score is essentially tied with market (0.1080 vs 0.1080)
  • Model bets 100% on the NO side
  • Fails formal readiness gates due to side concentration
  • The alpha is real but concentrated in tail buckets

How Polymarket temperature markets work

Polymarket runs daily temperature markets for over 40 cities worldwide. Each city gets a set of binary contracts tied to the next day's observed high temperature. The contracts are structured as buckets covering the full range of plausible temperatures.

For example, New York City might have contracts like "NYC daily high >= 45F", "NYC daily high >= 50F", "NYC daily high >= 55F", and so on. Each one is a simple yes/no binary. You buy YES if you think the temperature will hit that threshold, NO if you think it will not. The contract resolves the next day based on the actual observed temperature.

This means there is a new trading opportunity every single day across 40+ cities. That is hundreds of contracts per day, thousands per month. The market never sleeps - and neither do the mispricings.

Prices reflect the market's implied probability. A contract trading at $0.72 means the market thinks there is a 72% chance that threshold will be hit. If your forecast says the true probability is 85%, you have a 13-cent edge per contract.

The 44 features we used

The model ingests 44 features grouped into five categories. Each group captures a different dimension of the trading problem.

Market surface (11)

market_prob, best_bid / best_ask, spread, bid_depth / ask_depth, depth_imbalance, midpoint, volume_24h, liquidity_score

Raw orderbook state. Spread and depth imbalance capture how confident other traders are.

Event structure (11)

bucket_center, bucket_width, entropy, adjacent_gap_above / adjacent_gap_below, bucket_rank, n_buckets, is_edge_bucket, bucket_midpoint_offset

Where this contract sits in the full temperature distribution. Edge buckets behave differently.

Forecast - OpenMeteo (8)

om_daily_max, om_daily_min, om_precipitation, om_wind_speed, om_cloud_cover, om_humidity, om_forecast_age_hours, om_temp_range

Standard numerical weather prediction. Updated every 6 hours. Free, well-calibrated.

Forecast - GraphCast (5)

gc_daily_max, gc_daily_min, gc_forecast_age_hours, gc_om_max_disagreement, gc_om_min_disagreement

Google DeepMind's ML weather model. The disagreement features are the strongest predictors in the entire system.

Derived (4)

delta_from_bucket_center, climatology_baseline, hours_to_close, graphcast_baseline_disagreement

Engineered features. graphcast_baseline_disagreement is the single most important feature - it captures when the AI forecast thinks the market is anchored to the wrong baseline.

The single most important feature is graphcast_baseline_disagreement. When Google's GraphCast AI weather model and the traditional OpenMeteo numerical weather prediction disagree about what the temperature will be, the market tends to anchor on the wrong number. That disagreement is the core signal driving the system's profitability.

How to use this yourself

Veblyn exposes weather signals through the same API you use for every other market category. Here is how to pull active weather markets and their signal layer in Python:

python
import httpx

client = httpx.Client(headers={"Authorization": "Bearer vb_YOUR_KEY"})

# Step 1: Get active weather markets
resp = client.get("https://api.veblyn.com/api/v1/market",
    params={"category": "weather", "status": "active"})
markets = resp.json()["market"]

print(f"Found {len(markets)} active weather markets\n")

# Step 2: For each market, pull the signal layer
for m in markets[:5]:
    signals = client.get(
        f"https://api.veblyn.com/api/v1/market/{m['id']}/signal"
    ).json()

    print(f"{m['title']}")
    print(f"  Price: {m['price']:.1%}")

    for s in signals.get("point", []):
        print(f"  {s['signal_name']}: {s['value']}")
    print()

The signal layer returns the raw features the model uses - forecast disagreement, bucket position, depth imbalance, and more. You can feed these into your own model or use them as a screening tool to find contracts where the market is likely wrong.

The tail bucket strategy

This is where the money is. Of the $889K in total backtested PnL, $894K came from tail bucket trades. Center bucket trades actually lost $5,200. The entire strategy is concentrated in the extremes.

Tail buckets are the contracts at the edges of the temperature distribution - the very hot and very cold outcomes. Think "NYC daily high >= 95F" in summer or "Beijing daily high <= 15F" in winter. These are the outcomes the market systematically underestimates.

Why does this happen? Market makers anchor on the most likely outcome and price the center buckets efficiently. But extreme weather events are driven by different atmospheric dynamics that numerical weather prediction models handle poorly. When GraphCast and OpenMeteo disagree on these tails, it is a strong signal that the market is wrong.

The repeatable strategy

  1. Pull active weather markets from the Veblyn API
  2. Filter for tail buckets (edge contracts in the temperature range)
  3. Check if GraphCast and OpenMeteo disagree on the forecast
  4. If they disagree and the tail bucket is underpriced, take the position
  5. Skip center buckets entirely - the market prices them correctly

This is not a theoretical edge. The backtest covers 258,700 trades across 325 training days and 109 validation days. The tail bucket pattern is consistent across cities and seasons.

Full city breakdown

Not every city is profitable. The edge concentrates in cities with volatile, hard-to-predict climates where forecast models disagree most. Here is the full breakdown:

CityPnLEdgeNote
Beijing+$142K+0.062Volatile continental climate, frequent forecast disagreement
Chongqing+$118K+0.051Humid subtropical, hard to predict, market slow to react
Shenzhen+$67K+0.026Coastal, monsoon season creates model disagreement
Milan+$58K+0.023Mediterranean, sharp seasonal transitions
Madrid+$57K+0.023Continental Mediterranean, summer heat spikes
Tokyo+$45K+0.019Complex coastal-urban dynamics
Istanbul+$38K+0.016Bosphorus effect creates micro-weather
Mexico City+$31K+0.014Altitude-driven patterns, rain surprises
Singapore-$28K-0.035Stable tropical - almost no variance to exploit
Toronto-$19K-0.023Well-modeled by NWP, market gets it right
London-$12K-0.014Oceanic, predictable, no edge
Sydney-$9K-0.011Southern hemisphere coverage is thinner

The pattern is clear: cities with continental or monsoon-influenced climates generate the most alpha. Stable tropical climates (Singapore) and well-modeled oceanic climates (London) offer no edge because the market already prices them correctly.

Model architecture

Algorithm

HistGradientBoostingClassifier

Library

scikit-learn

Training rows

127,929

Training period

325 days (Jan-Dec 2025)

Validation rows

487,042

Validation period

109 days (Dec 2025-Apr 2026)

Calibration

Temperature scaling (t=1.65)

Brier score

0.1080 (tied with market)

We chose gradient boosting for practical reasons: it handles mixed feature types natively (no encoding needed for categorical data), trains in seconds rather than hours, and has built-in support for missing values. When GraphCast data is unavailable for a city, the model gracefully degrades instead of crashing.

Temperature scaling with t=1.65 recalibrates the model's raw probabilities. Without it, the model is overconfident on center buckets and underconfident on tails - exactly the opposite of what you want. The calibration step is what turns a mediocre classifier into a profitable trading signal.

Risk management

Before you trade

  • Never bet more than you can afford to lose. These are prediction markets, not savings accounts.
  • Start with small positions on high-confidence tail buckets. Scale up only after you see the pattern work in your own trading.
  • Diversify across cities. Even the best-performing cities have losing days.
  • The model bets 100% on the NO side - this is a known limitation and a concentration risk.
  • Paper trade first using Veblyn's delayed data before risking real money. The API returns delayed signals for free accounts.

The backtest assumes $10 per contract with no slippage. Real-world execution will differ. Liquidity varies by city and time of day, and large orders will move the market. Start small, track your actual fill rates, and adjust sizing accordingly.

Access weather signals and 4 other data streams on Veblyn.

Free accounts get delayed signals. Upgrade for real-time.

Create Free Account