Weather Alpha: $889K in Backtested Returns Across 40 Cities
Veblyn Research - April 5, 2026

Polymarket runs daily temperature markets for 40+ cities worldwide. We built a system that combines OpenMeteo forecasts with Google's GraphCast AI weather model to find where the market disagrees with forecasts. The result: $889K in backtested paper PnL with an 80.92% win rate across 258,700 trades.
This post is a complete walkthrough. We cover how the markets work, every feature in the model, the exact strategy that drives returns, and how to replicate it yourself using the Veblyn API. No hand-waving, no black boxes.
+$889,178
paper PnL
258,700
trades backtested
80.92%
win rate
40
cities worldwide
The honest picture
- Overall Brier score is essentially tied with market (0.1080 vs 0.1080)
- Model bets 100% on the NO side
- Fails formal readiness gates due to side concentration
- The alpha is real but concentrated in tail buckets
How Polymarket temperature markets work
Polymarket runs daily temperature markets for over 40 cities worldwide. Each city gets a set of binary contracts tied to the next day's observed high temperature. The contracts are structured as buckets covering the full range of plausible temperatures.
For example, New York City might have contracts like "NYC daily high >= 45F", "NYC daily high >= 50F", "NYC daily high >= 55F", and so on. Each one is a simple yes/no binary. You buy YES if you think the temperature will hit that threshold, NO if you think it will not. The contract resolves the next day based on the actual observed temperature.
This means there is a new trading opportunity every single day across 40+ cities. That is hundreds of contracts per day, thousands per month. The market never sleeps - and neither do the mispricings.
Prices reflect the market's implied probability. A contract trading at $0.72 means the market thinks there is a 72% chance that threshold will be hit. If your forecast says the true probability is 85%, you have a 13-cent edge per contract.
The 44 features we used
The model ingests 44 features grouped into five categories. Each group captures a different dimension of the trading problem.
Market surface (11)
market_prob, best_bid / best_ask, spread, bid_depth / ask_depth, depth_imbalance, midpoint, volume_24h, liquidity_score
Raw orderbook state. Spread and depth imbalance capture how confident other traders are.
Event structure (11)
bucket_center, bucket_width, entropy, adjacent_gap_above / adjacent_gap_below, bucket_rank, n_buckets, is_edge_bucket, bucket_midpoint_offset
Where this contract sits in the full temperature distribution. Edge buckets behave differently.
Forecast - OpenMeteo (8)
om_daily_max, om_daily_min, om_precipitation, om_wind_speed, om_cloud_cover, om_humidity, om_forecast_age_hours, om_temp_range
Standard numerical weather prediction. Updated every 6 hours. Free, well-calibrated.
Forecast - GraphCast (5)
gc_daily_max, gc_daily_min, gc_forecast_age_hours, gc_om_max_disagreement, gc_om_min_disagreement
Google DeepMind's ML weather model. The disagreement features are the strongest predictors in the entire system.
Derived (4)
delta_from_bucket_center, climatology_baseline, hours_to_close, graphcast_baseline_disagreement
Engineered features. graphcast_baseline_disagreement is the single most important feature - it captures when the AI forecast thinks the market is anchored to the wrong baseline.
The single most important feature is graphcast_baseline_disagreement. When Google's GraphCast AI weather model and the traditional OpenMeteo numerical weather prediction disagree about what the temperature will be, the market tends to anchor on the wrong number. That disagreement is the core signal driving the system's profitability.
How to use this yourself
Veblyn exposes weather signals through the same API you use for every other market category. Here is how to pull active weather markets and their signal layer in Python:
import httpx
client = httpx.Client(headers={"Authorization": "Bearer vb_YOUR_KEY"})
# Step 1: Get active weather markets
resp = client.get("https://api.veblyn.com/api/v1/market",
params={"category": "weather", "status": "active"})
markets = resp.json()["market"]
print(f"Found {len(markets)} active weather markets\n")
# Step 2: For each market, pull the signal layer
for m in markets[:5]:
signals = client.get(
f"https://api.veblyn.com/api/v1/market/{m['id']}/signal"
).json()
print(f"{m['title']}")
print(f" Price: {m['price']:.1%}")
for s in signals.get("point", []):
print(f" {s['signal_name']}: {s['value']}")
print()The signal layer returns the raw features the model uses - forecast disagreement, bucket position, depth imbalance, and more. You can feed these into your own model or use them as a screening tool to find contracts where the market is likely wrong.
The tail bucket strategy
This is where the money is. Of the $889K in total backtested PnL, $894K came from tail bucket trades. Center bucket trades actually lost $5,200. The entire strategy is concentrated in the extremes.
Tail buckets are the contracts at the edges of the temperature distribution - the very hot and very cold outcomes. Think "NYC daily high >= 95F" in summer or "Beijing daily high <= 15F" in winter. These are the outcomes the market systematically underestimates.
Why does this happen? Market makers anchor on the most likely outcome and price the center buckets efficiently. But extreme weather events are driven by different atmospheric dynamics that numerical weather prediction models handle poorly. When GraphCast and OpenMeteo disagree on these tails, it is a strong signal that the market is wrong.
The repeatable strategy
- Pull active weather markets from the Veblyn API
- Filter for tail buckets (edge contracts in the temperature range)
- Check if GraphCast and OpenMeteo disagree on the forecast
- If they disagree and the tail bucket is underpriced, take the position
- Skip center buckets entirely - the market prices them correctly
This is not a theoretical edge. The backtest covers 258,700 trades across 325 training days and 109 validation days. The tail bucket pattern is consistent across cities and seasons.
Full city breakdown
Not every city is profitable. The edge concentrates in cities with volatile, hard-to-predict climates where forecast models disagree most. Here is the full breakdown:
| City | PnL | Edge | Note |
|---|---|---|---|
| Beijing | +$142K | +0.062 | Volatile continental climate, frequent forecast disagreement |
| Chongqing | +$118K | +0.051 | Humid subtropical, hard to predict, market slow to react |
| Shenzhen | +$67K | +0.026 | Coastal, monsoon season creates model disagreement |
| Milan | +$58K | +0.023 | Mediterranean, sharp seasonal transitions |
| Madrid | +$57K | +0.023 | Continental Mediterranean, summer heat spikes |
| Tokyo | +$45K | +0.019 | Complex coastal-urban dynamics |
| Istanbul | +$38K | +0.016 | Bosphorus effect creates micro-weather |
| Mexico City | +$31K | +0.014 | Altitude-driven patterns, rain surprises |
| Singapore | -$28K | -0.035 | Stable tropical - almost no variance to exploit |
| Toronto | -$19K | -0.023 | Well-modeled by NWP, market gets it right |
| London | -$12K | -0.014 | Oceanic, predictable, no edge |
| Sydney | -$9K | -0.011 | Southern hemisphere coverage is thinner |
The pattern is clear: cities with continental or monsoon-influenced climates generate the most alpha. Stable tropical climates (Singapore) and well-modeled oceanic climates (London) offer no edge because the market already prices them correctly.
Model architecture
Algorithm
HistGradientBoostingClassifier
Library
scikit-learn
Training rows
127,929
Training period
325 days (Jan-Dec 2025)
Validation rows
487,042
Validation period
109 days (Dec 2025-Apr 2026)
Calibration
Temperature scaling (t=1.65)
Brier score
0.1080 (tied with market)
We chose gradient boosting for practical reasons: it handles mixed feature types natively (no encoding needed for categorical data), trains in seconds rather than hours, and has built-in support for missing values. When GraphCast data is unavailable for a city, the model gracefully degrades instead of crashing.
Temperature scaling with t=1.65 recalibrates the model's raw probabilities. Without it, the model is overconfident on center buckets and underconfident on tails - exactly the opposite of what you want. The calibration step is what turns a mediocre classifier into a profitable trading signal.
Risk management
Before you trade
- Never bet more than you can afford to lose. These are prediction markets, not savings accounts.
- Start with small positions on high-confidence tail buckets. Scale up only after you see the pattern work in your own trading.
- Diversify across cities. Even the best-performing cities have losing days.
- The model bets 100% on the NO side - this is a known limitation and a concentration risk.
- Paper trade first using Veblyn's delayed data before risking real money. The API returns delayed signals for free accounts.
The backtest assumes $10 per contract with no slippage. Real-world execution will differ. Liquidity varies by city and time of day, and large orders will move the market. Start small, track your actual fill rates, and adjust sizing accordingly.
Access weather signals and 4 other data streams on Veblyn.
Free accounts get delayed signals. Upgrade for real-time.
Create Free Account