"Instead of indexing the geofences using R-tree or the complicated S2..."
"... we first find the desired city with a linear scan of all the city geofences"
Why not use a spatial index? It's not hard, and you wouldn't need to worry about rebuilding your index because your city geofences are not likely to change frequently.
The bottleneck here isn't rooted in I/O but a better algorithmic approach to the problem.
'Highest QPS?'
This is a trivial geometric problem, any sane implementation should be several orders of magnitude faster than doing network IO. /rant