You Probably Don't Need a Staging Server
I seek out simple and scalable solutions to various challenges with tech
Most software teams consider a staging environment essential - it's treated as a given, like unit tests or version control. But is it really necessary? Let's challenge this assumption and explore why you might be better off without one.
The Traditional Setup
The typical deployment pipeline looks like this:
Development Environment
Developers write and test code locally
Uses mock data and services
Fast feedback loop
Staging Environment
Mirrors production configuration
Integration point for team changes
Pre-production testing ground
Production Environment
Serves real users
Handles actual load
Uses live data
Why Teams Think They Need Staging
Teams justify staging environments for several reasons:
Deployment Testing
Teams use staging to verify deployment scripts and configuration changes. However, this assumes staging accurately reflects production - it rarely does.
QA Testing
Quality Assurance teams use staging for final checks. But staging data is usually synthetic or outdated, missing real-world edge cases.
Integration Testing
# Traditional staging integration test
def test_payment_flow():
user = create_test_user()
product = add_to_cart(user)
payment = process_payment(product)
assert payment.status == 'success'
This approach often fails to catch real integration issues because:
Third-party services behave differently in staging
Data patterns don't match production
Load characteristics are different
Load Testing
Teams run performance tests in staging, but:
Staging rarely has production-scale data
Infrastructure often differs
Real user patterns are hard to simulate
The Real Problems with Staging
1. False Sense of Security
Staging environments create dangerous illusions:
# Example of staging vs production difference
# Staging: 100 users, 1000 records
staging_query = "SELECT * FROM users WHERE active = true"
# Production: 1M users, 10M records
# Same query, completely different performance characteristics
production_query = "SELECT * FROM users WHERE active = true"
2. Resource Costs
The hidden costs add up:
Infrastructure: Usually 50-80% of production costs
Engineering time: Environment maintenance
Cognitive overhead: Managing multiple environments
Deployment complexity: Additional pipeline steps
3. Deployment Delays
Staging creates friction:
Better Alternatives
Feature Flags
Modern feature flagging enables safer production deployments:
# Feature flag configuration
flags = {
'new_payment_system': {
'enabled': True,
'rollout_percentage': 10,
'white_listed_users': ['test@example.com']
}
}
def process_payment(user, amount):
if feature_enabled('new_payment_system', user):
return new_payment_processor(amount)
return legacy_payment_processor(amount)
Testing in Production
A/B Testing
Test features with real users
Gather actual usage data
Make data-driven decisions
Canary Deployments
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: payment-service
spec:
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 1h}
- setWeight: 50
- pause: {duration: 1h}
- Shadow Testing
def process_payment(user, amount):
# Main payment flow
result = current_payment_system(amount)
# Shadow test new system without affecting users
try:
new_payment_system(amount)
except Exception as e:
log_error(e)
return result
Robust Monitoring
Implement comprehensive monitoring:
def payment_endpoint():
with metrics.timer('payment_processing_time'):
try:
result = process_payment()
metrics.increment('payment_success')
return result
except Exception as e:
metrics.increment('payment_error')
metrics.event('payment_failure', str(e))
raise
Quick Rollbacks
Ensure fast recovery:
# Kubernetes rollback
kubectl rollout undo deployment/payment-service
# Feature flag rollback
curl -X PATCH api.features.com/flags/new-payment \
-d '{"enabled": false}'
When You Actually Need Staging
Some valid use cases remain:
Regulated Industries
Required by compliance
Audit requirements
Certification testing
Hardware Dependencies
IoT devices
Specialized equipment
Physical infrastructure
Complex Third-party Integration
Payment processor certification
External security audits
Partner system testing
Making the Decision
Ask yourself:
What specific problems does staging solve for you?
Could feature flags provide better solutions?
What's your monthly staging infrastructure cost?
When did staging last catch a production issue?
How much developer time goes into maintaining staging?
The answers might surprise you. Most teams can replace staging with a combination of:
Feature flags
Robust monitoring
Canary deployments
Shadow testing
Quick rollback capabilities
This approach often results in:
Faster deployments
Lower costs
More reliable testing
Better production practices
Increased developer productivity
Real-World Example: ProMind.ai's Approach
For my project ProMind.ai, I deploy straight to production using the control systems mentioned above. ProMind’s AI agent platform relies heavily on feature flags and canary deployments to safely roll out new AI capabilities. I use comprehensive monitoring through tools like Sentry to track the AI agents' performance and shadow testing to validate new agent behaviours before full release. This approach has helped maintain uptime with minimal issues while deploying multiple times. I will be doing a follow-up piece soon delving into how I have implemented some of these practices. In the meantime, you can check out the AI agents platform yourself.



