Skip to content

Your application's backend code and Docker containers run on sherpa.sh's application servers. These servers sit behind our CDN and load balancer within our Kubernetes cluster in your selected region, providing a robust execution environment for your applications.

When you deploy your application, sherpa.sh runs your backend code or Docker containers on our application servers. These servers handle all the compute-intensive work while our CDN and load balancer manage traffic distribution and static content delivery.

Architecture Flow:

  1. User request → CDN (static files) or Load Balancer (dynamic content)
  2. Load Balancer → Application Server instances
  3. Application Servers → Your backend code/containers
  4. Response flows back through the same path

For detailed architecture information, see our Architecture Overview page.

Every application server gets deployed as a swarm of docker containers behind a loadbalancer. There is a maximum resource allocation per individual container.

# Per instance limits
CPU: 1 core maximum
Memory: 1GB maximum

The amount of replicas of your containers that get created depends on your plan selected.

# Hobby and Starter Plan Settings
Minimum instances: 1
Maximum instances: 5
CPU scaling threshold: 80%

How scaling works: When your backend code uses more than 80% CPU across instances, new application servers automatically spin up to handle the load.

If you need more instance replicas or different CPU thresholdes, reach out to support in discord.sherpa.sh.

What you get: Your containers/code run on shared infrastructure alongside other applications.

Benefits:

  • Zero configuration: Deploy immediately
  • Automatic load distribution: Traffic spreads across instances
  • Cost-effective: Shared infrastructure costs
  • Built-in monitoring: Performance metrics included

Limitations:

  • Ephemeral storage: No persistent file writes
  • Shared resources: CPU/memory shared with other applications
  • Standard resource limits: Fixed allocation per instance

Best for: Stateless APIs, web applications, microservices

What you get: Exclusive physical servers running only your application containers.

Available Configurations:

  • Compute: 2-96 CPU cores, 4-256GB memory
  • Storage: 80GB-300TB persistent disk
  • Network: 1-10Gbps dedicated bandwidth
  • Transfer: 20TB monthly included

Benefits:

  • Guaranteed performance: No resource contention
  • Persistent storage: File system writes supported
  • Custom sizing: Tailored to your workload
  • Isolation: Enhanced security and performance

Best for: Databases, file processing, high-traffic applications

Your application servers run in the region you select during deployment. View our available regions.

Benefits of regional deployment:

  • Reduced latency: Servers closer to your users
  • Data compliance: Meet regional data requirements
  • Improved performance: Faster database connections
  1. Navigate to your application dashboard
  2. Go to Resources > Application Server
  3. Monitor instance count, CPU usage, and memory consumption
  1. Visit discord.sherpa.sh
  2. Create support ticket with requirements:
    • Expected traffic volume
    • Resource requirements (CPU/memory)
    • Storage needs
    • Performance requirements

Design your application to work seamlessly across multiple server instances by avoiding in-memory state storage.

Good: External State Management

pages/api/users/[id].js
import { getUser } from '../../../lib/database';
export default async function handler(req, res) {
const { id } = req.query;
// Fetch from external database, not server memory
const user = await getUser(id);
if (!user) {
return res.status(404).json({ error: 'User not found' });
}
res.status(200).json(user);
}

Avoid: In-Memory State

// Don't do this - data lost during scaling
let userCache = {}; // Lost when new instances start
export default async function handler(req, res) {
const { id } = req.query;
if (!userCache[id]) {
userCache[id] = await getUser(id); // Won't persist across instances
}
res.status(200).json(userCache[id]);
}

Sherpa.sh automatically checks your application health by requesting the root URL (/). Ensure this endpoint returns a valid response.

Required Health Check Setup

// pages/index.js or app/page.js (App Router)
export default function Home() {
return (
<div>
<h1>Application Status: Healthy</h1>
<p>Server is running normally</p>
</div>
);
}
// Or for API-only applications
// pages/index.js
export default function handler(req, res) {
res.status(200).json({
status: 'healthy',
timestamp: new Date().toISOString(),
version: process.env.npm_package_version || '1.0.0'
});
}

Advanced Health Check with Dependencies

pages/index.js
import { checkDatabase } from '../lib/database';
import { checkExternalAPI } from '../lib/external-services';
export default async function handler(req, res) {
try {
// Check critical dependencies
await checkDatabase();
await checkExternalAPI();
res.status(200).json({
status: 'healthy',
checks: {
database: 'connected',
external_api: 'responding'
},
timestamp: new Date().toISOString()
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: error.message,
timestamp: new Date().toISOString()
});
}
}

Optimize your application for auto-scaling by implementing efficient async patterns and resource management.

Database Connection Management

lib/database.js
import { Pool } from 'pg';
// Use connection pooling for database efficiency
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // Maximum connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
export async function queryDatabase(query, params) {
const client = await pool.connect();
try {
const result = await client.query(query, params);
return result.rows;
} finally {
client.release(); // Always release connection
}
}

Async API Route Optimization

pages/api/products/search.js
export default async function handler(req, res) {
const { query, category } = req.query;
try {
// Run parallel requests for better performance
const [products, categories, recommendations] = await Promise.all([
searchProducts(query),
getCategories(category),
getRecommendations(query)
]);
res.status(200).json({
products,
categories,
recommendations
});
} catch (error) {
console.error('Search error:', error);
res.status(500).json({ error: 'Search failed' });
}
}
  • CPU utilization: Monitor for scaling triggers
  • Memory usage: Track memory leaks and optimization opportunities
  • Response time: Measure application performance
  • Instance count: Understand scaling patterns
  • Database optimization: Use connection pooling and query optimization
  • Caching: Implement Redis or in-memory caching
  • Async processing: Use background tasks for heavy operations
  • Resource monitoring: Set up alerts for resource usage
  • Review architecture: Read our Architecture Overview for complete system understanding
  • Set up monitoring: Configure application performance monitoring
  • Implement caching: Add Redis or similar caching layer
  • Plan scaling: Consider dedicated servers for growing applications
  • Optimize performance: Profile your application for bottlenecks

Need help optimizing your application server setup? Our support team provides detailed performance analysis and recommendations for your specific use case.