Skip to content

Performances et scalabilité

MétriqueValeur actuelleObjectif
Temps de réponse API (p50)45ms< 100ms
Temps de réponse API (p95)180ms< 500ms
Génération IA (moyenne)8s< 15s
Uptime99.2%99.9%
Utilisateurs simultanés100 testés10 000 cible
Endpoint: GET /api/v1/collections
Concurrency: 50 users
Duration: 60 seconds
Results:
├── Requests/sec: 450
├── Avg latency: 42ms
├── P95 latency: 156ms
├── P99 latency: 312ms
└── Errors: 0%

Architecture Cloud

ComposantConfigurationFournisseur
API (Kubernetes)2 pods, 1 CPU, 2GB RAMOVHcloud
PostgreSQLManaged, 2 vCPU, 4GB RAMOVHcloud
RedisManaged, 1GB RAMOVHcloud
Qdrant1 instance, 2GB RAMHetzner
MinIOObject storage, 50GBOVHcloud

Kubernetes HPA

Auto-scaling des pods API basé sur CPU/mémoire

Load Balancer

Distribution de charge entre les instances

Stateless Design

Chaque pod peut traiter n’importe quelle requête

Queue Workers

Traitement asynchrone des tâches lourdes

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mindlet-api
spec:
replicas: 2
selector:
matchLabels:
app: mindlet-api
template:
spec:
containers:
- name: api
image: mindlet/api:latest
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mindlet-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mindlet-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
// Job de génération de cartes
class GenerateCardsJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public int $tries = 3;
public int $timeout = 120;
public function __construct(
private Document $document,
private User $user
) {
$this->onQueue('high');
}
public function handle(AIService $aiService): void
{
$cards = $aiService->generateCards($this->document);
event(new CardsGenerated($cards, $this->user));
}
public function failed(Throwable $exception): void
{
Log::error('Card generation failed', [
'document_id' => $this->document->id,
'error' => $exception->getMessage(),
]);
$this->user->notify(new GenerationFailedNotification());
}
}
UtilisateursAPI PodsIA WorkersDB SizeCoût estimé/mois
100 (actuel)215GB50€
1 0004220GB150€
10 00084100GB500€
100 00016+8+500GB2 000€+
ComposantRisqueSolution prévue
Base de donnéesConnexions saturéesConnection pooling (PgBouncer)
Service IACoûts LLMCache des réponses, modèles locaux
StockageCroissance rapidePolitique de rétention, compression
QdrantMémoire insuffisanteSharding, clustering
class CollectionController extends Controller
{
public function index(Request $request): JsonResponse
{
$userId = $request->user()->id;
$collections = Cache::remember(
"user:{$userId}:collections",
now()->addMinutes(5),
fn () => Collection::where('user_id', $userId)
->with('cards:id,collection_id')
->get()
);
return CollectionResource::collection($collections);
}
}
// Avant : N+1 queries
$collections = Collection::all();
foreach ($collections as $collection) {
echo $collection->cards->count(); // Requête à chaque itération
}
// Après : Eager loading
$collections = Collection::withCount('cards')->get();
foreach ($collections as $collection) {
echo $collection->cards_count; // Pas de requête supplémentaire
}
-- Index pour les requêtes fréquentes
CREATE INDEX idx_cards_collection_type ON cards(collection_id, type);
CREATE INDEX idx_cards_user_created ON cards(user_id, created_at DESC);
CREATE INDEX idx_collections_user ON collections(user_id);
-- Index pour la recherche full-text
CREATE INDEX idx_cards_question_gin ON cards
USING gin(to_tsvector('french', question));
OutilUsage
PrometheusCollecte de métriques
GrafanaVisualisation
SentryError tracking
Laravel TelescopeDebug en développement
// Middleware de métriques
class MetricsMiddleware
{
public function handle($request, $next)
{
$start = microtime(true);
$response = $next($request);
$duration = microtime(true) - $start;
Metrics::histogram('http_request_duration_seconds', $duration, [
'method' => $request->method(),
'path' => $request->path(),
'status' => $response->status(),
]);
return $response;
}
}
MétriqueSeuilAction
CPU > 80% pendant 5minWarningNotification Slack
Latency p95 > 1sCriticalAlerte email + Slack
Error rate > 5%CriticalAlerte immédiate
Disk usage > 85%WarningNotification

Prêt pour la croissance, conçu pour la performance.