Skip to content

Performances et scalabilité

MétriqueValeur actuelleObjectif
Temps de réponse API (p50)45ms< 100ms
Temps de réponse API (p95)180ms< 500ms
Génération IA (moyenne)8s< 15s
Uptime99.2%99.9%
Utilisateurs simultanés100 testés10 000 cible

Endpoint: GET /api/v1/collections
Concurrence: 50 utilisateurs
Durée: 60 secondes

Benchmark API
100%
xychart-beta
    title "Résultats du benchmark API"
    x-axis ["Req/s", "Latence moy (ms)", "P95 (ms)", "P99 (ms)"]
    y-axis "Valeur" 0 --> 500
    bar [450, 42, 156, 312]
MétriqueValeur
Requêtes/sec450
Latence moyenne42ms
P95 latence156ms
P99 latence312ms
Erreurs0%
Architecture Cloud
100%
flowchart LR
 subgraph Client["Client"]
        RN["React Native"]
  end
 subgraph DNS["DNS"]
        API_DNS["api.mindlet.app"]
        WS_DNS["ws.mindlet.app"]
  end
 subgraph Ingress["Nginx Ingress Controller"]
        LB["Load Balancer"]
        RT["Router"]
  end
 subgraph API["API Deployment"]
        OCT["Laravel Octane"]
        POD_API1["POD"]
        POD_API2["POD"]
        POD_API3["POD"]
  end
 subgraph WS["WS Deployment"]
        REV["Laravel Reverb"]
        POD_WS["POD"]
  end
 subgraph REDIS["Redis Service"]
        REDIS_APP["Redis"]
        POD_REDIS["POD"]
  end
 subgraph WORKERS["Worker supervisors"]
        HZ["Laravel Horizon"]
        POD_W1["POD"]
        POD_W2["POD"]
        POD_W3["POD"]
  end
 subgraph K8S["OVH Cloud Kubernetes Cluster"]
        Ingress
        API_SVC(("api-service"))
        WS_SVC(("ws-service"))
        API
        WS
        REDIS
        WORKERS
  end
 subgraph Neon["Neon"]
        PG[("PostgreSQL")]
  end
 subgraph Qdrant["Qdrant Cloud"]
        VDB[("Vector DB")]
  end
 subgraph LangGraph["LangGraph Cloud"]
        CARD["Card generation"]
        EMB["Embeddings"]
  end
    RN --> API_DNS & WS_DNS
    LB --> RT
    API_DNS --> LB
    WS_DNS --> LB
    RT --> API_SVC & WS_SVC
    OCT --> POD_API1 & POD_API2 & POD_API3
    REV --> POD_WS
    REDIS_APP --> POD_REDIS
    HZ --> POD_W1 & POD_W2 & POD_W3
    API_SVC --> API
    WS_SVC --> WS
    API <--> REDIS
    WS <--> REDIS
    WORKERS <--> REDIS
    API --> PG & VDB & CARD & EMB
ComposantConfigurationFournisseur
API (Kubernetes)2 pods, 1 CPU, 2GB RAMOVHcloud
PostgreSQLManaged, 2 vCPU, 4GB RAMOVHcloud
RedisManaged, 1GB RAMOVHcloud
Qdrant1 instance, 2GB RAMHetzner
MinIOObject storage, 50GBOVHcloud

Kubernetes HPA

Auto-scaling des pods API basé sur CPU/mémoire

Load Balancer

Distribution de charge entre les instances

Stateless Design

Chaque pod peut traiter n’importe quelle requête

Queue Workers

Traitement asynchrone des tâches lourdes

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mindlet-api
spec:
replicas: 2
selector:
matchLabels:
app: mindlet-api
template:
spec:
containers:
- name: api
image: mindlet/api:latest
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mindlet-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mindlet-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
// Job de génération de cartes
class GenerateCardsJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public int $tries = 3;
public int $timeout = 120;
public function __construct(
private Document $document,
private User $user
) {
$this->onQueue('high');
}
public function handle(AIService $aiService): void
{
$cards = $aiService->generateCards($this->document);
event(new CardsGenerated($cards, $this->user));
}
public function failed(Throwable $exception): void
{
Log::error('Card generation failed', [
'document_id' => $this->document->id,
'error' => $exception->getMessage(),
]);
$this->user->notify(new GenerationFailedNotification());
}
}
UtilisateursAPI PodsIA WorkersDB SizeCoût estimé/mois
100 (actuel)215GB50€
1 0004220GB150€
10 00084100GB500€
100 00016+8+500GB2 000€+
ComposantRisqueSolution prévue
Base de donnéesConnexions saturéesConnection pooling (PgBouncer)
Service IACoûts LLMCache des réponses, modèles locaux
StockageCroissance rapidePolitique de rétention, compression
QdrantMémoire insuffisanteSharding, clustering
class CollectionController extends Controller
{
public function index(Request $request): JsonResponse
{
$userId = $request->user()->id;
$collections = Cache::remember(
"user:{$userId}:collections",
now()->addMinutes(5),
fn () => Collection::where('user_id', $userId)
->with('cards:id,collection_id')
->get()
);
return CollectionResource::collection($collections);
}
}
// Avant : N+1 queries
$collections = Collection::all();
foreach ($collections as $collection) {
echo $collection->cards->count(); // Requête à chaque itération
}
// Après : Eager loading
$collections = Collection::withCount('cards')->get();
foreach ($collections as $collection) {
echo $collection->cards_count; // Pas de requête supplémentaire
}
-- Index pour les requêtes fréquentes
CREATE INDEX idx_cards_collection_type ON cards(collection_id, type);
CREATE INDEX idx_cards_user_created ON cards(user_id, created_at DESC);
CREATE INDEX idx_collections_user ON collections(user_id);
-- Index pour la recherche full-text
CREATE INDEX idx_cards_question_gin ON cards
USING gin(to_tsvector('french', question));
OutilUsage
PrometheusCollecte de métriques
GrafanaVisualisation
SentryError tracking
Laravel TelescopeDebug en développement
// Middleware de métriques
class MetricsMiddleware
{
public function handle($request, $next)
{
$start = microtime(true);
$response = $next($request);
$duration = microtime(true) - $start;
Metrics::histogram('http_request_duration_seconds', $duration, [
'method' => $request->method(),
'path' => $request->path(),
'status' => $response->status(),
]);
return $response;
}
}
MétriqueSeuilAction
CPU > 80% pendant 5minWarningNotification Slack
Latency p95 > 1sCriticalAlerte email + Slack
Error rate > 5%CriticalAlerte immédiate
Disk usage > 85%WarningNotification

Prêt pour la croissance, conçu pour la performance.