Posted on May 18, 2026May 18, 2026 | by rajeshkumar

Master Guide: Application Error Tracking in EKS using Datadog, DogStatsD, APM, Logs, and Error Tracking

First, tiny naming correction: it is DogStatsD, not DogStashD. DogStatsD is Datadog’s StatsD-compatible custom metrics service. It is excellent for counting application errors, but it is not a full Sentry replacement by itself. For Sentry-like error debugging, you should combine:

DogStatsD metrics
+ Application logs with stack traces
+ Datadog APM traces
+ Datadog Error Tracking
+ Kubernetes / EKS metadata
+ Unified service tagging

This is the best implementation pattern for your app running inside containers/pods on EKS.

1. What we are trying to build

The target outcome is:

Application error happens
        ↓
Datadog captures error count, log, trace, stack trace
        ↓
Datadog links it to service, env, version
        ↓
Datadog adds Kubernetes context
        ↓
You can identify the exact pod/container/deployment/node

Final relationship should look like this:

Error Issue
  ├── service: checkout-api
  ├── env: prod
  ├── version: 1.8.4
  ├── error.type: PaymentTimeoutException
  ├── endpoint: /api/checkout
  ├── kube_namespace: prod
  ├── kube_deployment: checkout-api
  ├── pod_name: checkout-api-7c9d8f98c9-xz2lp
  ├── container_name: checkout-api
  ├── node: ip-10-0-12-25
  └── trace_id / log correlation

Datadog’s unified service tagging is built around the standard env, service, and version tags, which are used to correlate metrics, traces, logs, containers, and deployment versions. (Datadog Monitoring)

2. High-level architecture

flowchart TD
    U[User / Client Request] --> ING[Ingress / ALB / API Gateway]
    ING --> SVC[Kubernetes Service]
    SVC --> POD[Application Pod in EKS]

    POD --> APP[Application Container]

    APP -->|DogStatsD custom error metrics| DSD[Datadog Agent DogStatsD]
    APP -->|APM traces and exceptions| APM[Datadog Agent APM Receiver]
    APP -->|stdout/stderr structured logs| LOGS[Kubernetes Node Log Files]

    LOGS --> AGENT[Datadog Agent DaemonSet]
    DSD --> AGENT
    APM --> AGENT

    KUBE[Kubernetes API / Kubelet Metadata] --> AGENT
    CLUSTER[Datadog Cluster Agent] --> AGENT

    AGENT --> DD[Datadog Platform]

    DD --> METRICS[Metrics Explorer / Dashboards]
    DD --> LOGEXP[Logs Explorer]
    DD --> TRACE[APM Traces / Service Map]
    DD --> ERR[Error Tracking]
    DD --> MON[Monitors / Alerts]

    ERR --> RCA[Root Cause Analysis]
    TRACE --> RCA
    LOGEXP --> RCA
    METRICS --> RCA

3. Sentry to Datadog mapping

Sentry capability	Datadog equivalent
Error issue grouping	Datadog Error Tracking
Stack trace	APM error span or structured error log
Release/version tracking	`version` tag
Environment	`env` tag
Project/service	`service` tag
Error count	DogStatsD custom metric
Request trace	Datadog APM
Breadcrumb-style context	Logs, trace spans, custom tags
Alert on new error	Error Tracking monitor
Alert on error volume	Metric monitor or APM monitor
Find pod/container	Kubernetes tags from Datadog Agent

DogStatsD is useful for custom error counters, but Error Tracking, logs, and APM are what give you the Sentry-like debugging experience.

4. Recommended implementation model

Use four data streams together:

flowchart LR
    A[Application Error] --> B[DogStatsD Metric]
    A --> C[Structured Error Log]
    A --> D[APM Trace / Span Error]
    A --> E[Kubernetes Metadata]

    B --> F[Dashboards and Metric Alerts]
    C --> G[Log Search and Error Tracking]
    D --> H[Trace Debugging and Service Map]
    E --> I[Pod / Container / Deployment / Node Relationship]

    F --> J[Datadog Incident View]
    G --> J
    H --> J
    I --> J

Data type	Purpose	Example
DogStatsD metric	Count and alert	`app.error.count`
Error log	Stack trace and message	JSON log with `error.stack`
APM trace	Request path and dependency failure	`/checkout` → `payment-service` timeout
Kubernetes metadata	Pod/container relationship	`pod_name`, `kube_deployment`, `kube_namespace`
Error Tracking issue	Group similar errors	`PaymentTimeoutException` grouped as one issue

Datadog Error Tracking groups errors into issues and can alert on new, regressed, or high-impact errors. (Datadog Monitoring)

5. Install Datadog Agent in EKS

Datadog supports installation through Datadog Operator, Helm, or manual DaemonSet. Datadog currently recommends the Operator for Kubernetes because it reduces misconfiguration risk, but Helm is also a very common production approach. (Datadog Monitoring)

For EKS, the standard model is:

Datadog Agent = DaemonSet
Datadog Cluster Agent = Deployment
Application Pod sends logs/traces/metrics to local node Agent

5.1 Create namespace and secret

kubectl create namespace datadog

kubectl -n datadog create secret generic datadog-secret \
  --from-literal api-key="$DD_API_KEY"

5.2 Example `datadog-values.yaml`

This is a practical production-style baseline for EKS application error tracking:

targetSystem: linux

datadog:
  apiKeyExistingSecret: datadog-secret

  # Example: datadoghq.com, datadoghq.eu, us3.datadoghq.com, us5.datadoghq.com, ap1.datadoghq.com
  site: datadoghq.com

  clusterName: eks-prod-apne1-01

  kubeStateMetricsCore:
    enabled: true

  collectEvents: true

  logs:
    enabled: true
    containerCollectAll: true

  apm:
    socketEnabled: true
    portEnabled: false

  dogstatsd:
    originDetection: true
    useSocketVolume: true
    socketPath: /var/run/datadog/dsd.socket
    tagCardinality: orchestrator

  tags:
    - cloud:aws
    - platform:eks
    - owner:devops

clusterAgent:
  enabled: true
  admissionController:
    enabled: true
    mutateUnlabelled: false

agents:
  containers:
    agent:
      resources:
        requests:
          cpu: 200m
          memory: 256Mi
        limits:
          memory: 512Mi

Important notes:

logs.enabled and containerCollectAll allow the Agent to collect container logs. Datadog’s Kubernetes log collection docs show enabling features.logCollection.enabled and containerCollectAll with the Operator; the Helm values above express the same intent for Helm-based installs. (Datadog Monitoring)

dogstatsd.originDetection helps the Agent identify which container/pod emitted DogStatsD metrics. Datadog documents that DogStatsD origin detection can tag metrics with the same pod tags as Autodiscovery metrics, but the Agent-side origin detection is not enabled by default unless configured. (Datadog Monitoring)

apm.socketEnabled and dogstatsd.useSocketVolume use Unix Domain Socket communication. For Kubernetes APM, Datadog supports UDS, host IP, or Kubernetes service communication, and recommends UDS for trace submission. (Datadog Monitoring)

5.3 Install or upgrade Agent

helm upgrade --install datadog-agent datadog/datadog \
  -n datadog \
  -f datadog-values.yaml

Verify:

kubectl -n datadog get pods
kubectl -n datadog get ds
kubectl -n datadog get deploy

Expected resources:

datadog-agent DaemonSet
datadog-cluster-agent Deployment

6. Add unified service tags to your application

This is the most important part for relationship-building.

Every application Deployment should have:

env
service
version

Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-api
  namespace: prod
  labels:
    tags.datadoghq.com/env: "prod"
    tags.datadoghq.com/service: "checkout-api"
    tags.datadoghq.com/version: "1.8.4"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout-api
  template:
    metadata:
      labels:
        app: checkout-api
        tags.datadoghq.com/env: "prod"
        tags.datadoghq.com/service: "checkout-api"
        tags.datadoghq.com/version: "1.8.4"
      annotations:
        admission.datadoghq.com/enabled: "true"
        ad.datadoghq.com/checkout-api.logs: '[{"source":"java","service":"checkout-api"}]'
    spec:
      containers:
        - name: checkout-api
          image: myrepo/checkout-api:1.8.4
          env:
            - name: DD_ENV
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/env']

            - name: DD_SERVICE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/service']

            - name: DD_VERSION
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/version']

            - name: DD_ENTITY_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.uid

            - name: DD_TRACE_AGENT_URL
              value: "unix:///var/run/datadog/apm.socket"

            - name: DOGSTATSD_SOCKET
              value: "/var/run/datadog/dsd.socket"

          volumeMounts:
            - name: datadog-socket
              mountPath: /var/run/datadog
              readOnly: true

      volumes:
        - name: datadog-socket
          hostPath:
            path: /var/run/datadog

Datadog’s Kubernetes unified service tagging documentation recommends applying tags.datadoghq.com/env, tags.datadoghq.com/service, and tags.datadoghq.com/version labels at the Deployment and pod template levels, and exposing them to the container as DD_ENV, DD_SERVICE, and DD_VERSION. (Datadog Monitoring)

7. Understand the exact error-tracking flow

sequenceDiagram
    participant User
    participant App as App Container
    participant DogStatsD as DogStatsD Socket
    participant Logs as Container Logs
    participant APM as APM Tracer
    participant Agent as Datadog Agent
    participant DD as Datadog
    participant ET as Error Tracking

    User->>App: API request
    App->>App: Exception occurs

    App->>DogStatsD: increment app.error.count
    DogStatsD->>Agent: custom metric with tags

    App->>Logs: write structured ERROR log with stack trace
    Logs->>Agent: collect stdout/stderr logs

    App->>APM: mark span as error
    APM->>Agent: send trace/span data

    Agent->>Agent: attach Kubernetes metadata
    Agent->>DD: send metrics, logs, traces

    DD->>ET: group similar errors into issues
    ET->>DD: issue with service/env/version/pod/container context

8. Implement DogStatsD error metrics

DogStatsD should be used for counting and alerting, not for full stack traces.

Recommended metric:

app.error.count

Recommended tags:

env
service
version
error_type
operation
endpoint
http_status
handled

Avoid high-cardinality tags:

request_id
user_id
session_id
order_id
full_url
full_error_message
stack_trace
pod_name unless intentionally needed

Bad metric design:

app.error.count{user_id:12345,request_id:abc,error_message:payment failed for order 998877}

Good metric design:

app.error.count{
  env:prod,
  service:checkout-api,
  version:1.8.4,
  error_type:PaymentTimeoutException,
  operation:checkout,
  endpoint:/api/checkout,
  http_status:500,
  handled:false
}

8.1 Generic application pattern

try:
    process_request()
except Exception as err:
    dogstatsd.increment(
        "app.error.count",
        tags=[
            "error_type:" + err.class_name,
            "operation:checkout",
            "endpoint:/api/checkout",
            "http_status:500",
            "handled:false"
        ]
    )

    logger.error(
        "Checkout failed",
        error=err,
        stack_trace=true,
        fields={
            "error.kind": err.class_name,
            "error.message": err.message,
            "error.stack": err.stack,
            "operation": "checkout",
            "endpoint": "/api/checkout",
            "http.status_code": 500
        }
    )

    raise

8.2 Python example

import os
import traceback
import logging
from datadog import DogStatsd

logger = logging.getLogger(__name__)

statsd = DogStatsd(
    socket_path=os.getenv("DOGSTATSD_SOCKET", "/var/run/datadog/dsd.socket")
)

def checkout(request):
    try:
        # business logic here
        process_payment(request)

    except Exception as exc:
        error_type = exc.__class__.__name__
        stack = traceback.format_exc()

        statsd.increment(
            "app.error.count",
            tags=[
                f"error_type:{error_type}",
                "operation:checkout",
                "endpoint:/api/checkout",
                "http_status:500",
                "handled:false",
            ],
        )

        logger.error(
            "Checkout failed",
            extra={
                "status": "error",
                "error.kind": error_type,
                "error.message": str(exc),
                "error.stack": stack,
                "operation": "checkout",
                "endpoint": "/api/checkout",
                "http.status_code": 500,
            },
        )

        raise

8.3 Node.js example

const StatsD = require("hot-shots");
const logger = require("./logger");

const dogstatsd = new StatsD({
  path: process.env.DOGSTATSD_SOCKET || "/var/run/datadog/dsd.socket"
});

async function checkout(req, res) {
  try {
    await processPayment(req.body);
    res.status(200).send({ status: "ok" });
  } catch (err) {
    dogstatsd.increment("app.error.count", 1, [
      `error_type:${err.name}`,
      "operation:checkout",
      "endpoint:/api/checkout",
      "http_status:500",
      "handled:false"
    ]);

    logger.error({
      status: "error",
      message: "Checkout failed",
      "error.kind": err.name,
      "error.message": err.message,
      "error.stack": err.stack,
      operation: "checkout",
      endpoint: "/api/checkout",
      "http.status_code": 500
    });

    throw err;
  }
}

9. Implement structured logs for Error Tracking

This is where you get the Sentry-like stack trace.

For Datadog Error Tracking from backend logs, the log should include:

status = ERROR / CRITICAL / ALERT / EMERGENCY
service
error.kind or error.stack

Datadog documents that backend error logs need either error.kind or a valid error.stack, a service attribute, and an error-level status. For better grouping, include error.message and error.stack. (Datadog Monitoring)

Recommended JSON log:

{
  "timestamp": "2026-05-18T10:00:00.000Z",
  "status": "error",
  "service": "checkout-api",
  "env": "prod",
  "version": "1.8.4",
  "message": "Checkout failed",
  "error.kind": "PaymentTimeoutException",
  "error.message": "Payment provider timed out",
  "error.stack": "PaymentTimeoutException: Payment provider timed out\n    at CheckoutService.pay...",
  "operation": "checkout",
  "endpoint": "/api/checkout",
  "http.status_code": 500
}

Recommended log rule:

Application logs should go to stdout/stderr.
Datadog Agent should collect container logs from the node.
Logs should be JSON if possible.
Each error log should contain service, env, version, error.kind, error.message, error.stack.

For Kubernetes, Datadog recommends Agent-based log collection and can collect logs from Kubernetes log files. File-based collection is preferred over Docker socket-based collection for performance and reliability in containerized environments. (Datadog Monitoring)

10. Implement APM for request-level debugging

APM is what lets you answer:

Which API failed?
Which downstream service failed?
Was it database, cache, third-party API, timeout, or code exception?
Which trace/log belongs to this error?

Flow:

flowchart TD
    REQ[Incoming Request /api/checkout] --> SPAN1[checkout-api span]
    SPAN1 --> SPAN2[payment-service HTTP call]
    SPAN1 --> SPAN3[database query]
    SPAN2 --> ERR[Timeout Exception]
    ERR --> TRACE[Trace marked as error]
    TRACE --> ET[Error Tracking Issue]
    TRACE --> LOG[Connected Logs]
    TRACE --> POD[Pod and Container Metadata]

Recommended APM environment variables:

env:
  - name: DD_ENV
    value: "prod"

  - name: DD_SERVICE
    value: "checkout-api"

  - name: DD_VERSION
    value: "1.8.4"

  - name: DD_TRACE_AGENT_URL
    value: "unix:///var/run/datadog/apm.socket"

  - name: DD_LOGS_INJECTION
    value: "true"

  - name: DD_RUNTIME_METRICS_ENABLED
    value: "true"

Datadog APM on Kubernetes supports UDS, host IP, or Kubernetes service routing for traces. In containerized environments, sending traces to localhost is usually wrong because the Agent is in another container/pod; for Kubernetes, use UDS, node host IP, Admission Controller injection, or a Kubernetes service pattern. (Datadog Monitoring)

11. How Error Tracking groups errors

Datadog Error Tracking groups similar errors into issues based on properties such as:

service
error.type / error.kind
error.message
error.stack
top meaningful stack frame

So two errors may become separate issues if they happen in different services or have different error types/stack-frame locations. (Datadog Monitoring)

Example:

checkout-api + PaymentTimeoutException + CheckoutService.pay()
= One Error Tracking issue

payment-service + PaymentTimeoutException + PaymentClient.call()
= Different Error Tracking issue

This is why service, error.kind, and error.stack matter so much.

12. Recommended tag strategy

Mandatory tags

Tag	Example	Purpose
`env`	`prod`	Separate prod/stage/dev
`service`	`checkout-api`	Service-level ownership
`version`	`1.8.4`	Release/deployment tracking

Strongly recommended tags

Tag	Example	Purpose
`team`	`payments`	Ownership
`product`	`motoshare`	Product/application grouping
`component`	`api`	API/worker/consumer grouping
`operation`	`checkout`	Business flow
`endpoint`	`/api/checkout`	API route
`error_type`	`PaymentTimeoutException`	Error classification
`handled`	`true/false`	Handled vs unhandled error
`cloud`	`aws`	Cloud provider
`platform`	`eks`	Runtime platform

Kubernetes tags Datadog can add

kube_cluster_name
kube_namespace
kube_deployment
kube_replica_set
pod_name
container_name
image_name
image_tag
node
availability_zone

For DogStatsD metrics, be careful with tag cardinality. Datadog notes that for UDP DogStatsD, pod_name is not added by default to avoid creating too many custom metrics, and tag cardinality can be controlled globally or per metric. (Datadog Monitoring)

My recommendation:

Use service/version-level DogStatsD metrics for alerting.
Use logs/APM/Error Tracking for exact pod/container investigation.
Use pod-level metric tagging only when you really need it.

13. Complete application telemetry flow

flowchart TD
    A[Exception in Application] --> B{Telemetry Type}

    B --> C[DogStatsD Counter]
    C --> C1[app.error.count]
    C1 --> C2[Alert: Error spike by service/version]

    B --> D[Structured Error Log]
    D --> D1[error.kind]
    D --> D2[error.message]
    D --> D3[error.stack]
    D3 --> D4[Error Tracking Issue]

    B --> E[APM Trace]
    E --> E1[Trace marked error]
    E1 --> E2[Request path]
    E2 --> E3[Downstream dependency failure]

    B --> F[Kubernetes Metadata]
    F --> F1[pod_name]
    F --> F2[container_name]
    F --> F3[kube_deployment]
    F --> F4[node]

    C2 --> G[Datadog Incident / Monitor]
    D4 --> G
    E3 --> G
    F4 --> G

14. Build dashboards

14.1 Error count by service

sum:app.error.count{env:prod} by {service}.as_count()

14.2 Error count by version

sum:app.error.count{env:prod,service:checkout-api} by {version}.as_count()

Use this to answer:

Did the new release increase errors?

14.3 Error count by operation

sum:app.error.count{env:prod,service:checkout-api} by {operation}.as_count()

Use this to answer:

Which business flow is failing?

14.4 Error count by error type

sum:app.error.count{env:prod,service:checkout-api} by {error_type}.as_count()

Use this to answer:

Which exception is most common?

14.5 Error count by Kubernetes deployment

sum:app.error.count{env:prod} by {kube_namespace,kube_deployment}.as_count()

Use this to answer:

Which deployment is producing the errors?

14.6 Pod-level view

Only use this if your DogStatsD metric cardinality/tagging supports it:

sum:app.error.count{env:prod,service:checkout-api} by {pod_name}.as_count()

For exact pod-level investigation, I would rely more on logs/APM/Error Tracking because pod-level metrics can create high cardinality and cost/noise.

15. Build monitors and alerts

15.1 Metric monitor: service error spike

sum(last_5m):sum:app.error.count{env:prod,service:checkout-api}.as_count() > 50

Alert message:

High application error count detected.

Service: {{service.name}}
Environment: {{env.name}}
Version: {{version.name}}

Check:
- Error Tracking issue
- APM trace
- Logs for error.stack
- Kubernetes pod/container details

15.2 Metric monitor: new version error spike

sum(last_10m):sum:app.error.count{env:prod,service:checkout-api} by {version}.as_count() > 100

Use this after deployments.

15.3 Error Tracking monitor: new issue

Use this for Sentry-like behavior:

Alert when a new backend issue appears for service:checkout-api env:prod

Datadog Error Tracking monitors support alerting on new issues, regressions, and high-impact errors. (Datadog Monitoring)

15.4 APM monitor: error rate

Example logic:

Error rate for checkout-api > 5% during last 5 minutes

Use this for service reliability monitoring.

16. Recommended alerting strategy

Do not create only one giant alert.

Use layered alerting:

flowchart TD
    A[Application Errors] --> B[Metric Alert]
    A --> C[Error Tracking New Issue Alert]
    A --> D[APM Error Rate Alert]
    A --> E[Kubernetes Pod Restart Alert]

    B --> F[High volume problem]
    C --> G[New code issue]
    D --> H[Request failure problem]
    E --> I[Runtime/container problem]

    F --> J[Incident]
    G --> J
    H --> J
    I --> J

Alert type	Detects	Best for
DogStatsD metric alert	Error volume spike	Fast service-level alert
Error Tracking alert	New/regressed grouped error	Sentry-like issue detection
APM error rate alert	Request failure percentage	API/SLO reliability
Log alert	Specific log pattern	Known failure modes
Kubernetes alert	CrashLoopBackOff/restarts	Pod/container health

17. Best practice: use DogStatsD for counters, not stack traces

DogStatsD should answer:

How many errors happened?
Which service/version/operation is failing?
Did errors increase after deployment?

DogStatsD should not answer:

What is the stack trace?
Which line of code failed?
What was the exception body?
What user/request caused this?

Those belong in:

APM
Logs
Error Tracking
Trace/log correlation

18. Best practice: standardize error classification

Create a small taxonomy across all services.

Example:

validation_error
dependency_timeout
database_error
authentication_error
authorization_error
business_rule_error
unexpected_exception

Then tag DogStatsD metrics like this:

error_category:dependency_timeout
error_type:PaymentTimeoutException
operation:checkout

This gives clean dashboards:

Errors by category
Errors by operation
Errors by service
Errors by version

19. Best practice: release/version tracking

Every deployment should set a unique version.

Good:

version: 1.8.4
version: git-sha-a8f91cd
version: 2026.05.18.1

Bad:

version: latest
version: prod
version: main

Datadog expects version to change with each application deployment so deployment impact can be identified cleanly. (Datadog Monitoring)

20. Best practice: log format

Use JSON logs.

Recommended fields:

{
  "status": "error",
  "service": "checkout-api",
  "env": "prod",
  "version": "1.8.4",
  "message": "Checkout failed",
  "error.kind": "PaymentTimeoutException",
  "error.message": "Payment provider timed out",
  "error.stack": "...",
  "operation": "checkout",
  "endpoint": "/api/checkout",
  "http.method": "POST",
  "http.status_code": 500,
  "customer_impact": true
}

Avoid logging sensitive data:

password
token
credit card
personal identity data
full request payloads
authorization headers

21. Best practice: deployment annotation for logs

For each application container, add Datadog log annotation:

annotations:
  ad.datadoghq.com/checkout-api.logs: >
    [{
      "source": "java",
      "service": "checkout-api",
      "tags": ["team:payments","component:api"]
    }]

Use the right source value:

App language/runtime	`source`
Java	`java`
Node.js	`nodejs`
Python	`python`
Go	`go`
.NET	`csharp` or configured .NET source
Ruby	`ruby`

The source tag matters because Datadog’s Error Tracking for logs uses language-specific handling, and Datadog recommends ensuring the source tag is properly configured. (Datadog Monitoring)

22. Pod/container relationship design

The relationship is built from three places:

flowchart TD
    A[Application Deployment Labels] --> D[env/service/version]
    B[Datadog Agent Kubernetes Metadata] --> E[pod/container/deployment/node]
    C[Application Logs/APM/DogStatsD] --> F[error/trace/metric]

    D --> G[Unified Datadog View]
    E --> G
    F --> G

    G --> H[Which service failed?]
    G --> I[Which version failed?]
    G --> J[Which pod/container failed?]
    G --> K[Which node hosted it?]

To make this work:

1. Datadog Agent must run in the cluster.
2. App pods must have unified service tags.
3. Logs/APM/DogStatsD must use the same service/env/version.
4. Error logs must include error.kind/error.stack.
5. APM tracer should inject trace/log correlation where supported.
6. DogStatsD origin detection should be enabled.

23. EKS-specific implementation notes

Standard EKS with EC2 worker nodes

Recommended:

Datadog Agent as DaemonSet
Use UDS for APM
Use UDS for DogStatsD
Collect container logs from nodes
Use Cluster Agent
Use Admission Controller where possible

EKS Fargate

Be careful. EKS Fargate does not behave like normal EC2 worker nodes because you do not manage the underlying node the same way. Datadog’s DogStatsD origin detection docs specifically mention shareProcessNamespace:true to assist the Agent for origin detection on EKS Fargate. (Datadog Monitoring)

If you are using Fargate, validate the Datadog deployment pattern separately.

24. End-to-end sample implementation

24.1 Datadog Agent values

targetSystem: linux

datadog:
  apiKeyExistingSecret: datadog-secret
  site: datadoghq.com
  clusterName: eks-prod-apne1-01

  logs:
    enabled: true
    containerCollectAll: true

  apm:
    socketEnabled: true
    portEnabled: false

  dogstatsd:
    originDetection: true
    useSocketVolume: true
    socketPath: /var/run/datadog/dsd.socket
    tagCardinality: orchestrator

  kubeStateMetricsCore:
    enabled: true

  collectEvents: true

clusterAgent:
  enabled: true
  admissionController:
    enabled: true
    mutateUnlabelled: false

24.2 App deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-api
  namespace: prod
  labels:
    tags.datadoghq.com/env: "prod"
    tags.datadoghq.com/service: "checkout-api"
    tags.datadoghq.com/version: "1.8.4"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout-api
  template:
    metadata:
      labels:
        app: checkout-api
        tags.datadoghq.com/env: "prod"
        tags.datadoghq.com/service: "checkout-api"
        tags.datadoghq.com/version: "1.8.4"
      annotations:
        admission.datadoghq.com/enabled: "true"
        ad.datadoghq.com/checkout-api.logs: >
          [{
            "source": "java",
            "service": "checkout-api",
            "tags": ["team:payments","component:api"]
          }]
    spec:
      containers:
        - name: checkout-api
          image: myrepo/checkout-api:1.8.4
          env:
            - name: DD_ENV
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/env']

            - name: DD_SERVICE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/service']

            - name: DD_VERSION
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['tags.datadoghq.com/version']

            - name: DD_ENTITY_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.uid

            - name: DD_TRACE_AGENT_URL
              value: "unix:///var/run/datadog/apm.socket"

            - name: DD_LOGS_INJECTION
              value: "true"

            - name: DD_RUNTIME_METRICS_ENABLED
              value: "true"

            - name: DOGSTATSD_SOCKET
              value: "/var/run/datadog/dsd.socket"

          volumeMounts:
            - name: datadog-socket
              mountPath: /var/run/datadog
              readOnly: true

      volumes:
        - name: datadog-socket
          hostPath:
            path: /var/run/datadog

25. Validation checklist

25.1 Validate Datadog Agent

kubectl -n datadog get pods
kubectl -n datadog get ds
kubectl -n datadog get deploy

25.2 Check Agent status

kubectl -n datadog exec -it <datadog-agent-pod-name> -c agent -- agent status

Look for:

APM Agent: Running
DogStatsD: Running
Logs Agent: Running

Datadog’s APM troubleshooting guide says the Agent status output should show the APM Agent as running; otherwise traces cannot be submitted properly. (Datadog Monitoring)

25.3 Validate app tags

Check pod labels:

kubectl -n prod get pod <pod-name> --show-labels

Expected:

tags.datadoghq.com/env=prod
tags.datadoghq.com/service=checkout-api
tags.datadoghq.com/version=1.8.4

25.4 Validate logs

Generate a test exception, then search logs by:

service:checkout-api env:prod status:error

Expected fields:

error.kind
error.message
error.stack
kube_namespace
pod_name
container_name

25.5 Validate DogStatsD metric

Search metric:

app.error.count

Group by:

service
version
error_type
operation

25.6 Validate APM

Search service:

service:checkout-api env:prod

Expected:

Traces visible
Error traces visible
Service map visible
Trace/log correlation working

25.7 Validate Error Tracking

Search backend issues for:

service:checkout-api env:prod

Expected:

Grouped error issue
Stack trace visible
Occurrences visible
Related logs/traces visible

26. Common problems and fixes

Problem	Likely cause	Fix
Error metric appears but no pod/container	DogStatsD origin detection/cardinality issue	Enable origin detection; use UDS; review tag cardinality
Error Tracking issue not created	Logs missing `error.kind` or `error.stack`	Add structured error fields
Logs visible but service name wrong	Missing log annotation or unified tags	Add `service` in log config and `DD_SERVICE`
APM traces missing	App cannot reach Agent	Use UDS or correct `DD_AGENT_HOST`; check Agent status
Trace/log correlation missing	Log injection not enabled	Enable tracer log injection
Too many custom metrics	High-cardinality metric tags	Remove `request_id`, `user_id`, `pod_name` from metrics
New release not visible	Static or missing `version`	Set unique `DD_VERSION` per deployment
Pod error not visible in metric	Pod tag not included for cardinality reasons	Use logs/APM for pod-level RCA or adjust cardinality carefully
Logs not collected	Agent log collection disabled	Enable container log collection

27. Best implementation pattern for your migration

Do not migrate like this:

Sentry → DogStatsD only

That will give weak debugging.

Migrate like this:

Sentry
  → Datadog Error Tracking
  → Datadog APM
  → Datadog Logs
  → DogStatsD custom error metrics
  → Kubernetes metadata correlation

Recommended production pattern:

flowchart TD
    A[Sentry Replacement Requirement] --> B[Error Tracking]
    A --> C[APM]
    A --> D[Logs]
    A --> E[DogStatsD Metrics]

    B --> F[Grouped Issues]
    C --> G[Trace and Dependency RCA]
    D --> H[Stack Trace and Context]
    E --> I[Fast Error Count Alerts]

    F --> J[Service / Env / Version]
    G --> J
    H --> J
    I --> J

    J --> K[Kubernetes Pod / Container / Deployment / Node]

28. Final recommended standard

For every service running in EKS, implement this standard:

1. Add Datadog unified service labels:
   - tags.datadoghq.com/env
   - tags.datadoghq.com/service
   - tags.datadoghq.com/version

2. Add application env vars:
   - DD_ENV
   - DD_SERVICE
   - DD_VERSION
   - DD_TRACE_AGENT_URL
   - DD_LOGS_INJECTION
   - DD_ENTITY_ID

3. Enable Datadog Agent features:
   - logs
   - APM
   - DogStatsD
   - DogStatsD origin detection
   - Kubernetes metadata
   - Cluster Agent

4. Application must emit:
   - DogStatsD metric: app.error.count
   - Structured error log with error.kind/error.message/error.stack
   - APM trace/span errors

5. Dashboards should show:
   - errors by service
   - errors by version
   - errors by operation
   - errors by error_type
   - errors by namespace/deployment
   - related pods/containers through logs/APM

6. Alerts should include:
   - new Error Tracking issue
   - high error count
   - high APM error rate
   - pod restart/crashloop alerts

29. Final conclusion

The best Datadog design for application error tracking in EKS is:

DogStatsD for custom error counters
Logs for stack traces
APM for request/dependency tracing
Error Tracking for Sentry-like issue grouping
Unified service tagging for service/env/version relationship
Kubernetes metadata for pod/container/node relationship

In short:

DogStatsD tells you how many errors happened.
Logs tell you what exception happened.
APM tells you where in the request path it failed.
Error Tracking groups the issue.
Kubernetes metadata tells you which pod/container/deployment/node caused it.

That combination gives you a clean, production-grade replacement for Sentry while also giving stronger EKS infrastructure correlation than Sentry alone.

MOTOSHARE 🚗🏍️ Turning Idle Vehicles into Shared Rides & Earnings

wewqe

Master Guide: Application Error Tracking in EKS using Datadog, DogStatsD, APM, Logs, and Error Tracking

1. What we are trying to build

2. High-level architecture

3. Sentry to Datadog mapping

4. Recommended implementation model

5. Install Datadog Agent in EKS

5.1 Create namespace and secret

5.2 Example datadog-values.yaml

5.3 Install or upgrade Agent

6. Add unified service tags to your application

7. Understand the exact error-tracking flow

8. Implement DogStatsD error metrics

8.1 Generic application pattern

8.2 Python example

8.3 Node.js example

9. Implement structured logs for Error Tracking

10. Implement APM for request-level debugging

11. How Error Tracking groups errors

12. Recommended tag strategy

Mandatory tags

Strongly recommended tags

Kubernetes tags Datadog can add

13. Complete application telemetry flow

14. Build dashboards

14.1 Error count by service

14.2 Error count by version

14.3 Error count by operation

14.4 Error count by error type

14.5 Error count by Kubernetes deployment

14.6 Pod-level view

15. Build monitors and alerts

15.1 Metric monitor: service error spike

15.2 Metric monitor: new version error spike

15.3 Error Tracking monitor: new issue

15.4 APM monitor: error rate

16. Recommended alerting strategy

17. Best practice: use DogStatsD for counters, not stack traces

18. Best practice: standardize error classification

19. Best practice: release/version tracking

20. Best practice: log format

21. Best practice: deployment annotation for logs

22. Pod/container relationship design

23. EKS-specific implementation notes

Standard EKS with EC2 worker nodes

EKS Fargate

24. End-to-end sample implementation

24.1 Datadog Agent values

24.2 App deployment

25. Validation checklist

25.1 Validate Datadog Agent

25.2 Check Agent status

25.3 Validate app tags

25.4 Validate logs

25.5 Validate DogStatsD metric

25.6 Validate APM

25.7 Validate Error Tracking

26. Common problems and fixes

27. Best implementation pattern for your migration

28. Final recommended standard

29. Final conclusion

MOTOSHARE 🚗🏍️
Turning Idle Vehicles into Shared Rides & Earnings

5.2 Example `datadog-values.yaml`