Removing Personal Information (PII) from Sentry Error Monitoring in JavaScript

Removing sensitive data from Sentry exceptions & staying GDPR-compliant in a Next.js application.

·

22 min read

@AKQA is committed to being GDPR-compliant. While working on a client's Next.js project, I investigated Sentry to find the best approach to remove all personal information from the monitoring logs.

1.0 - Sentry

Sentry is an American application performance monitoring (APM) tool, provided as a platform as a service (PaaS) and an alternative to DataDog, NewRelic or AppSignal. Using a global CDN, all events and logs are eventually stored on servers located in the United States, which makes Sentry a poor choice GDPR-wise. Sentry allows for self-hosting the solution, making you the data controller.

Sentry has a pre-built plug-and-play library for Next.js, that hooks into the server and client runtime. This library logs server & client exceptions and client performance.

You can add the library to a Next.js project by using their installation wizard. A guide can be found here.

The wizard adds the following files to your project:

  • /sentry.client.config.js - Configuration of how client exceptions should be handled.

  • /sentry.server.config.js - Configuration of how server exceptions should be handled.

  • /sentry.properties - Configuration values used for sentry-cli which is installed during the wizard

  • /.sentryclirc - Auth token for sentry-cli and for managing releases & source maps. You can use the SENTRY_AUTH_TOKEN environment variable as an alternative.

  • /pages/sentry_sample_error.js - Sample client page that can be used to test the installation/Sentry configs.

When running Sentry locally with npm run dev, Sentry does not upload sourcemaps to prevent sourcemaps upload on each file change. You'll therefore get a "Source code was not found (see Troubleshooting for JavaScript)" error in Sentry's Dashboard. This error disappears when running in production mode (npm run build & npm run start).

1.1 - Logging & GDPR

Note

The following statements serve as a guideline and are not 100% legally verified. All statements are based on extensive online research with linked sources.

It is allowed to store personally identifiable information (PII), e.g. CPR, Address, E-mail, IP, ... in server logs, as long as you have a valid purpose and inform the users of:

  • the purpose of data collection

  • how long the data is stored

  • who has access to the data

  • Whoever controls the application should never use the data for any other purpose agreed to by the user in a privacy notice. As an example, here's reddit's logging policy.

(Source 1, Source 2, Source 3).

That said, it's still your duty to protect as much PII as possible. Therefore, we can:

  • pseudonymize PII as much as possible, e.g. changing all e-mails to example@example.com

  • remove PII from logs altogether, known as Data Scrubbing in Sentry

Data Scrubbing

With Sentry, you can filter out data that is stored on their side (e.g. personal information). You have the option to filter out:

  • before it's sent to Sentry's server, e.g. in the sentry.client.config.js & sentry.server.config.js files. This is what Sentry calls SDK Data Scrubbing.

  • after it has been sent to Sentry's server, but before storing the data. This is what Sentry calls Server-Side Data Scrubbing.

Sentry filters out by default values or object properties that contain a list of predefined strings, e.g. password, secret, credit card format (using regex), ... .

Some PII like e-mail, CPR, and address,... are not part of the Server-Side data scrubbing logic. You can create additional server-side data scrubbing rules, which will further narrow down the logic of removing sensitive data.

2.0 - Server-Side Data Scrubbing

The Sentry team themselves have written a blog post on how to get started with Server-Side scrubbing, which you can find here.

To get a basic understanding of what's possible, you can select from a range of pre-defined regexes (e.g., "Username in filepaths", “Email addresses”, "IP addresses"), build your own, or use their own rule syntax. Furthermore, you can choose from a list of methods for handling found sensitive data.

Sentry Advanced Data Scrubbing rule

Server-side data scrubbing has many options, but it also has its limitations. It requires you to learn Sentries custom rule syntax, may not work with all JSON objects, and may require more work to manage rules.

Note: IP Address scrubbing

In Sentry's Settings > Security & Privacy, there is an option called Prevent storing of IP addresses. I reached out to their support and asked about the differences between this option and a custom IP address scrubbing rule. The Prevent storing of IP addresses option only removes IP addresses from the event.user.ip_address⁣ property, and not on the whole event. If you want to completely remove IP addresses, you should use a custom rule with wildcards. There's a GitHub issue about possibly replacing the behaviour in the future

Additionally, server-side scrubbing provides two options for scrubbing IP addresses, but they do different things.

Rules that can't be covered by server-side data scrubbing will have to be removed with SDK data scrubbing.

2.1 - SDK Data Scrubbing

Software Development Kit (SDK) is the term Sentry uses for all their NodeJs, PHP, Python,... libraries (@sentry/nodejs) and framework-agnostic libraries (@sentry/nextjs)

Sentry's SDKs provide the

  • beforeSend hook to modify the exception payload sent to Sentry

  • beforeSendTransaction hook to modify the performance payload (known as a transaction) sent to Sentry

Using both hooks covers all requests made to Sentry. Let's have a look at one of them.

Sentry.init({
    dsn: "https://examplePublicKey@o0.ingest.sentry.io/0",
    beforeSend(event) {
        // Modify the event here
        if (event.user) {
            // Don't send user's email address
            delete event.user.email;
        }
        return event;
    },
});

The event payload

An event is a structured object containing data to send errors, exceptions, crashes and transactions (for performance) to Sentry. More about the event payload can be found in the Event Introduction, the Event API, the typescript types and Data Handling sites. Both the beforeSend(event) and beforeSendTransaction(event) use the same event payload object.

Server-side exception

As an example. In an /pages/api/example.js file, we throw the following server error:

export default (req, res) => {
    throw new Error("API throw error test");
    res.status(200).json({ name: "John Doe" });
};

This results in triggering the beforeSend hook with the following event payload:

{
    exception: {
        values: [
            {
                type: "Error",
                value: "API throw error test",
                stacktrace: {
                    frames: [
                        /* ... previous stack traces */
                        {
                            filename:
                                "app:///_next/server/pages/api/example.js",
                            module: "test",
                            function: "test_sentry_wrapped_",
                            lineno: 36,
                            colno: 11,
                            in_app: true,
                            pre_context: [
                                "__webpack_require__.r(test_sentry_wrapped_namespaceObject);",
                                "__webpack_require__.d(test_sentry_wrapped_namespaceObject, {",
                                '  "default": () => (test_sentry_wrapped_)',
                                "});",
                                "",
                                ";// CONCATENATED MODULE: ./src/pages/api/example.js?__sentry_wrapped__",
                                "/* harmony default export */ const test_sentry_wrapped_ = ((req, res)=>{",
                            ],
                            context_line:
                                '    throw new Error("API throw error test");',
                            post_context: [],
                            cpr: "This user had cpr id \"123213123\" and hurray"
                        },
                    ],
                },
                mechanism: {
                    type: "instrument",
                    handled: true,
                    data: {
                        wrapped_handler: "test_sentry_wrapped_",
                        function: "withSentry",
                    },
                },
            },
        ],
    },
    event_id: "b6ac48cfeda8425fb38e2e7e0c7e4e48",
    platform: "node",
    contexts: {
        trace: {
            op: "http.server",
            span_id: "89c2021b062cdaee",
            trace_id: "c3117536a3844e7293e5ab44e38fcb4f",
        },
        runtime: { name: "node", version: "v16.15.0" },
        app: {
            app_start_time: "2022-11-29T10:09:08.819Z",
            app_memory: 146898944,
        },
        os: {
            kernel_version: "21.6.0",
            name: "macOS",
            version: "12.5.1",
            build: "21G83",
        },
        device: {
            boot_time: "2022-11-17T07:20:21.054Z",
            arch: "arm64",
            memory_size: 17179869184,
            free_memory: 72073216,
            processor_count: 10,
            cpu_description: "Apple M1 Pro",
            processor_frequency: 24,
        },
        culture: { locale: "en-US", timezone: "Europe/Berlin" },
    },
    server_name: "my-laptop",
    timestamp: 1669716575.999,
    environment: "production",
    release: "XmP2QZ-ry87BGWjyr78eq",
    sdk: {
        integrations: [
            "InboundFilters",
            "FunctionToString",
            "Console",
            "Http",
            "OnUncaughtException",
            "OnUnhandledRejection",
            "ContextLines",
            "Context",
            "Modules",
            "RequestData",
            "LinkedErrors",
            "RewriteFrames",
        ],
    },
    tags: { transaction: "GET /api/example", runtime: "node" },
    breadcrumbs: undefined,
    sdkProcessingMetadata: {
        request: {
            /* ... */
        },
    },
    modules: {
        next: "12.3.4",
        react: "18.0.0",
        // ...
    },
    request: {
        method: "GET",
        cookies: {
            "next-auth.csrf-token":
                "b5437049f029faf5530c7769958e61f402c4f533d6be123dasawsdbb9a3|3ee95b5987b512ebb0736f48ef9d28a61f4f788502343617573b0a78ff349c75",
            "next-auth.callback-url": "https://example.com/home",
            "next-auth.session-token":
                "eyJhbGciOiJkaXIiLCJl12323jkhkkjhad0NNIn0..F0mvXYeLI3MKKcnG.xfj1A3rS3Arh8DXcn-52-IXMC8RfFcNFEVAsH_dVK-wa_qwjaklHChCRCoONowjrhUXZcqit9bAyVyC4HKVAPpBol4rgek1E9wyxngYdj-mjXtG-138yO1gKLJzsE7gLHq3zXR0EGf477K973K_kA0k4wABmBA4rFN1U5npGUqn-2LI7EAo254T-BDil2Vm_-Qi2SFDdnVTv9Ok7Y_osjKw.qtd1YzN8Yxwr6ntWflunUw",
        },
        headers: {
            host: "example.com",
            connection: "keep-alive",
            pragma: "no-cache",
            "cache-control": "no-cache",
            "sec-ch-ua":
                '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
            "sec-ch-ua-mobile": "?0",
            "sec-ch-ua-platform": '"macOS"',
            "upgrade-insecure-requests": "1",
            "user-agent":
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
            accept: "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
            "sec-fetch-site": "same-origin",
            "sec-fetch-mode": "navigate",
            "sec-fetch-user": "?1",
            "sec-fetch-dest": "document",
            referer: "https://example.com/api/example",
            "accept-encoding": "gzip, deflate, br",
            "accept-language": "en-GB,en-US;q=0.9,en;q=0.8,da;q=0.7,la;q=0.6",
            cookie: "next-auth.csrf-token=b5437049f029faf5530c7769958e61f402c4f533d6be123dasawsdbb9a3|3ee95b5987b512ebb0736f48ef9d28a61f4f788502343617573b0a78ff349c75; next-auth.session-token=eyJhbGciOiJkaXIiLCJl12323jkhkkjhad0NNIn0..F0mvXYeLI3MKKcnG.xfj1A3rS3Arh8DXcn-52-IXMC8RfFcNFEVAsH_dVK-wa_qwjaklHChCRCoONowjrhUXZcqit9bAyVyC4HKVAPpBol4rgek1E9wyxngYdj-mjXtG-138yO1gKLJzsE7gLHq3zXR0EGf477K973K_kA0k4wABmBA4rFN1U5npGUqn-2LI7EAo254T-BDil2Vm_-Qi2SFDdnVTv9Ok7Y_osjKw.qtd1YzN8Yxwr6ntWflunUw",
        },
        query_string: {},
        url: "https://example.com/api/example",
    },
    transaction: "GET /api/example",
}

When looking at the above event inside the Sentry Dashboard (sentry.io), it is evident that the Server-Side Data Scrubbing has kicked in. The following screenshot from Sentry's dashboard shows that some values have been filtered out.

About sdkProcessingMetadata

The event may contain the unserializable sdkProcessingMetadata. From the type definition and tests, this hardly documented IncomingMessage object is used only locally to process the event, but the object itself is never sent to Sentry.

Client-side exception

Next, let's have a look at a client-side exception. Sentry's pages/sentry_sample_error.js page throws throw new Error("Sentry Frontend Error");. The beforeSend(event) captures the following event payload.

{
    exception: {
        values: [
            {
                type: "Error",
                value: "Sentry Frontend Error",
                stacktrace: {
                    frames: [
                        /* ... previous stack traces */
                        {
                            filename:
                                "webpack-internal:///./src/pages/sentry_sample_error.js",
                            function: "onClick",
                            in_app: true,
                            lineno: 101,
                            colno: 35,
                        },
                    ],
                },
                mechanism: {
                    type: "instrument",
                    handled: true,
                    data: {
                        function: "addEventListener",
                        handler: "callCallback",
                        target: "EventTarget",
                    },
                },
            },
        ],
    },
    level: "error",
    platform: "javascript",
    event_id: "010azce3dac740369d0026bcd216aa1c",
    timestamp: 1669721899.15,
    environment: "production",
    release: "jYm2QZ-ry87BGWjyr78eq",
    sdk: {
        integrations: [
            "InboundFilters",
            "FunctionToString",
            "TryCatch",
            "Breadcrumbs",
            "GlobalHandlers",
            "LinkedErrors",
            "Dedupe",
            "HttpContext",
            "BrowserTracing",
        ],
        name: "sentry.javascript.nextjs",
        version: "7.21.0",
        packages: [
            {
                name: "npm:@sentry/nextjs",
                version: "7.21.0",
            },
            {
                name: "npm:@sentry/react",
                version: "7.21.0",
            },
        ],
    },
    tags: {
        runtime: "browser",
    },
    breadcrumbs: [
        {
            timestamp: 1669713905.044,
            category: "fetch",
            data: {
                method: "GET",
                url: "/_next/static/development/_devMiddlewareManifest.json",
                __span: "b1c4ce4211d3d216",
                status_code: 200,
            },
            type: "http",
        },
        {
            timestamp: 1669713905.062,
            category: "navigation",
            data: {
                from: "/sentry_sample_error",
                to: "/sentry_sample_error",
            },
        },
        {
            timestamp: 1669721899.146,
            category: "ui.click",
            message:
                'div#__next > div.css-1g4yje1 > div > main > button[type="button"]',
        },
    ],
    request: {
        url: "https://example.com/sentry_sample_error",
        headers: {
            Referer: "https://example.com/sentry_sample_error",
            "User-Agent":
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
        },
    },
    extra: {
        arguments: [
            {
                type: "react-click",
                target: "react",
                currentTarget: "react",
                isTrusted: false,
            },
        ],
    },
}

Looking at the server and client exception events, we can see the following default applied differences:

  • client exception

    • breadcrumbs - explain what actions occurred before the exception happened. Breadcrumbs are either auto-generated (for example on the client event payload) or can be specified
  • server exception

    • contexts - non-indexed data that provides additional information for an exception

Both values, along with others, are not exclusive to either client/server exceptions and will depend on the SDK (e.g., @sentry/browser or @sentry/node) or custom-applied logging values.

Transactions

In contrast to the exception events, the transaction event (beforeSendTransaction(event)) populates the event.spans property. Spans is an array of transaction data, which are known as traces (Traces, Transaction and Spans).

Below is the output of a transaction event from the beforeSendTransaction(event) hook.

{
    contexts: {
        trace: {
            op: "pageload",
            span_id: "8683baae3fade3e2",
            tags: {
                "routing.instrumentation": "next-router",
                effectiveConnectionType: "4g",
                deviceMemory: "8 GB",
                hardwareConcurrency: "10",
                sentry_reportAllChanges: false,
            },
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
    },
    spans: [
        {
            data: {
                method: "GET",
                url: "/_next/static/development/_devMiddlewareManifest.json",
                type: "fetch",
            },
            description:
                "GET /_next/static/development/_devMiddlewareManifest.json",
            op: "http.client",
            parent_span_id: "8683baae3fade3e2",
            span_id: "8711dce5b94d17db",
            start_timestamp: 1670321646.5132003,
            status: "ok",
            tags: {
                "http.status_code": "200",
            },
            timestamp: 1670321646.5689,
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
        {
            description: "Main UI thread blocked",
            op: "ui.long-task",
            parent_span_id: "8683baae3fade3e2",
            span_id: "a499230edab87a13",
            start_timestamp: 1670321645.7012,
            timestamp: 1670321645.8892,
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
        {
            description: "Main UI thread blocked",
            op: "ui.long-task",
            parent_span_id: "8683baae3fade3e2",
            span_id: "b2ecbb27ab02e000",
            start_timestamp: 1670321645.9296,
            timestamp: 1670321646.5146,
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
        {
            data: {
                method: "GET",
                url: "/_next/static/development/_devPagesManifest.json",
                type: "fetch",
            },
            description: "GET /_next/static/development/_devPagesManifest.json",
            op: "http.client",
            parent_span_id: "8683baae3fade3e2",
            span_id: "804784cf6e3871d7",
            start_timestamp: 1670321646.7549002,
            status: "ok",
            tags: {
                "http.status_code": "200",
            },
            timestamp: 1670321646.8977003,
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
        {
            data: {
                method: "GET",
                url: "/api/v1/auth/session",
                type: "fetch",
            },
            description: "GET /api/v1/auth/session",
            op: "http.client",
            parent_span_id: "8683baae3fade3e2",
            span_id: "84b806bb29db004e",
            start_timestamp: 1670321646.768,
            status: "ok",
            tags: {
                "http.status_code": "200",
            },
            timestamp: 1670321646.9006002,
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
        /* ... */
        {
            data: {
                "Transfer Size": 1569,
                "Encoded Body Size": 1269,
                "Decoded Body Size": 5430,
            },
            description: "/favicon.ico",
            op: "resource.other",
            parent_span_id: "8683baae3fade3e2",
            span_id: "aad953aaa4498b50",
            start_timestamp: 1670321646.8346999,
            timestamp: 1670321646.8741,
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
        {
            data: {
                "Transfer Size": 1569,
                "Encoded Body Size": 1269,
                "Decoded Body Size": 5430,
            },
            description: "/favicon.ico",
            op: "resource.other",
            parent_span_id: "8683baae3fade3e2",
            span_id: "a0522abc460be60a",
            start_timestamp: 1670321646.8974,
            timestamp: 1670321646.9499998,
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
        {
            data: {
                "Transfer Size": 1569,
                "Encoded Body Size": 1269,
                "Decoded Body Size": 5430,
            },
            description: "/favicon.ico",
            op: "resource.other",
            parent_span_id: "8683baae3fade3e2",
            span_id: "809b1c43a93d1df6",
            start_timestamp: 1670321646.8974,
            timestamp: 1670321646.9866998,
            trace_id: "b66a6547e8e4418a93a6cc4ba4c5f23b",
        },
    ],
    start_timestamp: 1670321645.208,
    tags: {
        runtime: "browser",
        "routing.instrumentation": "next-router",
        effectiveConnectionType: "4g",
        deviceMemory: "8 GB",
        hardwareConcurrency: "10",
        sentry_reportAllChanges: false,
    },
    timestamp: 1670321647.1433,
    transaction: "/",
    type: "transaction",
    transaction_info: {
        source: "route",
        changes: [],
        propagations: 6,
    },
    measurements: {
        fp: {
            value: 1403.8999999761581,
            unit: "millisecond",
        },
        fcp: {
            value: 1403.8999999761581,
            unit: "millisecond",
        },
        "connection.rtt": {
            value: 0,
            unit: "millisecond",
        },
        ttfb: {
            value: 254.8999786376953,
            unit: "millisecond",
        },
        "ttfb.requestTime": {
            value: 244.99988555908203,
            unit: "millisecond",
        },
    },
    platform: "javascript",
    event_id: "4ce59dcfed3e4f1a80648cb181b5a2ed",
    environment: "local",
    release: "development",
    sdk: {
        integrations: [
            "InboundFilters",
            "FunctionToString",
            "TryCatch",
            "Breadcrumbs",
            "GlobalHandlers",
            "LinkedErrors",
            "Dedupe",
            "HttpContext",
            "BrowserTracing",
        ],
        name: "sentry.javascript.nextjs",
        version: "7.21.0",
        packages: [
            {
                name: "npm:@sentry/nextjs",
                version: "7.21.0",
            },
            {
                name: "npm:@sentry/react",
                version: "7.21.0",
            },
        ],
    },
    breadcrumbs: [
        {
            timestamp: 1670321646.569,
            category: "fetch",
            data: {
                method: "GET",
                url: "/_next/static/development/_devMiddlewareManifest.json",
                __span: "8711dce5b94d17db",
                status_code: 200,
            },
            type: "http",
        },
        {
            timestamp: 1670321646.577,
            category: "navigation",
            data: {
                from: "/",
                to: "/",
            },
        },
        {
            timestamp: 1670321646.897,
            category: "fetch",
            data: {
                method: "GET",
                url: "/_next/static/development/_devPagesManifest.json",
                __span: "804784cf6e3871d7",
                status_code: 200,
            },
            type: "http",
        },
        {
            timestamp: 1670321646.9,
            category: "fetch",
            data: {
                method: "GET",
                url: "/api/v1/auth/session",
                __span: "84b806bb29db004e",
                status_code: 200,
            },
            type: "http",
        },
        {
            timestamp: 1670321646.901,
            category: "fetch",
            data: {
                method: "GET",
                url: "/api/v1/auth/session",
                __span: "89a42cd9c4b3b735",
                status_code: 200,
            },
            type: "http",
        },
        {
            timestamp: 1670321646.971,
            category: "xhr",
            data: {
                method: "GET",
                url: "http://localhost:8085/dist/cabl.json",
                status_code: 0,
            },
            type: "http",
        },
        {
            timestamp: 1670321647.143,
            category: "fetch",
            data: {
                method: "GET",
                url: "/_next/data/development/index.json",
                __span: "b8b6c44d28d54f9f",
                status_code: 200,
            },
            type: "http",
        },
        {
            timestamp: 1670321647.145,
            category: "navigation",
            data: {
                from: "/",
                to: "/",
            },
        },
        {
            timestamp: 1670321647.146,
            category: "navigation",
            data: {
                from: "/",
                to: "/",
            },
        },
    ],
    request: {
        url: "http://localhost:3000/",
        headers: {
            Referer: "http://localhost:3000/checkout/oplysninger",
            "User-Agent":
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
        },
    },
};

contexts vs tags vs extra

Tags are small key/value pairs that are indexed for searches in Sentry, tags also have lower size limits. Contexts and Extra are a bit different. Both have the same idea behind Tags, which add more information to events, except that these will not be used in searches. Context is structured data: it must be a dictionary or map, while extra doesn't.

In Sentry's UI (sentry.io), all that is added as event.extra will be under "Additional Data" while event.context will have its own section reflecting its name.

A short explanation of when to use Context, Tags or Extra:

  • if you need to use the information in filters or searches in Sentry: use Tags

  • if the information is longer and is not used for searches: use Context.

  • avoid using Extra

Data leaking locations in the event payload

Given examples of client/server-side exceptions and transaction events, let's investigate where personal information could be leaked. Sentry mentions that sensitive data may appear in the following areas.

Stacktraces

PHP and Python expose variable values in their stack traces. By variable values is meant, instead of exposing the static code implementation (function connectDB(process.env.ADMIN_USER)), the actual runtime value is exposed (function connectDB("Zeus")).

An example of this is the PDO statement in PHP, which is a built-in feature for connecting to a database. This statement throws the following stack trace, exposing secrets.

PDOException: SQLSTATE[HY000] [2002] No such file or directory in /var/www/html/test.php:3
Stack trace:
#0 /var/www/html/test.php(3): PDO->__construct('mysql:host=loca...', 'Zeus', 'mySecretPassword')
#1 {main}

Node.js does not expose variable values, which means we don't need to worry about data scrubbing in the stacktrace.

Nevertheless, personal information may still end up in the error message.

Stacktraces are accessible in the stacktrace property, which can be found in the exception property

It may be a good idea to apply data scrubbing to the whole exception object if personal information leaks are a concern.

{
    exception: {
        values: [
            {
                stacktrace: {
                    frames: [
                        // ...
                    ],
                },
            },
        ],
    },
}

Breadcrumbs

Breadcrumbs, traces of how an error occurred, may pick up sensitive data. Sentry strongly recommends disabling breadcrumbs if you're logging PII in your codebase (e.g., console.log(user.email) which yields "john@example.com" in the console) as breadcrumbs may pick these up and send them to Sentry.

As an example, we're modifying the /pages/sentry_sample_error.js like the following:

export function SentrySampleError() {
    return (
        <div>
            <button
                onClick={() => {
                    const userId = window.localStorage.getItem("userId");
                    console.log("Current user id: " + userId); // logging PII
                    throw new Error("Sentry Frontend Error");
                }}
            >
                Throw error
            </button>
        </div>
    );
}

The exception event shows that the variable's value is exposed in the breadcrumbs.

{
    breadcrumbs: [
        // ...,
        {
            timestamp: 1669900269.379,
            category: "console",
            data: {
                arguments: ["Current user id: 271829"],
                logger: "console",
            },
            level: "log",
            message: "Current user id: 271829",
        },
    ];
}

Basically, breadcrumbs may leak variable values, as mentioned with PHP stacktraces.

User context

When an exception is sent to Sentry, it is possible to pass additional user data like username, userId, etc. to the exception, facilitating error tracking. This user can then be found inside the event.user property.

One should be careful using Sentry.setUser({ email: "john.doe@example.com" }); to attach user data to events. In our case, we remove event.user as we see privacy as more important than debugging context.

HTTP context

The Http context contains the HTTP request information. The implementation differs between the client and server sides.

Client side, the HTTP context may consist of the user-agent, referrer, and other headers. This information is used to catalogue and tag events with specific OS, browser and version information. This can be configured with the @sentry/browser package and HttpContext integration:

import * as Sentry from "@sentry/browser";

Sentry.init({
    // ...
    integrations: [
        new Sentry.Integrations.HttpContext({
            // ...
        }),
    ],
});

The Server-side HTTP context wraps the http and https modules to capture all network requests either as breadcrumbs and/or tracing spans. This can be configured with the @sentry/nextjs package and Http integration:

import * as Sentry from "@sentry/nextjs"; // or @sentry/nodejs, ...

Sentry.init({
    // ...
    integrations: [
        new Sentry.Integrations.Http({
            // ...
        }),
    ],
});

The Http and HttpContext can be found in the request property of the event object.

{
    request: {}
}

Transactions

A transaction represents a single instance of an activity you want to measure or track, like a page load, page navigation, or an asynchronous task. Having transaction information lets you monitor the overall performance of your application beyond when it crashes or generates an error.

Sometimes transaction names contain sensitive information. A browser's page load transaction might have the raw path, like /users/272819/details as its name. Most of the time, sentry's server-side data scrubbing parameterizes URLs, that is, turning users/272819/details into users/:userid/details.

Sentry doesn't mention during which case they fail to parameterize URLs. One should therefore consider transactions vulnerable to personal information leaks.

Transactions are available on the event.spans object. { spans: [] }

If personal information ended up in transactions on Sentry's server, then you need to contact their support to have them deleted. This is because ClickHouse is Sentry's underlying data store for transactions, which is designed for immutable data. Sentry support can help with limited mutability, as described in this Blog post.

Is that all?

After speaking with Sentry's support, they informed me that the list is not exhaustive.

The list doesn't include custom information which an SDK developer may add to the event payload:

Scrubbing Data

Personal information may appear throughout the whole event payload.

If the payload does not contain any custom added information, then you should be fine filtering on:

  • event.spans

  • event.exception

  • event.breadcrumbs

  • event.request

  • event.user

Just remember to check if the property exists before filtering.

However, if there is the possibility that a developer will add personal information to the payload, then you might better filter the whole event object. Only required fields should be excluded from filtering.

We could traverse through all nested values, check for a blacklist of keys, and remove the whole key/value in case of a match. Keys that are not blacklisted will be subject to value filtering, removing any sensitive information from the values.

The following code is an example of such a traverse method. Here, the scrubData

  • takes a large object & clones it

  • removes blacklisted keys

  • for non-blacklisted keys, their values are run through a callback function, replacing the initial value with a returned value from the callback.

  • finally, return the whole filtered object

import { normalize } from "@sentry/utils";

/**
 * Scrub sensitive data out from Objects.
 * Filters throught both keys and values.
 * - deleting a key/value if the keyname is found in a blacklist
 * - applies a callback function on primitive values, replacing the original value
 *
 * @param {object} initObj Object to scrub data from
 * @param {array<string | regex> | null} keyBlacklist Array of blacklisted object keys
 * @param {function(primitive) => string | undefined} scrubFromValueCallback A function that filters through primitive values
 * @returns {object} Filtered object
 */
function scrubData(initObj, keyBlacklist, scrubFromValueCallback) {
    let eventObj;
    try {
        // clone and serialize the object, removing functions and other non JSON-conform values
        eventObj = JSON.parse(JSON.stringify(initObj));
    } catch (error) {
        // if clone failed due to non-serializable types (circular reference), use sentry's
        // internal cloning tool with a performance impact
        eventObj = normalize(initObj);
    }

    // Traverse through all nested objects and arrays until a primitive value is found (aka, no more nested values)
    // Modifies directly on the cloned object
    function traverseAndFilter(obj) {
        for (const key in obj) {
            if (obj.hasOwnProperty(key)) {
                if (typeof obj[key] === "object" && obj[key] !== null) {
                    // The current property is an object or an array,
                    // so we need to traverse it recursively
                    traverseAndFilter(obj[key]);
                } else if (isJsonObject(obj[key])) {
                    // The current property is a JSON object
                    // Parse it, apply scrubbing and serialize it again
                    const parsedJSON = JSON.parse(obj[key]);
                    traverseAndFilter(parsedJSON); // scrub directly on parsedJSON since it's already a copy and not reference
                    obj[key] = JSON.stringify(parsedJSON);
                } else {
                    // The current property has a primitive value - no more nested values
                    // Do work on this property
                    const isBlacklist =
                        keyBlacklist?.some((reg) => reg.test(key)) || false;
                    if (isBlacklist) {
                        // key has a blacklisted word. Delete value
                        delete obj[key];
                    } else {
                        // Key not blacklisted, do stuff on the value
                        if (scrubFromValueCallback) {
                            // Run the callback function over the value
                            obj[key] = scrubFromValueCallback(obj[key]);
                        }
                    }
                }
            }
        }
    }

    traverseAndFilter(eventObj);

    return eventObj; // return cloned and modified object
}

/**
 * Check if valid JSON object or array
 * Fails for any other valid json structure like null, false, true,...
 * Fails when single quotes are used instead of double quotes
 * @see https://stackoverflow.com/a/32278428
 * @param {string} str
 * @returns {boolean}
 */
function isJsonObject(str) {
    try {
        const parsedJSON = JSON.parse(str);
        return (
            !!(parsedJSON && str) &&
            (parsedJSON instanceof Array || parsedJSON instanceof Object
                ? true
                : false)
        );
    } catch (e) {
        return false;
    }
}

// EXAMPLE
const obj = {
    hi: "there",
    jessy: [
        {
            she: "cool",
            why: [
                {
                    go: "there",
                    banana: ["big", "small", 123],
                    socialNumber: "asd3110729999asd",
                },
            ],
        },
    ],
};

const newObj = scrubData(obj, [/socialNumber/], (x) => "foo");
console.log(newObj);
// ^ Filtered object
// {
//     hi: "foo",
//     jessy: [
//         {
//             she: "foo",
//             why: [
//                 {
//                     go: "foo",
//                     banana: ["foo", "foo", "foo"],
//                 },
//             ],
//         },
//     ],
// };

With this helper function, we can now scrub personal information from the events in the beforeSend and beforeSendTransaction hook, excluding the required fields and sdkProcessingMetadata.

const keyBlacklist = [
    // some key's are already blacklisted by Sentry's Server side Scrubbing
    // https://docs.sentry.io/product/data-management-settings/scrubbing/server-side-scrubbing/
    "ssn",
    "sociaNumber",
    "address",
    "email",
    "firstName",
    "lastName",
].map((word) => new RegExp(`^.*?${word}.*$`));

const ssnRegex = /((?!(000|666|9))\d{3}-(?!00)\d{2}-(?!0000)\d{4})/g;

const filterCallback = (val) => {
    // String
    if (typeof val === "string") {
        // replace part of the text if it finds a social security number
        val = val.replaceAll(ssnRegex, "__SSN_CENSORED__");

        // replace values that may have leaking urls
        if (val.includes("/password-reset?email=")) {
            val = "REDACTED";
        }
    }

    // Number
    if (typeof val === "number") {
        const num = num.toString(); // convert to string so that we can apply modifications

        num = num.replaceAll(ssnRegex, 0); // empty if match

        val = Number(num); // convert back to number
    }

    return val;
};

/**
 * Combine the processing of 'beforeSend' and 'beforeTransaction'
 * @param {Event} event Sentry Event object available in the beforeSend hook
 * @returns {Event} Modified event with scrubbed data
 */
const eventHandler = (event) => {
    // separate required fields from the rest
    const {
        event_id,
        timestamp,
        platform,
        user,
        // temporary data storage for the SDK's event processing pipeline that is not sent to Sentry
        // of unserializable object type https://nodejs.org/api/http.html#class-httpincomingmessage containing circular references
        sdkProcessingMetadata,
        ...eventPayload
    } = event;

    const filteredEvent = scrubData(eventPayload, keyBlacklist, scrubCallback);

    return {
        ...filteredEvent,
        event_id,
        timestamp,
        platform,
        ...(sdkProcessingMetadata ? { sdkProcessingMetadata } : {}), // reattach only when value actually exists
        // excluding 'user' as we want to remove any user data
    };
};

Sentry.init({
    // ...
    beforeSend: eventHandler,
    beforeSendTransaction: eventHandler,
});

In the above example, we

  • extract required fields + sdkProcessingMetadata from the original event payload

  • remove any properties that contain one of the blacklisted strings

  • replace any value that may look like a social security number with a regex

  • scanned the application for URL query parameters and handle those in our scrubbing callback

  • return the modified event with the required fields and sdkProcessingMetadata if available

2.2 - Final thoughts

Sentry is a simple solution to implement compared to its competitors, which often have a wide range of monitoring products to choose from.

However, I found the documentation for Sentry to be confusing, overlapping and lacking. This made it difficult for me to understand how to set up and use the data scrubbing filters, and it took me longer than I would have liked to get things working.

That being said, I'm hopeful that more comprehensive guides will improve the developer experience and make Sentry a good choice.

Feedback from Sentry

I reached out to Sentry to provide a comment on this blog post. Their response was:

Some of the features might change in the future, and parts of this post might become invalid at some point.

One of them is that we plan to add a location option for SaaS users, meaning European organisations operating in Europe will be able to have the information stored and processed in Europe. This requires big changes to our infrastructure and the current ETA is simply: at some point in 2023 (and still subject to change).

The other is regarding the phrase “Node.js does not expose variable values”, which is a feature the team is trying to work on for future releases.

European servers are good news. Regarding the second feedback, I raised my concerns about making this feature an opt-in option rather than an opt-out option.

Additional Notes

After using this approach now for two months in an enterprise application, here are some important takeaways:

  • When writing regular expressions, it's important to verify their browser support since the scrubbing occurs both on your server and the client's browser. For instance, Safari does not support lookahead or lookbehind regexes, which created 8k error reports in just 2 weeks (source: Related StackOverflow).

  • The JSON event payload that you can inspect in Sentry's dashboard differs from the event payload sent from the SDK.

    Some properties are created on Sentry's side by inferring them from other properties. For example, in the JSON available on Sentry, theevent.title is inferred from the SDK event.exception.values[0].value. This is essential information for debugging for debugging Sentry, not application, specific features or exceptions

  • The ignoreError SDK configuration is a setting that allows you to discard events in the SDK. This is useful because it means that these events won't count towards your monthly quota (for example, the Team Plan has a limit of 50k errors per month). However, it's important to note that this setting only applies to the event.message property, which often is empty and filled in later on Sentry's side (hindsight: previous bullet point about inferring values). It's often better to create your own ignoreErrors implementation within the beforeSend hook, for example, apply "error ignoring" on the event.exception.values[0].value property.

  • With scrubbing data in Sentry's SDK to remove personal information, we also lose a lot of traceability. For example, if a customer named ABC reports an issue to customer service, we don't have any way to link the error captured by Sentry to that specific customer. To address this issue, one common approach is to anonymize customers by creating an internal lookup table. This table would contain a random uuid for each customer. When an event is captured by Sentry, this random uuid can be added to the event payload. Then, we can search our internal lookup table to determine which customer was affected, without exposing any personally identifiable information.