Manipulate Gitlab security reports programatically
The Problem¶
Gitlab offers templates for security scans, such as secret detection and static analysis. Unfortunately, it doesn’t offer any way to see the discovered defects in the UI, unless you’re paying for the Ultimate tier. You only get the report files which you can manually download from the pipelines page.
And that is not helpful at all. In fact, this whole approach strikes me as wrong to begin with, but more on that later. For now, the problem is as follows: we want to download and process those reports programatically.
If you’ve used Gitlab templates before, then you likely know these report files are artifacts. However, they’re not normal artifacts, they’re special, namespaced artifacts, such as artifacts:reports:sast.
This makes it impossible to:
-
use/override before_script, script or after_script: artifacts are the last thing to run in a job [an undocumented fact, BTW], so you might think it makes sense to process the report in one of those phases. You’d be wrong, since overriding any of them leads to Gitlab just removing the report and cloning your repo, even if you set GIT_STRATEGY to none, presumably because it "re-uses the local working copy" and obviously a scanner needs files to scan
-
do anything with them in any other job: Gitlab will remove them before cloning your repository
-
download them via the API: if you look at the artifact API docs, it can deal with artifacts just fine - except these special, namespaced ones, which naturally isn’t documented either
The Solution¶
As you might imagine, I’m not the first to want to do something like this, which lead me to finding the bug report for this use case.
The workaround given on that issue linked above is as follows: you can override the paths argument for the artifact section, which will make it behave as if it was part of your project’s repository.
Something like this:
secret_detection:
stage: test
variables:
SECRET_DETECTION_HISTORIC_SCAN: "false"
artifacts:
reports:
secret_detection: gl-secret-detection-report.json
paths:
- gl-secret-detection-report.json
rules:
- if: $CI_COMMIT_BRANCH == "master" || $CI_COMMIT_BRANCH == "develop"
Then you may finally create a job to process the reports:
process-reports:
stage: do-something-with-reports
script:
- ls -l .
- cat gl-secret-detection-report.json
- cat gl-sast-report.json | tail -n 5
As you’ll see in the job output, the reports are now all hanging out with the rest of the files in your repository. Great, right? Right. If that’s all you need. But here’s the thing. The SAST template isn’t just a single tool template: it detects what’s appropriate for your repository, and generates as many reports using as many tools as it thinks are needed. In my case, it used Semgrep and NodeJsScan [which is Gitlab’s wrapper around njsscan] to generate two separate reports.
Unfortunately, it also names both the Semgrep and the NodeJsScan report the same: gl-sast-report.json. This isn’t a problem on the pipelines page, as they add the scanner’s name to distinguish the reports (nodejs-scan-sast:sast, semgrep-sast:sast). But when you try to download the files? You’ll overwrite the previous one. And when you perform the above artifacts:paths trick, the same problem applies. It writes one file. Which one? Whichever finished last, I’d assume. You can see how this would be problematic when you need to automate this process.
In conclusion, this approach works only if you want a single tool’s SAST report.
If you want multiple, you’ll have to implement the scans yourself.
As hinted at in the opening paragraph, I consider this the better approach. Even if you’re paying for Ultimate - what if you don’t want to use the Security dashboard because you already have some dashboard solution of your own and you’d very much like it all available in one place?
There’s one additional benefit: now you control what happens depending on the scan result. Found a secret? Pipeline fails. Found a critical vulnerability? Pipeline fails. Found a secret, but it’s really your local dev Docker setup using default Postgres credentials? Who cares, pass.