gitlab-runner在实践中理解

以前在 Mac、windows 上使用过 gitlab 的 pipeline,安装配置过 gitlab-runner。现在需要在 Linux 上再走一遍流程,一开始觉得应该挺容易的,流程比较熟悉,和其他平台大致相似吧,应该很快就搞好了。然而,实际操作时,发现有些不同,出现踩坑的原因也有环境的限制(部分网络无法访问),也有对 gitlab-runner 机制的不理解。现在整理一下遇到的问题吧。

背景

  • 环境网络设备特殊,很多要求权限的步骤,在普通用户下无法直接使用 sudo,只能在 root 账户下操作
  • 一台公共的 Linux 设备,通过不同的个人账号登录
  • 配置的 pipeline 中有调用 python 脚本,后者里面有拉取多个工程仓库代码
  • 官方操作指导文档,亦有小问题,混用 install gitlab-runner 与启动方式

账户差异导致的问题

代码仓无法下载、访问

如果是手动登录 Linux 机器,进行 git 克隆代码仓库,无法下载,出现如下报错,肯定第一时间就知道是没权限等原因,一般都去排查一下 ssh 秘钥有没配置、是否需要更新。

1
2
3
4
5
git clone git@qcp-gitlab.xxx.xyz:abc/abcdef.git
Cloning into 'abcdef'...
GitLab: The project you were looking for could not be found.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights

但是,如果是通过 gitlab-runner、jenkins agent 等工具在机器上执行拉取仓库出现同样的问题时,可能让人会有疑惑。

ssh 秘钥对应不同的 gitlab 账户,root账户使用的SSH密钥(/root/.ssh/id_rsa.pub)和普通用户的SSH密钥对应的是不同的GitLab账户,而root对应的GitLab账户可能没有该仓库的访问权限。

需要注意的是:在普通用户下配置了自己账户的秘钥,而 root 对应的秘钥也需要配置,同时将用户账号加入到项目成员中。

检查 ssh 连接的 gitlab 的用户身份:

1
2
3
4
5
6
7
# root 账户下,显示的是机器定制配置的名字
ssh -T git@qcp-gitlab.xxxx.xyz
Welcome to GitLab, qcpmaster!

# autotest 账户下,会显示用户的名字
ssh -T git@qcp-gitlab.xxxx.xyz
Welcome to GitLab, 周星星!

目录或文件无法访问

在安装 gitlab-runner 时,有的指导文档会使用 --user=gitlab-runner 参数,指定用户来安装,导致后续启动、执行时也是该用户。

该用户可能没有权限访问很多目录,导致在执行 pipeline 遇到创建目录或者文件时,就会提示 no permission。

解决办法

对于代码仓无法下载

  • 如果是账户没有配置秘钥,那么只要切换到对应用户环境,生成公钥,配置到 gitlab 上
  • 在 gitlab 项目中,给执行环境的用户(比如这里的qcpmaster)添加为成员(至少 DeveloperReporter),让用户可以访问该仓库代码

对于目录或文件无法访问

  • 给 gitlab-runner 用户必要的权限,比如特定目录访问权限(pipeline中配置的)

    1
    2
    3
    # root账户下,或者加上sudo执行
    chown -R gitlab-runner:gitlab-runner /home/tools/gitlab-runner
    chmod -R 755 /home/tools/gitlab-runner
  • 确保 gitlab 构建缓存目录有权限

    1
    sudo chown -R gitlab-runner:gitlab-runner /home/gitlab-runner

启动方式导致的问题

重要提示:不要混用 gitlab-runner start/stopsystemctl 命令!

学到了

1
2
3
4
5
6
7
8
9
# ✅ 正确方式(通过 systemd 管理)
sudo systemctl start gitlab-runner
sudo systemctl stop gitlab-runner
sudo systemctl restart gitlab-runner
sudo systemctl status gitlab-runner

# ❌ 不好的方式(直接命令,不通过 systemd)
gitlab-runner start
gitlab-runner stop

问题复述

runner 明明已经启动了,但是 gitlab 上的 job 一直处于 pending 状态,于是去查看 runner 状态,发现服务启动失败了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(base) [root@dgvxl2905 tools]# gitlab-runner status
Runtime platform arch=amd64 os=linux pid=26119 revision=139a0ac0 version=18.4.0
gitlab-runner: Service is running
(base) [root@dgvxl2905 tools]# gitlab-runner stop
Runtime platform arch=amd64 os=linux pid=26150 revision=139a0ac0 version=18.4.0
(base) [root@dgvxl2905 tools]# gitlab-runner start
Runtime platform arch=amd64 os=linux pid=26205 revision=139a0ac0 version=18.4.0
(base) [root@dgvxl2905 tools]#
(base) [root@dgvxl2905 tools]# systemctl status gitlab-runner
● gitlab-runner.service - GitLab Runner
Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2025-09-30 16:53:57 CST; 5s ago
Process: 26218 ExecStart=/usr/local/bin/gitlab-runner run --config /etc/gitlab-runner/config.toml --working-directory /home/tools/gitlab-runner --service gitlab-runner --user gitlab-runner (code=exited, status=1/FAILURE)
Main PID: 26218 (code=exited, status=1/FAILURE)
Sep 30 16:53:57 dgvxl2905 systemd[1]: Unit gitlab-runner.service entered failed state.
Sep 30 16:53:57 dgvxl2905 systemd[1]: gitlab-runner.service failed.

原因:多次启动了 gitlab-runner 服务,配置文件混乱,有的 runner 已经从 gitlab 页面上删除,但是机器上还保留着安装时对应的 toml 配置文件,导致每次使用 gitlab-runner start 启动都是旧的配置,看起来是在 running,但服务又 failed。

解决

1
2
3
4
5
6
7
8
9
10
11
12
13
# 以下均在 root 下执行,不再添加 sudo 了
# 1. 停止所有进程
pkill -9 gitlab-runner
# 2. 检查并修复用户
id gitlab-runner || useradd --system --shell /bin/bash --home /home/gitlab-runner gitlab-runner
# 3. 修复工作目录,防止文件夹不存在
mkdir -p /home/tools/gitlab-runner
chown -R gitlab-runner:gitlab-runner /home/tools/gitlab-runner
# 使用 systemd 启动服务,而不是直接命令
gitlab-runner verify
systemctl daemon-reload
systemctl start gitlab-runner
systemctl status gitlab-runner

记录的服务启动失败,到解决并成功启动的日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
(base) [root@dgvxl2905 tools]# systemctl status gitlab-runner
● gitlab-runner.service - GitLab Runner
Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; vendor preset: disabled)
Active: inactive (dead) (Result: exit-code) since Tue 2025-09-30 17:01:28 CST; 12s ago
Process: 28734 ExecStart=/usr/local/bin/gitlab-runner run --config /etc/gitlab-runner/config.toml --working-directory /home/tools/gitlab-runner --service gitlab-runner --user gitlab-runner (code=exited, status=1/FAILURE)
Main PID: 28734 (code=exited, status=1/FAILURE)

Sep 30 16:59:56 dgvxl2905 systemd[1]: Unit gitlab-runner.service entered failed state.
Sep 30 16:59:56 dgvxl2905 systemd[1]: gitlab-runner.service failed.
Sep 30 17:01:28 dgvxl2905 systemd[1]: Stopped GitLab Runner.
(base) [root@dgvxl2905 tools]# pkill -9 gitlab-runner
(base) [root@dgvxl2905 tools]# id gitlab-runner || sudo useradd --system --shell /bin/bash --home /home/gitlab-runner gitlab-runner
uid=5993(gitlab-runner) gid=5993(gitlab-runner) groups=5993(gitlab-runner)
(base) [root@dgvxl2905 tools]# mkdir -p /home/tools/gitlab-runner
(base) [root@dgvxl2905 tools]# chown -R gitlab-runner:gitlab-runner /home/tools/gitlab-runner
(base) [root@dgvxl2905 tools]# ls /home/tools/gitlab-runner/
(base) [root@dgvxl2905 tools]# cat /etc/gitlab-runner/config.toml
concurrent = 1
check_interval = 0
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
session_timeout = 1800

[[runners]]
name = "build aws-doc"
url = "https://gitlab.vmic.xyz/"
id = 3347
token = "_pzPmGi-Nb3WByU1uS1E"
token_obtained_at = 2025-09-30T08:31:35Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "shell"
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
(base) [root@dgvxl2905 tools]# gitlab-runner verify
Runtime platform arch=amd64 os=linux pid=30312 revision=139a0ac0 version=18.4.0
Running in system-mode.

Verifying runner... is alive correlation_id=01K6CXRD9MN5Q0VH6TCFAFAEDV runner=_pzPmGi-N
(base) [root@dgvxl2905 tools]# systemctl status gitlab-runner
● gitlab-runner.service - GitLab Runner
Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; vendor preset: disabled)
Active: inactive (dead) (Result: exit-code) since Tue 2025-09-30 17:01:28 CST; 1min 36s ago
Process: 28734 ExecStart=/usr/local/bin/gitlab-runner run --config /etc/gitlab-runner/config.toml --working-directory /home/tools/gitlab-runner --service gitlab-runner --user gitlab-runner (code=exited, status=1/FAILURE)
Main PID: 28734 (code=exited, status=1/FAILURE)

Sep 30 16:59:56 dgvxl2905 systemd[1]: Unit gitlab-runner.service entered failed state.
Sep 30 16:59:56 dgvxl2905 systemd[1]: gitlab-runner.service failed.
Sep 30 17:01:28 dgvxl2905 systemd[1]: Stopped GitLab Runner.
(base) [root@dgvxl2905 tools]# systemctl daemon-reload
(base) [root@dgvxl2905 tools]# systemctl start gitlab-runner
(base) [root@dgvxl2905 tools]# systemctl status gitlab-runner
● gitlab-runner.service - GitLab Runner
Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2025-09-30 17:03:17 CST; 22s ago
Main PID: 30494 (gitlab-runner)
Tasks: 13
Memory: 22.7M
CGroup: /system.slice/gitlab-runner.service
└─30494 /usr/local/bin/gitlab-runner run --config /etc/gitlab-runner/config.toml --working-directory /home/tools/gitlab-runner --service gitlab-runner --user gitlab-runner

Sep 30 17:03:18 dgvxl2905 su[30521]: (to gitlab-runner) root on none
Sep 30 17:03:18 dgvxl2905 su[30549]: (to gitlab-runner) root on none
Sep 30 17:03:20 dgvxl2905 su[30609]: (to gitlab-runner) root on none
Sep 30 17:03:20 dgvxl2905 su[30637]: (to gitlab-runner) root on none
Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: WARNING: Job failed: exit status 1
Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: duration_s=0 job=508688 project=61926 runner=_pzPmGi-N
Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: Appending trace to coordinator...ok code=202 correlation_id=01K6CXSBK54JNYV0JKE1GSSV9K job=508688 job-log=0-1504 job-status=ru...e-interval=3s
Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: Updating job... bytesize=1504 checksum=crc32:268fa141 job=508688 runner=_pzPmGi-N
Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: Submitting job to coordinator...ok bytesize=1504 checksum=crc32:268fa141 code=200 correlation_id=01K6CXSBPW8M8CX40YVG25DN7X j...e-interval=0s
Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: Removed job from processing list builds=0 job=508688 max_builds=1 project=61926 queue_depth=0 queue_size=0 repo_url=https:/...eue_seconds=0
Hint: Some lines were ellipsized, use -l to show in full.

小结

提炼的安装、启动方式

1
2
3
4
5
6
7
8
9
# 下载、赋予执行权限
curl -L --output /usr/local/bin/gitlab-runner "https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/binaries/gitlab-runner-linux-amd64"
chmod +x /usr/local/bin/gitlab-runner
# 创建用户
id gitlab-runner || useradd --comment 'GitLab Runner' --create-home gitlab-runner --shell /bin/bash
# 这一步创建了 systemd 服务!此时 systemd 知道了 gitlab-runner 服务的存在
gitlab-runner install --user=gitlab-runner --working-directory=/home/tools/gitlab-runner
# 此时最好不要用 gitlab-runner start 启动,而是用 systemd
systemctl start gitlab-runner

备注:Install GitLab Runner manually on GNU/Linux | GitLab Docs 官方指导中,先用了 gitlab-runner install,后用了 gitlab-runner start 是不好的。

安全考虑

  1. 安装阶段:需要 root 权限(使用 sudo
  2. 运行阶段:应该使用专用的普通用户(默认是 gitlab-runner 用户)
  3. 注册:使用 sudo 注册(配置文件需要写入系统目录)

gitlab-runner 用户

报错

1
2
$ set -euo pipefail # collapsed multi-line command
error: could not lock config file /home/gitlab-runner/.gitconfig: No such file or directory,
1
mkdir: cannot create directory ‘/home/autotest’: Permission denied

原因

  • Git 配置文件权限问题。GitLab Runner 在执行任务时,以 gitlab-runner 用户运行,但该用户的家目录配置不完整
  • .gitlab-ci.yml 配置上,gitlab-runner 用户没有权限创建或访问 /home/autotest 目录及其子目录。

解决

创建并修复 gitlab-runner 家目录。

1
2
3
4
5
sudo mkdir -p /home/gitlab-runner
sudo chown -R gitlab-runner:gitlab-runner /home/gitlab-runner
sudo chmod 755 /home/gitlab-runner
sudo -u gitlab-runner touch /home/gitlab-runner/.gitconfig
sudo chmod 644 /home/gitlab-runner/.gitconfig

yml 中使用 CI 环境

1
2
3
4
variables:
# 使用 CI 工作目录或共享目录
TARGET_BUILD_DIR: "${CI_PROJECT_DIR}/doc/build"
LOG_DIR: "${CI_PROJECT_DIR}/logs"

给 gitlab-runner 用户访问权限(如果必须使用 /home/autotest 路径)

1
2
3
4
5
6
7
8
9
10
# 方法 A:将 gitlab-runner 加入 autotest 组
sudo usermod -aG autotest gitlab-runner

# 方法 B: 设置目录权限,允许组成员访问
sudo chmod 755 /home/autotest
sudo chmod -R 755 /home/autotest/tools
sudo chmod -R 755 /home/autotest/vlt

# 重启 gitlab-runner 使组权限生效
sudo systemctl restart gitlab-runner

755 的含义:

  • 7 (owner): 读+写+执行
  • 5 (group): 读+执行
  • 5 (others): 读+执行 ← 这是关键,任何用户(包括 gitlab-runner)都可以读取和访问这些目录,不需要加入任何组

代码仓库权限

错误信息:

  • Host key verification failed - 缺少 known_hosts
  • Could not read from remote repository - 缺少 SSH 密钥或权限

问题分析

  1. 手动可以克隆:用的是 root 或 autotest 用户,有配置好的 SSH 密钥
  2. Pipeline 失败:gitlab-runner 用户没有 SSH 密钥,也没有 GitLab 服务器的 host key

最佳解决:

直接切换到 gitlab-runner 用户,ssh-keygen 生成并配置秘钥到仓库。

gitlab runner 日志

对于通过 systemd 管理的 Runner

1
2
3
4
5
6
7
8
9
10
# 查看实时日志
sudo journalctl -u gitlab-runner -f
# 查看最近的日志
sudo journalctl -u gitlab-runner -n 100
# 查看指定时间范围的日志
sudo journalctl -u gitlab-runner --since "2024-09-30 10:00:00" --until "2024-09-30 12:00:00"
# 查看今天的日志
sudo journalctl -u gitlab-runner --since today
# 只看错误级别日志
sudo journalctl -u gitlab-runner -p err

检查 Runner 状态

1
2
3
4
5
6
7
8
9
10
11
12
# 查看 Runner 服务状态
sudo systemctl status gitlab-runner

# 查看 Runner 版本和配置
sudo gitlab-runner --version
sudo gitlab-runner verify
# 查看配置文件内容、文件权限
cat /etc/gitlab-runner/config.toml
ls -la /etc/gitlab-runner/config.toml

# 列出所有注册的 Runner
sudo gitlab-runner list