以前在 Mac、windows 上使用过 gitlab 的 pipeline,安装配置过 gitlab-runner。现在需要在 Linux 上再走一遍流程,一开始觉得应该挺容易的,流程比较熟悉,和其他平台大致相似吧,应该很快就搞好了。然而,实际操作时,发现有些不同,出现踩坑的原因也有环境的限制(部分网络无法访问),也有对 gitlab-runner 机制的不理解。现在整理一下遇到的问题吧。
背景
- 环境网络设备特殊,很多要求权限的步骤,在普通用户下无法直接使用 sudo,只能在 root 账户下操作
- 一台公共的 Linux 设备,通过不同的个人账号登录
- 配置的 pipeline 中有调用 python 脚本,后者里面有拉取多个工程仓库代码
- 官方操作指导文档,亦有小问题,混用 install gitlab-runner 与启动方式
账户差异导致的问题
代码仓无法下载、访问
如果是手动登录 Linux 机器,进行 git 克隆代码仓库,无法下载,出现如下报错,肯定第一时间就知道是没权限等原因,一般都去排查一下 ssh 秘钥有没配置、是否需要更新。
1 2 3 4 5
| git clone git@qcp-gitlab.xxx.xyz:abc/abcdef.git Cloning into 'abcdef'... GitLab: The project you were looking for could not be found. fatal: Could not read from remote repository. Please make sure you have the correct access rights
|
但是,如果是通过 gitlab-runner、jenkins agent 等工具在机器上执行拉取仓库出现同样的问题时,可能让人会有疑惑。
ssh 秘钥对应不同的 gitlab 账户,root账户使用的SSH密钥(/root/.ssh/id_rsa.pub)和普通用户的SSH密钥对应的是不同的GitLab账户,而root对应的GitLab账户可能没有该仓库的访问权限。
需要注意的是:在普通用户下配置了自己账户的秘钥,而 root 对应的秘钥也需要配置,同时将用户账号加入到项目成员中。
检查 ssh 连接的 gitlab 的用户身份:
1 2 3 4 5 6 7
| # root 账户下,显示的是机器定制配置的名字 ssh -T git@qcp-gitlab.xxxx.xyz Welcome to GitLab, qcpmaster!
# autotest 账户下,会显示用户的名字 ssh -T git@qcp-gitlab.xxxx.xyz Welcome to GitLab, 周星星!
|
目录或文件无法访问
在安装 gitlab-runner 时,有的指导文档会使用 --user=gitlab-runner 参数,指定用户来安装,导致后续启动、执行时也是该用户。
该用户可能没有权限访问很多目录,导致在执行 pipeline 遇到创建目录或者文件时,就会提示 no permission。
解决办法
对于代码仓无法下载
- 如果是账户没有配置秘钥,那么只要切换到对应用户环境,生成公钥,配置到 gitlab 上
- 在 gitlab 项目中,给执行环境的用户(比如这里的qcpmaster)添加为成员(至少 Developer 或 Reporter),让用户可以访问该仓库代码
对于目录或文件无法访问
启动方式导致的问题
重要提示:不要混用 gitlab-runner start/stop 和 systemctl 命令!
学到了
1 2 3 4 5 6 7 8 9
| sudo systemctl start gitlab-runner sudo systemctl stop gitlab-runner sudo systemctl restart gitlab-runner sudo systemctl status gitlab-runner
gitlab-runner start gitlab-runner stop
|
问题复述
runner 明明已经启动了,但是 gitlab 上的 job 一直处于 pending 状态,于是去查看 runner 状态,发现服务启动失败了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| (base) [root@dgvxl2905 tools]# gitlab-runner status Runtime platform arch=amd64 os=linux pid=26119 revision=139a0ac0 version=18.4.0 gitlab-runner: Service is running (base) [root@dgvxl2905 tools]# gitlab-runner stop Runtime platform arch=amd64 os=linux pid=26150 revision=139a0ac0 version=18.4.0 (base) [root@dgvxl2905 tools]# gitlab-runner start Runtime platform arch=amd64 os=linux pid=26205 revision=139a0ac0 version=18.4.0 (base) [root@dgvxl2905 tools]# (base) [root@dgvxl2905 tools]# systemctl status gitlab-runner ● gitlab-runner.service - GitLab Runner Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; vendor preset: disabled) Active: activating (auto-restart) (Result: exit-code) since Tue 2025-09-30 16:53:57 CST; 5s ago Process: 26218 ExecStart=/usr/local/bin/gitlab-runner run --config /etc/gitlab-runner/config.toml --working-directory /home/tools/gitlab-runner --service gitlab-runner --user gitlab-runner (code=exited, status=1/FAILURE) Main PID: 26218 (code=exited, status=1/FAILURE) Sep 30 16:53:57 dgvxl2905 systemd[1]: Unit gitlab-runner.service entered failed state. Sep 30 16:53:57 dgvxl2905 systemd[1]: gitlab-runner.service failed.
|
原因:多次启动了 gitlab-runner 服务,配置文件混乱,有的 runner 已经从 gitlab 页面上删除,但是机器上还保留着安装时对应的 toml 配置文件,导致每次使用 gitlab-runner start 启动都是旧的配置,看起来是在 running,但服务又 failed。
解决
1 2 3 4 5 6 7 8 9 10 11 12 13
|
pkill -9 gitlab-runner
id gitlab-runner || useradd --system --shell /bin/bash --home /home/gitlab-runner gitlab-runner
mkdir -p /home/tools/gitlab-runner chown -R gitlab-runner:gitlab-runner /home/tools/gitlab-runner
gitlab-runner verify systemctl daemon-reload systemctl start gitlab-runner systemctl status gitlab-runner
|
记录的服务启动失败,到解决并成功启动的日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
| (base) [root@dgvxl2905 tools]# systemctl status gitlab-runner ● gitlab-runner.service - GitLab Runner Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; vendor preset: disabled) Active: inactive (dead) (Result: exit-code) since Tue 2025-09-30 17:01:28 CST; 12s ago Process: 28734 ExecStart=/usr/local/bin/gitlab-runner run --config /etc/gitlab-runner/config.toml --working-directory /home/tools/gitlab-runner --service gitlab-runner --user gitlab-runner (code=exited, status=1/FAILURE) Main PID: 28734 (code=exited, status=1/FAILURE)
Sep 30 16:59:56 dgvxl2905 systemd[1]: Unit gitlab-runner.service entered failed state. Sep 30 16:59:56 dgvxl2905 systemd[1]: gitlab-runner.service failed. Sep 30 17:01:28 dgvxl2905 systemd[1]: Stopped GitLab Runner. (base) [root@dgvxl2905 tools]# pkill -9 gitlab-runner (base) [root@dgvxl2905 tools]# id gitlab-runner || sudo useradd --system --shell /bin/bash --home /home/gitlab-runner gitlab-runner uid=5993(gitlab-runner) gid=5993(gitlab-runner) groups=5993(gitlab-runner) (base) [root@dgvxl2905 tools]# mkdir -p /home/tools/gitlab-runner (base) [root@dgvxl2905 tools]# chown -R gitlab-runner:gitlab-runner /home/tools/gitlab-runner (base) [root@dgvxl2905 tools]# ls /home/tools/gitlab-runner/ (base) [root@dgvxl2905 tools]# cat /etc/gitlab-runner/config.toml concurrent = 1 check_interval = 0 connection_max_age = "15m0s" shutdown_timeout = 0
[session_server] session_timeout = 1800
[[runners]] name = "build aws-doc" url = "https://gitlab.vmic.xyz/" id = 3347 token = "_pzPmGi-Nb3WByU1uS1E" token_obtained_at = 2025-09-30T08:31:35Z token_expires_at = 0001-01-01T00:00:00Z executor = "shell" [runners.cache] MaxUploadedArchiveSize = 0 [runners.cache.s3] [runners.cache.gcs] [runners.cache.azure] (base) [root@dgvxl2905 tools]# gitlab-runner verify Runtime platform arch=amd64 os=linux pid=30312 revision=139a0ac0 version=18.4.0 Running in system-mode.
Verifying runner... is alive correlation_id=01K6CXRD9MN5Q0VH6TCFAFAEDV runner=_pzPmGi-N (base) [root@dgvxl2905 tools]# systemctl status gitlab-runner ● gitlab-runner.service - GitLab Runner Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; vendor preset: disabled) Active: inactive (dead) (Result: exit-code) since Tue 2025-09-30 17:01:28 CST; 1min 36s ago Process: 28734 ExecStart=/usr/local/bin/gitlab-runner run --config /etc/gitlab-runner/config.toml --working-directory /home/tools/gitlab-runner --service gitlab-runner --user gitlab-runner (code=exited, status=1/FAILURE) Main PID: 28734 (code=exited, status=1/FAILURE)
Sep 30 16:59:56 dgvxl2905 systemd[1]: Unit gitlab-runner.service entered failed state. Sep 30 16:59:56 dgvxl2905 systemd[1]: gitlab-runner.service failed. Sep 30 17:01:28 dgvxl2905 systemd[1]: Stopped GitLab Runner. (base) [root@dgvxl2905 tools]# systemctl daemon-reload (base) [root@dgvxl2905 tools]# systemctl start gitlab-runner (base) [root@dgvxl2905 tools]# systemctl status gitlab-runner ● gitlab-runner.service - GitLab Runner Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2025-09-30 17:03:17 CST; 22s ago Main PID: 30494 (gitlab-runner) Tasks: 13 Memory: 22.7M CGroup: /system.slice/gitlab-runner.service └─30494 /usr/local/bin/gitlab-runner run --config /etc/gitlab-runner/config.toml --working-directory /home/tools/gitlab-runner --service gitlab-runner --user gitlab-runner
Sep 30 17:03:18 dgvxl2905 su[30521]: (to gitlab-runner) root on none Sep 30 17:03:18 dgvxl2905 su[30549]: (to gitlab-runner) root on none Sep 30 17:03:20 dgvxl2905 su[30609]: (to gitlab-runner) root on none Sep 30 17:03:20 dgvxl2905 su[30637]: (to gitlab-runner) root on none Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: WARNING: Job failed: exit status 1 Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: duration_s=0 job=508688 project=61926 runner=_pzPmGi-N Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: Appending trace to coordinator...ok code=202 correlation_id=01K6CXSBK54JNYV0JKE1GSSV9K job=508688 job-log=0-1504 job-status=ru...e-interval=3s Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: Updating job... bytesize=1504 checksum=crc32:268fa141 job=508688 runner=_pzPmGi-N Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: Submitting job to coordinator...ok bytesize=1504 checksum=crc32:268fa141 code=200 correlation_id=01K6CXSBPW8M8CX40YVG25DN7X j...e-interval=0s Sep 30 17:03:20 dgvxl2905 gitlab-runner[30494]: Removed job from processing list builds=0 job=508688 max_builds=1 project=61926 queue_depth=0 queue_size=0 repo_url=https:/...eue_seconds=0 Hint: Some lines were ellipsized, use -l to show in full.
|
小结
提炼的安装、启动方式
1 2 3 4 5 6 7 8 9
| curl -L --output /usr/local/bin/gitlab-runner "https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/binaries/gitlab-runner-linux-amd64" chmod +x /usr/local/bin/gitlab-runner
id gitlab-runner || useradd --comment 'GitLab Runner' --create-home gitlab-runner --shell /bin/bash
gitlab-runner install --user=gitlab-runner --working-directory=/home/tools/gitlab-runner
systemctl start gitlab-runner
|
备注:Install GitLab Runner manually on GNU/Linux | GitLab Docs 官方指导中,先用了 gitlab-runner install,后用了 gitlab-runner start 是不好的。
安全考虑
- 安装阶段:需要 root 权限(使用
sudo)
- 运行阶段:应该使用专用的普通用户(默认是
gitlab-runner 用户)
- 注册:使用 sudo 注册(配置文件需要写入系统目录)
gitlab-runner 用户
报错
1 2
| $ set -euo pipefail error: could not lock config file /home/gitlab-runner/.gitconfig: No such file or directory,
|
1
| mkdir: cannot create directory ‘/home/autotest’: Permission denied
|
原因
- Git 配置文件权限问题。GitLab Runner 在执行任务时,以
gitlab-runner 用户运行,但该用户的家目录配置不完整
.gitlab-ci.yml 配置上,gitlab-runner 用户没有权限创建或访问 /home/autotest 目录及其子目录。
解决
创建并修复 gitlab-runner 家目录。
1 2 3 4 5
| sudo mkdir -p /home/gitlab-runner sudo chown -R gitlab-runner:gitlab-runner /home/gitlab-runner sudo chmod 755 /home/gitlab-runner sudo -u gitlab-runner touch /home/gitlab-runner/.gitconfig sudo chmod 644 /home/gitlab-runner/.gitconfig
|
yml 中使用 CI 环境
1 2 3 4
| variables: TARGET_BUILD_DIR: "${CI_PROJECT_DIR}/doc/build" LOG_DIR: "${CI_PROJECT_DIR}/logs"
|
给 gitlab-runner 用户访问权限(如果必须使用 /home/autotest 路径)
1 2 3 4 5 6 7 8 9 10
| sudo usermod -aG autotest gitlab-runner
sudo chmod 755 /home/autotest sudo chmod -R 755 /home/autotest/tools sudo chmod -R 755 /home/autotest/vlt
sudo systemctl restart gitlab-runner
|
755 的含义:
7 (owner): 读+写+执行
5 (group): 读+执行
5 (others): 读+执行 ← 这是关键,任何用户(包括 gitlab-runner)都可以读取和访问这些目录,不需要加入任何组
代码仓库权限
错误信息:
Host key verification failed - 缺少 known_hosts
Could not read from remote repository - 缺少 SSH 密钥或权限
问题分析
- 手动可以克隆:用的是 root 或 autotest 用户,有配置好的 SSH 密钥
- Pipeline 失败:gitlab-runner 用户没有 SSH 密钥,也没有 GitLab 服务器的 host key
最佳解决:
直接切换到 gitlab-runner 用户,ssh-keygen 生成并配置秘钥到仓库。
gitlab runner 日志
对于通过 systemd 管理的 Runner
1 2 3 4 5 6 7 8 9 10
| sudo journalctl -u gitlab-runner -f
sudo journalctl -u gitlab-runner -n 100
sudo journalctl -u gitlab-runner --since "2024-09-30 10:00:00" --until "2024-09-30 12:00:00"
sudo journalctl -u gitlab-runner --since today
sudo journalctl -u gitlab-runner -p err
|
检查 Runner 状态
1 2 3 4 5 6 7 8 9 10 11 12
| sudo systemctl status gitlab-runner
sudo gitlab-runner --version sudo gitlab-runner verify
cat /etc/gitlab-runner/config.toml ls -la /etc/gitlab-runner/config.toml
sudo gitlab-runner list
|