Quantcast
Channel: Posts on Nova Kwok's Awesome Blog
Viewing all articles
Browse latest Browse all 90

在 Kubernetes 上运行 GitHub Actions Self-hosted Runner

$
0
0

GitHub Actions 很不错,相比较 Travis CI 而言排队不是很严重,除了用于 CI/CD 以外还可以通过提取内部的 DockerHub Credential 放到本地用于 docker pull 来避开 Docker Hub 的 429 Ratelimit 问题(参考:「 同步 docker hub library 镜像到本地 registry 」),对于一些小项目而言,GitHub Actions 提供的 Standard_DS2_v2 虚拟机确实性能还行,但是如果对于以下需求,使用 GitHub Actions 自带的机器可能就不是很合适了:

  • 编译 TiKV(Standard_DS2_v2 的 2C7G 的机器用 build dist_release 可以编译到死(或者 OOM))
  • 需要一些内部镜像协作,或使用到内网资源
  • 私有仓库,且需要大量编译(官方的 Action 对于私有仓库只有 2000 分钟的使用时间)
  • 需要更大的存储空间(官方的 GitHub Actions 只有 15G 不到的可用空间)

这种时候,我们就需要使用 Self-hosted Runner,什么是 Self-hosted Runner?

Self-hosted runners offer more control of hardware, operating system, and software tools than GitHub-hosted runners provide. With self-hosted runners, you can choose to create a custom hardware configuration with more processing power or memory to run larger jobs, install software available on your local network, and choose an operating system not offered by GitHub-hosted runners. Self-hosted runners can be physical, virtual, in a container, on-premises, or in a cloud.

对于一个 Org 而言,要添加一个 Org Level (全 Org 共享的) Runner 比较简单,只需要:

mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64-2.278.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.278.0/actions-runner-linux-x64-2.278.0.tar.gz
./config.sh --url https://github.com/some-github-org --token AF5TxxxxxxxxxxxA6PRRS
./run.sh

你就可以获得一个 Self hosted Runner 了,但是这样做会有一些局限性,比如:

  • 没法弹性扩容,只能一个个手动部署
  • 直接部署在裸机上,会有环境不一致的问题

Runner in Containter

Simple Docker

为了解决这个问题,我们需要把 GitHub Runner 给容器化,这里提供一个 Dockerfile 的 Example (魔改自:https://github.com/SanderKnape/github-runner),由于需要使用到类似 dind 的环境(在 Actions 中直接使用到 Docker 相关的指令),所以我加入了 docker 的 binary 进去,由于默认 Runner 不允许以 root 权限运行,为了避开后续挂载宿主机 Docker 的 sock 导致的权限问题,使用的 GitHub Runner 是一个经过修改的版本,修改版本中让 Runner 可以以 root 权限运行,修改的脚本如下:

wget https://github.com/actions/runner/releases/download/v2.278.0/actions-runner-linux-x64-2.278.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.278.0.tar.gz && rm -f actions-runner-linux-x64-2.278.0.tar.gz

# 这里删除了两个文件中判断是否 root 用户的部分
sed -i '3,9d' ./config.sh
sed -i '3,8d' ./run.sh
# End

# 重新打包
tar -czf actions-runner-linux-x64-2.278.0.tar.gz *

# 删除解压出来的不需要的文件
rm -rf bin config.sh env.sh externals run.sh

然后 Dockerfile 可以这么写

FROM ubuntu:18.04

ENV GITHUB_PAT ""
ENV GITHUB_ORG_NAME ""
ENV RUNNER_WORKDIR "_work"
ENV RUNNER_LABELS ""

RUN apt-get update \
    && apt-get install -y curl sudo git jq iputils-ping zip \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && curl https://download.docker.com/linux/static/stable/x86_64/docker-20.10.7.tgz --output docker-20.10.7.tgz \
    && tar xvfz docker-20.10.7.tgz \
    && cp docker/* /usr/bin/

USER root
WORKDIR /root/

RUN GITHUB_RUNNER_VERSION="2.278.0" \
    && curl -Ls https://internal.knat.network/action-runner/actions-runner-linux-x64-${GITHUB_RUNNER_VERSION}.tar.gz | tar xz \
    && ./bin/installdependencies.sh

COPY entrypoint.sh runsvc.sh ./
RUN sudo chmod u+x ./entrypoint.sh ./runsvc.sh

ENTRYPOINT ["./entrypoint.sh"]

其中 entrypoint.sh 的内容如下:

#!/bin/sh

# 这里如果直接使用 ./config.sh --url https://github.com/some-github-org --token AF5TxxxxxxxxxxxA6PRRS 的方式注册的话,token 会动态变化,容易导致注册后无法 remove 的问题,所以参考 https://docs.github.com/en/rest/reference/actions#list-self-hosted-runners-for-an-organization 通过 Personal Access Token 动态获取 Runner 的 Token
registration_url="https://github.com/${GITHUB_ORG_NAME}"
token_url="https://api.github.com/orgs/${GITHUB_ORG_NAME}/actions/runners/registration-token"
payload=$(curl -sX POST -H "Authorization: token ${GITHUB_PAT}" ${token_url})
export RUNNER_TOKEN=$(echo $payload | jq .token --raw-output)

if [ -z "${RUNNER_NAME}" ]; then
    RUNNER_NAME=$(hostname)
fi

./config.sh --unattended --url https://github.com/${GITHUB_ORG_NAME} --token ${RUNNER_TOKEN} --labels "${RUNNER_LABELS}"

# 在容器被干掉的时候自动向 GitHub 解除注册 Runner
remove() {
    if [ -n "${GITHUB_RUNNER_TOKEN}" ]; then
        export REMOVE_TOKEN=$GITHUB_RUNNER_TOKEN
    else
        payload=$(curl -sX POST -H "Authorization: token ${GITHUB_PAT}" ${token_url%/registration-token}/remove-token)
        export REMOVE_TOKEN=$(echo $payload | jq .token --raw-output)
    fi

    ./config.sh remove --unattended --token "${RUNNER_TOKEN}"
}

trap 'remove; exit 130' INT
trap 'remove; exit 143' TERM

./runsvc.sh "$*" &

wait $!

Build + 运行:

docker build . -t n0vad3v/github-runner
docker run -v /var/run/docker.sock:/var/run/docker.sock -e GITHUB_PAT="ghp_bhxxxxxxxxxxxxx7xxxxxxxdONDT" -e GITHUB_ORG_NAME="some-github-org" -it n0vad3v/github-runner

此时你就可以看到你的 Org 下多了一个船新的 Runner 了,现在终于可以利用上自己的机器快速跑任务不排队,而且性能比 GitHub Actions 强了~

Scale with Kubernetes

但是这样并不 Scale,所有的 Runner 都需要手动管理,而且,GitHub Actions 如果同时写了多个 Job ,然后 Runner 数量小于 Job 数量的话,部分 Job 就会一直排队,对于排队时间的话:

Each job for self-hosted runners can be queued for a maximum of 24 hours. If a self-hosted runner does not start executing the job within this limit, the job is terminated and fails to complete.

那这个肯定是没法接受的,正好手边有个 k8s 集群,对于这类基本无状态的服务来说,让 k8s 来自动管理他们不是最好的嘛,于是可以想到写一个 Deployment,比如这样:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: github-runner-some-github-org
  labels:
    app: githubrunner
spec:
  replicas: 10
  selector:
    matchLabels:
      app: githubrunner
  template:
    metadata:
      labels:
        app: githubrunner
    spec:
      volumes:
        - name: docker-sock
          hostPath:
            path: /var/run/docker.sock
            type: File
      containers:
        - name: github-runner-some-github-org
          imagePullPolicy: Always
          image: 'n0vad3v/github-runner'
          env:
            - name: GITHUB_PAT
              value: "ghp_bhxxxxxxxxxxxxx7xxxxxxxdONDT"
            - name: GITHUB_ORG_NAME
              value: "some-github-org"
            - name: RUNNER_LABELS
              value: "docker,internal-k8s"

          volumeMounts:
            - mountPath: /var/run/docker.sock
              name: docker-sock
              readOnly: false

kubectl apply -f action.yml -n novakwok,打上 Tag, 起飞!

[root@dev action]# kubectl get po -n novakwok
NAME                                                    READY   STATUS    RESTARTS   AGE
github-runner-some-github-org-deployment-9cfb598d9-4shrk   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-5rnj4   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-cvkr9   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-dmbnp   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-ggl24   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-gkgzx   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-jcscq   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-lrrxh   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-pn9cn   1/1     Running   0          26m
github-runner-some-github-org-deployment-9cfb598d9-wj2tj   1/1     Running   0          26m

Demo on Docker

由于我的需求比较特殊,我需要在 Runner 内使用 Docker 相关的指令(比如需要在 Runner 上 docker build/push),这里测试一下 Runner 是否可以正常工作,首先创建一个多 Job 的任务,像这样:

name: Test
on:
  push:
    branches: [ main ]

jobs:
  test-1:
    runs-on: [self-hosted,X64]

    steps:
      - uses: actions/checkout@v2
      - name: Run a one-line script
        run: |
          curl ip.sb
          df -h
          lscpu
          docker pull redis
                    
  test-2:
    runs-on: [self-hosted,X64]
    steps:
      - uses: actions/checkout@v2
      - name: Run a one-line script
        run: |
          curl ip.sb
          df -h
          lscpu
          docker pull redis
                    
  test-3:
    runs-on: [self-hosted,X64]
    steps:
      - uses: actions/checkout@v2
      - name: Run a one-line script
        run: |
          curl ip.sb
          df -h
          lscpu
          pwd
          docker pull redis          

然后跑一下看看是否可以 Work,首先确定是调度到了 Docker Runner 上:

然后看看 Docker 相关的操作是否可以 Work

好耶!

GC

有的时候会由于一些诡异的问题导致 Runner 掉线(比如 Remove 的时候网络断了之类的),这种之后 Org 下就会有一堆 Offline 的 Runner,为了解决这种情况,我们可以写一个简单的脚本来进行 GC,脚本如下:

import requests
import argparse

parser = argparse.ArgumentParser(description='GC Dead Self-hosted runners')
parser.add_argument('--github_pat', help='GitHub Personal Access Token')
parser.add_argument('--org_name', help='GitHub Org Name')
args = parser.parse_args()


def list_runners(org_name,github_pat):
    list_runner_url = 'https://api.github.com/orgs/{}/actions/runners'.format(org_name)
    headers = {"Authorization": "token {}".format(github_pat)}
    r = requests.get(list_runner_url,headers=headers)
    runner_list = r.json()['runners']
    return runner_list

def delete_offline_runners(org_name,github_pat,runner_list):
    headers = {"Authorization": "token {}".format(github_pat)}
    for runner in runner_list:
        if runner['status'] == "offline":
            runner_id = runner['id']
            delete_runner_url = 'https://api.github.com/orgs/{}/actions/runners/{}'.format(org_name,runner_id)
            print("Deleting runner " + str(runner_id) + ", with name of " + runner['name'])
            r = requests.delete(delete_runner_url,headers=headers)

if __name__ == '__main__':
    runner_list = list_runners(args.org_name,args.github_pat)
    delete_offline_runners(args.org_name,args.github_pat,runner_list)

用法是:python3 gc_runner.py --github_pat "ghp_bhxxxxxxxxxxxxx7xxxxxxxdONDT" --org_name "some-github-org"

Some limitations

除了我们自身硬件限制以外,GitHub Actions 本身还有一些限制,比如:

  • Workflow run time - Each workflow run is limited to 72 hours. If a workflow run reaches this limit, the workflow run is cancelled.
  • Job queue time - Each job for self-hosted runners can be queued for a maximum of 24 hours. If a self-hosted runner does not start executing the job within this limit, the job is terminated and fails to complete.
  • API requests - You can execute up to 1000 API requests in an hour across all actions within a repository. If exceeded, additional API calls will fail, which might cause jobs to fail.
  • Job matrix - A job matrix can generate a maximum of 256 jobs per workflow run. This limit also applies to self-hosted runners.
  • Workflow run queue - No more than 100 workflow runs can be queued in a 10 second interval per repository. If a workflow run reaches this limit, the workflow run is terminated and fails to complete.

其中 API requests 这个比较玄学,由于 GitHub Actions 的工作方法官方介绍如下:

The self-hosted runner polls GitHub to retrieve application updates and to check if any jobs are queued for processing. The self-hosted runner uses a HTTPS long poll that opens a connection to GitHub for 50 seconds, and if no response is received, it then times out and creates a new long poll.

所以不是很容易判断怎么样才算是一个 API request,这一点需要在大量使用的时候才可能暴露出问题。

Git Version

这里有个小坑,容器内的 Git 版本建议在 2.18 以上,Ubuntu 18.04 没问题(默认是 2.22.5),但是 arm64v8/ubuntu:18.04 官方源包管理工具的 Git 版本是 2.17,如果用这个版本的话,会遇到这种问题:

所以需要编译一个高版本的 Git,比如 Dockerfile 可以加上这么一行:

apt install -y gcc libssl-dev libcurl4-gnutls-dev zlib1g-dev make gettext wget
wget https://www.kernel.org/pub/software/scm/git/git-2.28.0.tar.gz && tar -xvzf git-2.28.0.tar.gz && cd git-2.28.0 && ./configure --prefix=/usr/ && make && make install

小结

如上,我们已经把 Runner 封进了 Docker 容器中,并且在需要 Scale 的情况下通过 k8s 进行水平扩展,此外,我们还有一个简单的 GC 程序对可能异常掉线的 Runner 进行 GC,看上去已经满足了一些初步的需求啦~

但是这样还是有一些问题,比如:

  1. 用 root 用户跑容器可能会有潜在的风险,尤其是还暴露了宿主机的 Docker sock,所以对于普通的任务来说,还是需要一个非 root 用户的容器来运行
  2. 还是没有实现自动化扩缩容,扩缩容依赖手动修改 replica,这里需要进行自动化(例如预留 20 个 Idle 的 Runner,如果 Idle Runner 小于 20 个就自动增加)
  3. Label 管理,由于 GitHub Actions 依赖的 Label 进行调度,所以这里打 Label 其实是一个需要长期考虑的事情

References

  1. Running self-hosted GitHub Actions runners in your Kubernetes cluster
  2. About GitHub-hosted runners
  3. Actions

Viewing all articles
Browse latest Browse all 90

Latest Images

Pangarap Quotes

Pangarap Quotes

Vimeo 10.7.0 by Vimeo.com, Inc.

Vimeo 10.7.0 by Vimeo.com, Inc.

HANGAD

HANGAD

MAKAKAALAM

MAKAKAALAM

Doodle Jump 3.11.30 by Lima Sky LLC

Doodle Jump 3.11.30 by Lima Sky LLC

Trending Articles


Ang Nobela sa “From Darna to ZsaZsa Zaturnnah: Desire and Fantasy, Essays on...


Lola Bunny para colorear


Libros para colorear


Mandalas de flores para colorear


Dibujos para colorear de perros


Dromedario para colorear


mayabang Quotes, Torpe Quotes, tanga Quotes


Love Quotes Tagalog


Tropa Quotes


Mga Tala sa “Unang Siglo ng Nobela sa Filipinas” (2009) ni Virgilio S. Almario


Gwapo Quotes : Babaero Quotes


Kung Fu Panda para colorear


Girasoles para colorear


Tiburon para colorear


Renos para colorear


Lagarto para colorear


Long Distance Relationship Tagalog Love Quotes


Tagalog Long Distance Relationship Love Quotes


RE: Mutton Pies (mely)


El Vibora (1971) by Francisco V. Coching and Federico C. Javinal





Latest Images

Pangarap Quotes

Pangarap Quotes

Vimeo 10.7.0 by Vimeo.com, Inc.

Vimeo 10.7.0 by Vimeo.com, Inc.

HANGAD

HANGAD

MAKAKAALAM

MAKAKAALAM

Doodle Jump 3.11.30 by Lima Sky LLC

Doodle Jump 3.11.30 by Lima Sky LLC