无法正常启动的docker容器如何修复
由于docker目录迁移,导致容器启动的时候,然后不断restarting,服务不能正常运行。
root@db1:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f795ad40afc redis:6.0.9 “docker-entrypoint.s…” 3 days ago Up 41 minutes redis-6378
995d1c983134 redis:6.0.9 “docker-entrypoint.s…” 3 days ago Up 41 minutes redis-6379
60ff8ad4efe8 clickhouse/clickhouse-server:23.6.1.1524 “/entrypoint.sh” 4 weeks ago Restarting (232) 14 seconds ago clickhouse-01
f352dacd9f1d clickhouse/clickhouse-server:23.6.1.1524 “/entrypoint.sh” 4 weeks ago Restarting (232) 12 seconds ago clickhouse-02
b4a02ed016f1 clickhouse/clickhouse-keeper:23.6.1.1524 “/entrypoint.sh” 4 weeks ago Restarting (1) 6 seconds ago clickhouse-keeper-01
06f77e12e6bd mysql:5.7 “docker-entrypoint.s…” 5 weeks ago Restarting (1) 19 seconds ago mysql-master1
root@db1:~#
查看docker logs报错日志,发现是目录权限问题,导致命令报错直接退出。
root@db1:~# docker logs -f –tail 100 clickhouse-keeper-01
Logging information to /var/log/clickhouse-keeper/clickhouse-keeper.log
Logging errors to /var/log/clickhouse-keeper/clickhouse-keeper.err.log
Processing configuration file ‘/etc/clickhouse-keeper/keeper_config.xml’.
Logging information to /var/log/clickhouse-keeper/clickhouse-keeper.log
Logging errors to /var/log/clickhouse-keeper/clickhouse-keeper.err.log
Necessary directory ‘/var/lib/clickhouse’ isn’t accessible by user with id ‘101’
Necessary directory ‘/var/lib/clickhouse’ isn’t accessible by user with id ‘101’
…
正常情况下,通过宿主机执行docker exec -it 容器ID sh 是可以进去容器,然后就可以进行尝试修复。
但是容器现在处于不可运行状态,根本无法进去:
root@db1:~# docker exec -it -uroot clickhouse-keeper-01 bash
Error response from daemon: Container b4a02ed016f178eea7339f5c986c774dfbe766368b3d42e90338bb338f8a6869 is restarting, wait until the container is running
root@db1:~#
删除容器,再来run一个?但那是新的了,数据会丢失。
那容器里的数据复制docker cp复制出来,然后再run一个?但不知道容器内哪些数据是需要cp的,总不能都复制出来吧,再说组合一个新的“原装货”也很麻烦。
灵机一动,修改一下入口文件entrypoint.sh,再最前面加上sleep 3600,这样好歹可以先”卡住”容器,至少这样它是不会错误退出了。
遂docker cp 容器ID:/entrypoint.sh ./
vim entrypoint.sh,
#!/bin/bash
sleep 3600
set +x
set -eo pipefail
shopt -s nullglob
…
docker cp entrypoint.sh 容器ID:/
然后docker restart 容器ID
root@db1:~# docker exec -it -uroot clickhouse-keeper-01 bash
clickhouse-keeper-01:/#
于是可以正常进去了,为了方便,指定用root用户登录,否则权限还是无法修改。
实际上,用root执行一下entrypoint.sh,就可以把权限按照脚本期望的“刷”回来。(当然,也可以进行其他操作)
这里需要注意一下,因为我们已经修改了入口文件,所以需要来个副本,并去掉sleep时间,才能立即执行。
cp entrypoint.sh repair-entrypoint.sh
vi repair-entrypoint.sh
#!/bin/bash
# sleep 3600
set +x
set -eo pipefail
shopt -s nullglob
…
然后执行./repair-entrypoint.sh:
my id: 1, leader: 3, term: 1020
2024.12.10 06:44:37.452752 [ 58 ] {} <Information> RaftInstance: config at index 2122055 is committed, prev config log idx 2121879
2024.12.10 06:44:37.452804 [ 58 ] {} <Information> RaftInstance: new config log idx 2122055, prev log idx 2121879, cur config log idx 2121879, prev log idx 2121643
2024.12.10 06:44:37.452845 [ 58 ] {} <Information> RaftInstance: new configuration: log idx 2122055, prev log idx 2121879
peer 1, DC ID 0, clickhouse-keeper-01:9234, voting member, 1
peer 2, DC ID 0, clickhouse-keeper-02:9234, voting member, 1
peer 3, DC ID 0, clickhouse-keeper-03:9234, voting member, 1
my id: 1, leader: 3, term: 1020
没问题了。
切换为默认身份 su clickhouse
重新执行./repair-entrypoint.sh
发现正常了。
最后用root用户repair-entrypoint.sh 覆盖entrypoint.sh,退出容器,重启容器,OK。
观察日志,没问题,集群正常了~