捕获启动 bash 时的调试输出

也就是把 bash -x 的输出重定向到文件中。目的是跟踪 WSL2 Ubuntu 启动慢的问题。

为了避免当前 Shell 的影响,我打算从 bash --norc 开始。
因为在 PowerShell 中使用 ubuntu run bash -x > trace.txt 获得不到输出,只好这样:

# start WSL2 from PowerShell
ubuntu run bash --norc
# inside WSL2 bash
exec 3> trace.txt
export PS4='+ $EPOCHREALTIME\011 '  
export BASH_XTRACEFD=3
/bin/bash -lixc exit
$ hyperfine 'bash -lixc exit'
Benchmark 1: bash -lixc exit
  Time (mean ± σ):     670.1 ms ±  62.6 ms    [User: 79.9 ms, System: 63.0 ms]
  Range (minmax):   604.7 ms774.4 ms    10 runs

接下来的分析就有时间慢慢看

+ 1758313857.887787	 '[' '\s-\v\$ ' ']'
+ 1758313857.888383	 '[' /bin/bash ']'
+ 1758313857.888648	 '[' /bin/bash '!=' /bin/sh ']'
+ 1758313857.888897	 '[' -f /etc/bash.bashrc ']'
+ 1758313857.889147	 . /etc/bash.bashrc
++ 1758313857.889825	 '[' -z '\s-\v\$ ' ']'
++ 1758313857.890075	 shopt -s checkwinsize
++ 1758313857.890355	 '[' -z '' ']'
++ 1758313857.890580	 '[' -r /etc/debian_chroot ']'
++ 1758313857.890857	 '[' -n '' -a -n '' ']'
++ 1758313857.891102	 PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
++ 1758313857.891386	 '[' '!' -e /home/enihsyou/.sudo_as_admin_successful ']'
++ 1758313857.891684	 '[' -x /usr/lib/command-not-found -o -x /usr/share/command-not-found/command-not-found ']'
+ 1758313857.891970	 '[' -d /etc/profile.d ']'
+ 1758313857.892673	 for i in /etc/profile.d/*.sh

但主要还是想看到每行执行时间,好在有人已经做好介绍了

# inside WSL2 bash
exec 3>&2 2> >(tee sample-time.log | sed -u 's/^.*$/now/' | date -f - +%s.%N >sample-time.tim)
bash -lixc exit
 
# exit bash and reopen
^D
 
# then display result
paste <(
    while read tim ;do
	    [ -z "$last" ] && last=${tim//.} && first=${tim//.}
        crt=000000000$((${tim//.}-10#0$last))
        printf "%12.9f\n" ${crt:0:${#crt}-9}.${crt:${#crt}-9}
        last=${tim//.}
    done < sample-time.tim
) <(echo;cat sample-time.log) > sample-time.txt
 
# sort to get significent
sort -k1 -n -r sample-time.txt | head

WSL2 bash 加载慢的原因分析

PATH 包含 drvfs 路径,访问到 Windows 去了

 0.000003880	+++ for _bash_it_library_finalize_f in "${_bash_it_library_finalize_hook[@]:-}"
 0.000003889	+++ eval _bash_it_appearance_scm_init
 0.000003870	++++ _bash_it_appearance_scm_init
 0.000003928	+++++ type -P git
 0.000703130	++++ GIT_EXE=/usr/bin/git
 0.129232040	+++++ type -P hg
 0.000561248	+++++ true
 0.000562101	++++ HG_EXE=
 0.154041063	+++++ type -P svn
 0.000445073	+++++ true
 0.000629533	++++ SVN_EXE=
 0.132867911	+++++ type -P p4
 0.000348760	+++++ true
 0.000264307	++++ P4_EXE=
 0.000025048	++++ [[ -x '' ]]
 0.000288670	++++ return 0
$ hyperfine 'which git' 'which hg' -i
Benchmark 1: which git
  Time (mean ± σ):     619.6 µs ± 273.7 µs    [User: 352.4 µs, System: 68.0 µs]
  Range (minmax):   253.9 µs1594.0 µs    1469 runs
 
Benchmark 2: which hg
  Time (mean ± σ):      52.9 ms ±   6.2 ms    [User: 2.0 ms, System: 3.3 ms]
  Range (minmax):    43.6 ms 59.9 ms    58 runs
 
Summary
  which git ran
   85.40 ± 39.03 times faster than which hg

找不到的目标会遍历完所有的 PATH,造成缓存穿透般的效果。
再加上我这里的 echo "$PATH" | tr ':' '\n' | grep "/mnt" | wc -l 有 41 行位于 Windows 文件系统下,访问本来就慢,内容还有重复的。

目前解决方式是在 /etc/wsl.conf 设置 appendWindowsPath=false,把 Windows PATH 从 WSL 中移除,只在必要时手动加载

# 当前在 /etc/wsl.conf 设置了 appendWindowsPath=false,这个函数用来在必要时候
# 手动添加 Windows 的 PATH 到 WSL2 的 PATH 里
function win() {
  # 列表手动同步吧
  local win_path=(
    "/mnt/c/Program Files/OpenSSH/"
    "/mnt/c/WINDOWS/system32"
    "/mnt/c/WINDOWS"
    "/mnt/c/Program Files/Git/cmd"
    "/mnt/c/Users/enihsyou/.local/bin"
  )

  for p in "${win_path[@]}"; do
    if [ -d "$p" ] && [[ ":$PATH:" != *":$p:"* ]]; then
      PATH="$PATH:$p"
    fi
  done
  export PATH
}

加载时间一下就下来了,670ms 109ms

$ hyperfine 'bash -lixc exit'
Benchmark 1: bash -lixc exit
  Time (mean ± σ):     109.7 ms ±   4.1 ms    [User: 62.3 ms, System: 22.9 ms]
  Range (minmax):   106.1 ms127.0 ms    23 runs