new deploy: 2022-06-04T14:52:54+00:00

pages
Loïc Dachary 2022-06-04 14:52:54 +00:00 committed by dachary
parent cdf7368c31
commit 2021fcb3b7
15 changed files with 842 additions and 19 deletions

View File

@ -4,10 +4,75 @@
<link href="https://hostea.org/blog/atom.xml" rel="self" type="application/atom+xml"/>
<link href="https://hostea.org/blog/"/>
<generator uri="https://www.getzola.org/">Zola</generator>
<updated>2022-06-02T00:00:00+00:00</updated>
<updated>2022-06-04T00:00:00+00:00</updated>
<id>https://hostea.org/blog/atom.xml</id>
<entry xml:lang="en">
<title>[solved] Zombies created by Gitea</title>
<published>2022-06-04T00:00:00+00:00</published>
<updated>2022-06-04T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies-part-2/" type="text/html"/>
<id>https://hostea.org/blog/zombies-part-2/</id>
<content type="html">&lt;p&gt;Gitea can &lt;a href=&quot;zombies&quot;&gt;create zombies&lt;&#x2F;a&gt;, for instance if a Git mirror takes too long. When updating a mirror, Gitea relies on the &lt;code&gt;git remote update&lt;&#x2F;code&gt; command which creates a child process, &lt;code&gt;git-remote-https&lt;&#x2F;code&gt;, to fetch data from the remote repository. Gitea has an internal timeout that will kill the child process (e.g. &lt;code&gt;git remote update&lt;&#x2F;code&gt;) when it takes too long but will not kill the grandchild. This grandchild will become an orphan and run forever or until its own timeout expires, which is about two minutes on git version 2.25.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt;$ time git clone https:&#x2F;&#x2F;4.4.4.4
&lt;&#x2F;span&gt;&lt;span&gt;Clonage dans &amp;#39;4.4.4.4&amp;#39;...
&lt;&#x2F;span&gt;&lt;span&gt;fatal: impossible d&amp;#39;accéder à &amp;#39;https:&#x2F;&#x2F;4.4.4.4&#x2F;&amp;#39;: Failed to connect to 4.4.4.4 port 443: Connexion terminée par expiration du délai d&amp;#39;attente
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;real 2m9,753s
&lt;&#x2F;span&gt;&lt;span&gt;user 0m0,001s
&lt;&#x2F;span&gt;&lt;span&gt;sys 0m0,009s
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;As explained in the &lt;a href=&quot;zombies&#x2F;#killing-a-child-process-and-all-its-children&quot;&gt;diagnostic blog post regarding Gitea zombies&lt;&#x2F;a&gt; there fortunately is a very simple way to avoid this by making sure each Gitea child is a &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Process_group&quot;&gt;process group leader&lt;&#x2F;a&gt;. That first step was &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;introduced in Gitea 1.17&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;backported to Gitea 1.16.9&lt;&#x2F;a&gt;. The actual bug fix can now be implemented.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;using-negative-process-id-to-kill-children&quot;&gt;Using negative process id to kill children&lt;a class=&quot;zola-anchor&quot; href=&quot;#using-negative-process-id-to-kill-children&quot; aria-label=&quot;Anchor link for: using-negative-process-id-to-kill-children&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;When Gitea timeout on a child, it relies on &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;blob&#x2F;f8a53df314e4af8cd350eedb0dae77d4c4fc30d0&#x2F;src&#x2F;os&#x2F;exec&#x2F;exec.go#L650&quot;&gt;os.Process.Kill&lt;&#x2F;a&gt; which translates into a using the kill(2) system call to send a SIGKILL signal to unconditionally terminate it: &lt;code&gt;kill(pid, SIGKILL)&lt;&#x2F;code&gt;. Using a negative pid with &lt;code&gt;kill(-pid, SIGKILL)&lt;&#x2F;code&gt; will also terminate all processes created by Gitea&#x27;s child, without Gitea knowing when or why they were created. From the kill(2) manual page:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Which is implemented as follows in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L79&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;h3 id=&quot;not-using-the-default-go-commandcontext&quot;&gt;Not using the default Go CommandContext&lt;a class=&quot;zola-anchor&quot; href=&quot;#not-using-the-default-go-commandcontext&quot; aria-label=&quot;Anchor link for: not-using-the-default-go-commandcontext&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Since &lt;a href=&quot;https:&#x2F;&#x2F;pkg.go.dev&#x2F;os&#x2F;exec#CommandContext&quot;&gt;CommandContext&lt;&#x2F;a&gt; does not allow to send a signal to the negative pid of the child process, it has to be implemented by Gitea itself, in a way that is similar to how the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L71-87&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt; does it:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt; err := cmd.Start()
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; go func() {
&lt;&#x2F;span&gt;&lt;span&gt; &amp;lt;-ctx.Done()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; if killErr := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); killErr == nil {
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; }
&lt;&#x2F;span&gt;&lt;span&gt; }()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; err = cmd.Wait()
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;testing-the-bug-is-fixed-and-stays-fixed&quot;&gt;Testing the bug is fixed and stays fixed&lt;a class=&quot;zola-anchor&quot; href=&quot;#testing-the-bug-is-fixed-and-stays-fixed&quot; aria-label=&quot;Anchor link for: testing-the-bug-is-fixed-and-stays-fixed&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Long standing bugs that are difficult to reproduce manually such as this one require robust testing to ensure that:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the diagnostic identifying the root cause is correct&lt;&#x2F;li&gt;
&lt;li&gt;the bug fix works&lt;&#x2F;li&gt;
&lt;li&gt;it does not resurface insidiously because of a subtle regression introduce years later&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;It is easy to implement as can be seen in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L44-76&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;. In a nutshell:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L53&quot;&gt;git clone https:&#x2F;&#x2F;4.4.4.4&lt;&#x2F;a&gt; which will hang because of firewall rules&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L60-65&quot;&gt;wait for the git-remote-https&lt;&#x2F;a&gt; grandchild process to be spawned&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L67-68&quot;&gt;cancel the context and wait for the goroutine to terminate&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L70-75&quot;&gt;verify the git-remote-https is killed&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;And with that... no more zombies!&lt;&#x2F;p&gt;
</content>
</entry>
<entry xml:lang="en">
<title>[diagnostic] Zombies created by Gitea</title>
<published>2022-06-02T00:00:00+00:00</published>
<updated>2022-06-02T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies/" type="text/html"/>

View File

@ -204,7 +204,7 @@
<ul class="blog__list">
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<a href="https://hostea.org/blog/zombies-part-2/" class="blog__post-link">
<h2 class="blog__post-title">[solved] Zombies created by Gitea</h2>
<p class="blog__post-meta">
@ -215,6 +215,41 @@
&middot; 4
June
,
2022 &middot; <b>3 min read</b>
</p>
<p class="blog__post-description">Gitea can use process groups to kill its children using a negative PID to never create zombies. </p>
</a>
<div class="blog__post-tag-container">
<a class="blog__post-tag" href="/tags/hostea">#hostea</a>
<a class="blog__post-tag" href="/tags/gitea">#gitea</a>
<a class="blog__post-tag" href="/tags/troubleshoot">#troubleshoot</a>
<a class="blog__post-tag" href="/tags/problem">#problem</a>
</div>
</li>
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<h2 class="blog__post-title">[diagnostic] Zombies created by Gitea</h2>
<p class="blog__post-meta">
<a href="https:&#x2F;&#x2F;dachary.org" class="post__author">Loïc Dachary</a>
&middot; 2
June

View File

@ -0,0 +1,355 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width" />
<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png" />
<link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png" />
<link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png" />
<link rel="manifest" href="/site.webmanifest" />
<link rel="stylesheet" href="https://hostea.org/main.css" />
<link
rel="stylesheet"
media="screen and (max-width: 1300px)"
href="https://hostea.org/mobile.css"
/>
<meta name="referrer" content="no-referrer-when-downgrade" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="stylesheet" href="https://hostea.org/main.css" />
<link
rel="stylesheet"
media="screen and (max-width: 1300px)"
href="https://hostea.org/mobile.css"
/>
<meta name="referrer" content="no-referrer-when-downgrade" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>[solved] Zombies created by Gitea | Hostea: Managed Gitea Hosting </title>
<meta name="referrer" content="no-referrer-when-downgrade" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="description" content="Gitea can use process groups to kill its children using a negative PID to never create zombies." />
<meta property="og:title" content="[solved] Zombies created by Gitea | Hostea: Managed Gitea Hosting " />
<meta property="og:type" content="article" />
<meta property="og:url" content="https:&#x2F;&#x2F;hostea.org" />
<meta property="og:description" content="Gitea can use process groups to kill its children using a negative PID to never create zombies." />
<meta
property="og:site_name"
content="[solved] Zombies created by Gitea | Hostea: Managed Gitea Hosting "
/>
<link
rel="apple-touch-icon"
sizes="57x57"
href="https://hostea.org/apple-icon-57x57.png?h=c21de14cfdf862a6472ae977557fa048a7c36d39337e61d3274705e9bd8e857f"
/>
<link
rel="apple-touch-icon"
sizes="60x60"
href="https://hostea.org/apple-icon-60x60.png?h=67089d9025a52d0d1ddce450078c7acefe2c150a2427dec9f5e13c6314f74281"
/>
<link
rel="apple-touch-icon"
sizes="72x72"
href="https://hostea.org/apple-icon-72x72.png?h=70725943de8884804f9da28202ced0ad6fed483ae9cf8f6d874aa133e30cb693"
/>
<link
rel="apple-touch-icon"
sizes="76x76"
href="https://hostea.org/apple-icon-76x76.png?h=1e6e8072df3b21bdcea254a42aac6e993611e845f91ddd79f6f35a6c441710a5"
/>
<link
rel="apple-touch-icon"
sizes="114x114"
href="https://hostea.org/apple-icon-114x114.png?h=c20099f8190ed3962fab5726c5594857a871cdb3ee98439343c622cd3727fed6"
/>
<link
rel="apple-touch-icon"
sizes="120x120"
href="https://hostea.org/apple-icon-120x120.png?h=4df78e402e60b58c6d44764678bdd737b5b6a836aeb85fb75fa49f706f7e8c81"
/>
<link
rel="apple-touch-icon"
sizes="144x144"
href="https://hostea.org/apple-icon-144x144.png?h=0c44e6655d714f89ee95cc151032d1f0dc3204bd24d1ca2ee9d94692d4ede84d"
/>
<link
rel="apple-touch-icon"
sizes="152x152"
href="https://hostea.org/apple-icon-152x152.png?h=157918f883ff95d4eeb6452d0ebb61ca5e21ea0dcac1aefe825f3e2f3999052f"
/>
<link
rel="apple-touch-icon"
sizes="180x180"
href="https://hostea.org/apple-icon-180x180.png?h=7d5c16d379b7db6d8ea5aae64921d7162b84f543763acd8fc7c107f80a600213"
/>
<link
rel="icon"
type="image/png"
sizes="192x192"
href="https://hostea.org/android-icon-192x192.png?h=095e3835b082dba07f606c33fa6f71bcd671a71e987b0ab2e46dcddceef52b9c"
/>
<link
rel="icon"
type="image/png"
sizes="32x32"
href="https://hostea.org/favicon-32x32.png?h=d7cd5d6390d58e729cd1f3564add60e9d8b63f54482a7f4cb5a66bb4780dfb05"
/>
<link
rel="icon"
type="image/png"
sizes="96x96"
href="https://hostea.org/favicon-96x96.png?h=5e01ce966b1d7ed88e0b01226d74ad8aaa65cea839073eb1ec6e115e76f3b2db"
/>
<link
rel="icon"
type="image/png"
sizes="16x16"
href="https://hostea.org/favicon-16x16.png?h=442e55b5177a8b501f75401b6b61bddace8d1ef8d91dab611fb1993293682ba5"
/>
<link
rel="manifest"
href="https://hostea.org/manifest.json?h=27eca3e8297eb7ff340deb3849b210185a459b3845456aa4d0036f6d966b3518"
/>
<meta name="msapplication-TileColor" content="#ffffff" />
<meta
name="msapplication-TileImage"
content="https://hostea.org/ms-icon-144x144.png?h=8170ab51b871b84b8f98bd03cf441afdffb2998b7dfffb04abb7ebf5deeb1f94"
/>
<meta name="theme-color" content="#ffffff" />
</head>
</head>
<body class="base">
<header>
<nav class="nav__container">
<input type="checkbox" class="nav__toggle" id="nav__toggle" />
<div class="nav__header">
<a class="nav__logo-container" href="/">
<img src="https://hostea.org/android-icon-48x48.png?h=5115cfa26ec433a1f436236b2842c138d9d17f0c5a6376e3102c14e949dae1cb"
alt="Hostea temporary logo"/>
<p class="nav__home-btn">
ostea
</p>
</a>
<label class="nav__hamburger-menu" for="nav__toggle">
<span class="nav__hamburger-inner"></span>
</label>
</div>
<div class="nav__spacer"></div>
<div class="nav__link-group">
<div class="nav__link-container">
<a class="nav__link" rel="noreferrer" href="&#x2F;about&#x2F;">About</a>
</div>
<div class="nav__link-container">
<a class="nav__link" rel="noreferrer" href="&#x2F;blog&#x2F;">Blog</a>
</div>
<div class="nav__link-container">
<a class="nav__link" rel="noreferrer" href="&#x2F;contact&#x2F;">Contact</a>
</div>
<div class="nav__link-container">
<a class="nav__link" rel="noreferrer" href="&#x2F;gitea-clinic&#x2F;">Gitea Clinic</a>
</div>
<div class="nav__link-container">
<a class="nav__link" rel="noreferrer" href="&#x2F;talks&#x2F;">Talks</a>
</div>
</div>
</nav>
</header>
<!-- See ../sass/main.scss. Required for pushing footer to the very
bottom of the page -->
<div class="main__content-container">
<main>
<div class="page__container">
<h1 class="page__group-title">[solved] Zombies created by Gitea</h1>
<p class="blog__post-meta">
<a href="https:&#x2F;&#x2F;dachary.org" class="post__author">Loïc Dachary</a>
&middot; 4
June
,
2022 &middot; <b>3 min read</b>
</p>
<div class="blog__content">
<p>Gitea can <a href="zombies">create zombies</a>, for instance if a Git mirror takes too long. When updating a mirror, Gitea relies on the <code>git remote update</code> command which creates a child process, <code>git-remote-https</code>, to fetch data from the remote repository. Gitea has an internal timeout that will kill the child process (e.g. <code>git remote update</code>) when it takes too long but will not kill the grandchild. This grandchild will become an orphan and run forever or until its own timeout expires, which is about two minutes on git version 2.25.</p>
<pre style="background-color:#2b303b;color:#c0c5ce;"><code><span>$ time git clone https://4.4.4.4
</span><span>Clonage dans &#39;4.4.4.4&#39;...
</span><span>fatal: impossible d&#39;accéder à &#39;https://4.4.4.4/&#39;: Failed to connect to 4.4.4.4 port 443: Connexion terminée par expiration du délai d&#39;attente
</span><span>
</span><span>real 2m9,753s
</span><span>user 0m0,001s
</span><span>sys 0m0,009s
</span></code></pre>
<p>As explained in the <a href="zombies/#killing-a-child-process-and-all-its-children">diagnostic blog post regarding Gitea zombies</a> there fortunately is a very simple way to avoid this by making sure each Gitea child is a <a href="https://en.wikipedia.org/wiki/Process_group">process group leader</a>. That first step was <a href="https://github.com/go-gitea/gitea/pull/19865">introduced in Gitea 1.17</a> and <a href="https://github.com/go-gitea/gitea/pull/19865">backported to Gitea 1.16.9</a>. The actual bug fix can now be implemented.</p>
<h3 id="using-negative-process-id-to-kill-children">Using negative process id to kill children<a class="zola-anchor" href="#using-negative-process-id-to-kill-children" aria-label="Anchor link for: using-negative-process-id-to-kill-children"
><span class="anchor-icon">#</span></a
>
</h3>
<p>When Gitea timeout on a child, it relies on <a href="https://github.com/golang/go/blob/f8a53df314e4af8cd350eedb0dae77d4c4fc30d0/src/os/exec/exec.go#L650">os.Process.Kill</a> which translates into a using the kill(2) system call to send a SIGKILL signal to unconditionally terminate it: <code>kill(pid, SIGKILL)</code>. Using a negative pid with <code>kill(-pid, SIGKILL)</code> will also terminate all processes created by Gitea's child, without Gitea knowing when or why they were created. From the kill(2) manual page:</p>
<blockquote>
<p>If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.</p>
</blockquote>
<p>Which is implemented as follows in the <a href="https://lab.forgefriends.org/friendlyforgeformat/gofff/-/blob/f42a29284a5262d3e6f94801089369626c5197f6/util/exec.go#L79">Friendly Forge Format library</a>:</p>
<blockquote>
<p><code>syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)</code></p>
</blockquote>
<h3 id="not-using-the-default-go-commandcontext">Not using the default Go CommandContext<a class="zola-anchor" href="#not-using-the-default-go-commandcontext" aria-label="Anchor link for: not-using-the-default-go-commandcontext"
><span class="anchor-icon">#</span></a
>
</h3>
<p>Since <a href="https://pkg.go.dev/os/exec#CommandContext">CommandContext</a> does not allow to send a signal to the negative pid of the child process, it has to be implemented by Gitea itself, in a way that is similar to how the <a href="https://lab.forgefriends.org/friendlyforgeformat/gofff/-/blob/f42a29284a5262d3e6f94801089369626c5197f6/util/exec.go#L71-87">Friendly Forge Format library</a> does it:</p>
<pre style="background-color:#2b303b;color:#c0c5ce;"><code><span> err := cmd.Start()
</span><span>...
</span><span> go func() {
</span><span> &lt;-ctx.Done()
</span><span>
</span><span> if killErr := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); killErr == nil {
</span><span>...
</span><span> }
</span><span> }()
</span><span>
</span><span> err = cmd.Wait()
</span></code></pre>
<h3 id="testing-the-bug-is-fixed-and-stays-fixed">Testing the bug is fixed and stays fixed<a class="zola-anchor" href="#testing-the-bug-is-fixed-and-stays-fixed" aria-label="Anchor link for: testing-the-bug-is-fixed-and-stays-fixed"
><span class="anchor-icon">#</span></a
>
</h3>
<p>Long standing bugs that are difficult to reproduce manually such as this one require robust testing to ensure that:</p>
<ul>
<li>the diagnostic identifying the root cause is correct</li>
<li>the bug fix works</li>
<li>it does not resurface insidiously because of a subtle regression introduce years later</li>
</ul>
<p>It is easy to implement as can be seen in the <a href="https://lab.forgefriends.org/friendlyforgeformat/gofff/-/blob/f42a29284a5262d3e6f94801089369626c5197f6/util/exec_test.go#L44-76">Friendly Forge Format library</a>. In a nutshell:</p>
<ul>
<li><a href="https://lab.forgefriends.org/friendlyforgeformat/gofff/-/blob/f42a29284a5262d3e6f94801089369626c5197f6/util/exec_test.go#L53">git clone https://4.4.4.4</a> which will hang because of firewall rules</li>
<li><a href="https://lab.forgefriends.org/friendlyforgeformat/gofff/-/blob/f42a29284a5262d3e6f94801089369626c5197f6/util/exec_test.go#L60-65">wait for the git-remote-https</a> grandchild process to be spawned</li>
<li><a href="https://lab.forgefriends.org/friendlyforgeformat/gofff/-/blob/f42a29284a5262d3e6f94801089369626c5197f6/util/exec_test.go#L67-68">cancel the context and wait for the goroutine to terminate</a></li>
<li><a href="https://lab.forgefriends.org/friendlyforgeformat/gofff/-/blob/f42a29284a5262d3e6f94801089369626c5197f6/util/exec_test.go#L70-75">verify the git-remote-https is killed</a></li>
</ul>
<p>And with that... no more zombies!</p>
</div>
<br>
<br>
<div class="blog__post-tag-container">
<a class="blog__post-tag" href="/tags/hostea">#hostea</a>
<a class="blog__post-tag" href="/tags/gitea">#gitea</a>
<a class="blog__post-tag" href="/tags/troubleshoot">#troubleshoot</a>
<a class="blog__post-tag" href="/tags/problem">#problem</a>
</div>
</div>
</main>
<footer>
<div class="footer__container">
<!-- <div class="footer__column"> --->
<p class="footer__column license__conatiner">
All text <a
class="license__link"
rel="noreferrer"
href="http://creativecommons.org/licenses/by-sa/4.0/"
target="_blank"
>&nbsp;CC-BY-SA&nbsp;</a
>
&amp; code
<a
class="license__link"
rel="noreferrer"
href="https://www.gnu.org/licenses/agpl-3.0.en.html"
target="_blank"
>&nbsp;AGPL&nbsp;</a
>
|
<a
class="license__link"
rel="noreferrer"
href="https://www.eff.org/issues/do-not-track/amp/"
target="_blank"
>&nbsp;No AMP&nbsp;</a
>
</p>
<!-- </div> -->
<div class="footer__column--center">
<a href="/blog/atom.xml" target="_blank" rel="noopener" title="RSS">
<img
src="https://hostea.org/icons/rss.svg?h=f6cd584bdbcd2eb4d1b8b84c9cf083ef45f772167c33fdcee754b35ae8ff4c7d"
class="footer__icon"
alt="Email icon"
/>
</a>
</div>
<div class="footer__column">
<a href="/about" title="About">About</a>
<a href="/coc" title="Code of Conduct">CoC</a>
<span class="footer__column-divider--mobile-only">|</span>
<a href="/legalese" title="Legalese">Legalese</a>
<a href="/privacy-policy" title="Privacy Policy">Privacy</a>
<span class="footer__column-divider--mobile-only">|</span>
<a
href="https://stats.uptimerobot.com/EQ7VJHWylx"
rel="noreferrer"
target="_blank"
title="Status"
>Status</a
>
<a href="/tos" title="Terms of Service">ToS</a>
</div>
</div>
</footer>
</div>
</body>
</html>

View File

@ -33,21 +33,21 @@
<meta name="referrer" content="no-referrer-when-downgrade" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>[solved] Zombies created by Gitea | Hostea: Managed Gitea Hosting </title>
<title>[diagnostic] Zombies created by Gitea | Hostea: Managed Gitea Hosting </title>
<meta name="referrer" content="no-referrer-when-downgrade" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="description" content="An increasing number of zombies processes are created by Gitea because it only kills its direct children on timeout." />
<meta property="og:title" content="[solved] Zombies created by Gitea | Hostea: Managed Gitea Hosting " />
<meta property="og:title" content="[diagnostic] Zombies created by Gitea | Hostea: Managed Gitea Hosting " />
<meta property="og:type" content="article" />
<meta property="og:url" content="https:&#x2F;&#x2F;hostea.org" />
<meta property="og:description" content="An increasing number of zombies processes are created by Gitea because it only kills its direct children on timeout." />
<meta
property="og:site_name"
content="[solved] Zombies created by Gitea | Hostea: Managed Gitea Hosting "
content="[diagnostic] Zombies created by Gitea | Hostea: Managed Gitea Hosting "
/>
<link
rel="apple-touch-icon"
@ -197,7 +197,7 @@
<div class="page__container">
<h1 class="page__group-title">[solved] Zombies created by Gitea</h1>
<h1 class="page__group-title">[diagnostic] Zombies created by Gitea</h1>
<p class="blog__post-meta">

File diff suppressed because one or more lines are too long

View File

@ -25,6 +25,10 @@
<loc>https://hostea.org/blog/unsafe-repository-is-owned-by-someone-else/</loc>
<lastmod>2022-05-15</lastmod>
</url>
<url>
<loc>https://hostea.org/blog/zombies-part-2/</loc>
<lastmod>2022-06-04</lastmod>
</url>
<url>
<loc>https://hostea.org/blog/zombies/</loc>
<lastmod>2022-06-02</lastmod>

View File

@ -4,10 +4,75 @@
<link href="https://hostea.org/tags/gitea/atom.xml" rel="self" type="application/atom+xml"/>
<link href="https://hostea.org"/>
<generator uri="https://www.getzola.org/">Zola</generator>
<updated>2022-06-02T00:00:00+00:00</updated>
<updated>2022-06-04T00:00:00+00:00</updated>
<id>https://hostea.org/tags/gitea/atom.xml</id>
<entry xml:lang="en">
<title>[solved] Zombies created by Gitea</title>
<published>2022-06-04T00:00:00+00:00</published>
<updated>2022-06-04T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies-part-2/" type="text/html"/>
<id>https://hostea.org/blog/zombies-part-2/</id>
<content type="html">&lt;p&gt;Gitea can &lt;a href=&quot;zombies&quot;&gt;create zombies&lt;&#x2F;a&gt;, for instance if a Git mirror takes too long. When updating a mirror, Gitea relies on the &lt;code&gt;git remote update&lt;&#x2F;code&gt; command which creates a child process, &lt;code&gt;git-remote-https&lt;&#x2F;code&gt;, to fetch data from the remote repository. Gitea has an internal timeout that will kill the child process (e.g. &lt;code&gt;git remote update&lt;&#x2F;code&gt;) when it takes too long but will not kill the grandchild. This grandchild will become an orphan and run forever or until its own timeout expires, which is about two minutes on git version 2.25.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt;$ time git clone https:&#x2F;&#x2F;4.4.4.4
&lt;&#x2F;span&gt;&lt;span&gt;Clonage dans &amp;#39;4.4.4.4&amp;#39;...
&lt;&#x2F;span&gt;&lt;span&gt;fatal: impossible d&amp;#39;accéder à &amp;#39;https:&#x2F;&#x2F;4.4.4.4&#x2F;&amp;#39;: Failed to connect to 4.4.4.4 port 443: Connexion terminée par expiration du délai d&amp;#39;attente
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;real 2m9,753s
&lt;&#x2F;span&gt;&lt;span&gt;user 0m0,001s
&lt;&#x2F;span&gt;&lt;span&gt;sys 0m0,009s
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;As explained in the &lt;a href=&quot;zombies&#x2F;#killing-a-child-process-and-all-its-children&quot;&gt;diagnostic blog post regarding Gitea zombies&lt;&#x2F;a&gt; there fortunately is a very simple way to avoid this by making sure each Gitea child is a &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Process_group&quot;&gt;process group leader&lt;&#x2F;a&gt;. That first step was &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;introduced in Gitea 1.17&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;backported to Gitea 1.16.9&lt;&#x2F;a&gt;. The actual bug fix can now be implemented.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;using-negative-process-id-to-kill-children&quot;&gt;Using negative process id to kill children&lt;a class=&quot;zola-anchor&quot; href=&quot;#using-negative-process-id-to-kill-children&quot; aria-label=&quot;Anchor link for: using-negative-process-id-to-kill-children&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;When Gitea timeout on a child, it relies on &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;blob&#x2F;f8a53df314e4af8cd350eedb0dae77d4c4fc30d0&#x2F;src&#x2F;os&#x2F;exec&#x2F;exec.go#L650&quot;&gt;os.Process.Kill&lt;&#x2F;a&gt; which translates into a using the kill(2) system call to send a SIGKILL signal to unconditionally terminate it: &lt;code&gt;kill(pid, SIGKILL)&lt;&#x2F;code&gt;. Using a negative pid with &lt;code&gt;kill(-pid, SIGKILL)&lt;&#x2F;code&gt; will also terminate all processes created by Gitea&#x27;s child, without Gitea knowing when or why they were created. From the kill(2) manual page:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Which is implemented as follows in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L79&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;h3 id=&quot;not-using-the-default-go-commandcontext&quot;&gt;Not using the default Go CommandContext&lt;a class=&quot;zola-anchor&quot; href=&quot;#not-using-the-default-go-commandcontext&quot; aria-label=&quot;Anchor link for: not-using-the-default-go-commandcontext&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Since &lt;a href=&quot;https:&#x2F;&#x2F;pkg.go.dev&#x2F;os&#x2F;exec#CommandContext&quot;&gt;CommandContext&lt;&#x2F;a&gt; does not allow to send a signal to the negative pid of the child process, it has to be implemented by Gitea itself, in a way that is similar to how the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L71-87&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt; does it:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt; err := cmd.Start()
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; go func() {
&lt;&#x2F;span&gt;&lt;span&gt; &amp;lt;-ctx.Done()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; if killErr := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); killErr == nil {
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; }
&lt;&#x2F;span&gt;&lt;span&gt; }()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; err = cmd.Wait()
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;testing-the-bug-is-fixed-and-stays-fixed&quot;&gt;Testing the bug is fixed and stays fixed&lt;a class=&quot;zola-anchor&quot; href=&quot;#testing-the-bug-is-fixed-and-stays-fixed&quot; aria-label=&quot;Anchor link for: testing-the-bug-is-fixed-and-stays-fixed&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Long standing bugs that are difficult to reproduce manually such as this one require robust testing to ensure that:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the diagnostic identifying the root cause is correct&lt;&#x2F;li&gt;
&lt;li&gt;the bug fix works&lt;&#x2F;li&gt;
&lt;li&gt;it does not resurface insidiously because of a subtle regression introduce years later&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;It is easy to implement as can be seen in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L44-76&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;. In a nutshell:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L53&quot;&gt;git clone https:&#x2F;&#x2F;4.4.4.4&lt;&#x2F;a&gt; which will hang because of firewall rules&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L60-65&quot;&gt;wait for the git-remote-https&lt;&#x2F;a&gt; grandchild process to be spawned&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L67-68&quot;&gt;cancel the context and wait for the goroutine to terminate&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L70-75&quot;&gt;verify the git-remote-https is killed&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;And with that... no more zombies!&lt;&#x2F;p&gt;
</content>
</entry>
<entry xml:lang="en">
<title>[diagnostic] Zombies created by Gitea</title>
<published>2022-06-02T00:00:00+00:00</published>
<updated>2022-06-02T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies/" type="text/html"/>

View File

@ -210,8 +210,34 @@
<ul class="blog__list">
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<a href="https://hostea.org/blog/zombies-part-2/" class="blog__post-link">
<h2 class="blog__post-title">[solved] Zombies created by Gitea</h2>
<p class="blog__post-meta">
4
June
,
2022 &middot; <b>3 min read</b>
</p>
<p class="blog__post-description">Gitea can use process groups to kill its children using a negative PID to never create zombies. </p>
</a>
<div class="blog__post-tag-container">
<a class="blog__post-tag" href="/tags/hostea">#hostea</a>
<a class="blog__post-tag" href="/tags/gitea">#gitea</a>
<a class="blog__post-tag" href="/tags/troubleshoot">#troubleshoot</a>
<a class="blog__post-tag" href="/tags/problem">#problem</a>
</div>
</li>
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<h2 class="blog__post-title">[diagnostic] Zombies created by Gitea</h2>
<p class="blog__post-meta">
2
June

View File

@ -4,10 +4,75 @@
<link href="https://hostea.org/tags/hostea/atom.xml" rel="self" type="application/atom+xml"/>
<link href="https://hostea.org"/>
<generator uri="https://www.getzola.org/">Zola</generator>
<updated>2022-06-02T00:00:00+00:00</updated>
<updated>2022-06-04T00:00:00+00:00</updated>
<id>https://hostea.org/tags/hostea/atom.xml</id>
<entry xml:lang="en">
<title>[solved] Zombies created by Gitea</title>
<published>2022-06-04T00:00:00+00:00</published>
<updated>2022-06-04T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies-part-2/" type="text/html"/>
<id>https://hostea.org/blog/zombies-part-2/</id>
<content type="html">&lt;p&gt;Gitea can &lt;a href=&quot;zombies&quot;&gt;create zombies&lt;&#x2F;a&gt;, for instance if a Git mirror takes too long. When updating a mirror, Gitea relies on the &lt;code&gt;git remote update&lt;&#x2F;code&gt; command which creates a child process, &lt;code&gt;git-remote-https&lt;&#x2F;code&gt;, to fetch data from the remote repository. Gitea has an internal timeout that will kill the child process (e.g. &lt;code&gt;git remote update&lt;&#x2F;code&gt;) when it takes too long but will not kill the grandchild. This grandchild will become an orphan and run forever or until its own timeout expires, which is about two minutes on git version 2.25.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt;$ time git clone https:&#x2F;&#x2F;4.4.4.4
&lt;&#x2F;span&gt;&lt;span&gt;Clonage dans &amp;#39;4.4.4.4&amp;#39;...
&lt;&#x2F;span&gt;&lt;span&gt;fatal: impossible d&amp;#39;accéder à &amp;#39;https:&#x2F;&#x2F;4.4.4.4&#x2F;&amp;#39;: Failed to connect to 4.4.4.4 port 443: Connexion terminée par expiration du délai d&amp;#39;attente
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;real 2m9,753s
&lt;&#x2F;span&gt;&lt;span&gt;user 0m0,001s
&lt;&#x2F;span&gt;&lt;span&gt;sys 0m0,009s
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;As explained in the &lt;a href=&quot;zombies&#x2F;#killing-a-child-process-and-all-its-children&quot;&gt;diagnostic blog post regarding Gitea zombies&lt;&#x2F;a&gt; there fortunately is a very simple way to avoid this by making sure each Gitea child is a &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Process_group&quot;&gt;process group leader&lt;&#x2F;a&gt;. That first step was &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;introduced in Gitea 1.17&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;backported to Gitea 1.16.9&lt;&#x2F;a&gt;. The actual bug fix can now be implemented.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;using-negative-process-id-to-kill-children&quot;&gt;Using negative process id to kill children&lt;a class=&quot;zola-anchor&quot; href=&quot;#using-negative-process-id-to-kill-children&quot; aria-label=&quot;Anchor link for: using-negative-process-id-to-kill-children&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;When Gitea timeout on a child, it relies on &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;blob&#x2F;f8a53df314e4af8cd350eedb0dae77d4c4fc30d0&#x2F;src&#x2F;os&#x2F;exec&#x2F;exec.go#L650&quot;&gt;os.Process.Kill&lt;&#x2F;a&gt; which translates into a using the kill(2) system call to send a SIGKILL signal to unconditionally terminate it: &lt;code&gt;kill(pid, SIGKILL)&lt;&#x2F;code&gt;. Using a negative pid with &lt;code&gt;kill(-pid, SIGKILL)&lt;&#x2F;code&gt; will also terminate all processes created by Gitea&#x27;s child, without Gitea knowing when or why they were created. From the kill(2) manual page:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Which is implemented as follows in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L79&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;h3 id=&quot;not-using-the-default-go-commandcontext&quot;&gt;Not using the default Go CommandContext&lt;a class=&quot;zola-anchor&quot; href=&quot;#not-using-the-default-go-commandcontext&quot; aria-label=&quot;Anchor link for: not-using-the-default-go-commandcontext&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Since &lt;a href=&quot;https:&#x2F;&#x2F;pkg.go.dev&#x2F;os&#x2F;exec#CommandContext&quot;&gt;CommandContext&lt;&#x2F;a&gt; does not allow to send a signal to the negative pid of the child process, it has to be implemented by Gitea itself, in a way that is similar to how the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L71-87&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt; does it:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt; err := cmd.Start()
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; go func() {
&lt;&#x2F;span&gt;&lt;span&gt; &amp;lt;-ctx.Done()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; if killErr := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); killErr == nil {
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; }
&lt;&#x2F;span&gt;&lt;span&gt; }()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; err = cmd.Wait()
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;testing-the-bug-is-fixed-and-stays-fixed&quot;&gt;Testing the bug is fixed and stays fixed&lt;a class=&quot;zola-anchor&quot; href=&quot;#testing-the-bug-is-fixed-and-stays-fixed&quot; aria-label=&quot;Anchor link for: testing-the-bug-is-fixed-and-stays-fixed&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Long standing bugs that are difficult to reproduce manually such as this one require robust testing to ensure that:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the diagnostic identifying the root cause is correct&lt;&#x2F;li&gt;
&lt;li&gt;the bug fix works&lt;&#x2F;li&gt;
&lt;li&gt;it does not resurface insidiously because of a subtle regression introduce years later&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;It is easy to implement as can be seen in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L44-76&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;. In a nutshell:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L53&quot;&gt;git clone https:&#x2F;&#x2F;4.4.4.4&lt;&#x2F;a&gt; which will hang because of firewall rules&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L60-65&quot;&gt;wait for the git-remote-https&lt;&#x2F;a&gt; grandchild process to be spawned&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L67-68&quot;&gt;cancel the context and wait for the goroutine to terminate&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L70-75&quot;&gt;verify the git-remote-https is killed&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;And with that... no more zombies!&lt;&#x2F;p&gt;
</content>
</entry>
<entry xml:lang="en">
<title>[diagnostic] Zombies created by Gitea</title>
<published>2022-06-02T00:00:00+00:00</published>
<updated>2022-06-02T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies/" type="text/html"/>

View File

@ -210,8 +210,34 @@
<ul class="blog__list">
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<a href="https://hostea.org/blog/zombies-part-2/" class="blog__post-link">
<h2 class="blog__post-title">[solved] Zombies created by Gitea</h2>
<p class="blog__post-meta">
4
June
,
2022 &middot; <b>3 min read</b>
</p>
<p class="blog__post-description">Gitea can use process groups to kill its children using a negative PID to never create zombies. </p>
</a>
<div class="blog__post-tag-container">
<a class="blog__post-tag" href="/tags/hostea">#hostea</a>
<a class="blog__post-tag" href="/tags/gitea">#gitea</a>
<a class="blog__post-tag" href="/tags/troubleshoot">#troubleshoot</a>
<a class="blog__post-tag" href="/tags/problem">#problem</a>
</div>
</li>
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<h2 class="blog__post-title">[diagnostic] Zombies created by Gitea</h2>
<p class="blog__post-meta">
2
June

View File

@ -269,7 +269,7 @@
<span class="tag__meta">5 entries</span>
<span class="tag__meta">6 entries</span>
</a>
<a class="tag__rss-link" href="https:&#x2F;&#x2F;hostea.org&#x2F;tags&#x2F;gitea&#x2F;atom.xml" target="_blank" rel="noopener" title="RSS">
<img
@ -287,7 +287,7 @@
<span class="tag__meta">6 entries</span>
<span class="tag__meta">7 entries</span>
</a>
<a class="tag__rss-link" href="https:&#x2F;&#x2F;hostea.org&#x2F;tags&#x2F;hostea&#x2F;atom.xml" target="_blank" rel="noopener" title="RSS">
<img
@ -305,7 +305,7 @@
<span class="tag__meta">4 entries</span>
<span class="tag__meta">5 entries</span>
</a>
<a class="tag__rss-link" href="https:&#x2F;&#x2F;hostea.org&#x2F;tags&#x2F;problem&#x2F;atom.xml" target="_blank" rel="noopener" title="RSS">
<img
@ -323,7 +323,7 @@
<span class="tag__meta">4 entries</span>
<span class="tag__meta">5 entries</span>
</a>
<a class="tag__rss-link" href="https:&#x2F;&#x2F;hostea.org&#x2F;tags&#x2F;troubleshoot&#x2F;atom.xml" target="_blank" rel="noopener" title="RSS">
<img

View File

@ -4,10 +4,75 @@
<link href="https://hostea.org/tags/problem/atom.xml" rel="self" type="application/atom+xml"/>
<link href="https://hostea.org"/>
<generator uri="https://www.getzola.org/">Zola</generator>
<updated>2022-06-02T00:00:00+00:00</updated>
<updated>2022-06-04T00:00:00+00:00</updated>
<id>https://hostea.org/tags/problem/atom.xml</id>
<entry xml:lang="en">
<title>[solved] Zombies created by Gitea</title>
<published>2022-06-04T00:00:00+00:00</published>
<updated>2022-06-04T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies-part-2/" type="text/html"/>
<id>https://hostea.org/blog/zombies-part-2/</id>
<content type="html">&lt;p&gt;Gitea can &lt;a href=&quot;zombies&quot;&gt;create zombies&lt;&#x2F;a&gt;, for instance if a Git mirror takes too long. When updating a mirror, Gitea relies on the &lt;code&gt;git remote update&lt;&#x2F;code&gt; command which creates a child process, &lt;code&gt;git-remote-https&lt;&#x2F;code&gt;, to fetch data from the remote repository. Gitea has an internal timeout that will kill the child process (e.g. &lt;code&gt;git remote update&lt;&#x2F;code&gt;) when it takes too long but will not kill the grandchild. This grandchild will become an orphan and run forever or until its own timeout expires, which is about two minutes on git version 2.25.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt;$ time git clone https:&#x2F;&#x2F;4.4.4.4
&lt;&#x2F;span&gt;&lt;span&gt;Clonage dans &amp;#39;4.4.4.4&amp;#39;...
&lt;&#x2F;span&gt;&lt;span&gt;fatal: impossible d&amp;#39;accéder à &amp;#39;https:&#x2F;&#x2F;4.4.4.4&#x2F;&amp;#39;: Failed to connect to 4.4.4.4 port 443: Connexion terminée par expiration du délai d&amp;#39;attente
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;real 2m9,753s
&lt;&#x2F;span&gt;&lt;span&gt;user 0m0,001s
&lt;&#x2F;span&gt;&lt;span&gt;sys 0m0,009s
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;As explained in the &lt;a href=&quot;zombies&#x2F;#killing-a-child-process-and-all-its-children&quot;&gt;diagnostic blog post regarding Gitea zombies&lt;&#x2F;a&gt; there fortunately is a very simple way to avoid this by making sure each Gitea child is a &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Process_group&quot;&gt;process group leader&lt;&#x2F;a&gt;. That first step was &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;introduced in Gitea 1.17&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;backported to Gitea 1.16.9&lt;&#x2F;a&gt;. The actual bug fix can now be implemented.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;using-negative-process-id-to-kill-children&quot;&gt;Using negative process id to kill children&lt;a class=&quot;zola-anchor&quot; href=&quot;#using-negative-process-id-to-kill-children&quot; aria-label=&quot;Anchor link for: using-negative-process-id-to-kill-children&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;When Gitea timeout on a child, it relies on &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;blob&#x2F;f8a53df314e4af8cd350eedb0dae77d4c4fc30d0&#x2F;src&#x2F;os&#x2F;exec&#x2F;exec.go#L650&quot;&gt;os.Process.Kill&lt;&#x2F;a&gt; which translates into a using the kill(2) system call to send a SIGKILL signal to unconditionally terminate it: &lt;code&gt;kill(pid, SIGKILL)&lt;&#x2F;code&gt;. Using a negative pid with &lt;code&gt;kill(-pid, SIGKILL)&lt;&#x2F;code&gt; will also terminate all processes created by Gitea&#x27;s child, without Gitea knowing when or why they were created. From the kill(2) manual page:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Which is implemented as follows in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L79&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;h3 id=&quot;not-using-the-default-go-commandcontext&quot;&gt;Not using the default Go CommandContext&lt;a class=&quot;zola-anchor&quot; href=&quot;#not-using-the-default-go-commandcontext&quot; aria-label=&quot;Anchor link for: not-using-the-default-go-commandcontext&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Since &lt;a href=&quot;https:&#x2F;&#x2F;pkg.go.dev&#x2F;os&#x2F;exec#CommandContext&quot;&gt;CommandContext&lt;&#x2F;a&gt; does not allow to send a signal to the negative pid of the child process, it has to be implemented by Gitea itself, in a way that is similar to how the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L71-87&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt; does it:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt; err := cmd.Start()
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; go func() {
&lt;&#x2F;span&gt;&lt;span&gt; &amp;lt;-ctx.Done()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; if killErr := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); killErr == nil {
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; }
&lt;&#x2F;span&gt;&lt;span&gt; }()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; err = cmd.Wait()
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;testing-the-bug-is-fixed-and-stays-fixed&quot;&gt;Testing the bug is fixed and stays fixed&lt;a class=&quot;zola-anchor&quot; href=&quot;#testing-the-bug-is-fixed-and-stays-fixed&quot; aria-label=&quot;Anchor link for: testing-the-bug-is-fixed-and-stays-fixed&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Long standing bugs that are difficult to reproduce manually such as this one require robust testing to ensure that:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the diagnostic identifying the root cause is correct&lt;&#x2F;li&gt;
&lt;li&gt;the bug fix works&lt;&#x2F;li&gt;
&lt;li&gt;it does not resurface insidiously because of a subtle regression introduce years later&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;It is easy to implement as can be seen in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L44-76&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;. In a nutshell:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L53&quot;&gt;git clone https:&#x2F;&#x2F;4.4.4.4&lt;&#x2F;a&gt; which will hang because of firewall rules&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L60-65&quot;&gt;wait for the git-remote-https&lt;&#x2F;a&gt; grandchild process to be spawned&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L67-68&quot;&gt;cancel the context and wait for the goroutine to terminate&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L70-75&quot;&gt;verify the git-remote-https is killed&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;And with that... no more zombies!&lt;&#x2F;p&gt;
</content>
</entry>
<entry xml:lang="en">
<title>[diagnostic] Zombies created by Gitea</title>
<published>2022-06-02T00:00:00+00:00</published>
<updated>2022-06-02T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies/" type="text/html"/>

View File

@ -210,8 +210,34 @@
<ul class="blog__list">
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<a href="https://hostea.org/blog/zombies-part-2/" class="blog__post-link">
<h2 class="blog__post-title">[solved] Zombies created by Gitea</h2>
<p class="blog__post-meta">
4
June
,
2022 &middot; <b>3 min read</b>
</p>
<p class="blog__post-description">Gitea can use process groups to kill its children using a negative PID to never create zombies. </p>
</a>
<div class="blog__post-tag-container">
<a class="blog__post-tag" href="/tags/hostea">#hostea</a>
<a class="blog__post-tag" href="/tags/gitea">#gitea</a>
<a class="blog__post-tag" href="/tags/troubleshoot">#troubleshoot</a>
<a class="blog__post-tag" href="/tags/problem">#problem</a>
</div>
</li>
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<h2 class="blog__post-title">[diagnostic] Zombies created by Gitea</h2>
<p class="blog__post-meta">
2
June

View File

@ -4,10 +4,75 @@
<link href="https://hostea.org/tags/troubleshoot/atom.xml" rel="self" type="application/atom+xml"/>
<link href="https://hostea.org"/>
<generator uri="https://www.getzola.org/">Zola</generator>
<updated>2022-06-02T00:00:00+00:00</updated>
<updated>2022-06-04T00:00:00+00:00</updated>
<id>https://hostea.org/tags/troubleshoot/atom.xml</id>
<entry xml:lang="en">
<title>[solved] Zombies created by Gitea</title>
<published>2022-06-04T00:00:00+00:00</published>
<updated>2022-06-04T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies-part-2/" type="text/html"/>
<id>https://hostea.org/blog/zombies-part-2/</id>
<content type="html">&lt;p&gt;Gitea can &lt;a href=&quot;zombies&quot;&gt;create zombies&lt;&#x2F;a&gt;, for instance if a Git mirror takes too long. When updating a mirror, Gitea relies on the &lt;code&gt;git remote update&lt;&#x2F;code&gt; command which creates a child process, &lt;code&gt;git-remote-https&lt;&#x2F;code&gt;, to fetch data from the remote repository. Gitea has an internal timeout that will kill the child process (e.g. &lt;code&gt;git remote update&lt;&#x2F;code&gt;) when it takes too long but will not kill the grandchild. This grandchild will become an orphan and run forever or until its own timeout expires, which is about two minutes on git version 2.25.&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt;$ time git clone https:&#x2F;&#x2F;4.4.4.4
&lt;&#x2F;span&gt;&lt;span&gt;Clonage dans &amp;#39;4.4.4.4&amp;#39;...
&lt;&#x2F;span&gt;&lt;span&gt;fatal: impossible d&amp;#39;accéder à &amp;#39;https:&#x2F;&#x2F;4.4.4.4&#x2F;&amp;#39;: Failed to connect to 4.4.4.4 port 443: Connexion terminée par expiration du délai d&amp;#39;attente
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt;real 2m9,753s
&lt;&#x2F;span&gt;&lt;span&gt;user 0m0,001s
&lt;&#x2F;span&gt;&lt;span&gt;sys 0m0,009s
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;As explained in the &lt;a href=&quot;zombies&#x2F;#killing-a-child-process-and-all-its-children&quot;&gt;diagnostic blog post regarding Gitea zombies&lt;&#x2F;a&gt; there fortunately is a very simple way to avoid this by making sure each Gitea child is a &lt;a href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Process_group&quot;&gt;process group leader&lt;&#x2F;a&gt;. That first step was &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;introduced in Gitea 1.17&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;go-gitea&#x2F;gitea&#x2F;pull&#x2F;19865&quot;&gt;backported to Gitea 1.16.9&lt;&#x2F;a&gt;. The actual bug fix can now be implemented.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;using-negative-process-id-to-kill-children&quot;&gt;Using negative process id to kill children&lt;a class=&quot;zola-anchor&quot; href=&quot;#using-negative-process-id-to-kill-children&quot; aria-label=&quot;Anchor link for: using-negative-process-id-to-kill-children&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;When Gitea timeout on a child, it relies on &lt;a href=&quot;https:&#x2F;&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;blob&#x2F;f8a53df314e4af8cd350eedb0dae77d4c4fc30d0&#x2F;src&#x2F;os&#x2F;exec&#x2F;exec.go#L650&quot;&gt;os.Process.Kill&lt;&#x2F;a&gt; which translates into a using the kill(2) system call to send a SIGKILL signal to unconditionally terminate it: &lt;code&gt;kill(pid, SIGKILL)&lt;&#x2F;code&gt;. Using a negative pid with &lt;code&gt;kill(-pid, SIGKILL)&lt;&#x2F;code&gt; will also terminate all processes created by Gitea&#x27;s child, without Gitea knowing when or why they were created. From the kill(2) manual page:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Which is implemented as follows in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L79&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;h3 id=&quot;not-using-the-default-go-commandcontext&quot;&gt;Not using the default Go CommandContext&lt;a class=&quot;zola-anchor&quot; href=&quot;#not-using-the-default-go-commandcontext&quot; aria-label=&quot;Anchor link for: not-using-the-default-go-commandcontext&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Since &lt;a href=&quot;https:&#x2F;&#x2F;pkg.go.dev&#x2F;os&#x2F;exec#CommandContext&quot;&gt;CommandContext&lt;&#x2F;a&gt; does not allow to send a signal to the negative pid of the child process, it has to be implemented by Gitea itself, in a way that is similar to how the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec.go#L71-87&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt; does it:&lt;&#x2F;p&gt;
&lt;pre style=&quot;background-color:#2b303b;color:#c0c5ce;&quot;&gt;&lt;code&gt;&lt;span&gt; err := cmd.Start()
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; go func() {
&lt;&#x2F;span&gt;&lt;span&gt; &amp;lt;-ctx.Done()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; if killErr := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); killErr == nil {
&lt;&#x2F;span&gt;&lt;span&gt;...
&lt;&#x2F;span&gt;&lt;span&gt; }
&lt;&#x2F;span&gt;&lt;span&gt; }()
&lt;&#x2F;span&gt;&lt;span&gt;
&lt;&#x2F;span&gt;&lt;span&gt; err = cmd.Wait()
&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;testing-the-bug-is-fixed-and-stays-fixed&quot;&gt;Testing the bug is fixed and stays fixed&lt;a class=&quot;zola-anchor&quot; href=&quot;#testing-the-bug-is-fixed-and-stays-fixed&quot; aria-label=&quot;Anchor link for: testing-the-bug-is-fixed-and-stays-fixed&quot;
&gt;&lt;span class=&quot;anchor-icon&quot;&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;a
&gt;
&lt;&#x2F;h3&gt;
&lt;p&gt;Long standing bugs that are difficult to reproduce manually such as this one require robust testing to ensure that:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the diagnostic identifying the root cause is correct&lt;&#x2F;li&gt;
&lt;li&gt;the bug fix works&lt;&#x2F;li&gt;
&lt;li&gt;it does not resurface insidiously because of a subtle regression introduce years later&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;It is easy to implement as can be seen in the &lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L44-76&quot;&gt;Friendly Forge Format library&lt;&#x2F;a&gt;. In a nutshell:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L53&quot;&gt;git clone https:&#x2F;&#x2F;4.4.4.4&lt;&#x2F;a&gt; which will hang because of firewall rules&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L60-65&quot;&gt;wait for the git-remote-https&lt;&#x2F;a&gt; grandchild process to be spawned&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L67-68&quot;&gt;cancel the context and wait for the goroutine to terminate&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;lab.forgefriends.org&#x2F;friendlyforgeformat&#x2F;gofff&#x2F;-&#x2F;blob&#x2F;f42a29284a5262d3e6f94801089369626c5197f6&#x2F;util&#x2F;exec_test.go#L70-75&quot;&gt;verify the git-remote-https is killed&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;And with that... no more zombies!&lt;&#x2F;p&gt;
</content>
</entry>
<entry xml:lang="en">
<title>[diagnostic] Zombies created by Gitea</title>
<published>2022-06-02T00:00:00+00:00</published>
<updated>2022-06-02T00:00:00+00:00</updated>
<link href="https://hostea.org/blog/zombies/" type="text/html"/>

View File

@ -210,8 +210,34 @@
<ul class="blog__list">
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<a href="https://hostea.org/blog/zombies-part-2/" class="blog__post-link">
<h2 class="blog__post-title">[solved] Zombies created by Gitea</h2>
<p class="blog__post-meta">
4
June
,
2022 &middot; <b>3 min read</b>
</p>
<p class="blog__post-description">Gitea can use process groups to kill its children using a negative PID to never create zombies. </p>
</a>
<div class="blog__post-tag-container">
<a class="blog__post-tag" href="/tags/hostea">#hostea</a>
<a class="blog__post-tag" href="/tags/gitea">#gitea</a>
<a class="blog__post-tag" href="/tags/troubleshoot">#troubleshoot</a>
<a class="blog__post-tag" href="/tags/problem">#problem</a>
</div>
</li>
<li class="blog__post-item">
<a href="https://hostea.org/blog/zombies/" class="blog__post-link">
<h2 class="blog__post-title">[diagnostic] Zombies created by Gitea</h2>
<p class="blog__post-meta">
2
June