<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>__Builtin_expect on Uniguri&#39;s Blog</title>
    <link>/tags/__builtin_expect/</link>
    <description>Recent content in __Builtin_expect on Uniguri&#39;s Blog</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 09 Dec 2024 14:51:14 +0000</lastBuildDate><atom:link href="/tags/__builtin_expect/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>About_builtin_exptect</title>
      <link>/posts/recording/about-builtin-expect/</link>
      <pubDate>Mon, 09 Dec 2024 14:51:14 +0000</pubDate>
      
      <guid>/posts/recording/about-builtin-expect/</guid>
      <description>&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;컴퓨터 구조를 배우면서, MIPS 아키텍쳐에 대해 공부하고 있다.
MIPS의 경우, pipeline을 도입하고 branch prediction을 위해 여러 방법을 사용한다고 한다.
그 방법 중 static prediction의 방법으로 해당 branch가 거짓일 경우(즉, jump를 수행하지 않는 분기)로 예상하는 방법이 존재한다고 한다.&lt;/p&gt;
&lt;p&gt;이 방법을 보고나서 &lt;a href=&#34;https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect&#34;&gt;gcc의 __builtin_expect&lt;/a&gt;나 &lt;a href=&#34;https://en.cppreference.com/w/cpp/language/attributes/likely&#34;&gt;C++20의 likely 및 unlikely attribute&lt;/a&gt;가 이러한 분기 예측을 기반으로 구성되어 있는 것이 아닌가하는 생각이 들었다.
따라서 이를 간단하게 분석해보고자 하였다.&lt;/p&gt;
&lt;h2 id=&#34;__builtin_expect&#34;&gt;__builtin_expect&lt;/h2&gt;
&lt;p&gt;GCC에서는 컴파일러에게 도움을 줄 수 있는 여러 builtin macro를 제공한다.
앞서 언급했듯, 그 중에는 &lt;code&gt;__builtin_expect&lt;/code&gt;와 &lt;code&gt;__builtin_expect_with_probability&lt;/code&gt;가 존재한다.&lt;/p&gt;</description>
      <content>&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;컴퓨터 구조를 배우면서, MIPS 아키텍쳐에 대해 공부하고 있다.
MIPS의 경우, pipeline을 도입하고 branch prediction을 위해 여러 방법을 사용한다고 한다.
그 방법 중 static prediction의 방법으로 해당 branch가 거짓일 경우(즉, jump를 수행하지 않는 분기)로 예상하는 방법이 존재한다고 한다.&lt;/p&gt;
&lt;p&gt;이 방법을 보고나서 &lt;a href=&#34;https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect&#34;&gt;gcc의 __builtin_expect&lt;/a&gt;나 &lt;a href=&#34;https://en.cppreference.com/w/cpp/language/attributes/likely&#34;&gt;C++20의 likely 및 unlikely attribute&lt;/a&gt;가 이러한 분기 예측을 기반으로 구성되어 있는 것이 아닌가하는 생각이 들었다.
따라서 이를 간단하게 분석해보고자 하였다.&lt;/p&gt;
&lt;h2 id=&#34;__builtin_expect&#34;&gt;__builtin_expect&lt;/h2&gt;
&lt;p&gt;GCC에서는 컴파일러에게 도움을 줄 수 있는 여러 builtin macro를 제공한다.
앞서 언급했듯, 그 중에는 &lt;code&gt;__builtin_expect&lt;/code&gt;와 &lt;code&gt;__builtin_expect_with_probability&lt;/code&gt;가 존재한다.&lt;/p&gt;
&lt;h3 id=&#34;example-code--opcodes&#34;&gt;Example code &amp;amp; Opcodes&lt;/h3&gt;
&lt;p&gt;간단한 테스트를 위해 다음과 같은 코드를 &lt;code&gt;-O3&lt;/code&gt; 옵션으로 컴파일했다.&lt;/p&gt;



  &lt;div class=&#34;collapsable-code&#34;&gt;
    &lt;input id=&#34;1&#34; type=&#34;checkbox&#34; checked /&gt;
    &lt;label for=&#34;1&#34;&gt;
      &lt;span class=&#34;collapsable-code__language&#34;&gt;c&lt;/span&gt;
      &lt;span class=&#34;collapsable-code__title&#34;&gt;__builtin_expect(-,1).c&lt;/span&gt;
      &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
    &lt;/label&gt;
    &lt;pre class=&#34;language-c&#34; &gt;&lt;code&gt;
#include &amp;lt;stdio.h&amp;gt;

int main() {
  int c;
  scanf(&amp;#34;%d&amp;#34;, &amp;amp;c);
  if(__builtin_expect(c != 0, 1)) {
    puts(&amp;#34;c != 0&amp;#34;);
  } else {
    puts(&amp;#34;c == 0&amp;#34;);
  }
  return 0;
}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;


&lt;p&gt;컴파일했을 때의 결과는 다음과 같다.&lt;/p&gt;



  &lt;div class=&#34;collapsable-code&#34;&gt;
    &lt;input id=&#34;2&#34; type=&#34;checkbox&#34; checked /&gt;
    &lt;label for=&#34;2&#34;&gt;
      &lt;span class=&#34;collapsable-code__language&#34;&gt;assembly&lt;/span&gt;
      &lt;span class=&#34;collapsable-code__title&#34;&gt;When __builtin_expect(~, 1).asm&lt;/span&gt;
      &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
    &lt;/label&gt;
    &lt;pre class=&#34;language-assembly&#34; &gt;&lt;code&gt;
.text:00000000000010A0 ; int __fastcall main(int argc, const char **argv, const char **envp)
.text:00000000000010A0                 public main
.text:00000000000010A0 main            proc near               ; DATA XREF: _start&amp;#43;18↓o
.text:00000000000010A0
.text:00000000000010A0 var_14          = dword ptr -14h
.text:00000000000010A0 var_10          = qword ptr -10h
.text:00000000000010A0
.text:00000000000010A0 ; __unwind {
.text:00000000000010A0                 endbr64
.text:00000000000010A4                 sub     rsp, 18h
.text:00000000000010A8                 lea     rdi, unk_2004
.text:00000000000010AF                 mov     rax, fs:28h
.text:00000000000010B8                 mov     [rsp&amp;#43;18h&amp;#43;var_10], rax
.text:00000000000010BD                 xor     eax, eax
.text:00000000000010BF                 lea     rsi, [rsp&amp;#43;18h&amp;#43;var_14]
.text:00000000000010C4                 call    ___isoc99_scanf
.text:00000000000010C9                 mov     eax, [rsp&amp;#43;18h&amp;#43;var_14]
.text:00000000000010CD                 test    eax, eax
.text:00000000000010CF                 jz      short loc_10F4
.text:00000000000010D1                 lea     rdi, s          ; &amp;#34;c != 0&amp;#34;
.text:00000000000010D8                 call    _puts
.text:00000000000010DD
.text:00000000000010DD loc_10DD:                               ; CODE XREF: main&amp;#43;60↓j
.text:00000000000010DD                 mov     rax, [rsp&amp;#43;18h&amp;#43;var_10]
.text:00000000000010E2                 sub     rax, fs:28h
.text:00000000000010EB                 jnz     short loc_1102
.text:00000000000010ED                 xor     eax, eax
.text:00000000000010EF                 add     rsp, 18h
.text:00000000000010F3                 retn
.text:00000000000010F4 ; ---------------------------------------------------------------------------
.text:00000000000010F4
.text:00000000000010F4 loc_10F4:                               ; CODE XREF: main&amp;#43;2F↑j
.text:00000000000010F4                 lea     rdi, aC0_0      ; &amp;#34;c == 0&amp;#34;
.text:00000000000010FB                 call    _puts
.text:0000000000001100                 jmp     short loc_10DD
.text:0000000000001102 ; ---------------------------------------------------------------------------
.text:0000000000001102
.text:0000000000001102 loc_1102:                               ; CODE XREF: main&amp;#43;4B↑j
.text:0000000000001102                 call    ___stack_chk_fail
.text:0000000000001102 ; } // starts at 10A0
.text:0000000000001102 main            endp
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;


&lt;p&gt;그러면 다음 코드를 컴파일하면 어떻게 될까?&lt;/p&gt;



  &lt;div class=&#34;collapsable-code&#34;&gt;
    &lt;input id=&#34;3&#34; type=&#34;checkbox&#34; checked /&gt;
    &lt;label for=&#34;3&#34;&gt;
      &lt;span class=&#34;collapsable-code__language&#34;&gt;c&lt;/span&gt;
      &lt;span class=&#34;collapsable-code__title&#34;&gt;__builtin_expect(-,0).c&lt;/span&gt;
      &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
    &lt;/label&gt;
    &lt;pre class=&#34;language-c&#34; &gt;&lt;code&gt;
#include &amp;lt;stdio.h&amp;gt;

int main() {
  int c;
  scanf(&amp;#34;%d&amp;#34;, &amp;amp;c);
  if(__builtin_expect(c != 0, 0)) {
    puts(&amp;#34;c != 0&amp;#34;);
  } else {
    puts(&amp;#34;c == 0&amp;#34;);
  }
  return 0;
}
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;


&lt;p&gt;다음과 같은 결과가 나온다.&lt;/p&gt;



  &lt;div class=&#34;collapsable-code&#34;&gt;
    &lt;input id=&#34;4&#34; type=&#34;checkbox&#34; checked /&gt;
    &lt;label for=&#34;4&#34;&gt;
      &lt;span class=&#34;collapsable-code__language&#34;&gt;assembly&lt;/span&gt;
      &lt;span class=&#34;collapsable-code__title&#34;&gt;__builtin_expect(-,0).asm&lt;/span&gt;
      &lt;span class=&#34;collapsable-code__toggle&#34; data-label-expand=&#34;Show&#34; data-label-collapse=&#34;Hide&#34;&gt;&lt;/span&gt;
    &lt;/label&gt;
    &lt;pre class=&#34;language-assembly&#34; &gt;&lt;code&gt;
.text:00000000000010A0 ; int __fastcall main(int argc, const char **argv, const char **envp)
.text:00000000000010A0                 public main
.text:00000000000010A0 main            proc near               ; DATA XREF: _start&amp;#43;18↓o
.text:00000000000010A0
.text:00000000000010A0 var_14          = dword ptr -14h
.text:00000000000010A0 var_10          = qword ptr -10h
.text:00000000000010A0
.text:00000000000010A0 ; __unwind {
.text:00000000000010A0                 endbr64
.text:00000000000010A4                 sub     rsp, 18h
.text:00000000000010A8                 lea     rdi, unk_2004
.text:00000000000010AF                 mov     rax, fs:28h
.text:00000000000010B8                 mov     [rsp&amp;#43;18h&amp;#43;var_10], rax
.text:00000000000010BD                 xor     eax, eax
.text:00000000000010BF                 lea     rsi, [rsp&amp;#43;18h&amp;#43;var_14]
.text:00000000000010C4                 call    ___isoc99_scanf
.text:00000000000010C9                 mov     eax, [rsp&amp;#43;18h&amp;#43;var_14]
.text:00000000000010CD                 test    eax, eax
.text:00000000000010CF                 jnz     short loc_10F4
.text:00000000000010D1                 lea     rdi, s          ; &amp;#34;c == 0&amp;#34;
.text:00000000000010D8                 call    _puts
.text:00000000000010DD
.text:00000000000010DD loc_10DD:                               ; CODE XREF: main&amp;#43;60↓j
.text:00000000000010DD                 mov     rax, [rsp&amp;#43;18h&amp;#43;var_10]
.text:00000000000010E2                 sub     rax, fs:28h
.text:00000000000010EB                 jnz     short loc_1102
.text:00000000000010ED                 xor     eax, eax
.text:00000000000010EF                 add     rsp, 18h
.text:00000000000010F3                 retn
.text:00000000000010F4 ; ---------------------------------------------------------------------------
.text:00000000000010F4
.text:00000000000010F4 loc_10F4:                               ; CODE XREF: main&amp;#43;2F↑j
.text:00000000000010F4                 lea     rdi, aC0_0      ; &amp;#34;c != 0&amp;#34;
.text:00000000000010FB                 call    _puts
.text:0000000000001100                 jmp     short loc_10DD
.text:0000000000001102 ; ---------------------------------------------------------------------------
.text:0000000000001102
.text:0000000000001102 loc_1102:                               ; CODE XREF: main&amp;#43;4B↑j
.text:0000000000001102                 call    ___stack_chk_fail
.text:0000000000001102 ; } // starts at 10A0
.text:0000000000001102 main            endp
&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;


&lt;h3 id=&#34;analysis-the-result&#34;&gt;Analysis the result&lt;/h3&gt;
&lt;p&gt;결과를 보면, &lt;code&gt;__builtin_expect(c != 0, 1)&lt;/code&gt;를 사용했을 때는 &lt;code&gt;jz&lt;/code&gt;로 분기를 하지 않으면 &lt;code&gt;c != 0&lt;/code&gt;을 출력한다.
반면에 &lt;code&gt;__builtin_expect(c != 0, 0)&lt;/code&gt;를 사용했을 때는 &lt;code&gt;jz&lt;/code&gt;로 분기하지 않으면 &lt;code&gt;c == 0&lt;/code&gt;이 출력된다.&lt;/p&gt;
&lt;p&gt;즉, &lt;code&gt;__builtin_expect&lt;/code&gt;를 통해 컴파일러에게 분기에 대한 예측을 제공하면 내가 예상한 대로 static branch prediction을 고려해서 실행될 것으로 예측될 블록은 점프를 하지 않고 실행할 수 있게 만든다.&lt;/p&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;예측이 옳았다. &lt;code&gt;__builtin_expect&lt;/code&gt;는 Static Branch Prediction을 고려해서, 예상되는 분기의 코드를 실행하기 위해 필요한 branch를 없앤다.
또한 글을 보면, 비단 static prediction에서뿐만 아니라 dynamic prediction(Branch Target Buffer를 사용한다든가 하는 방법)에서도 성능 향상을 이룰 수 있다고 한다&lt;a href=&#34;https://justdoprogram.blogspot.com/2022/03/builtinexpect-likely-unlikey.html&#34;&gt;[1]&lt;/a&gt;.&lt;/p&gt;
</content>
    </item>
    
  </channel>
</rss>
