如何看待方舟编译器于 2019 年 8 月 31 日开源?

华为表示 show you the code,希望有大佬可以借此机会解释解释何为方舟编译器? [图片]
关注者
5,349
被浏览
5,946,205
登录后你可以
不限量看优质回答私信答主深度交流精彩内容一键收藏

我的意思是别太快下结论,慢慢欣赏一下

再次更新:

来看看华为编译器团队元老的说法

解开了我的疑惑

方舟编译器不是自己脑补出来的

之前华为在DSP芯片,服务器等领域,有大量的个性化编译器的需求,开源满足不了,才搞的

方舟原名MAPLE(multiple architecture and programming language environment)项目

宏大的野心是,所有编程语言和芯片处理器架构,统一一套编译体系

是第一个把对象Object,和异常处理引入IR的编译器

MAPLEJS这个项目也不是脑补的,是有把JavaScript运行在嵌入式设备的真实需求

不管咋样吧,除了华为,国内几乎没有花钱长期养编译器团队的公司吧,大佬都转行了,太可怕了

不花钱养,大佬就会转行

希望国内多出几个花利润养编译器,操作系统团队的公司

其实别说国内了,其他国家花钱养编译器团队的公司也不多吧

华为内部大佬原文:

zhihu.com/question/3434

更新:

官方FAQ+源代码+主页都来了!!

来了老弟!!

官方给的架构图



华为的野心是,Java、JavaScript、Python、C、C++都要翻译到Maple IR,并且Maple IR是为了在芯片上运行做优化为目的的语言

一个c语言程序,用华为放出来的编译器翻译到Maple IR的效果;

C source:

int fact(int n) {
  if(n!=1)
    return n*fact(n-1);
  else return 1;
}

Maple IR:

func &fact (var %n i32) i32 {
  if (ne i32 (dread i32 %n, constval i32 1)) {
    call $fact (sub i32 (dread i32 %n, constval i32 1))
    return (mul i32 (dread i32 %n,regread i32 %%retval))
  }
  return(constval i32 1)
}
 

今天把官方给的文档和源代码都解读(fanyi)一遍吧

目前读下来的感觉是:

把Java的类、接口、异常处理、都做了,工作量真的挺大的

国人自己搞一个这么硬核的项目,在以前是没有的

虽然知乎大牛把这个和国外几十年,开源竞争出来的、地球上最聪明的脑袋集合起来的,最牛逼的编译器比,觉得很一般

能把Java翻译到自己设计的语言上,再执行对,太难了,工作量超级大,让我做我是做不出,国内做这种级别的东西,也是第一次吧

但真要自己写一个,还能把Java程序编译下去,跑起来,用起来,太艰难了


让你写一个程序,输入是一个字符串(Java程序)

输出一个字符串(Maple IR)程序

还得写一个把Maple IR跑起来的程序

这工作量太可怕了8

竟然在安卓手机上,还能用,没bug,兼容现有Java标准。。

要我哪天实现出这种东西,我肯定是在做梦中。。


反正这种编程能力,还是挺可怕的。。

这种级别的控制复杂度,和建立抽象的能力,以前貌似是只有老外这么干过

我们用的Linux、G++、LLVM编译器都是这些的代表。


国外也有很多开源的,

比如BCC,Borland C++,曾经非常流行的C++编译器
然而在C++编译器圣战中失败了,后面也被Borland卖掉了,目前几乎没有听说了。

你能看到的国外开源的,Linux、GCC编译器,什么的,都是优胜劣汰下来最强的几个了


你没有看到无数失败项目,都淘汰在历史的尘埃里了呢

活到现在的都是整个地球,几十年才进化出来的最强的编译器


就算搞个跟国外进化下来的最强开源的,一样牛逼的,只要是自己一行一行写起来的,那已经很吊了好吧

要我做了这种级别的项目,我直接可以吹一辈子了


很高兴以后国内也会搞这种硬核东西了,开了这个头就很好。

国内程序员并不差,但是要有企业愿意花利润去养这些人,搞这些看起来,用国外现成东西就能解决的,所以国内搞编译器、操作系统内核这种程序员四大浪漫的东西并不多

当然最好的是,融入整个生态,大家别重复造轮子了。

一道复杂的算法题,写100多行遇到bug就有的找了,更别提这种大项目,您还得和Java标准兼容,每个语法特性都要翻译。

你知道语法特性有多少吗

Java面向对象的那一堆,类啊、接口啊

还有异常处理的

还有各种Java 8新增的语法,想想就可怕

华为还要把JavaScript、Python、C、C++,全写翻译成Maple IR并执行的程序,太可怕了8

这工作量要我就直接离职了

完全巨大的坑要填好吧

你打开《21天Java从入门到放弃》,所有语法特征要翻译成Maple IR,写程序,把Java程序作为字符串文本读进来。。

反正要我直接离职跑路了。。


你可以喷他和其他编译器原理类似,

但是真要把Java语言这么搞一遍,还真的工作量炒鸡大啊

要我写真的这么多行,肯定会出很多bug

不像写写前后端做网页,做做app,这种复杂项目的bug炒鸡难找出来的


原理不就是


但是要一行一行码代码出来,太TM难了

反正你给我年薪四五十万,我宁愿去写前后端,做网页,也不愿意写这个,风险太大了,写着写着结果不对,可能错的地方太多了。


不过开源真的挺好的

知乎上大牛真的多,大家别沸腾、下大棋

也别冷嘲热讽

不如一起用爱发电一下

免费帮忙填坑

找bug


反正以后无论华为开源什么,我都是很有兴趣读每行代码的

并且,梦想就是帮华为实现一个feature,pull request上去

我觉得,中国还是可以拉出很多水平还是可以的程序员一起来做这件事的

只要有个第一版,慢慢迭代就是了

以后发动越来越多的程序员来做这件事

到时候,说我为方舟编译器修了一大bug,找工作的时候,也是一大亮点啊


开源不就是为了相互借鉴,发动大家的力量来用爱发电嘛

就像维基百科和知乎,都是吃饱了撑着的人,出于兴趣写了很多高质量的内容

业余时间也是可以贡献很多力量的。我想这才是开源的真谛吧。

更别提华为也会出基金来支持大家写开源


There are additional design criteria at the implementation level. To be friendly to compiler developers, MAPLE IR exists in both binary and ASCII forms. Conversion between the two forms can be viewed as assembly and dis-assembly. Thus, the ASCII form can be viewed as the assembler language of the MAPLE VM instructions. The ASCII form is editable, which implies that it is possible to program in MAPLE IR directly. Thus, the ASCII form of MAPLE IR is modeled after a typical C program, which is made up of: declaration statements - these represent the symbol table information executable statements - these represent the executable program code

意思是,大家可能学过Python和C语言,

Maple IR既然是一种语言,大家也可以自己写,自我们可以直接在编辑器里写代码,然后运行

比如可以来一本《21天Maple IR,从入门到精通》哈哈

只不过所有Java、C++、C、Python,这些语言已经写好的程序

会自动把他先写成意思一样的Maple IR语言,再运行。

Maple IR语言有两部分组成,第一部分是declaration,就是一些变量函数定义之类的;第二部分是executable statements,是可以执行的代码。


MAPLE IR is the common representation for programs compiled from different programming languages, which include general purpose languages like C, C++ and Java. MAPLE IR is extensible. As additional languages, including domain-specific languages, are compiled into MAPLE IR, more constructs will be added to represent constructs unique to each language.

意思是c++ Java 或者任何语言都会被翻译成Maple IR语言再执行

以后其他语言加进来,Maple IR也会跟着扩展,来表示新的独特语言特性


先看看c语言,翻译成意思一样的Maple IR的结果

C source:

int fact(int n) {
  if(n!=1)
    return n*fact(n-1);
  else return 1;
}

Maple IR:

func &fact (var %n i32) i32 {
  if (ne i32 (dread i32 %n, constval i32 1)) {
    call $fact (sub i32 (dread i32 %n, constval i32 1))
    return (mul i32 (dread i32 %n,regread i32 %%retval))
  }
  return(constval i32 1)
}


有人说Maple IR没什么创新:

但是官方强调的是Maple IR后期还会支持更多语言,和对应的特性

这个还是有很难的

Maple IR是一个中间翻译官

既要照顾好,Intel X86、麒麟、骁龙这种ARM处理器的硬件特性

又要照顾好,C++、C、Python、Java、JavaScript

两头都要考虑,速度都要快,不是在业界摸爬滚打了好多年,很难搞出来

而且还要考虑后期的可扩展性,原有的指令尽量就不要动了,对初期的设计上的选择和权衡还是很有要求的


Since MAPLE IR is one IR that can exist at multiple levels of semantics, the level of a MAPLE IR program is dictated by the constraints that it adheres to. These constraints are of the following two types:
Opcodes allowed - The higher the level, the more types of opcodes allowed, including opcodes generated only from specific languages. At the lowest level, only opcodes that correspond one-to-one to operations in a general purpose processor are allowed.
Code structure - The program structure is hierarchical at the higher levels. The hierarchical constructs become less and less as lowering proceeds. At the lowest level, the program structure is flat, consisting of sequences of primitive instructions consumed by the general purpose processor.

这段挺巧妙的,就是说,这个Maple IR支持不同粒度

通俗说,就是各种编程语言都有自己的Code Structure(代码结构)和Opcodes(操作运算种类),在更高粒度,就是更接近各种编程语言个性化的程序结构,会允许各种个性化的运算的opcode

但编译到更低层次的表示(更小的指令粒度),就更接近于x86(英特尔处理器)、arm(海思麒麟、高通骁龙)这种处理器的原生指令

这个还是很关键的


接下来是喜闻乐见的数据结构定义部分了:

There are three kinds of executable nodes in MAPLE IR:
Leaf nodes - Also called terminal nodes, these nodes denote a value at execution time, which may be a constant or the value of a storage unit.
Expression nodes - An expression node performs an operation on its operands to compute a result. Its result is a function of the values of its operands and nothing else. Each operand can be either a leaf node or another expression node. Expression nodes are the internal nodes of expression trees. The type field in the expression node gives the type associated with the result of the operation.
Statement nodes - These represent the flow of control. Execution starts at the entry of the function and continues sequentially statement by statement until a control flow statement is executed. Apart from modifying control flow, statements can also modify data storage in the program. A statement nodes has operands that can be leaf, expression or statement nodes.

一个编程语言可以抽象成一颗树,语义是一棵树一样

Maple IR语言,写出来的程序,这棵树有三种节点:

Leaf Node 这个很简单,就是C语言里面int x, const int x = 10,一个数字常量,或者是一个存储单位(变量)而已

Expression Node,显然就是各种Leaf Node加加减减,组成的表达式树


Statement Node,就是比如C语言里一行语句嘛。如果是if条件语句,会改变程序执行的流程,如果是x=(1+2)/3;这种语句,那么这句话里面就包含了一个Expression Node,表达式节点。

To enable easy visualization in the ASCII IR, whenever an operand is not a leaf node, we require the start of a new line indented to the right by two spaces relative to the last line. Thus, the statement "a = b + c" is:
dassign $a (
  add i32(dread i32 $b, dread i32 $c))
and the statement "a = b + c - d" is:
dassign $a (
  sub i32(
   add i32(dread i32 $b, dread i32 $c),
   dread i32 $d))

这段代码还挺好理解的,opcode是这种编程语言的一种操作符,其实理解成一个函数就好了,无非是指定了参数和返回值的类型


The general rules regarding line breaks are as follows:
Each expression or statement node must occupy its own line, and each line cannot contain more than one expression or statement node.
When there is at least one operand that is not a leaf node, then all the operands of the current expression or statement node must be specified in separate new lines, including operands that are leaf nodes.
Comments can be specified via the character '#', which can be regarded as the end of line character by the IR parser.
For human-edited MAPLE IR files, the line breaks are not enforced for expressions, as they do not affect the correctness of the program, since the end of operand specification is indicated by the closing parenthesis. But there must not be more than one statement node per line, because we do not use the ';'character to delimit statement boundary.

这些无非说他注释用#,和Python一行

然后每句话最后无需加;,直接根据括号是否匹配来决定一行

每一行最多一个表达式节点和statement 节点


Primitive types can be regarded as pre-defined types supported by the execution engine such that they can be directly operated on. They also play a part in conveying the semantics of operations, as addresses are distinct from unsigned integers. The number in the primitive type name indicates the storage size in bits.
The primitive types are:
  • no type - void
  • signed integers - i8, i16, i32, i64
  • unsigned integers - u8, u16, u32, u64
  • booleans- u1
  • addresses - ptr, ref, I'a32, a64
  • floating point numbers - f32, f64
  • complex numbers - c64, c128
  • javascript types:
    • dynany
    • dynu32
    • dyni32
    • dynundef
    • dynnull
    • dynhole
    • dynbool
    • dynptr
    • dynf64
    • dynf32
    • dynstr
    • dynobj
  • SIMD types - (to be defined)
  • unknown


以上是方舟编译器Maple IR语言,所支持的数据类型,其中包括了一些JavaScript类型,显然是为了未来编译JavaScript所准备的


还有SIMD types,这玩意可以用于同时并行化,加快执行效率


Special registers are registers with special meaning. They are all specified using %% as prefix. %%SP is the stack pointer and %%FP the frame pointer in referencing the stack frame of the current function. %%GP is the global pointer used for addressing global variables.
Special registers %%retval0, %%retval1, %%retval2, etc. are used for fetching the multiple values returned by a call. They are overwritten by each call, and should only be read at most once after each call. They can assume whatever is the type of the return value.

方舟编译器的特殊寄存器,包括%%SP,代表栈指针;%%GP代表全局变量地址指针,以及 %%retval0, %%retval1, %%retval2提供函数调用的快速返回值。


Some opcodes are applicable to non-primitive (or derived) types, as in an aggregate assignment. When the type is derived, the designation agg can be used. In such cases, the data size can be looked up from the type of the symbol.
The primitive types ptr and ref are the target-independent types for addresses. ref conveys the additional semantics that the address is a reference to a run-time managed block of memory or object in the heap. Uses of ptr or ref instead of a32 or a64 allow the IR to be independent of the target machine by not manifesting the size of addresses until the later target-dependent compilation phases.
The primitive type unknown is used by the language front-end when the type of a field in an object has not been fully resolved because the full definition resides in a different compilation unit.

很有意思,ptr和ref,大家都知道C语言有指针,C++有引用的概念

某个变量在内存中的地址,比如0x3F234A,就是指针

由于不同机器内存分布不同,同一个程序在Intel处理器上,和在麒麟、骁龙这种ARM处理器上地址也不一样

这里Maple IR带来了机器无关的地址,不管哪个处理器,地址都有一个统一编号

Statement Labels

Label names are prefixed with '@' which serves to identify them. Any statement beginning with a label name defines that label as referring to that text position. Labels are only referred to locally by goto and branch statements.

每一行Maple IR代码,都会编个号而已,这样if条件语句之类的跳转,就会可以指向到某个语句


Java Class and Interface Declaration

syntax: javaclass <id-name> <class-type> <attributes>

<id-name> must have '$' as prefix as class names always have global scope. For example:
javaclass $Puppy <class [{@color](mailto:%7B@color) i32}> public final         # a java class named "Puppy" with a single field "color" and attributes public and final
A javaclass name should not be regarded as a type name as it contains additional attribute information. It cannot be enclosed in angular brackets as it cannot be referred to as a type.

A java interface has the same form as the class type, being able to extend another interface, but unlike class, an interface can extend multiple interfaces. Another difference from class is that an interface cannot be instantiated. Without instantiation, the data fields in interfaces are always allocated statically. For example,
interface <$interfaceA> {      #this interface extends interfaceA
  @s1 int32,                   # data fields inside interfaces are always statically allocated
  &method1(int32) f32 }        # a method declaration

Java的Class和interface都做了,类和接口


Exceptions Handling

Described in this section are the various exception handling constructs and operations. The try statement marks the entrance to a try block. The catch statement marks the entrance to a catch block. The finally statement marks the entrance to a finally block. The endtry statement marks the end of the composite exception handling constructs that began with the try. In addition, there are two special types of labels. Handler labels are placed before catch statements, and finally labels are placed before finally statements. Handler labels are distinguished from ordinary labels via the prefix "@h@", while finally labels use the prefix "@f@". These special labels explicitly shows the correspondence of try, catch and finally to each other in each try-catch-finally composite, without relying on block nesting. The special register %%thrownval contains the value being thrown, which is the operand of the throw operation that raised the current exception.

try

syntax:

try <handler-label> <finally-label>


catch

syntax:

<handler-label> catch


finally

syntax:

<finally-label> finally



Try、Catch、Finally,异常处理都做了


A generic function is instantiated by invoking it with an instantiation vector. The instantiation vector immediately follows the name of the generic function. Since the instantiation vector is regarded as type information, it is further enclosed inside the angular brackets "<" and ">". Invocation of generic functions must be via the opcodes callinstantand callinstantassigned, which correspond to call and callassigned respectively. Example:

func &swap (var %x <!UU>, var %y <!UU>) void { # &swap is a generic
function to swap the contents of its two parameters

var %z <!UU>

dassign %z (dread agg %x)

dassiign %x (dread agg %y)

dassign %y (dread agg %z)

return

}

泛型也做了

Since MAPLE IR is one IR that can exist at multiple levels of semantics, the level of a MAPLE IR program is dictated by the constraints that it adheres to. These constraints are of the following two types:
Opcodes allowed - The higher the level, the more types of opcodes allowed, including opcodes generated only from specific languages. At the lowest level, only opcodes that correspond one-to-one to operations in a general purpose processor are allowed.
Code structure - The program structure is hierarchical at the higher levels. The hierarchical constructs become less and less as lowering proceeds. At the lowest level, the program structure is flat, consisting of sequences of primitive instructions consumed by the general purpose processor.

这段挺巧妙的,就是说,这个Maple IR支持不同粒度

通俗说,就是各种编程语言都有自己的Code Structure和Opcodes,在更高粒度,就是更接近各种编程语言个性化的程序结构,会允许各种个性化的运算的opcode

但编译到更低层次的表示(更小的指令粒度),就更接近于x86、arm这种处理器的原生指令


以上是IR部分,由于是英文的,详细看了一遍

华为给出了引用计数原理

这个用于计算机系学生的教育用途Educational Purpose倒是不错啦

总体数据结构 和 算法都比较基础,所以名字也是Naive(朴素的)RC计数

引用计数(Reference Counting, RC)是计算机编程语言中的一种内存管理技术,是指将资源(可以是对象、内存或磁盘空间等等)的被引用次数保存起来,当被引用次数变为零时就将其释放的过程。使用引用计数技术可以实现自动资源管理的目的。同时引用计数还可以指使用引用计数技术回收未使用资源的垃圾回收算法。朴素版RC(Naive RC)是一种简单直接的RC插入操作。
  • 插入前
class A {
 static Object static_field;
 Object instance_field;
 A() {
   static_field = new Object();
 }
}
Object foo(){
 A a = new A();
 bar(a, new Object())
 return a.instance_field;
}
void bar(A a, Object o) {
 a.instance_field = o;
}
  • 插入后
class A {
  A() {
   local_var t = new Object(); // t是赋值给static_field过程中使用的临时变量
   old = static_field;
   static_field = t;
   IncRef(t); DecRef(old);  // 更新堆上RC
   DecRef(t); // 函数退出释放栈上RC
  }
}
Object foo(){
  A a = new A();
  bar(a, new Object());
  locl_var t = a.instance_field;
  IncRef(t) // 栈上变量引用RC+1
  IncRef(t) // 函数返回,返回值RC+1
  DecRef(a) // 函数退出释放栈上RC,释放a
  DecRef(t) // 函数退出释放栈上RC
  return t;
}
void bar(A a, Object o) {
  old = a.instance_field
  a.instance_field = o;
  IncRef(o); DecRef(old);
}

这个是虚函数表的部分,我觉得Java和C++程序员都可以看一看,这不是面试常考内容嘛

方舟编译器会为每一个类生成一个虚方法表。在这个表中,会存储父类的虚方法,再加上子类的虚方法以及实现的接口类的Default方法。如果子类重载了父类的实现,那么在虚方法表中同样的位置,则会覆盖掉父类的方法。
下面,展示一个具体 的例子:
class A {
  public int first() {
    return 0;
  }
}

class B extends A {
  public void foo() {
  }
  public int first() {
    return 1;
  }
}

class C extends A {
  public void bar() {
  }
  public int first() {
    return 2;
  }
}

public class IsEmpty {
  public Static void main(String [] args) {
    A x = new B();
    X.first();
    A y = new C()
    y.first();
  }

  public void add(A x) {
    x.first();
  }
}

方舟编译器生成的虚函数表的结构如下:

A:

_vtb_LA_3B:
        .quad   Ljava_2Flang_2FObject_3B_7Cclone_7C_28_29Ljava_2Flang_2FObject_3B - .
        .quad   Ljava_2Flang_2FObject_3B_7Cequals_7C_28Ljava_2Flang_2FObject_3B_29Z - .
        .quad   Ljava_2Flang_2FObject_3B_7Cfinalize_7C_28_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7CgetClass_7C_28_29Ljava_2Flang_2FClass_3B - .
        .quad   Ljava_2Flang_2FObject_3B_7ChashCode_7C_28_29I - .
        .quad   Ljava_2Flang_2FObject_3B_7Cnotify_7C_28_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7CnotifyAll_7C_28_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7CtoString_7C_28_29Ljava_2Flang_2FString_3B - .
        .quad   Ljava_2Flang_2FObject_3B_7Cwait_7C_28_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7Cwait_7C_28J_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7Cwait_7C_28JI_29V - .
        .quad   LA_3B_7Cfirst_7C_28_29I - .

B:

__vtb_LB_3B:
        .quad   Ljava_2Flang_2FObject_3B_7Cclone_7C_28_29Ljava_2Flang_2FObject_3B - .
        .quad   Ljava_2Flang_2FObject_3B_7Cequals_7C_28Ljava_2Flang_2FObject_3B_29Z - .
        .quad   Ljava_2Flang_2FObject_3B_7Cfinalize_7C_28_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7CgetClass_7C_28_29Ljava_2Flang_2FClass_3B - .
        .quad   Ljava_2Flang_2FObject_3B_7ChashCode_7C_28_29I - .
        .quad   Ljava_2Flang_2FObject_3B_7Cnotify_7C_28_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7CnotifyAll_7C_28_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7CtoString_7C_28_29Ljava_2Flang_2FString_3B - .
        .quad   Ljava_2Flang_2FObject_3B_7Cwait_7C_28_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7Cwait_7C_28J_29V - .
        .quad   Ljava_2Flang_2FObject_3B_7Cwait_7C_28JI_29V - .
        .quad   LB_3B_7Cfirst_7C_28_29I - .
        .quad   LB_3B_7Cfoo_7C_28_29V - .

C:

__vtb_LC_3B:
前面11个和A和B一样
       … …
       .quad   LC_3B_7Cfirst_7C_28_29I - .
       .quad   LC_3B_7Cbar_7C_28_29V - .

方舟编译器:Java虚函数调用的静态化

接口函数表

Java接口函数调用的静态化


在程序执行过程中,我们执行如下步骤:
判断对象(obj)是哪个类的实例,当前为类A的实例;
根据hash值,在一级表中查找,存在则返回函数指针,如果对应位置为0,则通过二级表查找。在二级表中,使用函数签名的哈希值查找,如果找到就返回函数指针,否则用函数名查找;
间接调用函数指针,并把相关的参数(args)传给间接调用。
下面,举一个具体的例子:
这个IsEmpty类实现了接口A和B,每个接口中声明有两个方法。
interface A{
  public int add();
  public int minus();
}

interface B{
  public int mult();
  public int div();
}

public class IsEmpty implements A, B {
    public static void main(String[]args) {
    }

    public void test(B x) {
      x.mult();
    }

    public int add() {
      return 6 + 3;
    }

    public int minus() {
      return 6 - 3;
    }

    public int mult() {
      return 6 * 3;
    }

    public int div() {
      return 6 / 3;
    }
}

首先,我们来看一下 IsEmpty 的 itable 在 maple 代码里面是怎么样的:

var $__itb_LIsEmpty_3B fstatic <[24] <* void>> = [0, 0, 0, 0, 0, 0, 0, 0, addroffunc ptr &LIsEmpty_3B_7Cdiv_7C_28_29I, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, addroffunc ptr &LIsEmpty_3B_7Cadd_7C_28_29I, 0, 0, addrof ptr $__itabC_LIsEmpty_3B]

var $__itbC_LIsEmpty_3B fstatic <[6] <* void>> = [2, 1, 0xb97, addroffunc ptr &LIsEmpty_3B_7Cmult_7C_28_29I, 0x1f7f, addroffunc ptr &LIsEmpty_3B_7Cminus_7C_28_29I]

对应的汇编结构:

__itb_LIsEmpty_3B:
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   LIsEmpty_3B_7Cdiv_7C_28_29I - .
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   0
        .quad   LIsEmpty_3B_7Cadd_7C_28_29I - .
        .quad   0
        .quad   0
        .quad   __itabC_LIsEmpty_3B - .
__itbC_LIsEmpty_3B:
        .quad   2
        .quad   1
        .quad   2967
        .quad   LIsEmpty_3B_7Cmult_7C_28_29I - .
        .quad   8063
        .quad   LIsEmpty_3B_7Cminus_7C_28_29I - .
其中表项内容如下:
一级表中(__itb_LIsEmpty_3B),共23项,其中第9项和第20项为函数地址,第23项为二级表地址,由此可见一级表发生了冲突,从而需要二级表来确认具体的函数地址;
二级表中第一项为2,表示有2个不冲突的函数,第二项为1,起到对齐占位的作用,而后面4项分别为函数签名产生的hash值和对应的函数地址。
接下来我们看到这个例子里面,源码中test函数中会产生一个interface-call,对应的maple代码如下:
if (eq u1 u64 (regread u64 %4, constval u64 0)) {
  callassigned &MCC_getFuncPtrFromItabSecondHash64 (regread ptr %3, constval u64 0xb97, conststr ptr "mult|()I") { regassign u64 %4}
}
icallassigned (regread u64 %4, regread ref %2) {}
可以看出调用逻辑是这样的:
首先判断一级itable表当中hash值对应位置表项是否为空,如果不空则直接使用该地址;如果为空,则调用 getFuncPtrFromItabSecondHash64 函数。
getFuncPtrFromItabSecondHash64 函数有三个参数,itable 地址,函数 basename 对应的 hash 值,和函数的签名。完整的调用过程是先通过 classinfo 找到对应的 itable 地址,然后进行 hash 值的比对,如果比对成功且不冲突就能得到正确的地址;如果比对冲突,则直接使用 signature name 进行比对(字符串比对)。
这里所访问的 itable 和上面列出的 IsEmpty 的itable表项形式一致。



愿中国青年都摆脱冷气,只是向上走,不必听自暴自弃者流的话。能做事的做事,能发声的发声。有一分热,发一分光。就令萤火一般,也可以在黑暗里发一点光,不必等候炬火
我们自古以来,就有埋头苦干的人,有拼命硬干的人,有为民请命的人,有舍身求法的人……虽是等于为帝王将相作家谱的所谓‘正史’,也往往掩不住他们的光耀,这就是中国的脊梁。
的确的,谁也没有发见过苍蝇们的缺点和创伤。然而,有缺点的战士终竟是战士,完美的苍蝇也终竟不过是苍蝇。