運算元本質與數學函式

tvm支援基本的算術運算。在許多情況下，通常需要更複雜的內建函式。例如exp取函式的指數。

這些函式依賴於目標系統，可能具有不同目標平台的不同名稱。本文將學習如何呼叫這些特定於目標的函式，以及如何通過tvm的內在api統一介面。

fromfutureimport absolute_import, print_function

import tvm

from tvm import te

import numpy as np

direct declare extern math call

呼叫特定於目標函式的最直接的方法是通過tvm中的extern函式呼叫構造。在下面的示例中，使用call __expf呼叫只有在cuda下才可用的函式。

n = te.var(「n」)

a = te.placeholder((n,), name=「a」)

b = te.compute(a.shape, lambda i: tvm.tir.call_pure_extern(「float32」, 「__expf」, a[i]), name=「b」)

s = te.create_schedule(b.op)

num_thread = 64

bx, tx = s[b].split(b.op.axis[0], factor=num_thread)

s[b].bind(bx, te.thread_axis(「blockidx.x」))

s[b].bind(tx, te.thread_axis(「threadidx.x」))

f = tvm.build(s, [a, b], 「cuda」, name=「myexp」)

print(f.imported_modules[0].get_source())

out:

extern 「c」globalvoid myexp_kernel0(float*restrictb, float*restricta, int n, int stride, int stride1) else }}

unified intrinsic call

上面的**驗證了直接外部呼叫是否可以用於呼叫特定於裝置的函式。但是，上述方法只適用於浮點型的cuda目標。理想情況下，希望為任何裝置和任何資料型別編寫相同的**。

tvm內在機制為使用者提供了實現這一點的機制，這是解決問題的推薦方法。以下**使用te.exp公司而是建立乙個call :py:tvm.te.exp()做指數運算。

n = te.var(「n」)

a = te.placeholder((n,), name=「a」)

b = te.compute(a.shape, lambda i: te.exp(a[i]), name=「b」)

s = te.create_schedule(b.op)

num_thread = 64

bx, tx = s[b].split(b.op.axis[0], factor=num_thread)

s[b].bind(bx, te.thread_axis(「blockidx.x」))

s[b].bind(tx, te.thread_axis(「threadidx.x」))

fcuda = tvm.build(s, [a, b], 「cuda」, name=「myexp」)

print(fcuda.imported_modules[0].get_source())

out:

extern 「c」globalvoid myexp_kernel0(float*restrictb, float*restricta, int n, int stride, int stride1) else }}

可以發現**對cuda和opencl都有效。同樣的te.exp也可用於float64資料型別。

fopencl = tvm.build(s, [a, b], 「opencl」, name=「myexp」)

print(fopencl.imported_modules[0].get_source())

out:

__kernel void myexp_kernel0(__global float* restrict b, __global float* restrict a, int n, int stride, int stride1) else }}

intrinsic lowering rule

什麼時候tvm.te.exp()呼叫時，tvm將建立乙個內部呼叫表示式。tvm使用轉換規則將內部呼叫轉換為特定於裝置的外部呼叫。

tvm還允許使用者在執行時自定義規則。以下示例為exp自定義cuda降低規則。

def my_cuda_math_rule(op):

「」「customized cuda intrinsic lowering rule」""

assert isinstance(op, tvm.tir.call)

name = op.op.name

assert name.startswith(「tir.」)

dispatch_name = name[4:]

if op.dtype == 「float32」:

# call float function

return tvm.tir.call_pure_extern(「float32」, 「%sf」 % dispatch_name, op.args[0])

elif op.dtype == 「float64」:

# call double function

return tvm.tir.call_pure_extern(「float32」, dispatch_name, op.args[0])

else:

# cannot do translation, return self.

return op

tvm.target.register_intrin_rule(「cuda」, 「exp」, my_cuda_math_rule, override=true)

使用override選項將規則註冊到tvm以覆蓋現有規則。注意列印的**與以前的**之間的區別：新規則使用數學函式expf，而不是version __expf快速版本。

fcuda = tvm.build(s, [a, b], 「cuda」, name=「myexp」)

print(fcuda.imported_modules[0].get_source())

out:

extern 「c」globalvoid myexp_kernel0(float*restrictb, float*restricta, int n, int stride, int stride1) else }}

add your own intrinsic

如果存在tvm未提供的內在特性。使用者可以通過使用內在規則系統輕鬆地新增新的內在規則。下面的示例向系統新增乙個內部mylog。

def mylog(x):

「」「customized log intrinsic function」""

return tvm.tir.call_intrin(x.dtype, 「tir.mylog」, x)

def my_cuda_mylog_rule(op):

「」「cuda lowering rule for log」""

if op.dtype == 「float32」:

return tvm.tir.call_pure_extern(「float32」, 「logf」, op.args[0])

elif op.dtype == 「float64」:

return tvm.tir.call_pure_extern(「float64」, 「log」, op.args[0])

else:

return op

tvm.ir.register_op_attr(「tir.mylog」, 「tcalleffectkind」, tvm.tir.calleffectkind.pure)

tvm.target.register_intrin_rule(「cuda」, 「mylog」, my_cuda_mylog_rule, override=true)

函式函式名的本質

函式名的本質簡單地說可以當普通變數用是指向記憶體位址的乙個名字，且同樣可以賦值所謂的變數名，對於計算機都是乙個記憶體位址 func 函式名就是記憶體位址 func2 func 函式名可以賦值 func2 因此，可以將函式名存入列表中，迴圈呼叫函式名可以作為容器型別的元素 l func,fu...

Printf函式與Scanf函式學習

剛上大一時，主要學習了c 現在開始學c語言。下面來講講c語言輸出與輸入函式。一.printf的使用格式為 printf control string,item1,item2,control string 控制字串，它用來描述專案如何列印字串 item1和item2是要列印的專案，可以為常量，可以為變...

事物的本質和數學的關係

抽象事物形容詞抽象事物動詞以下這兩句話是等同的組合使得事物之間產生各種關係，而關係由事物之間的互動動作維持，這樣子名詞形容詞和動詞組成了我們的世界。關於事物的分割和組合，會產生以下的疑問這兩個問題的答案是人類一直在追溯的東西。所有其它類似的疑問將最終歸結於以上兩個問題。數學反映了...

運算元本質與數學函式

函式 函式名的本質

Printf函式與Scanf函式學習

事物的本質和數學的關係

相關推薦

函式函式名的本質