Taichi in a Nutshell
A domain specific language embedded in Python for high-performance parallel computing Just-in-time (JIT) compilation Automatically parallelizes outermost for loops in a kernel Supports multiple backends (CPUs, CUDA, OpenGL, Metal...) Supports ahead-of-time compilation |
Hello, World!
1. Install Taichi: |
|
2. Verify installation - Taichi gallery: |
|
3. Write your first Taichi program: |
import taichi as ti ti.init(arch=ti.cpu) # A backend can be either ti.cpu or ti.gpu # When ti.gpu is specified, Taichi moves down the backend list of ti.cuda, ti.vulkan, and ti.opengl/ti.metal
|
Data types
Primitive data types: |
i8
, i16
, i32
, i64
, u8
, u16
, u32
, u64
, f16
, f32
, f64
|
i: integer; u: unsigned integer; f: floating-point number Number following i/u/f stands for precision bits |
Change default types: |
# Default integer type: ti.i32; default floating-point type: ti.f32 ti.init(default_ip=ti.i64) # Sets the default integer type to ti.i64 ti.init(default_fp=ti.f64) # Sets the default floating-point type to ti.f64
|
Explicit type casting: |
# Use ti.cast(): a = 3.14 b = ti.cast(a, ti.i32) # 3 c = ti.cast(b, ti.f32) # 3.0 # Use primitive types to convert a scalar variable to a different scalar type: a = 3.14 x = int(a) # 3 y = float(a) # 3.14 z = ti.i32(a) # 3 w = ti.f64(a) # 3.14
|
Implicit type casting: |
Integer + floating point -> floating point Low-precision bits + high-precision bits -> high-precision bits Signed integer + unsigned integer -> unsigned integer |
Compound data types: |
Vectors and matrices: |
vec4d = ti.types.vector(4, ti.f64) # A 64-bit floating-point 4D vector type mat4x3i = ti.types.matrix(4, 3, int) # A 4x3 integer matrix type v = vec4d(1, 2, 3, 4) # Creates a vector instance: v = [1.0 2.0 3.0 4.0] |
Structs: |
# Defines a compound type vec3 to represent a sphere's center vec3 = ti.types.vector(3, float) # Defines a compound type sphere_type to represent a sphere sphere_type = ti.types.struct(center=vec3, radius=float) sphere = sphere_type(center=vec3(0), radius=1.0)
|
Quantized/low-precision data types: |
# Defines a 5-bit unsigned integer u5 = ti.types.quant.int(bits=5, signed=False) # Defines a 10-bit signed fixed point type within the range [-20.0, 20.0] fixed_type_a = ti.types.quant.fixed(bits=10, max_value=20.0) # Defines a 15-bit unsigned floating-point type with six exponent bits float_type_b = ti.types.quant.float(exp=6, frac=9, signed=False)
|
Sparse matrix (pending) |
Data container
Field (global data container): |
Declare: |
# Declares a scalar field scalar_field = ti.field(int, shape=(640, 480)) # Declares a vector field vector_field = ti.Vector.field(n=2, dtype=float, shape=(1,2,3)) # Declares a matrix field matrix_field = ti.Matrix.field(n=3, m=2, dtype=float, shape=(300, 400, 500))
|
Index: |
f_0d = ti.field(float, shape=()) f_0d[None] = 1.0 # Accesses the element in a 0D field f_1d = ti.field(int, shape=10) f_1d[5] = 1 f_2d = ti.field(int, shape=(10, 10)) f_2d[1, 2] = 255 f_3d = ti.Vector.field(3, float, shape=(10, 10, 10)) f_3d[3, 3, 3] = 1, 2, 3 f_3d[3, 3, 3][0] = 1
|
Interact with external arrays: |
x = ti.field(ti.f32, 4) x_np = x.to_numpy() # Exports data in Taichi fields to NumPy arrays x.from_numpy(x_np) # Imports data from NumPy arrays to Taichi fields x_torch = x.to_torch() # Exports data in Taichi fields to PyTorch tensors x.from_torch(torch.tensor([1, 7, 3, 5])) # Imports data from PyTorch tensors to Taichi fields @ti.kernel def numpy_as_ndarray(arr: ti.ndarray()): # Passes a NumPy ndarray to a kernel for i in ti.ndrange(arr.shape[0]): ...
|
Ndarray: A multidimensional container of elements of the same type and size |
pos = ti.Vector.ndarray(2, ti.f32, N) vel = ti.Vector.ndarray(2, ti.f32, N) force = ti.Vector.ndarray(2, ti.f32, N)
|
Kernels and functions
Kernel: An entry point where Taichi's runtime begins to take over computation tasks. The outermost for loops in a kernel are automatically parallelized. |
Taichi function: A building block of kernels. you can split your tasks into multiple Taichi functions to improve readability and reuse them in different kernels. |
Taichi kernel vs. Taichi function |
|
Taichi kernel |
Taichi function |
Decorated with |
@ti.kernel |
@ti.func |
Called from |
Python scope |
Taichi scope |
Type hint arguments |
Required |
Recommended |
Type hint return values |
Required |
Recommended |
Return type |
Scalar/ ti.Vector
/ ti.Matrix
|
Scalar/ ti.Vector
/ ti.Matrix
/ ti.Struct
/... |
Max. No. of elements in arguments |
32 (for OpenGL) 64 (for others) |
Unlimited |
Max. No. of return values |
1 |
Unlimited |
Visualization
GUI system: |
gui = ti.GUI('Window Title', (640, 360)) # Creates a window while not gui.get_event(ti.GUI.ESCAPE, ti.GUI.EXIT): gui.show() # Displays the window
|
GGUI system: |
pixels = ti.Vector.field(3, float, (640, 480)) window = ti.ui.Window("Window Title", (640, 360)) # Creates a window canvas = window.get_canvas() # Creates a canvas while window.running: canvas.set_image(pixels) window.show()
|
|
|
Data-oriented programming
Data-oriented class: A data-oriented class is used when your data is actively updated in the Python scope (such as current time and user input events) and tracked in Taichi kernels. |
@ti.data_oriented # Decorates a class with a @ti.data_oriented decorator class TiArray: def __init__(self, n): self.x = ti.field(dtype=ti.i32, shape=n) @ti.kernel # Defines Taichi kernels in the data-oriented Python class def inc(self): for i in self.x: self.x[i] += 1 a = TiArray(32) a.inc()
|
Taichi dataclass: A dataclass is a wrapper of ti.types.struct
. You can define Taichi functions as its methods and call these methods in the Taichi scope. |
@ti.dataclass class Sphere: center: vec3 radius: float @ti.func def area(self): # Defines a Taichi function as method return 4 math.pi self.radius**2 @ti.kernel def test(): sphere = Sphere(vec3(0), radius=1.0) print(sphere.area())
|
Math
Import Taichi’s math module: |
|
The module supports the following: |
Mathematical functions: |
# Call mathematical functions in the Taichi scope @ti.kernel def test(): a = tm.vec3(1, 2, 3) # A function can take vectors and matrices x = tm.sin(a) # [0.841471, 0.909297, 0.141120] # Element-wise operations y = tm.floor(a) # [1.000000, 2.000000, 3.000000] z = tm.degrees(a) # [57.295780, 114.591560, 171.887344]
|
Small vector and matrix types: |
vec2/vec3/vec4: 2D/3D/4D floating-point vector types ivec2/ivec3/ivec4: 2D/3D/4D integer vector types uvec2/uvec3/uvec4: 2D/3D/4D unsigned integer vector types mat2/mat3/mat4: 2D/3D/4D floating-point square matrix types |
GLSL-standard functions: |
@ti.kernel def example(): # Takes vectors and matrices as arguments and operates on them element-wise v = tm.vec3(0, 1, 2) w = tm.smoothstep(0, 1, v) w = tm.clamp(w, 0.2, 0.8) w = tm.reflect(v, tm.normalize(tm.vec3(1)))
|
Complex number operations in the form of 2D vectors: |
@ti.kernel def test(): x = tm.vec2(1, 1) # Complex number 1+1j y = tm.vec2(0, 1) # Complex number 1j z = tm.cmul(x, y) # vec2(-1, 1) = -1+1j w = tm.cdiv(x, y) # vec2(2, 0) = 2+0j
|
Commonly used functions: |
tm.acos(x) tm.asin(x) tm.atan2(x) tm.ceil(x) tm.clamp(x, xmin, xmax) tm.cos(x) tm.cross(x, y) tm.dot(x,y) tm.exp(x) tm.floor(x) tm.fract(x) tm.inverse(mat) tm.length(x) tm.log(x) tm.max(x, y, ...)
|
tm.min(x, y, ...) tm.mix(x, y, a) tm.mod(x,y) tm.normalize(x) tm.pow(x, a) tm.round(x) tm.sign(x) tm.sin(x) tm.smoothstep(e0, e1, x) tm.sqrt(x) tm.step(edge, x) tm.tan(x) tm.tanh(x) tm.degrees(x) tm.radians(x)
|
Performance
Profiling: |
scoped_profiler (default): |
# Analyzes the performance of the JIT compiler ti.profiler.print_scoped_profiler_info()
|
|
# Analyzes the performance of Taichi kernels ti.init(ti.cpu, kernel_profiler=True) # Enables the profiler ti.profiler.print_kernel_profiler_info() # Displays the results
|
Tuning: |
loop_config() : Serializes the outermost for loop that immediately follows it |
ti.loop_config(serialize=True) ti.loop_config(parallelize=8) # Uses 8 threads on the CPU backend ti.loop_config(block_dim=16) # Uses 16 threads in each block of the GPU backend
|
Offline cache (default): Saves compilation cache on disk for future runs |
ti.init(offline_cache=True)
|
Debugging
Activate debug mode: |
Conciser tracebacks: |
ti.init(arch=ti.cpu, debug=True)
|
import sys sys.tracebacklimit = 0
|
Runtime print in Taichi scope: |
Serial execution: |
@ti.kernel def inside_taichi_scope(): x = 256 print('hello', x) #=> hello 256
|
# Serializes the program ti.init(arch=ti.cpu, cpu_max_num_threads=1) # Serializes the for loop that immediately follows the line ti.loop_config(serialize=True)
|
Compile-time ti.static_print : |
Runtime assert in Taichi scope: |
x = ti.field(ti.f32, (2, 3)) y = 1 @ti.kernel def inside_taichi_scope(): ti.static_print(y) # => 1 ti.static_print(x.shape) # => (2, 3) ti.static_print(x.dtype) # => DataType.float32
|
# Activate debug mode before using assert statements in the Taichi scope ti.init(arch=ti.cpu, debug=True) x = ti.field(ti.f32, 128)
@ti.kernel def do_sqrt_all(): for i in x: assert x[i] >= 0 x[i] = ti.sqrt(x[i])
|
Compile-time ti.static_assert : |
@ti.func def copy(dst: ti.template(), src: ti.template()): ti.static_assert(dst.shape == src.shape, "copy() needs src and dst fields to be same shape") for I in ti.grouped(src): dst[I] = src[I]
|
|