AMD RunCL#

RunCL is a command-line tool to build, execute, and debug OpenCL programs, with a simple, easy-to-use interface.

RunCL Usage#

Usage: runcl [platform-options] [-I<include-dir>] [[-D<name>=<value>] ...]
             <kernel.[cl|elf]> [kernel-arguments] 
             <arguments> <num_work_items>[/<work_group_size>]

   [platform-options]
       -v                    verbose
       -gpu                  use GPU device (default)
       -cpu                  use CPU device
       -device <name>|#<num> use specified device
       -bo <string>          OpenCL build option

   [kernel-options]
       -k <kernel-name>      kernel name
       -p                    use persistence flag
       -r[link] <exec-count> execution count
       -w <msec>             waiting time
       -dumpcl               dump OpenCL code after pre-processing
       -dumpilisa            dump ISA of kernel and show ISA statistics
       -dumpelf              dump ELF binary

   The <arguments> shall be given in the order as required by the kernel.
     For value arguments use   
         iv#<int/float>[,<int/float>...] or 
         iv:<file> (e.g., iv#10.2,10,0x10)
     For local memory use      
         lm#<local-memory-size> (e.g., lm#8192)
     For input buffer use      
         if[#<buffer-size>]:[<file>][#[[<checksum>][/<file>[@<offset>#<end>]]]]
         (e.g., if:input.bin)
     For output (or RW) buffer 
         of[#<buffer-size>]:[#]<file>[@<ofile>][#[[<checksum>][/[+<float-tolerance>]<file>[@<offset>#<end>]]]] 
         (e.g., of#16384:output.bin)
     For input image  use      
         ii#<width>x<height>,<stride>,<u8/s16/u16/bgra/rgba/argb>:<file> 
         (e.g., ii#1920x1080,7680,bgra:screen1920x1080.rgb)
     For output image  use     
         oi#<width>x<height>,<stride>,<u8/s16/u16/bgra/rgba/argb>:<file> 
         (e.g., oi#1920x1080,7680,bgra:screen1920x1080.rgb

Example#

% cat subtract.cl
__kernel __attribute__((reqd_work_group_size(64, 1, 1)))
void subtract(
    __global float * a, 
    __global float * b, 
    __global float * c, 
    uint count)
{
    uint id = get_global_id(0);
    if(id < count) {
        c[id] = a[id] - b[id];
    }
}
% runcl subtract.cl if#4000:a.f32 if#4000:b.f32 of#4000:#out.f32 iv#1000 1024,1,1/64,1,1
OK: Using GPU device#0 [...]
OK: COMPILATION on GPU took   0.1268 sec for subtract
OK: kernel subtract info reqd_work_group_size(64,1,1)
OK: kernel subtract info work_group_size(256)
OK: kernel subtract info local_mem_size(0)
OK: kernel subtract info local_private_size(0)
OK: RUN SUCCESSFUL on GPU work:{1024,1,1}/{64,1,1} [  0.00025 sec/exec] subtract (1st execution)