【XRT Vitis-Tutorials】C++/RTL Kernel混合编程测试

1 前言
前面文章导航:
ZCU106 XRT环境搭建
ZCU106 XRT Vivado工程分析
ZCU106 XRT PetaLinux工程分析
【XRT Vitis-Tutorials】RTL Kernels测试

官方文档:
2019.2 Vitis™ Application Acceleration Development Flow Tutorials
Vitis Unified Software Platform Documentation Application Acceleration Development
Vitis Unified Software Platform Documentation Embedded Software Development

2 创建Vitis工程

本篇文章来测试Tutorials中的第2个例子:Mixing C++ and RTL Kernels

该例子中进行了两个步骤的实验,分别是sw_emu和hw_emu,我这变还是会继续在硬件上直接测试。

本例子将会进行如下测试:

  • 使用sw_emu仿真方法,测试C++ Kernel的功能
  • 使用hw_emu仿真方法,测试Mixing C++ and RTL Kernels的功能
  • 使用hardware实测方法,板上验证硬件加速功能
  • 2.1 工程创建

    例子中的测试方法是使用脚本 run_sprite_mixing_c_rtl_kernels.sh 来进行Vitis工程创建和编译的,我为了方便还是使用GUI的方法吧。

    2.1 新建工程
    在Vitis中创建一个新的Application Project,平台选择zcu106vcu_base。

    2.2 添加源代码
    我们将需要编译的内容直接添加到src目录下,包括:

  • C++ Kernel:krnl_vadd.cpp
  • RTL Kernel:rtl_kernel_wizard_0.xo
  • Host APP:host_step2.cpp(直接使用host_step2可以测试混合Kernel功能)
  • 接着我们要将编译平台选择到Hardware,将C++/RTL Kernels添加到Hardware Functions内进行加速。

    最终的工程目录结构如下图:

    2.3 代码分析
    host_step2.cpp
    该例子中的主要功能如下:

  • 先使用C++ Kernel:krnl_vector_add 进行 c=a+b的硬件加速
  • 再使用RTL Kernel:krnl_const_add 进行d=c+1的硬件加速
  • //set the kernel Arguments
    krnl_vector_add.setArg(0,buffer_a);
    krnl_vector_add.setArg(1,buffer_b);
    krnl_vector_add.setArg(2,buffer_result);
    krnl_vector_add.setArg(3,DATA_SIZE);
    krnl_const_add.setArg(0,buffer_result);
    //Launch the Kernel
    q.enqueueTask(krnl_vector_add);
    q.enqueueTask(krnl_const_add);

    C++ Kernel

    C++ Kernel的源码如下:
    //------------------------------------------------------------------------------
    //
    // kernel: vadd
    //
    // Purpose: Demonstrate Vector Add in OpenCL
    //

    #define BUFFER_SIZE 256
    extern "C" {

    void krnl_vadd(
    int* a,
    int* b,
    int* c,
    const int n_elements)
    {

    #pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=a max_read_burst_length = 256
    #pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem port=b max_read_burst_length = 256
    #pragma HLS INTERFACE m_axi offset=SLAVE bundle=gmem1 port=c max_write_burst_length = 256

    #pragma HLS INTERFACE s_axilite port=a bundle=control
    #pragma HLS INTERFACE s_axilite port=b bundle=control
    #pragma HLS INTERFACE s_axilite port=c bundle=control

    #pragma HLS INTERFACE s_axilite port=n_elements bundle=control
    #pragma HLS INTERFACE s_axilite port=return bundle=control

    int arrayA[BUFFER_SIZE];
    int arrayB[BUFFER_SIZE];

    for (int i = 0 ; i < n_elements ; i += BUFFER_SIZE)
    {
    int size = BUFFER_SIZE;
    //boundary check
    if (i + size > n_elements) size = n_elements - i;

    //Burst reading A and B
    readA: for (int j = 0 ; j < size ; j++) {
    #pragma HLS pipeline ii = 1 rewind
    arrayA[j] = a[i+j];
    arrayB[j] = b[i+j];
    }

    //Burst reading B and calculating C and Burst writing
    // to Global memory
    vadd_wrteC: for (int j = 0 ; j < size ; j++){
    #pragma HLS pipeline ii = 1 rewind
    c[i+j] = arrayA[j] + arrayB[j];
    }
    }
    }
    }

    可以看出,其实这就是一个Vivado HLS的代码。功能包括:

  • 两个AXI_Master接口用于读写数据:gmem,gmem1
  • 一个AXI_Slave接口用于4个寄存器的配置:包括a,b,c数据地址,n_elements的可配之参数
  • 实际的IP功能,实现c=a+b,借助HLS pipeline定义实现了流水线处理
  • RTL Kernel
    RTL的功能可以解压一下rtl_kernel_wizard_0.xo,然后看其中的逻辑代码,就是实现了一个数据+1的功能。
    同时可以看到rtl_kernel_wizard_0.xo能够被展开,其中包含了一个名为rtl_kernel_wizard_0_cmodel.cpp的C Model测试代码。该C Model可以被用于进行仿真验证测试,因为仿真时候是没有RTL实体的。

    2.2 仿真测试
    其中仿真的两个测试方法我就不进行说明了,按照Tutorials的说明来做即可。

    2.2.1 Review the Application Timeline
    在该例子中还使用了Vitis的新工具Vitis Analyzer进行了仿真时序的查看,我还没仔细研究,看起来是一个不错的工具,可以观察数据处理的流程和Kernel的运行时间。

    2.3 上板测试
    2.3.1 编译
    该工程中直接选择Hardware进行编译即可,mixing_container与上一次的只有一个的Kernel有所区别。
    如下图:

    mixing_container中包含了两个Kernels,一个是RTL Kernel,一个是C++ Kernel,这也就是本篇的核心功能。
    点击Build进行编译

    2.3.2 Vivado工程
    编译完成后,可以打开Vivado查看一下内部的结构。如下图:

    Vivado HLS的图标,说明该IP正是由Vivado HLS生成的,然后作为C++ Kernel进行使用。
    rtl_kernel_wizard_0_1即是RTL Kernel。

    2.3.3 测试验证
    将固件复制到SD卡,然后运行命令进行测试,如下:
    root@zcu106vcu_base:~# /mnt/mixing_ke.exe /mnt/mixing_container.xclbin
    Using FPGA binary file specfied through the command line: /mnt/mi[ 50.938732] [drm] Pid 2526 opened device
    xing_container.xclbin
    [ 50.947471] [drm] Pid 2526 closed device
    [ 50.953532] [drm] Pid 2526 opened device
    Found Platform
    Platform Name: Xilinx
    Loading: '/mnt/mixing_container.xclbin'
    [ 51.916381] [drm] Finding IP_LAYOUT section header
    [ 51.916388] [drm] Section IP_LAYOUT details:
    [ 51.921201] [drm] offset = 0x126ad88
    [ 51.925466] [drm] size = 0xa8
    [ 51.929211] [drm] Finding DEBUG_IP_LAYOUT section header
    [ 51.932348] [drm] AXLF section DEBUG_IP_LAYOUT header not found
    [ 51.937654] [drm] Finding CONNECTIVITY section header
    [ 51.943572] [drm] Section CONNECTIVITY details:
    [ 51.948616] [drm] offset = 0x126ae30
    [ 51.953136] [drm] size = 0x34
    [ 51.956882] [drm] Finding MEM_TOPOLOGY section header
    [ 51.960019] [drm] Section MEM_TOPOLOGY details:
    [ 51.965064] [drm] offset = 0x126ad58
    [ 51.969585] [drm] size = 0x30
    [ 51.974631] [drm] No ERT scheduler on MPSoC, using KDS
    [ 51.983293] [drm] scheduler config ert(0)
    [ 51.983296] [drm] cus(2)
    [ 51.987305] [drm] slots(16)
    [ 51.990008] [drm] num_cu_masks(1)
    [ 51.992970] [drm] cu_shift(16)
    [ 51.996449] [drm] cu_base(0x80000000)
    [ 51.999671] [drm] polling(0)
    [ 52.011442] [drm] User buffer is not physical contiguous
    [ 52.019813] [drm] zocl_free_userptr_bo: obj 0x000000009a50640f
    [ 52.020624] [drm] User buffer is not physical contiguous
    [ 52.031792] [drm] zocl_free_userptr_bo: obj 0x000000009f443a13
    [ 52.032500] [drm] User buffer is not physical contiguous
    TEST WITH TWO KERNELS PASSED
    [ 52.043672] [drm] zocl_free_userptr_bo: obj 0x00000000bd649846
    [ 52.054960] [drm] Pid 2526 closed device
    root@zcu106vcu_base:~#
    root@zcu106vcu_base:~# /mnt/mixing_ke.exe /mnt/mixing_container.xclbin
    Using FPGA binary file specfied through the command line: /mnt/mi[ 183.121011] [drm] Pid 2864 opened device
    [ 183.129869] [drm] Pid 2864 closed device
    xing_container.xclbin
    [ 183.133953] [drm] Pid 2864 opened device
    Found Platform
    Platform Name: Xilinx
    Loading: '/mnt/mixing_container.xclbin'
    [ 183.234246] [drm] The XCLBIN already loaded. Don't need to reload.
    [ 183.236038] [drm] Reconfiguration not supported
    [ 183.254888] [drm] User buffer is not physical contiguous
    [ 183.260203] [drm] zocl_free_userptr_bo: obj 0x00000000623f0590
    [ 183.260948] [drm] User buffer is not physical contiguous
    [ 183.272098] [drm] zocl_free_userptr_bo: obj 0x00000000b4bceb33
    [ 183.272350] [drm] User buffer is not physical contiguous
    TEST WITH TWO KERNELS PASSED
    [ 183.283494] [drm] zocl_free_userptr_bo: obj 0x000000008012c72f
    [ 183.294723] [drm] Pid 2864 closed device
    root@zcu106vcu_base:~#

    3 总结
    使用Vitis和自定义的ZCU106 XRT平台完成了Vitis-Tutorials中的Mixing C++ and RTL Kernels功能测试。

    ————————————————
    版权声明:本文为CSDN博主「vacajk」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
    原文链接: https://blog.csdn.net/vacajk/article/details/103549318

    最新文章

    最新文章