embedded-video-processor-scaled

Software Infrastructure of an Embedded Video Processor Core for Multimedia Solutions

With new-age technologies like the Internet of Things, Machine Learning, Artificial Intelligence, companies are reimagining and creating intelligent multimedia applications by merging physical reality and digital information in innovative ways. A multimedia solution involves audio/video codec, image/audio/video processing, edge/cloud applications, and in a few cases AR/VR as well. This blog will talk about the software infrastructure involved for an embedded video processor core in any multimedia solution.

The video processor is an RTL-based hardened IP block available for use in leading FPGA boards these days. With this embedded core, users can natively support video conferencing, video streaming, and ML-based image recognition and facial identification applications with low latencies and high resource efficiency. However, there are software level issues pertaining to OS support, H.264/265 processing, driver development, and so forth that could come up before deploying the video processor.

Let us begin with an overview of the video processors and see how such issues can be resolved for semiconductor companies enabling the end-users to reap its product benefits.

The Embedded Video Processor Core

The video processor is a multi-component solution, consisting of the video processing engine itself, a DDR4 block, and a Synchronization block. Together, these components are dedicated to supporting H.264/.265 encoding and decoding at resolutions up to 4k UHD (3840x2160p60) and, for the top speed grades of this FPGA device family, up to 4096x2160p60. Levels and profiles supported include up to L5.1 High Tier for HEVC and L5.2 for AVC. All three are RTL-based embedded IP products that are deployed in the programmable logic fabric of the targeted FPGA device family and are optimized/’hardened’ for maximum resource efficiency and performance.

The video processor engine is capable of simultaneous encoding and decoding of up to 32 video streams. This is achieved by splitting up the 2160p60 bandwidth across all the intended channels, supporting video streams of 480p30 resolution. H.264 decoding is supported for bitstreams up to 960Mb/s at L5.2 2160p60 High 4:2:2 profile (CAVLC) and H.265 decoding of bitstreams up to 533Mb/s L5.1 2160p60 Main 4:2:2 10b Intra profile (CABAC.)

There is also significant versatility built into the video processor engine. Rate control options include CBR, VBR, and Constant QP. Higher resolutions than 2160p60 are supported at lower frame rates. The engine can handle 8b and 10b color depths along with YCbCr Chroma formats of 4:0:0, 4:2:0, and 4:2:2.

The microarchitecture includes separate encoder and decoder sections, each administered by an embedded 32b synthesizable MCU slaved to the Host APU through a single 32b AXI-4 Lite I/F. Each MCU has its L1 instruction and data cache supported by a dedicated 32b AXI-4 master. Data transfers with system memory are across a 4 channel 128b AXI-4 master I/F that is split between the encoder and decoder. There is also an embedded AXI performance monitor which measures bus transactions and latencies directly, eliminating the need for further software overhead other than the locked firmware for each MCU.

The DDR4 block is a combined memory controller and PHY. The controller portion optimizes R/W transactions with SDRAM, while the PHY performs SerDes and clock management tasks. There are additional supporting blocks that provide initialization and calibration with system memory. Five AXI ports and a 64b SODIMM port offer performance up to 2677 MT/s.

The third block synchronizes data transactions between the video processor engine encoder and DMA. It can buffer up to 256 AXI transactions and ensures low latency performance.

The company’s Integrated Development Environment (IDE) is used to determine the number of video processor cores needed for a given application and the configuration of buffers for either encoding or decoding, based on the number of bitstreams, the selected codec, and the desired profile. Through the toolchain, users can select either AVC or HEVC codecs, I/B/P frame encoding, resolution and level, frames per second color format & depth, memory usage, and compression/decompression operations. The IDE also provides estimates for bandwidth requirements and power consumption.

Embedded Software Support

The embedded software development support for any hardware into video processing can be divided into the following general categories:

  1. Video codec validation and functional testing
  2. Linux support, including kernel development, driver development, and application support
  3. Tools & Frameworks development
  4. Reference design development and deployment
  5. Use of and contributions to open-source organizations as needed

Validation of the AVC and HEVC codecs on the video processor is extensive. It must be executed to 3840x2160p60 performance levels for both encoding and decoding in bare metal and Linux-supported environments. Low latency performance is also validated from prototyping to full production.

Linux work focused on multimedia frameworks and levels to customize kernels and drivers. This includes the v4l2 subsystem, the DRM framework, and drivers for the synchronization block to ensure low latency performance.

The codec and Linux projects lent themselves effectively to the development of a wide variety of reference designs on behalf of the client. Edge designs for both encoding and decoding, developments ranging from low latency video conferencing to 32 channel video streaming, Region of Interest-based encoding, and ML face detection, all of this can be accomplished via the use of a carefully considered selection of open-source tools, frameworks, and capabilities. Find below a summary of these offerings:

  1. GStreamer – an open-source multi-OS library of multimedia components that can be assembled pipeline-fashion, following an object-oriented design approach with a plug-in architecture, for multimedia playback, editing, recording, and streaming. It supports the rapid building of multimedia apps and is available under the GNU LGPL license.
    The GStreamer offering also includes a variety of incredibly useful tools, including gst-launch (for building and running GStreamer pipelines) and gsttrace (a basic tracer tool.)
  2. StreamEye – an open-source tool that provides data and graphical displays for in-depth analysis of video streams.
  3. Gstshark – available as an open-source project from Ridgerun, this tool provides benchmarking and tracing capabilities for analysis and debugging of GStreamer multimedia application builds.
  4. FFmpeg and FFprobe – both part of the FFmpeg open-source project, these are hardware-agnostic, multi-OS tools for multimedia software developers. FFmpeg allows users to convert multimedia files between many formats, change sampling rates, and scale video. FFprobe is a basic tool for multimedia stream analysis.
  5. OpenMAX – available thru the Khronos Group, this is a library of API and signal processing functions that allow developers to make a multimedia stack portable across hardware platforms.
  6. Yocto – a Linux Foundation open-source collaboration that creates tools (including SDKs and BSPs) and supporting capabilities to develop Linux custom implementations for embedded and IoT apps. The community and its Linux versioning are hardware agnostic.
  7. Libdrm – an open-source set of low-level libraries used to support DRM. The Direct Rendering Manager is a Linux kernel that manages GPU-based video hardware on behalf of user programs. It administers program requests in an arbitration mode through a command queue and manages hardware subsystem resources, in particular memory. The libdrm libraries include functions for supporting GPUs from Intel, AMD, and Nvidia as well.
    Libdrm includes tools such as modetest, for testing the DRM display driver.
  8. Media-ctl – a widely available open-source tool for configuring the media controller pipeline in the Linux v4l2 layer.
  9. PYUV player – another widely available open-source tool that allows users to play uncompressed video streams.
  10. Audacity – a free multi-OS audio editor.

The above tools/frameworks help design efficient and quality multimedia solutions under video processing, streaming and conferencing.

The Softnautics engineering team has a long history of developing and integrating embedded multimedia and ML software stacks for many global clients. The skillsets of the team members extend to validating designs in hardware with a wide range of system interfaces, including HDMI, SDI, MIPI, PCIe, multi-Gb Ethernet, and more. With hands-on experience in Video Processing for Multi-Core SoC-based transcoder, Streaming Solutions, Optimized DSP processing for Vision Analytics, Smart Camera Applications, Multimedia Verification & validation, Device Drivers for Video/Audio interfaces, etc. Softnautics enable multimedia companies to design and develop connected multimedia solutions.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”12042″]

Software Infrastructure of an Embedded Video Processor Core for Multimedia Solutions Read More »

Xilinx's versal ACAP

Versal ACAP architecture & intelligent solution design

Overview

Xilinx’s new heterogeneous compute platform, Versal Adaptive Compute Acceleration Platform (ACAP), efficiently combines the power of software and hardware programmability. Versal ACAP devices are used for a wide range of applications such as data center, Wireless 5G, AI/ML, A & D Radars, Automotive, and wired applications.

Hardware Architecture

Versal ACAP is powered by scalable, adaptable, and Intelligent engines. On-chip memory access for all the machines is enabled via network on chip (NoC).

Source: Xilinx

Scalar Engines

Scalar engines power platform computing, decision making, and control. For general-purpose computing, dual-core ARM cortex- A72 Application Processing Unit (APU) is used in versal. APU supports virtualization allowing multiple software stacks to run simultaneously. The dual core ARM cortex R5F Realtime Processing Unit (RPU) is available for real-time applications. RPU can be configured as a single/dual processor in lockstep mode. RPU can be used for variety of time-critical applications, e.g., safety in the automotive domain.

Platform Management Controller

Platform Management Controller (PMC) is responsible for boot, configuration, partial re-configuration, and general platform management tasks, including power, clock, pin control, reset management, and system monitoring. It is also responsible for device life cycle management, including security.

Adaptable Engines

The adaptable engines feature the classic FPGA technology – the programable silicon. Adaptable engines include DSP engines (Adaptable), configurable logic blocks (Intelligent), and two types of RAM (Block RAM and Ultra RAM (adaptable)). Using such a configurable structure, users can create any kind of accelerator for different kinds of applications.

Intelligent Engines

AI engines are software programable and hardware adaptable. They are an array of VLIW SIMD vector processors used for ML/AI inference and advanced signal processing. AI engine is tile-based architecture. Each tile is made of a vector processor, scaler processor, dedicated program and data memory, dedicated AXI data movement channels, DMA, and locks.

Network on Chip

Network on Chip (NoC) makes Versal ACAPs even more powerful by connecting all engines, memory hierarchy, and highspeed IOs. NoC makes each hardware component and soft IP modules accessible to each other and the software via a memory-mapped interface.

Software Support

Xilinx introduced Vitis – A unified software development platform that enables embedded software and accelerated applications on heterogeneous Xilinx platforms, including FPGAs, SoCs, and Versal ACAPs.

The Vitis unified software development platform provides sets of open-source libraries, enabling developers to build hardware-accelerated applications without hardware knowledge. It also provides Xilinx Runtime Library (XRT), including firmware, board utilities, kernel driver, user-space libraries, and APIs. Vitis also provides an AI development environment including a deep learning framework like TensorFlow, PyTorch, and Caffe, and offers comprehensive APIs to prune, quantize, optimize, debug, and compile trained networks to achieve the highest AI inference performance.

Source: Xilinx

Softnautics have a wide range of expertise on various platforms including vision and image processing on VLIW SIMD vector processor, FPGA design development, Linux kernel driver development, Platform and Power Management Multimedia development.

Softnautics is developing high-performance Vision & ML/AI solutions using Versal ACAP by utilizing high bandwidth and configurable NoC, AI engine tile array in tandem with DMA and interconnect with PL. Versal’s high bandwidth interfaces and high compute processors can improve performance. One such use-case that Softnautics is already developing Scene Text Detection solution using Vitis AI & DPU.

The Scene Text Detection use-case demands high power compute for LSTM operations. Our AI/ML engineers’ team evaluates their design to leverage the custom memory hierarchy and the multicast stream capability on AI interconnect and AI-optimized vector instructions to gain the best performance. With a powerful AI Engine DMA capability and ping-pong buffering of stream data onto local tile memory, the ability of parallel processing opens a plethora of optimized implementations. Direct memory access (DMA) in the AI Engine tile moves data from the incoming stream(s) to local memory and from local memory to outgoing stream(s). Configuration interconnect (through memory-mapped AXI4 interface) with a shared, transaction-based switched interconnect provides access from external masters to internal AI Engine tile.

Further, cascade streams across multiple AI Engine tiles allow for greater flexibility in design by accommodating multiple ML inference instances. Along with the deep understanding of Versal ACAP memory hierarchies, AI Engine Tiles, DMA, and parallel processing, Softnautics’ extensive experience in leading ML/AI frameworks TensorFlow, PyTorch, and Caffe aids in creating end to end accelerated ML/AI pipelines with a focus on pre/post-processing of streams and model customization.

Softnautics has also been an early major contributor in Versal ACAP Platform Management related developments. Some of the key contributions in this space involve developing software components on Versal, such as platform management library (xilpm), Arm Trusted Firmware, Linux device drivers, u-boot for Platform Management.

Through our hands-on experience on Versal ACAP for AI/ML, Machine Vision & Platform Management, Softnautics can help customers take their concepts to design & deployment in a seamless fashion.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”12042″]

Versal ACAP architecture & intelligent solution design Read More »

Scroll to Top