Reduced Instruction Set Computing

Getting started with RISC-V and its Architecture

Due to high computing demands, SoCs are becoming more complex. Machine LearningMultimedia, connectivity are critical factors for this. When developing SoC, the critical decision to be made is choosing the proper Instruction Set Architecture (ISA) and the processor hardware architecture. There are many ISA available having different pros and cons. Some of them are proprietary and licensable, while some of them are open. ARM and Intel are two populate players in processor architectures.

There are different variants of architectures provided by vendors. Significant benefits of using licensed architecture are already developed software and ready to use ecosystem. However, design flexibility is minimal with these architectures. Open-source ISA offers greater flexibility, and they are free. Having an open-source also makes room for continuous improvements. People can modify them as per their requirements and contribute back to make them better.

RISC-V (Reduced Instruction Set Computing) is an open standard instruction set architecture based on Reduced Instruction Set Computing (RISC) principles. The RISC-V architecture project was started in 2010 by Prof. Krste Asanović, Prof. David Patterson, graduate students Yunsup Lee and Andrew Waterman at the UC (University of California, Berkeley).

RISC-V is a royalty-free, license-free, and high-quality based ISA sets. The RISC-V standards are maintained by the RISC-V Foundation Company. The RISC-V Foundation is a non-profit organization formed in Aug 2015 to maintain its standards publicly. Currently, more the 230+ companies have been joined the RISC-V Foundation. The RISC-V is freely available under the BSD license. Well, the RISC-V is neither a company nor a CPU implementation.

How RISC-V is different than other processors?

Freely available
Unlike other ISA designs, the RISC-V ISA is provided under free licensing, which means that without paying any fees, we can still use, modify, and distribute the same.

Open Source
As RISC-V ISA is open source, people can use them and improve them. This makes the product more reliable.

Fully Customizable
Though there may be different proprietary processor cores, customization is not possible based on the requirement. The advantage of using the RISC-V ISA (Instruction Set Architecture) is that it enables companies to develop a completely customizable product, specifically to their requirements. They can start with the RISC-V core and add whatever is based on their need. This ultimately saves their time and money, resulting in low cost and low power products which can be used for a long time.

Support for user-level ISA extensions
RISC-V is very modular. There are many standard extensions available for specific purposes which can be added to the base as per requirement. Developers can also create their non-standard extensions. Some of the standard extensions used by RISC-V are:

    • M – Integer Multiplication and Division
    • I – Integer
    • A – Atomics Operation
    • F – Single-Precision Floating Point
    • D – Double-Precision Floating Point
    • Q – Quad-Precision Floating Point
    • G – General Purpose, i.e., IMAFD
    • C – 16-bit Compressed instructions

Designed 32/64/128 bits wide support
RISC-V has different bits width support providing more flexibility for the product development.

 

Riscv processor

Apart from the above benefits, RISC-V has many vital features like multicore support, fully virtualizable for hypervisor development.

The RISC-V supports different software privileged levels
  • User Mode (U-Mode) – Generally runs user processes
  • Supervisor Mode (S-Mode) – Kernel (Including kernel modules and device drivers), Hypervisor
  • Machine Mode (M-Mode) – Bootloader and Firmware

The processor can run in only of the privilege modes at a time. The machine mode is the highest privileged mode and the only required mode. The privilege level defines the capabilities of the running software during its execution. This different level of models makes RISC-V a choice for certain & safety products.

One thing to note is RISC-V is open-source Instruction Set Architecture. Using this architecture, anybody can develop processor cores. Developed core can be free or proprietary depending on the choice of who develops it.

RISC-V can be used for a variety of applications because of its great flexibility, extensions, and possible customization. The RISC-V is suitable for all types of micro-processing systems to the super-computing system. It can fit into small memory devices consuming less memory & power. Similarly, its other variant can provide high computing capabilities. Its privileged modes can provide security features like trust zone & secure monitor calls.

Some of the applications suited for RISC-V are:
  • Machine Learning edge inference
  • Security solutions
  • IoT
RISC-V for Security Solution

Softnautics is using Lattice Semiconductor RISC-V MC CPU soft IP to develop security solutions. The below diagram illustrates it in brief.

Security solution

In Safety-critical products, RISC-V works as the root of trust to ensure the authenticity and integrity of the firmware. It provides below functionalities:

  • Protect platform firmware & critical data : It protects platform firmware and critical data from unauthorized access.
  • Ensure authenticity & integrity of Firmware : On the boot, it checks for firmware signature and verifies that it is not tempered.
  • Detect corrupted platform firmware & critical data : It checks platform firmware & critical data on boot and runtime when requested. If the platform or critical data gets corrupted for any reason, it can detect them and take corrective actions.
  • Restore corrupted platform firmware and/or critical data : If platform firmware or critical data are corrupted, it performs restoration of platform firmware and/or critical data from the backup partition based on the requirement.
  • Runtime monitoring for unintended access : During runtime, it monitors bus traffic accessing secure memories (e.g., SPI traffic) and blocks unintended access.
RISC-V for IoT solution

Gateway is developed with the RISC-V-based MCU to leverage the security features provided by RISC-V.

Riscv IoT solution

The gateway is developed to perform device management, user management, and services management. A cloud agent is developed to handle all the cloud activity from device to cloud via gateway. A device agent is developed to manage all the devices connected to the gateway. A service agent makes sure all the interface and status of the interface and connected device.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”12064″]

Getting started with RISC-V and its Architecture Read More »

Platform management

Versal ACAP Platform & Power Management

Industry Focus Towards Getting Maximum Performance with Optimum Power

With the latest evolution, performance per watt of power usage has become critical for any SoC. SoC designers are striving for energy conservation to manage the ultimate trade-off between power consumption and differentiating performance. Of greater importance are the application processors where a mere milliwatt of power consumption can make the difference between design win in the market-leading application or losing an opportunity to participate in the rapid growth market. SoC Power Management is gaining prominence in several traditional markets ranging from data center CPUs, automotive infotainment systems to IoT devices and wearables. In today’s world, SoCs are ideally expected to support a variety of use cases with optimum power levels.

Design Choices and Power Management Techniques in Complex SoCs

While CPUs, GPUs, and peripherals dominate SoC’s power management attention, other functional blocks are often overlooked. Today, interconnect technologies for accelerators & FPGA PLs also gain prominence in the SoC space with rising data center and cloud-oriented use-cases. More significant advantages have been realizable for those who have switched over to advanced Network-On-Chip (NoC) interconnect fabrics. Essential benefits of NoC are managing power consumption across  different functional blocks of the SoC and the interconnects between them, not just the CPU and GPU. Since the interconnect links all major available blocks, it provides an excellent opportunity to enhance best power management practices:

  • Dynamic frequency and voltage scaling
  • Data path optimization
  • Different power islands
  • Clock gating
  • Adaptive voltage scaling
  • Standby/leakage current management

Thus, NoC brings in a lot of advantages compared to older bus & crossbar interconnect technologies. To leverage the power management strengths of the NoC, SoCs will need a dedicated unit that can manage all the different resources depending on the application’s requirements, like Xilinx Versal ACAP’s PLM unit and ZynqMP Ultrascale+ MPSoC’s PMUFW unit.

Power management in Xilinx Versal ACAP

Source: Xilinx

Xilinx Versal ACAP has functional blocks in different power domains like LPD, FPD, PL, NoC, PMC, etc.

FP domain
Full Power Domain (FPD) is responsible for consuming high power. It mainly consists Cortex-A72 application processor for running a heavy application like Linux. It also has another high-power component like GPU, DP, PCIe, etc.

LP domain
Low Power Domain (LPD) mostly consists of small low-power peripherals like I2C, SPI, UART, etc. It also includes a cortex-R5 processor for running safety/secure/RTOS applications and on chip memories like OCM, TCM for faster memory access. This domain is also responsible for housing wakeup sources like UART, Ethernet, USB, etc.

PL domain
Programmable Logic (PL) domain includes DSP engine, Configurable logic block (CLB), Block RAM, Ultra RAM, and AI Engine blocks. This provides the improved application performance of the overall embedded system design.

NoC domain
The Network on Chip (NoC) provides the interconnects for the entire SoC units. It provides the connection between master and slave on an AXI4-based network and maps the global address space based on NoC interconnect.

PMC domain
Platform Management Controller (PMC) runs platform management firmware, which provides the platform management for the entire ACAP. The firmware running on PMC can manage the peripherals and power domains depending upon the user applications.

The power numbers of ACAP ranges from hundreds of watts to microwatts depending on the configuration. The individual power domains can be entirely powered down so that its power requirement, including static power, becomes virtually zero. Xilinx provides extensive software support for Power Management in the U-boot, ATF, Xen, Linux, and the BareMetal application.

Softnautics, with its rich experience in Platform Management on Xilinx Ultrascale+ MPSoC, was one of the early entrants into Versal ACAP platform management, supporting Xilinx extensively in developments across software layers in ATF, Xen, Linux, and BareMetal level. Going further, Softnautics has scaled up to keep some of the key end-customers of Xilinx.

Some of the key features implemented in power management on Versal ACAP by Softnautics are:

Linux suspend/resume with different wakeup sources

  • APU (Cortex-A72 runs Linux on the Versal ACAP can configure different peripherals such as UART, Ethernet (Wakeup on LAN), USB, etc. The Platform Management Controller (PMC) has been configured to handle multiple wake-up sources appropriately activated when the user suspends the system. This allows a complete power down of the Full Power Domain (FPD) domain.
  • When the wakeup source device generates a wakeup event, PMC gets the interrupt and powers up the FPD and APU running Linux.>

Xilinx’s Versal ACAP is a differentiating platform that will transform the horizon of power-hungry applications while smartly conserving energy and ensuring the best performance. Softnautics aims to continue playing key roles with its highly skilled talent pool to enable end-to-end solutions, leveraging Versal ACAP advancements in platform and power management.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”12025″]

Versal ACAP Platform & Power Management Read More »

vitis-ai-blog-softnautics

Accelerate AI applications using VITIS AI on Xilinx ZynqMP UltraScale+ FPGA

VITIS is a unified software platform for developing SW (BSP, OS, Drivers, Frameworks, and Applications) and HW (RTL, HLS, Ips, etc.) using Vivado and other components for Xilinx FPGA SoC platforms like ZynqMP UltraScale+ and Alveo cards. The key component of VITIS SDK, the VITIS AI runtime (VART), provides a unified interface for the deployment of end ML/AI applications on Edge and Cloud.

Vitis™ AI components:

  • Optimized IP cores
  • Tools
  • Libraries
  • Models
  • Example Reference Designs

Inference in machine learning is computation-intensive and requires high memory bandwidth and high performance compute to meet the low-latency and high-throughput requirements of various end applications.

Vitis AI Workflow

Xilinx Vitis AI provides an innovative workflow to deploy deep learning inference applications on Xilinx Deep Learning Processing Unit (DPU) using a simple process:

Source: Xilinx

  • The Deep Processing Unit (DPU) is a configurable computation engine optimized for convolution neural networks for deep learning inference applications and placed in programmable logic (PL). DPU contains efficient and scalable IP cores that can be customized to meet many different applications’ needs. The DPU defines its own instruction set, and the Vitis AI compiler generates instructions.
  • VITIS AI compiler schedules the instructions in an optimized manner to get the maximum performance possible.
  • Typical workflow to run any AI Application on Xilinx ZynqMP UltraScale+ SoC platform comprises:
  1. Model Quantization
  2. Model Compilation
  3. Model Optimization (Optional)
  4. Build DPU executable
  5. Build software application
  6. Integrate VITIS AI Unified APIs
  7. Compile and link the hybrid DPU application
  8. Deploy the hybrid DPU executable on FPGA
AI Quantizer

AI Quantizer is a compression tool for the quantization process by converting 32-bit floating-point weights and activations to fixed point INT8. It can reduce the computing complexity without losing accurate information for the model. The fixed point model needs less memory, thus providing faster execution and higher power efficiency than floating-point implementation.

AI Quantizer

AI Compiler

AI compiler maps a network model to a highly efficient instruction set and data flow. Input to the compiler is Quantized 8-bit neural network, and output is DPU kernel, the executable which will run on the DPU. Here, the unsupported layers need to be deployed in the CPU OR model can be customized to replace and remove those unsupported operations. It also performs sophisticated optimizations such as layer fusion, instruction scheduling and reuses on-chip memory as much as possible.

Once we get Executable for the DPU, one needs to use Vitis AI unified APIs to Initialize the data structure, initialize the DPU, implement the layers not supported by the DPU on CPU & Add the pre-processing and post-processing on a need basis on PL/PS.

AI Compiler

AI Optimiser

With its world-leading model compression technology, AI Optimizer can reduce model complexity by 5x to 50x with minimal impact on accuracy. This deep compression takes inference performance to the next level.

We can achieve desired sparsity and reduce runtime by 2.5x.

AI Optimizer

AI Profiler

AI Profiler can help profiling inference to find caveats causing a bottleneck in the end-to-end pipeline.

Profiler gives a designer a common timeline for DPU/CPU/Memory. This process doesn’t change any code and can also trace the functions and do profiling.

AI Profiler

AI Runtime

VITIS AI runtime (VART) enables applications to use unified high-level runtime APIs for both edge and cloud deployments, making it seamless and efficient. Some of the key features are:

  • Asynchronous job submission
  • Asynchronous job collection
  • C++ and Python implementations
  • Multi-threading and multi-process execution

Vitis AI also offers DSight, DExplorer, DDump, & DLet, etc., for various task execution.

DSight & DExplorer
DPU IP offers a number of configurations to specific cores to choose as per the network model. DSight tells us the percentage utilization of each DPU core. It also gives the efficiency of the scheduler so that we could tune user threads. One can also see performance numbers like MOPS, Runtime, memory bandwidth for each layer & each DPU node.

Softnautics have a wide range of expertise on various edge and cloud platforms, including vision and image processing on VLIW SIMD vector processor, FPGA, Linux kernel driver development, platform and power management multimedia development. We provide end-to-end ML/AI solutions from dataset preparation to application deployment on edge and cloud and including maintenance.

We chose the Xilinx ZynqMP UltraScale+ platform for high-performance to compute deployments. It provides the best application processing, highly configurable FPGA acceleration capabilities, and  VITIS SDK to accelerate high-performance ML/AI inferencing. One such application we targeted was face-mask detection for Covid-19 screening. The intention was to deploy multi-stream inferencing for Covid-19 screening of people wearing masks and identify non-compliance in real time, as mandated by various governments for Covid-19 precautions guidelines.

We prepared a dataset and selected pre-trained weights to design a model for mask detection and screening. We trained and pruned our custom models via the TensorFlow framework. It was a two-stage deployment of face detection followed by mask detection. The trained model thus obtained was passed through VITIS AI workflow covered in earlier sections. We observed 10x speed in inference time as compared to CPU. Xilinx provides different debugging tools and utilities that are very helpful during initial development and deployments. During our initial deployment stage, we were not getting detections for mask and Non-mask categories. We tried to match PC-based inference output with the output from one of the debug utilities called Dexplorer with debug mode & root-caused the issue to debug this further. Upon running the quantizer, we could tune the output with greater calibration images and iterations and get detections with approximation. 96% accuracy on the video feed. We also tried to identify the bottleneck in the pipeline using AI profiler and then taking corrective actions to remove the bottleneck by various means, like using HLS acceleration to compute bottleneck in post-processing.

Face Detection via AI

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”12063″]

Accelerate AI applications using VITIS AI on Xilinx ZynqMP UltraScale+ FPGA Read More »

Softnautics-Xilinx-OCR

Smart OCR solution using Xilinx Ultrascale+ and Vitis AI

The rich, precise high-level semantics embodied in the text helps understand the world around us and build autonomous-capable solutions that can be deployed in a live environment. Therefore, automatic text reading from natural environments, also known as scene text detection/recognition or PhotoOCR, has become an increasingly popular and important research topic in computer vision.

As the written form of human languages evolved, we developed thousands of unique font-families. When we add case (capitals/lower case/uni-case/small caps), skew (italic/roman), proportion (horizontal scale), weight, size-specific (display/text), swash, and serifization (serif/sans in super-families), the number grows in millions, and it makes text identification an exciting discipline for Machine Learning.

Xilinx as a choice for OCR solutions

Today, Xilinx powers 7 out of 10 new developments through its wide variety of powerful platforms and leads the FPGA-based system design trends. Softnautics chose Xilinx for implementing this solution because of the integrated Vitis™ AI stack and strong hardware capabilities.

Xilinx Vitis™ is a free and open-source development platform that packages hardware modules as software-callable functions and is compatible with standard development environments, tools, and open-source libraries. It automatically adapts software and algorithms to Xilinx hardware without the need for VHDL or Verilog expertise.

Selecting the right Xilinx Platform

The comprehensive and rich Xilinx toolset and ecosystem make prototyping a very predictable process expedites the development of the solutions to reduce overall development time by up to 70%.
Softnautics chose Xilinx Ultrascale+ platform as it offers the best of application processing and FPGA acceleration capabilities. It also provides impressive high-level synthesis capability resulting in 5x system-level performance per watt compared to earlier variants. It supports Xilinx Vitis AI that offers a wide range of capabilities to build AI inferencing using acceleration libraries.

Softnautics used Xilinx Vitis AI stack and acceleration utilizing the software to create a hybrid application and implemented LSTM functionality for effective sequence prediction by porting/migrating TensorFlow-lite to ARM. It is running on Processing Side (PS) using the N2Cube Software. Image pre- and post-processing was achieved using HLS through Vivado and Vitis was used for inferencing using CTPN (Connectionist Text Proposal Network). We eventually graduated the solution to real-time scene text detection with video pipeline and improved the model with a robust dataset.

Scene Text Detection

There are many implementations available, and new ones are being researched. Still, a series of grand challenges may still be encountered when detecting and recognizing text in the wild. The difficulties in natural scene mainly stem from three differences when compared to scripts in documents:

  • Diversity and Variability are arising from languages, colors, fonts, sizes, orientations, etc.
  • Vibrant background on which text is written
  • The aspect ratios and layouts of scene text may vary significantly

This type of solution has extensive applicability in various fields requiring real-time text detection on a video stream with higher accuracy and quick recognition. Few of these application areas are:

  • Parking validation — Cities and towns are using mobile OCR to validate if cars are parked according to city regulations automatically. Parking inspectors can use a mobile device with OCR to scan license plates of vehicles and check with an online database to see if they are permitted to park.
  • Mobile document scanning — A variety of mobile applications allow users to take a photo of a document and convert it to text. This OCR task is more challenging than traditional document scanners because photos have unpredictable image angles, lighting conditions, and text quality.
  • Digital asset management – The software helps organize rich media assets such as images, videos, and animations. A key aspect of DAM systems is the search-ability of rich media. By running OCR on uploaded images and video frames, DAM can make rich media searchable and enrich it with meaningful tags.

Softnautics team has been working on Xilinx FPGA based solutions that require design and software framework implementation. Our vast experience with Xilinx and understanding of intricacies ensured we took this solution from conceptualization to proof-of-concept within 4 weeks. Using our end-to-end solution building expertise, you can visualize your ideas with the fastest concept realization service on Xilinx Platforms and achieve greatly reduced time-to-market.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

Source: Xilinx

Smart OCR solution using Xilinx Ultrascale+ and Vitis AI Read More »

fpga-market-trends-with-next-gen-technology

FPGA Based Solutions With Evolving Technologies

Propelled into existence riding on low NRE (Non-recurring Engineering) cost, the FPGAs began as an alternative to ASIC until the customer required beyond a number of units, which was called a cross-over point at which higher NRE cost of ASIC was justifiable. Slowly the flexibility of programmable logic helped FPGA vendors create a room for their product. The IEEE article mentions that 2/3rd of the designs were losing money at one point because of changing requirements, product failures, or outright design errors.

Today, FPGAs are forming the backbone for 5G, Embedded Vision, Smart World (Cities, Factories, and so on), Cloud platforms, and safety-critical systems. FPGAs are finding application in various sectors which have been very specific about performance and compliance requirements such as defense, aerospace, and automotive.

The differentiator for FPGAs has been the flexibility and the operating range. On the one hand, FPGAs can power the high-performance cloud data-centers requiring as much as several hundred watts, and on the other hand, powering feature-light apps running low-power designs drawing 1/1000th of a watt (1mW). Thus, FPGAs can accelerate searches for Bing search engine on Microsoft at lower power consumption. And the same assembly could also be hosting specialized low-power FPGAs to run specific operations such as controlling the system, securing firmware, etc.

Machine Learning and Artificial adoption in our world is going to boost the demand for FPGAs, considering these fields are still evolving. They need flexible programmability to support agile development cycles of end use-cases. There are so many cases where low-power FPGA designs can support object detection, counting operations, key phrase detection to enable complex use-cases, which make a more durable case for mass-adoption than ever foreseeable. However, it requires FPGAs to support stringent low-power demands at much smaller form factor.

Lattice Semiconductor provides specialized low-power FPGA offering to perform computer vision and AI inference applications. Adopting a platform-based approach to product development resulting in maximum design-reuse, Lattice Semiconductor has launched platforms more frequently at a considerably lower cost. Lattice Nexus and Lattice Crosslink-NX platforms are likely to extend the low-power advantage that earlier FPGAs have enjoyed for a long time.

The Edge computing devices, a backbone of ML/AI advancements, have been constrained by battery juice and connectivity speed for fast and frequent data transmission. The AI has seen limited adoption mainly because of inadequate ability to process large datasets caused by the low transmission per set timeframe. However, with 5G adoption around the corner, this limitation will become a thing of the past. Now, with both problems, low-power and transmission speed, being addressed, these FPGA based solutions are all set to reach mass-production as more connected devices proliferate riding on the widespread 5G network.

The low-power FPGAs providers are going to see consistently higher demand for FPGA units. However, they are going to be continuously challenged to serve the incoming custom-design requirements considering large deviations in end-use. That’s where the boutique design houses can offload the FPGA companies and become the enablers for the mass-adoption of AI applications running on FPGAs. Someone who can handle RTL complexities, build required ML firmware, edit drivers, and get the system working for the desired ML use case.

Know how Softnautics can help you design FPGA-Powered ML solution for your use-case.

FPGA Based Solutions With Evolving Technologies Read More »

Tracking Social Distance What Enables Those Red and Green Boxes

Tracking Social Distance: What Enables Those Red and Green Boxes

Xilinx research shows its Ultrascale+TM XCVU13P FPGA (38.3 INT8 TOP/s) platform provides the same computing power as Tesla P40 (40 INT8 TOP/s) but flexibility and on-chip memory on Xilinx device results in significantly higher computing capability for different workloads and apps. Another Xilinx research in a general-purpose compute efficiency shows its Virtex Ultrascale+ performing 4x better than Nvidia Tesla V100. The scales tilt further in favor of FPGAs when we consider functional safety demanded by safety-critical aviation, autonomous automotive, and defense applications, such as ADAS.

Embedded Vision requires the machines which have the ability to see, sense, and quickly respond to challenges the hardware designers create next-gen architecture that is highly differentiated and extremely responsive to adapt to ever-evolving algorithms and image sensors.

If the ones mentioned above are computing power-intensive use-cases that need to utilize custom neural networks (CNNs/DNNs), the other side of the spectrum requires extremely low-power operations with a flexible solution building approach.

In pursuit of maximizing the efficiency of machines to achieve higher throughput of operations, the Industrial Internet of Things (IIoT) is driving Industry 4.0. Such applications require a combination of software programmability coupled with real-time processing of sensor data to leverage any-to-any connectivity in a secure and safe manner. The flexible nature of FPGA programmability and low-power consumption make FPGAs a perfect choice for Industry 4.0 solutions.

It is just the beginning of how FPGA platforms are powering the ML solutions that are likely to see mass-adoption and become household essentials as we create diverse use-cases to help people, businesses, and governments to make the world a safer and smarter place
Know how Softnautics can help you design FPGA-Powered ML solution for your use-case.

Tracking Social Distance: What Enables Those Red and Green Boxes Read More »

Scroll to Top