Optimizing Embedded software for real-time multimedia processing

The demands of multimedia processing are diverse and ever-increasing. Modern consumers expect nothing less than immediate and high-quality audio and video experiences. Everyone wants their smart speakers to recognize their voice commands swiftly, their online meetings to be smooth, and their entertainment systems to deliver clear visuals and audio. Multimedia applications are now tasked with handling a variety of data types simultaneously, such as audio, video, and text, and ensuring that these data types interact seamlessly in real-time. This necessitates not only efficient algorithms but also an underlying embedded software infrastructure capable of rapid processing and resource optimization. The global embedded system market is expected to reach around USD 173.4 billion by 2032, with a 6.8% CAGR. Embedded systems, blending hardware and software, perform specific functions and find applications in various industries. The growth is fuelled by the rising demand for optimized embedded software solutions.

The demands on these systems are substantial, and they must perform without glitches. Media and entertainment consumers anticipate uninterrupted streaming of high-definition content, while the automotive sector relies on multimedia systems for navigation, infotainment, and in-cabin experiences. Gaming, consumer electronics, security, and surveillance are other domains where multimedia applications play important roles.

Understanding embedded software optimization

Embedded software optimization is the art of fine-tuning software to ensure that it operates at its peak efficiency, responding promptly to the user’s commands. In multimedia, this optimization is about enhancing the performance of software that drives audio solutions, video solutions, multimedia systems, infotainment, and more. Embedded software acts as the bridge between the user’s commands and the hardware that carries them out. It must manage memory, allocate resources wisely, and execute complex algorithms without delay. At its core, embedded software optimization is about making sure every bit of code is utilized optimally.

Performance enhancement techniques

To optimize embedded software for real-time multimedia processing, several performance enhancement techniques come into play. These techniques ensure the software operates smoothly and at the highest possible performance.

  • Code optimization: Code optimization involves the meticulous refinement of software code to be more efficient. It involves using algorithms that minimize processing time, reduce resource consumption, and eliminate duplication.
  • Parallel processing: Parallel processing is an invaluable technique that allows multiple tasks to be executed simultaneously. This significantly enhances the system’s ability to handle complex operations in real-time. For example, in a multimedia player, parallel processing can be used to simultaneously decode audio and video streams, ensuring that both are in sync for a seamless playback experience.
  • Hardware acceleration: Hardware acceleration is a game-changer in multimedia processing. It involves assigning specific tasks, such as video encoding and decoding, to dedicated hardware components that are designed for specific functions. Hardware acceleration can dramatically enhance performance, particularly in tasks that involve intensive computation, such as video rendering and AI-based image recognition.

Memory management

Memory management is a critical aspect of optimizing embedded software for multimedia processing. Multimedia systems require quick access to data, and memory management ensures that data is stored and retrieved efficiently. Effective memory management can make the difference between a smooth, uninterrupted multimedia experience and a system prone to lags and buffering.

Efficient memory management involves several key strategies.

  • Caching: Frequently used data is cached in memory for rapid access. This minimizes the need to fetch data from slower storage devices, reducing latency.
  • Memory leak prevention: Memory leaks, where portions of memory are allocated but never released, can gradually consume system resources. Embedded software must be precisely designed to prevent memory leaks.
  • Memory pools: Memory pools are like pre-booked sectors of memory space. Instead of dynamically allocating and deallocating memory as needed, memory pools reserve sectors of memory in advance. This proactive approach helps to minimize memory fragmentation and reduces the overhead associated with constantly managing memory on the fly.

Optimized embedded software for real-time multimedia processing

Real-time communication

Real-time communication is the essence of multimedia applications. Embedded software must facilitate immediate interactions between users and the system, ensuring that commands are executed without noticeable delay. This real-time capability is fundamental to providing an immersive multimedia experience.

In multimedia, real-time communication encompasses various functionalities. For example, video conferencing ensures that audio and video streams remain synchronized, preventing any awkward lags in communication. In gaming, it enables real-time rendering of complex 3D environments and instantaneous response to user input. The seamless integration of real-time communication within multimedia applications not only ensures immediate responsiveness but also underpins the foundation for an enriched and immersive user experience across diverse interactive platforms.

The future of embedded software in multimedia

The future of embedded software in multimedia systems promises even more advanced features. Embedded AI solutions are becoming increasingly integral to multimedia, enabling capabilities like voice recognition, content recommendation, and automated video analysis. As embedded software development in this domain continues to advance, it will need to meet the demands of emerging trends and evolving consumer expectations.

In conclusion, optimizing embedded software for real-time multimedia processing is a subtle and intricate challenge. It necessitates a deep comprehension of the demands of multimedia processing, unwavering dedication to software optimization, and the strategic deployment of performance enhancement techniques. This ensures that multimedia systems can consistently deliver seamless, immediate, and high-quality audio and video experiences. The embedded software remains the driving force behind the multimedia solutions that have seamlessly integrated into our daily lives.

At Softnautics, a MosChip company, we excel in optimizing embedded software for real-time multimedia processing. Our team of experts specializes in fine-tuning embedded systems & software to ensure peak efficiency, allowing seamless and instantaneous processing of audio, video, and diverse media types. With a focus on enhancing performance in multimedia applications, our services span across designing audio/video solutions, multimedia systems & devices, media infotainment systems, and more. Operating on various architectures and platforms, including multi-core ARM, DSP, GPUs, and FPGAs, our embedded software optimization stands as a crucial element in meeting the evolving demands of the multimedia industry.

Read our success stories to know more about our multimedia engineering services.

Contact us at business@softnautics.com for any queries related to your solution design or for consultancy.

[elementor-template id=”13534″]

Optimizing Embedded software for real-time multimedia processing Read More »

Inside HDR10: A technical exploration of High Dynamic Range

High Dynamic Range (HDR) technology has taken the world of visual entertainment, especially streaming media solutions, by storm. It’s the secret sauce that makes images and videos look incredibly lifelike and captivating. From the vibrant colors in your favourite movies to the dazzling graphics in video games, HDR has revolutionized how we perceive visuals on screens. In this blog, we’ll take you on a technical journey into the heart of HDR, focusing on one of its most popular formats – HDR10. “Breaking down all the complex technical details into simple terms, this blog aims to help readers understand how HDR10 seamlessly integrates into streaming media solutions, working its magic.

What is High Dynamic Range 10 (HDR10)?

HDR 10 is the most popular and widely used HDR standard for consuming digital content. Every TV that is HDR enabled is compatible with HDR10. In the context of video, it primarily provides a significantly enhanced visual experience compared to standard dynamic range (SDR).

Standard Dynamic Range:

People have experienced visuals through SDR for a long time, which has a limited dynamic range. This limitation means SDR cannot capture the full range of brightness and contrast perceivable by the human eye. One can consider it the ‘old way’ of watching movies and TV shows. However, the discovery of HDR has changed streaming media by offering a much wider dynamic range, resulting in visuals that were more vivid and lifelike.

Understanding the basic visual concepts

Luminance and brightness: Luminance plays a pivotal role in our perception of contrast and detail in image processing. Higher luminance levels result in objects appearing brighter and contribute to the creation of striking highlights and deep shadows in HDR content. Luminance, measured in units called “nits,” is a scientific measurement of brightness. In contrast, brightness, in the context of how we perceive it, is a subjective experience influenced by individual factors and environmental conditions. It is how we interpret the intensity of light emitted or reflected by an object. It can vary from person to person.

Luminance comparison between SDR and HDR

Color-depth (bit-depth): Color-depth, often referred to as bit-depth, is a fundamental concept in digital imaging and video solutions. It determines the richness and accuracy of colors in digital content. This metric is quantified in bits per channel, effectively dictating the number of distinct colors that can be accurately represented for each channel. Common bit depths include 8-bit, 10-bit, and 12-bit per channel. Higher bit depths allow for smoother color transitions and reduce color banding, making them crucial in applications like photography, video editing, and other video solutions. However, it’s important to note that higher color depths lead to larger file sizes due to the storage of more color information.

Possible colors with SDR and HDR Bit-depth

Color-space: Color-space is a pivotal concept in image processing, display technology, and video solutions. It defines a specific set of colors that can be accurately represented and manipulated within a digital system. This ensures consistency and accuracy in how colors are displayed, recorded, and interpreted across different devices and platforms. Technically, it describes how an array of pixel values should be displayed on a screen, including information about pixel value storage within a file, the range, and the meaning of those values. Color spaces are essential for faithfully reproducing a wide range of colors, from the deep blues to the rich red colors, resulting in visuals that are more vibrant and truer to life. A color space is akin to a palette of colors available for use and is defined by a range of color primaries represented as points within a three-dimensional color space diagram. These color primaries determine the spectrum of colors that can be created within that color space. A broader color gamut includes a wider range of colors, while a narrower one offers a more limited selection. Various color spaces are standardized to ensure compatibility across different devices and platforms.

CIE Chromaticity Diagram representing Rec.709 vs Rev2020

Dynamic range: Dynamic range relates to the contrast between the highest and lowest values that a specific quantity can include. This idea is commonly used in the domains of signals, which include sound and light. In the context of images, dynamic range determines how the brightest and darkest elements appear within a picture and the extent to which a camera or film can handle varying levels of light. Furthermore, it greatly affects how different aspects appear in a developed photograph, impacting the interplay between brightness and darkness. Imagine dynamic range as a scale, stretching from the soft glow of a candlelit room to the brilliance of a sunlit day. In simpler terms, dynamic range allows us to notice fine details in shadows and the brilliance of well-lit scenes in both videos and images.

Dynamic Range supported by SDR and HDR Displays

Difference between HDR and SDR

Aspect HDR SDR
Luminance Offers a broader luminance range, resulting in brighter highlights and deeper black for more lifelike visuals. Limited luminance range can lead to less dazzling bright areas and shallower dark scenes.
Color depth Provides a 10-bit color depth per channel, allowing finer color gradations and smoother transitions between colors. Offers a lower color depth, resulting in fewer color gradations and potential color banding.
Color space Incorporates a wider color gamut like BT.2020, reproducing more vivid and lifelike colors. Typically uses the narrower BT.709 color space, offering a more limited color range.
Transfer function Utilizes the perceptual quantizer (PQ) as a transfer function, accurately representing luminance levels from 10,000 cd/m^2 down to 0.0001 nits. Relies on a gamma curve for transfer function, which may not accurately represent extreme luminance levels.

Metadata in HDR10
HDR10 utilizes the PQ EOTF, BT2020 WCG, and ST2086 + Max FALL + Max CLL static metadata. The HDR10 metadata structure follows the ITU Series H Supplement 18 standard for HDR and Wide Color Gamut (WCG). There are three HDR10-related Video Usability Information (VUI) parameters: color primaries, transfer characteristics, and matrix coefficients. This VUI metadata is contained in the Sequence Parameter Set (SPS) of the intra-coded frames.

In addition to the VUI parameters, there are two HDR10-related Supplemental Enhancement Information (SEI) messages. Mastering Display Color Volume (MDCV) and Content Light Level (CLL).

  • Mastering Display Color Volume (MDCV):
    MDCV or “Mastering Display Color Volume” is indeed an important piece of metadata within the HDR10 standard, also known as ST2086. This metadata plays a significant role in ensuring that HDR content is displayed optimally on different HDR-compatible screens.
  • Max Content Light Level (MaxCLL):
    MaxCLL specifies the maximum brightness level in nits (cd/m²) for any individual frame or scene within the content. It helps your display adjust its settings for specific, exceptionally bright moments.
  • Max Frame-Average Light Level (MaxFALL):
    MaxFALL indicates the maximum frame-average brightness level in nits across the entire content, including the brightest frames. It ensures that your display can correctly reproduce the content’s overall brightness. MaxFALL complements MaxCLL by indicating the maximum frame-average brightness level across the entire content. It prevents excessive dimming or over-brightening, creating a consistent and immersive viewing experience.
  • Transfer function (EOTF – electro-optical transfer function):
    The EOTF, often based on the ST-2084 PQ curve, dictates how luminance values are encoded in the content and decoded by your display. It ensures that brightness levels are presented accurately on your screen. EOTF defines how luminance values are encoded in the content and decoded by your display.

Sample HDR10 metadata parsed using ffprobe:

Future of HDR10 & competing HDR formats

The effectiveness of HDR10 implementation is closely tied to the quality of the TV used for viewing. When applied correctly, HDR10 enhances the visual appeal of video content. However, there are other HDR formats gaining popularity, such as HDR10+, HLG (Hybrid Log-Gamma), and Dolby Vision. These formats have gained prominence due to their ability to further enhance the visual quality of videos.
Competing HDR formats, like Dolby Vision and HDR10+, are gaining popularity due to their utilization of dynamic metadata. Unlike HDR10, which relies on static metadata for the entire content, these formats adjust brightness and color information on a scene-by-scene or even frame-by-frame basis. This dynamic metadata approach delivers heightened precision and optimization for each scene, ultimately enhancing the viewing experience. The rivalry among HDR formats is fueling innovation in the HDR landscape as each format strives to surpass the other in terms of visual quality and compatibility. This ongoing competition may lead to the emergence of new technologies and standards, further expanding the possibilities of what HDR can achieve.
To sum it up, HDR10 isn’t just a buzzword, it’s a revolution in how we experience visuals in any multimedia solution. It’s the technology that takes your screen from good to mind-blowingly fantastic. HDR10 is very popular because there are no licensing fees (compared to other HDR standards) and is widely adopted by many companies and there are lot of equipment out there already. So, whether you’re a movie buff, gamer, or just someone who appreciates the beauty of visuals, HDR10 is your backstage pass to a world of incredible imagery.
With continuous advancements in technology, we at Softnautics, a MosChip Company, help businesses across various industries to provide intelligent media solutions involving the simplest to the most complex multimedia technologies. We have hands-on experience in designing high-performance media applications, architecting complete video pipelines, audio/video codecs engineering, audio/video driver development, and multimedia framework integration. Our multimedia engineering services are extended across industries ranging from Media and entertainment, Automotive, Gaming, Consumer Electronics, Security, and Surveillance.

Read our success stories related to intelligent media solutions to know more about our multimedia engineering services.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy

[elementor-template id=”14177″]

Inside HDR10: A technical exploration of High Dynamic Range Read More »

Audio Validation in Multimedia Systems and its Parameters

In the massive world of multimedia, sound stands as a vital component that adds depth to the overall encounter. Whether it’s streaming services, video games, or virtual reality, sound holds a crucial role in crafting immersive and captivating content. Nevertheless, ensuring top-notch audio quality comes with its own set of challenges. This is where audio validation enters the scene. Audio validation involves a series of comprehensive tests and assessments to guarantee that the sound in multimedia systems matches the desired standards of accuracy and quality. Market Research Future predicts that the consumer audio market sector will expand from a valuation of USD 82.1 billion in 2023 to approximately USD 274.8 billion by 2032. This growth trajectory indicates a Compound Annual Growth Rate (CAGR) of about 16.30% within the forecast duration of 2023 to 2032.

What is Audio?

Audio is an encompassing term that refers to any sound or noise perceptible to the human ear, arising from vibrations or waves at frequencies within the range of 20 to 20,000 Hz. These frequencies form the canvas upon which the symphony of life is painted, encompassing the gentlest whispers to the most vibrant melodies, weaving a sonic tapestry that enriches our auditory experiences and connects us to the vibrancy of our surroundings.

There are two types of audio.

  • Analog audio
    Analog audio refers to the representation and transmission of sound as continuous, fluctuating electrical voltage or current signals. These signals directly mirror the variations in air pressure caused by sound waves, making them analogous to the original acoustic translating the “analogous” nature of sound into electrical signals. When recorded in an analog format, what is heard corresponds directly to what is stored, maintaining the continuous waveforms but amplitude can differ which is measured in decibels (dB).
  • Digital audio
    Digital audio is at the core of modern audio solutions. It represents sound in a digital format, allowing audio signals to be transformed into numerical data that can be stored, processed, and transmitted by computers and digital devices. Unlike analog audio, which directly records sound wave fluctuations, digital audio relies on a process known as analog-to-digital conversion (ADC) to convert the continuous analog waveform into discrete values. These values, or samples, are then stored as binary data, enabling precise reproduction and manipulation of sound. Overall, digital audio offers advantages such as ease of storage, replication, and manipulation, making it the foundation of modern communication systems and multimedia technology.

Basic measurable parameters of audio

Frequency

Frequency, a fundamental concept in sound, measures the number of waves passing a fixed point in a specific unit of time. Typically measured in Hertz (Hz), it represents the rhythm of sound. Mathematically, frequency (f) is inversely proportional to the time period (T) of one wave, expressed as f = 1/T.

Sample rate

Sample rate refers to the number of digital samples captured per second to represent an audio waveform. Measured in Hz, it dictates the accuracy of audio reproduction. For instance, a sample rate of 44.1kHz means that 44,100 samples are taken each second, enabling the digital representation of the original sound. Different audio sample rates are 8kHz, 16kHz, 24kHz, 48kHz, etc.

Bit depth or word size

Bit depth, also known as word size, signifies the number of bits present in each audio sample. This parameter determines the precision with which audio is represented digitally. A higher bit depth allows for a finer representation of the sound’s amplitude and nuances. Common options include 8-bit, 16-bit, 24-bit, and 32-bit depths.

Decibels (dB)

Decibels (dB) are logarithmic units employed to measure the intensity of sound or the ratio of a sound’s power to a reference level. This unit allows us to express the dynamic range of sound, spanning from the faintest whispers to the loudest roars.
Amplitude

Amplitude relates to the magnitude or level of a signal. In the context of audio, it directly affects the volume of sound. A higher amplitude translates to a louder sound, while a lower amplitude yields a softer tone. Amplitude shapes the auditory experience, from delicate harmonies to thunderous crescendos.

Root mean square (RMS) power

RMS power is a vital metric that measures amplitude in terms of its equivalent average power content, regardless of the shape of the waveform. It helps to quantify the energy carried by an audio signal and is particularly useful for comparing signals with varying amplitudes.

General terms used in an audio test

Silence

It denotes the complete absence of any audible sound. It is characterized by a flat line on the waveform, signifying zero amplitude. This void of sound serves as a stark contrast to the rich tapestry of auditory experiences.

Echo

An echo is an auditory effect that involves the repetitive playback of a selected audio, each iteration softer than the previous one. This phenomenon is achieved by introducing a fixed delay time between each repetition. The absence of pauses between echoes creates a captivating reverberation effect, commonly encountered in natural environments and digital audio manipulation.

Clipping

Clipping is a form of distortion that emerges when audio exceeds its dynamic range, often due to excessive loudness. When waveforms surpass the 0 dB limit, their peaks are flattened at this ceiling. This abrupt truncation not only results in a characteristic flat top but also alters the waveform’s frequency content, potentially introducing unintended harmonics.

DC-Offset

It is an alteration of a signal’s baseline from its zero point. In the waveform view, this shift is observed as the signal not being centred on the 0.0 horizontal line. This offset can lead to distortion and affect subsequent processing stages, warranting careful consideration in audio manipulation.

Gain

Gain signifies the ratio of output signal power to input signal power. It quantifies how much a signal is amplified, contributing to variations in its amplitude. Expressed in decibels (dB), positive gain amplifies the signal’s intensity, while negative gain reduces it, influencing the overall loudness and dynamics.

Harmonics

Harmonics are spectral components that occur at exact integer multiples of a fundamental frequency. These multiples contribute to the timbre and character of a sound, giving rise to musical richness and complexity. The interplay of harmonics forms the basis of musical instruments’ distinct voices.

Frequency response

Frequency response offers a visual depiction of how accurately an audio component reproduces sound across the audible frequency range. Represented as a line graph, it showcases the device’s output amplitude (in dB) against frequency (in Hz). This curve provides insights into how well the device captures the intricate nuances of sound.

Amplitude response

It measures the gain or loss of a signal as it traverses an audio system. This measure is depicted on the frequency response curve, showcasing the signal’s level in decibels (dB). The amplitude response unveils the system’s ability to faithfully transmit sound without distortion or alteration.

Types of testing performed for audio

Signal-to-noise ratio (SNR)

Testing the Signal-to-noise ratio (SNR) is a fundamental step in audio validation. It assesses the differentiation between the desired audio signal and the surrounding background noise, serving as a crucial metric in audio quality evaluation. SNR quantifies audio fidelity and clarity by calculating the ratio of signal power to noise power, typically expressed in decibels (dB). Higher SNR values signify a cleaner and more comprehensible auditory experience, indicating that the desired signal stands out prominently from the background noise. This vital audio parameter can be tested using specialized equipment (like audio analyzers and analog-to-digital converters) and software tools, ensuring that audio systems deliver optimal clarity and quality.

Latency

Latency refers to the time delay between initiating an audio input and its corresponding output, plays a pivotal role in audio applications where synchronization and responsiveness are critical, like live performances or interactive systems. Achieving minimal latency, often measured in milliseconds, is paramount for ensuring harmony between user actions and audio responses. Rigorous latency testing, employing methods such as hardware and software measurements, round-trip tests, real-time monitoring, and buffer size adjustments, is essential. Additionally, optimizing both software and hardware components for low-latency performance and conducting tests in real-world scenarios are crucial steps. These efforts guarantee that audio responses remain perfectly aligned with user interactions, enhancing the overall experience in various applications.

Audio synchronization

It is a fundamental element that harmonizes the outputs of various audio channels. This test ensures that multiple audio sources, such as those in surround sound setups, are precisely aligned in terms of timing and phase. The goal is to eliminate dissonance or disjointedness among the channels, creating a unified and immersive audio experience. By validation synchronization, audio engineers ensure that listeners are enveloped in a seamless soundscape where every channel works in concert.

Apart from the above tests we also need to validate audio according to the algorithm used for audio processing.

Audio test setup

Now let us see the types of audio distortions.

Phase Distortion

Phase distortion is a critical consideration when dealing with audio signals. It measures the phase shift between input and output signals in audio equipment. To control phase distortion, it’s essential to use high-quality components in audio equipment and ensure proper signal routing.

Clipping Distortion

Clipping distortion is another type of distortion that can degrade audio quality. This distortion occurs when the signal exceeds the maximum amplitude that a system can handle. To prevent clipping distortion, it’s important to implement a limiter or compressor in the audio chain. These tools can control signal peaks, preventing them from exceeding the clipping threshold. Additionally, adjusting input levels to ensure they stay within the system’s operational range is crucial for managing and mitigating clipping distortion.

Harmonic Distortion

Harmonic distortion introduces unwanted harmonics into audio signals, which can negatively impact audio quality. These harmonics can be odd or even, with “odd” harmonics having frequencies that are an odd number of times higher than the fundamental frequency, and even harmonics having an even number of times higher frequency. To mitigate harmonic distortion, it’s advisable to use high-quality amplifiers and speakers that produce fewer harmonic distortions.

The commonly used test file will have sine tone, sine sweep, pink noise, and white noise.

There are different tools to create, modify, play or analyze audio files. Below are few of them.

Adobe Audition

Adobe Audition is a comprehensive toolset that includes multitrack, waveform, and spectral display for creating, mixing, editing, and restoring audio content.

Audacity
Audacity is a free and open-source digital audio editor and recording application software, available for Windows, macOS, Linux, and other Unix-like operating systems.

There are many devices nowadays involving the application of audio. Devices are headphones, soundbars speakers, earbuds, or devices with audio processors to process different audio algorithms (i.e., Noise cancellation, Voice wake, ANC, etc.).

At Softnautics, a MosChip company, we understand the importance of audio validation in multimedia systems. Our team of media experts enable businesses to design and develop multimedia systems and solutions involving media infotainment systems, audio/video solutions, media streaming, camera-enabled applications, immersive solutions, and more on diverse architectures and platforms including multi-core ARM, DSP, GPUs, and FPGAs. Our multimedia engineering services are extended across industries ranging from Gaming & Entertainment, Automotive, Security, and Surveillance.

Read our success stories related to multimedia engineering to know more about our services.

Contact us at business@softnatics.com for any queries related to your media solution or for consultancy.

[elementor-template id=”13959″]

Audio Validation in Multimedia Systems and its Parameters Read More »

An Industrial Overview of Open Standards for Embedded Vision and Inferencing (2)

An Industrial Overview of Open Standards for Embedded Vision and Inferencing

Embedded Vision and Inferencing are two critical technologies for many modern devices such as drones, autonomous cars, industrial robots, etc. Embedded vision uses computer vision to process images, video, and other visual data, while inferencing is the process of making decisions based on collected data without having to explicitly program each decision step with the help of diverse architectures and platforms including multi-core ARM, DSP, GPUs, and FPGAs to provide a comprehensive foundation for developing multimedia systems. Open standards are essential to the development of interoperable systems, and they allow for transparency in the development process while providing a level playing field for all participants.

The global market for embedded vision and inferencing was valued at USD 11.22 billion in 2021, with a compound annual growth rate (CAGR) of around 7% from 2022 to 2030. It’s worth noting that the market share for embedded vision and inferencing is likely to continue evolving rapidly as these technologies are becoming increasingly pivotal for next-gen solutions across industries.

Adoption of Open Standards

Open Standards have been widely adopted by the industry, with many companies adopting them as their default. The benefits of open standards are numerous and include:

  • Reduced cost for development and integration
  • Increased interoperability between systems and components
  • Reduced time to market

Open Standards for Embedded Vision

Open standards for embedded vision are a critical component of the Internet of Things (IoT). They provide a common language that allows devices to communicate with each other, regardless of manufacturer or operating system. Open standards for embedded vision include OpenCV, OpenVX, and OpenCL.

OpenCV is a popular open-source computer vision library that includes over 2,500 optimized algorithms for image and video analysis. It is used in applications such as object recognition, facial recognition, and motion tracking. OpenVX is an open-standard API for computer vision that enables performance and power-optimized implementation, reducing development time and cost. It provides a common set of high-level abstractions for applications to access hardware acceleration for vision processing. OpenCL is an open standard for parallel programming across CPUs, GPUs, and other processors that provides a unified programming model for developing high-performance applications. It enables developers to write code that can run on a variety of devices, making it a popular choice for embedded vision applications.

Open Standards for Embedded Vision & Inferencing

Open Standards for Inferencing

Open standards for inferencing are also essential for the development of intelligent systems. One of the most important open standards is the Open Neural Network Exchange (ONNX), which describes how deep learning models can be exported and imported between frameworks. It is currently supported by major frameworks like TensorFlow, PyTorch, and Caffe2.

ONNX enables interoperability between deep learning frameworks and inference engines, which is critical for the development of intelligent systems that can make decisions based on collected data. It provides a common format for representing deep learning models, making it easier for developers to build and deploy models across different platforms and devices.

Another important open standard for inferencing is the Neural Network Exchange Format (NNEF), which enables interoperability between deep learning frameworks and inference engines. It provides a common format for deploying and executing trained neural networks on various devices. It allows developers to build models using their framework of choice and deploy them to a variety of devices, making it easier to build intelligent systems that can make decisions based on collected data.

Future of Open Standards

The future of open standards for embedded vision and inferencing is bright, but there are challenges ahead. One of the biggest challenges is the lack of support for open-source software in embedded systems. This means that open-source libraries and frameworks cannot be used on devices with limited memory and processing power. Another challenge is the wide variety of processors and operating systems available, making it difficult to create a standard that works across all devices. However, there are initiatives underway like hardware innovation and algorithm optimization to address these challenges.

Industry Impact of Open Standards

Open standards have a significant impact on the industry and consumers. The use of open standards has enabled an ecosystem to develop around machine learning and deep learning algorithms, which is essential for innovation in this space. Open-source software has been instrumental in accelerating the adoption of AI across industries, including automotive, consumer, financial services, manufacturing, and healthcare.

Open standards also have a direct impact on consumers by lowering costs, increasing interoperability, and improving security across devices. For example, standards enable companies to build products using fewer components than proprietary solutions require, reducing costs for manufacturers and end-users who purchase products with embedded vision technology built-in (e.g., cameras).

Open Standards are the key to unlocking the full potential of embedded vision and inferencing. They allow developers to focus on their applications, rather than on the underlying hardware or software platforms. Open standards also provide a level playing field for all types of companies – from startup to large enterprises – to compete in this growing market. Overall, open standards are crucial for unlocking the full potential of embedded vision and inferencing for designing next-gen solutions across various industries.

At Softnautics, we help business across various industries to design embedded multimedia solutions based on next-gen technologies like computer vision, deep learning, cognitive computing and more. We also have hands-on experience in designing high-performance media applications, architect complete video pipelines, audio/video codecs engineering, applications porting, ML model design, train, optimize, test, and deploy.

Read our success stories related to intelligent media solutions to know more about our multimedia engineering services.

Contact us at business@softnautics.com for your solution design or consultancy.

[elementor-template id=”13562″]

An Industrial Overview of Open Standards for Embedded Vision and Inferencing Read More »

Video analytics

Applications and Opportunities for Video Analytics

In recent years, the development of video analytics based video solutions have come up as a high-end technology that has changed the way we interpret and analyze video data. Video analytics uses the most advanced algorithms and artificial intelligence (AI) to track the behaviour and understand the data in real-time, allowing to automate necessary actions. This technology has found many applications across different industries, providing valuable insights, intensifying security, improving the safety and optimizing operations. According to Verified Market Research group, video analytics is experiencing rapid market growth, with its global market value to reach USD 35.88 billion by 2030, representing a CAGR of 21.5% from its valuation of USD 5.65 billion in 2021. This growth trend highlights the increasing demand for video analytics solutions as organizations seek to enhance their security and surveillance systems. 

Video analytics is closely related to video processing which is an essential part of any multimedia solution, as it involves the extraction of meaningful insights and information from video data using various computational techniques. Video analytics leverages the capabilities of video processing to analyze and interpret video content, enabling computers to automatically detect, track, and understand objects, events, and patterns within video streams. Video processing techniques are used in a wide range of applications, including surveillance, video streaming, multimedia communication, autonomous vehicles, medical imaging, entertainment, virtual reality, and many more.

In this article, we will see some Industrial applications and use cases of video analytics in different areas.

Industrial Applications of Video Analytics
Automotive
One of the most used and inescapable uses of video analytics is in the automotive industry for Advanced Driver Assistance System (ADAS) in highly-automate vehicles (HAVs). The HAVs use multiple cameras to identify pedestrians, traffic signals, other vehicles, lanes, and other indicators, they are integrated with the ECU and programmed in such a way as to identify the real-time situation and then respond accordingly. Automating this process, requires integration of various system on a chip (SoC). These chipsets help actuators to connect with the sensors through interface and ECUs. It analyses the data with deep learning based machine learning (ML) models that uses neural networks to learn patterns in data. Neural networks are structured with layers of interconnected processing nodes, typically comprising multiple layers. This deep learning algorithms are used to detect and track objects in real-time videos, as well as to recognize specific actions. 

Sports
In the sports industry, video analytics is being utilized by coaches, personal trainers, and professional athletes to optimize performance through data-driven insights. In sports such as rugby and soccer, tracking metrics like ball possession and the number of passes has become a standard practice for understanding game patterns and team performance. Detailed research on a soccer game has shown that analyzing ball possession can even impact the outcome of a match. Video analytics can be used to gain insights into the playing style, strategies, passing patterns, and weaknesses of the opponent team, enabling a better understanding of their gameplay.

Video Analytics Applications

Retail
Intelligent video analytics is a valuable tool for retailers to monitor storefront events and promptly respond to improve the customer experience. Real-time video is captured by cameras, which cover areas such as shelf inventory, curbside pickup, and cashier queues. On-site IoT Edge devices analyze the video data in real-time to detect key metrics, such as the number of people in checkout queues, empty shelf space, or cars in the parking lot.

Anomaly events can be avoided by metrics analysis, alerting store managers or stock supervisors to take corrective actions. Additionally, video clips or events can be stored in the cloud for long-term trend analysis, providing valuable insights for future decision-making.

Health Care
Video analytics has emerged as a transformative technology in the field of healthcare, offering significant benefits in patient care and operational efficiency. By utilizing cutting-edge machine learning algorithms and computer vision, these systems can analyze video data in real-time to automatically detect and interpret various diseases into human body. It can also be leveraged for patient monitoring, detecting emergencies, identifying wandering behaviour in dementia patients, and analyzing crowd behaviour in waiting areas. These capabilities enable healthcare providers to proactively address potential issues, optimize resource allocation, and enhance patient safety, leading to improved patient outcomes and a higher quality of care. With ongoing advancements in technology, video analytics is poised to play a crucial role in shaping the future of healthcare, making it more intelligent, efficient, and patient-centric.

To summarize, video analytics is a rapidly growing field that leverages various technologies such as computer vision, deep learning, image and video processing, motion detection and tracking, and data analysis to extract valuable insights. Video analytics has found applications in diverse domains, including security and surveillance, healthcare, automotive, sports, and others. By automating the analysis of video data, video analytics enables organizations to efficiently process large amounts of visual information, identify patterns and behaviours, and make data-driven decisions in more effective and less expensive.

With continuous advancements in technology, we at Softnautics help businesses across various industries to provide intelligent media solutions involving the simplest to the most complex multimedia technologies. We have hands-on experience in designing high-performance media applications, architect complete video pipelines, audio/video codecs engineering, applications porting, ML model design, optimize, test and deploy.

We hope you enjoyed this article and got a better understanding of how video analytics based intelligent solutions can be implemented for various businesses to automate processes, improve efficiency/accuracy, and take better decisions.

Read our success stories related to intelligent media solutions to know more about our multimedia engineering services.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”13562″]

 

Applications and Opportunities for Video Analytics Read More »

Multimedia and Artificial Intelligence

Multimedia Intelligence: Confluence of Multimedia and Artificial Intelligence

In contrast to traditional mass media, such as printed material or audio recordings, which feature little to no interaction between users, a multimedia is a form of communication that uses a combination of different content forms such as audio, text, animations, images, or video into a single interactive presentation. This definition now seems outdated because coming to 2022, multimedia has just exploded with more complex forms of interactions. Alexa, Google Assistant, Twitter, Snapchat, Instagram Reels, and many more such apps are becoming a daily part of the common man’s life. Such an explosion of multimedia and the rising need for artificial intelligence are bound to collide, and that is where multimedia intelligence comes into picture. Multimedia market is being driven forward by the increasing popularity of virtual creation in the media and entertainment industries, as well as its ability to create high-definition graphics and real-time virtual worlds. The growth is such that between 2022 to 2030, the global market for AI in media & entertainment is anticipated to expand at a 26.9% CAGR and reach about USD 99.48 billion, as per the Grand View Research, Inc. reports.

What is multimedia intelligence?

The rise and consumption of ever-emerging multimedia applications and services are churning out so much data, giving rise to conducting research and analysis on it. We are seeing great forms of multimedia research already like image/video content analysis, video or image search, recommendations, multimedia streaming, etc. Also, on the other hand, Artificial Intelligence is evolving at a faster pace, making it the perfect time for tapping content-rich multimedia for more intelligent applications.
Multimedia intelligence refers to the eco-system created when we apply artificial intelligence to multimedia data. This eco-system is a 2-way give-and-take relationship. In the first relation, we see how multimedia can boost research in artificial intelligence, enabling the evolution of algorithms and pushing AI toward achieving human-level perception and understanding. In the second relation, we see how artificial intelligence can boost multimedia data to become more inferable and reliable by providing its ability to reason. Like in the case of on-demand video streaming applications use AI algorithms to analyse user demographics and behaviour and recommend content that they enjoy streaming or watching. As a result, these AI-powered platforms focus on providing users with content tailored to their specific interests, resulting in a truly customized experience. Thus, multimedia intelligence is a closed cyclic loop between multimedia and AI, where they mutually influence and enhance each other.

Evolution and significance
The evolution of multimedia should be credited to the evolution of smartphones. Video calling through applications like skype, and WhatsApp truly marked that multimedia is here to dominate. This was a significant move because they completely revolutionized long distance communication. This has evolved further to even more complex applications like video streaming apps like discord, twitch, etc. Then AR/VR technology took it a step ahead by integrating motion sensing and geo-sensing into audio, and video.
Multimedia contains multimodal and heterogenous data like images, audio, video, text, etc. together. Multimedia data has become very complex, and this will be incremental. Normal algorithms are not capable enough to co-relate and derive insights from such data and this is still an active area of research, even for AI algorithms it’s a challenge to connect and establish a relationship between different modalities of the data.

Difference between media intelligence and multimedia intelligence
There is a significant difference between media and multimedia intelligence. Text, drawings, visuals, pictures, film, video, wireless, audio, motion graphics, web, and so on are all examples of media. Simply put, multimedia is the combination of two or more types of media to convey information. So, to date, when we talk about media intelligence, we are already seeing applications that exhibit it. Voice Bots like Alexa and Google Assistant are audio intelligent, Chatbots are text intelligent, and drones that recognize and follow hand gestures are video intelligent. There are very few multimedia intelligent applications. To name one: There is EMO – An AI Desktop robot that utilizes multimedia for all its interactions.

Industrial landscape for multimedia intelligence
Multimedia is closely tied to the media and entertainment industry. Artificial Intelligence enhances and influences everything in multimedia.

Landscape for Multimedia Intelligence

Let’s walk through each stage and see how artificial intelligence is impacting them:

Media devices
The media devices that have increasingly become coherent with artificial intelligence applications are cameras and microphones. Smart cameras are not just limited to capturing images and videos these days, but they increasingly do more stuff like detecting objects, tracking items, applying various face filters, etc. All these are driven by AI algorithms and come as part of the camera itself. Microphones are also getting smarter where AI algorithms do active noise cancellations and filter out ambient sounds. Wake words are the new norm, thanks to Alexa and Siri like applications that next-gen microphones are having in-built wake-word or key-phrase recognition AI models.

Image/Audio coding and compression
Autoencoders consists of two components namely encoder, and decoder and are self-supervised machine learning models that use recreating input data to reduce its size. These models are trained as supervised machine learning models and inferred as unsupervised models, hence the name self-supervised models. Autoencoders can be used for image denoising, image compression, and, in some cases, even the generation of image data. This is not limited to images only, autoencoders can be applied to audio data too for the same requirements.
GAN (General Adversarial Networks) are again revolutionary deep neural networks that have made it possible to generate images from texts. OpenAI’s recent project DALLE can generate images from textual descriptions. GFP (Generative Facial Prior)-GAN is another project that can correct and re-create any bad image. AI has shown quite promising results and has proven the feasibility of Deep learning-based image/audio encoding and compression.

Audio / Video distribution
Video streaming platforms like Netflix and Disney Hotstar extensively use AI for improving their content delivery across a global set of users. AI algorithms dominate personalization and recommendation services for both platforms. AI algorithms are also used for the generation of video meta-data for improving search on their platforms. Predicting content delivery and caching appropriate video content geographically is a challenging task that has been simplified to a good extent by AI algorithms. AI has honestly proven its potential to be a game-changer for the streaming industry by offering effective ways to encode, distribute, and organize data. Not just for video streaming platforms, but also for game streaming platforms like Discord, and Twitch and communication platforms like Zoom, and Webex, AI will become an integrated part of AV distribution. 

Categorization of content
On the internet, data is created in a wide range of formats in just a few seconds. Putting stuff into categories and organizing it could be a huge task. Artificial intelligence (AI) steps in to help with the successful classification of information into relevant categories, enabling users to find their preferred topic of interest faster, improving customer engagement, creating more enticing and effective targeted content, and boosting revenue.

Regulating and identifying fake content
Several websites generate and spread fake news in addition to legitimate news stories to enrage the public about events or societal issues. AI is assisting with the discovery and management of such content, as well as with the moderation or deletion of such content before distribution on internet platforms like social media sites. All platforms including Facebook, LinkedIn, Twitter, Instagram, etc. employ powerful AI algorithms in most of their features. Targeted ads services, recommendation services, job recommendations, fraud profile detections, harmful content detections, etc. has AI in it.

We have tried to cover how multimedia and artificial intelligence are interrelated and how they are impacting various industries. Still, this is a broad research topic since media intelligence is still in cogs where AI algorithms are still learning from single media, and we build other algorithms to co-relate them. There is still scope for the evolution of AI algorithms that would understand the full multimedia data in a singularity like how a human does it.

Softnautics has a long history of creating and integrating embedded multimedia and ML software stacks for various global clients. Our multimedia specialists have experience dealing with multimedia devices, smart camera applications, VoD & media streaming, multimedia frameworks, media infotainment systems, and immersive solutions. We work with media firms and the domain chipset manufacturer to create multimedia solutions that integrate digital information with physical reality in innovative and creative ways across a wide range of platforms.

Read our success stories related to Machine Learning services around multimedia to know more about our expertise.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”12026″]

 

 

Multimedia Intelligence: Confluence of Multimedia and Artificial Intelligence Read More »

How Automotive HMI Solutions Enhances the In-Vehicle Experience

How Automotive HMI Solutions Enhances the In-Vehicle Experience?

With new-age technologies, customers now have higher expectations from their vehicles than ever before. Many are more concerned with in-car interfaces than with aesthetics or engine power. The majority of drivers desire a vehicle that makes their lives easier and supports their favourite smartphone apps. HMI (Human-Machine Interface) solutions for automobiles are features and components of car hardware and software that enable drivers and passengers to interact with the vehicle and the outside environment. Automotive HMI solutions improve driving experiences by allowing interaction with multi-touch dashboards, voice-enabled vehicle infotainment, control panels, built-in screens, and other features. They turn a vehicle into an ecosystem of interconnected parts that work together to make driving more personalized, adaptive, convenient, safe, and enjoyable. FuSa (ISO 26262) complied HMIs, which are powered by embedded sensors and smart systems, enabling the vehicle to respond to the driver’s intent and preferences. The global automotive HMI market size is projected to reach $33.59 billion by 2025 with a 9.90% growth rate as per the reports stated by the allied market research group.

Let us see a few applications of HMI in the automotive industry and how it enhances the driver/passenger experience.

Application & Benefits of HMI Solutions

 

Digital Instrumental Clusters

An instrument cluster is seen in every vehicle. An instrument cluster is a board that houses a variety of gauges and indicators. The instrument cluster is located right behind the steering wheel in the dashboard. To keep track of the vehicle’s status, the driver relies on the gauges and indicators. The modern car cockpit’s full electronics features are accessible through the digital instrument cluster. With the help of digital clusters vehicle driving information, such as speed, gasoline or charge level, trip distance calculator, route planning graphics, and so on, are combined with comfort information, such as outside temperature, clock, and air vent control. In addition, these digital clusters connect with the vehicle’s entertainment system to control multimedia, browse a phone book, make a call, and choose a navigation destination location. For instance, with use of a tachometer, indicates how fast the engine is turning.

Heads Up Display (HUD)

The Heads Up Display (HUD) is a transparent display fitted on the dashboard of a car that displays important information and data without diverting the driver’s attention away from their normal viewing position. Whether it’s speed or navigation, you have it all in one place. It gives critical information to drivers so that they are not distracted. Driver tiredness is reduced greatly since they are not forced to search for information within the vehicle, allowing them to concentrate more on the road.

 

Automotive HMI Solutions

Rear-Seats Entertainment (RSE)

Rear-Seats Entertainment (RSE) is a fast-growing car entertainment system that heavily relies on graphics, video, and audio processing. TV, DVD, Internet, digital radio, and other multimedia content sources are all integrated into RSE systems. One can keep the whole family engaged while traveling with the Rear-Seat Entertainment System. As the system is installed with wireless internet connectivity, they can surf the web, manage their playlist, interact with their social media platforms, and can get access to many more services.

Voice-Operated Systems

Modern voice-activated systems enable very natural communication with a vehicle. They can even understand accents and request additional information if necessary. This is made possible by the incorporation of Artificial Intelligence and Machine Learning, as well as general advances in Natural Language Processing and cognitive computing. Apple CarPlay apps, for example, will allow users to navigate, send and receive messages, make phone calls, play music, and listen to podcasts or audiobooks. All of this is controlled by voice command, ensuring a safer atmosphere and allowing the driver to concentrate on the road.

Haptic Technology

It’s also known as 3D touch, and it’s a technology that gives the user a tactile sensation by applying forces, vibrations, or motions. Haptics can be used, when consumers need to touch a screen or operate some functionalities. In its most basic form, a Haptic system will consist of a sensor – such as a touchpad key – that sends the input stimulus signal to a microprocessor. The microprocessor generates a suitable output, which is amplified and transmitted to the actuator. The actuator then produces the vibration that the system requires. Automobiles are also becoming increasingly adept at recognizing their surroundings and reacting properly by issuing safety warnings and alarms. Information can easily be communicated to the driver by vibration alerting, rather than unpleasant lights or noises. For instance, when a lane change is detected without warning, the steering wheel can produce vibrations to alert the driver. The seats can also vibrate to alert the driver if they move lanes too slowly. As in the case of General Motors in 2015 under the Chevrolet brand, they introduced the Safety Alert Seat. The car can share collision risk and lane departure with the driver via haptic input in the seat. It was one of the first automobiles that employ the touch sensation to communicate with the driver.

In-Car Connected Payments

The concept of connected commerce is gaining popularity and creating opportunities for brands and OEMs. In this case, users will receive an e-wallet with biometric identification verification that will allow them to pay for nearly anything on the go, including tolls, coffee, and other billers and creditors. While in-car payments may not appear to be a huge advantage at present, the future of such HMI services may include more than just parking and takeaway.

This cannot be restricted to only advertisements, age and gender detection can also help businesses in taking quick decisions by managing appropriate support staff in retail stores, what age and gender people prefer visiting your store, businesses, etc. All this is more powerful and effective if you are very quick to determine and act. So, even more, a reason to have this solution on Edge TPU.

Driver Monitoring System

A driver-monitoring system is a sophisticated safety system that uses a camera positioned on the dashboard to detect driver tiredness or distraction and deliver a warning or alert to refocus the driver’s attention on the road. If the system detects that the driver is distracted or drowsy, it may issue auditory alarms, and illuminate a visual signal on the dashboard to grab the driver’s attention. If the driver’s internal sensors indicate that he or she is distracted, and the vehicle’s external sensors indicate that a collision is imminent, the system can automatically apply the brakes, integrating inputs from both the interior and outside sensors.

The interface between the vehicle and the human has transformed as we move towards smart, interconnected, and autonomous mobility. Today’s HMI solutions not only improve in-vehicle comfort and convenience but also provide personalized experiences. These smart HMI solutions convey critical information which is important and needs attention from the driver. This reduces driver distraction and improves vehicle safety. HMI makes information processing and monitoring simple, intuitive, and dependable.

At Softnautics, we help automotive businesses to design HMI & Infotainment-based solutions such as gesture recognition, voice recognition, touch recognition, infotainment sub-menu navigation & selection, etc. involving FPGAs, CPUs, and Microcontrollers. Our team of experts has experience working with autonomous driving platforms, functions, middleware, and compliances like adaptive AUTOSAR, FuSa (ISO 26262), and MISRA C. We support our clients in the entire journey of intelligent automotive solution design.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”11388″]

How Automotive HMI Solutions Enhances the In-Vehicle Experience? Read More »

embedded-video-processor-scaled

Software Infrastructure of an Embedded Video Processor Core for Multimedia Solutions

With new-age technologies like the Internet of Things, Machine Learning, Artificial Intelligence, companies are reimagining and creating intelligent multimedia applications by merging physical reality and digital information in innovative ways. A multimedia solution involves audio/video codec, image/audio/video processing, edge/cloud applications, and in a few cases AR/VR as well. This blog will talk about the software infrastructure involved for an embedded video processor core in any multimedia solution.

The video processor is an RTL-based hardened IP block available for use in leading FPGA boards these days. With this embedded core, users can natively support video conferencing, video streaming, and ML-based image recognition and facial identification applications with low latencies and high resource efficiency. However, there are software level issues pertaining to OS support, H.264/265 processing, driver development, and so forth that could come up before deploying the video processor.

Let us begin with an overview of the video processors and see how such issues can be resolved for semiconductor companies enabling the end-users to reap its product benefits.

The Embedded Video Processor Core

The video processor is a multi-component solution, consisting of the video processing engine itself, a DDR4 block, and a Synchronization block. Together, these components are dedicated to supporting H.264/.265 encoding and decoding at resolutions up to 4k UHD (3840x2160p60) and, for the top speed grades of this FPGA device family, up to 4096x2160p60. Levels and profiles supported include up to L5.1 High Tier for HEVC and L5.2 for AVC. All three are RTL-based embedded IP products that are deployed in the programmable logic fabric of the targeted FPGA device family and are optimized/’hardened’ for maximum resource efficiency and performance.

The video processor engine is capable of simultaneous encoding and decoding of up to 32 video streams. This is achieved by splitting up the 2160p60 bandwidth across all the intended channels, supporting video streams of 480p30 resolution. H.264 decoding is supported for bitstreams up to 960Mb/s at L5.2 2160p60 High 4:2:2 profile (CAVLC) and H.265 decoding of bitstreams up to 533Mb/s L5.1 2160p60 Main 4:2:2 10b Intra profile (CABAC.)

There is also significant versatility built into the video processor engine. Rate control options include CBR, VBR, and Constant QP. Higher resolutions than 2160p60 are supported at lower frame rates. The engine can handle 8b and 10b color depths along with YCbCr Chroma formats of 4:0:0, 4:2:0, and 4:2:2.

The microarchitecture includes separate encoder and decoder sections, each administered by an embedded 32b synthesizable MCU slaved to the Host APU through a single 32b AXI-4 Lite I/F. Each MCU has its L1 instruction and data cache supported by a dedicated 32b AXI-4 master. Data transfers with system memory are across a 4 channel 128b AXI-4 master I/F that is split between the encoder and decoder. There is also an embedded AXI performance monitor which measures bus transactions and latencies directly, eliminating the need for further software overhead other than the locked firmware for each MCU.

The DDR4 block is a combined memory controller and PHY. The controller portion optimizes R/W transactions with SDRAM, while the PHY performs SerDes and clock management tasks. There are additional supporting blocks that provide initialization and calibration with system memory. Five AXI ports and a 64b SODIMM port offer performance up to 2677 MT/s.

The third block synchronizes data transactions between the video processor engine encoder and DMA. It can buffer up to 256 AXI transactions and ensures low latency performance.

The company’s Integrated Development Environment (IDE) is used to determine the number of video processor cores needed for a given application and the configuration of buffers for either encoding or decoding, based on the number of bitstreams, the selected codec, and the desired profile. Through the toolchain, users can select either AVC or HEVC codecs, I/B/P frame encoding, resolution and level, frames per second color format & depth, memory usage, and compression/decompression operations. The IDE also provides estimates for bandwidth requirements and power consumption.

Embedded Software Support

The embedded software development support for any hardware into video processing can be divided into the following general categories:

  1. Video codec validation and functional testing
  2. Linux support, including kernel development, driver development, and application support
  3. Tools & Frameworks development
  4. Reference design development and deployment
  5. Use of and contributions to open-source organizations as needed

Validation of the AVC and HEVC codecs on the video processor is extensive. It must be executed to 3840x2160p60 performance levels for both encoding and decoding in bare metal and Linux-supported environments. Low latency performance is also validated from prototyping to full production.

Linux work focused on multimedia frameworks and levels to customize kernels and drivers. This includes the v4l2 subsystem, the DRM framework, and drivers for the synchronization block to ensure low latency performance.

The codec and Linux projects lent themselves effectively to the development of a wide variety of reference designs on behalf of the client. Edge designs for both encoding and decoding, developments ranging from low latency video conferencing to 32 channel video streaming, Region of Interest-based encoding, and ML face detection, all of this can be accomplished via the use of a carefully considered selection of open-source tools, frameworks, and capabilities. Find below a summary of these offerings:

  1. GStreamer – an open-source multi-OS library of multimedia components that can be assembled pipeline-fashion, following an object-oriented design approach with a plug-in architecture, for multimedia playback, editing, recording, and streaming. It supports the rapid building of multimedia apps and is available under the GNU LGPL license.
    The GStreamer offering also includes a variety of incredibly useful tools, including gst-launch (for building and running GStreamer pipelines) and gsttrace (a basic tracer tool.)
  2. StreamEye – an open-source tool that provides data and graphical displays for in-depth analysis of video streams.
  3. Gstshark – available as an open-source project from Ridgerun, this tool provides benchmarking and tracing capabilities for analysis and debugging of GStreamer multimedia application builds.
  4. FFmpeg and FFprobe – both part of the FFmpeg open-source project, these are hardware-agnostic, multi-OS tools for multimedia software developers. FFmpeg allows users to convert multimedia files between many formats, change sampling rates, and scale video. FFprobe is a basic tool for multimedia stream analysis.
  5. OpenMAX – available thru the Khronos Group, this is a library of API and signal processing functions that allow developers to make a multimedia stack portable across hardware platforms.
  6. Yocto – a Linux Foundation open-source collaboration that creates tools (including SDKs and BSPs) and supporting capabilities to develop Linux custom implementations for embedded and IoT apps. The community and its Linux versioning are hardware agnostic.
  7. Libdrm – an open-source set of low-level libraries used to support DRM. The Direct Rendering Manager is a Linux kernel that manages GPU-based video hardware on behalf of user programs. It administers program requests in an arbitration mode through a command queue and manages hardware subsystem resources, in particular memory. The libdrm libraries include functions for supporting GPUs from Intel, AMD, and Nvidia as well.
    Libdrm includes tools such as modetest, for testing the DRM display driver.
  8. Media-ctl – a widely available open-source tool for configuring the media controller pipeline in the Linux v4l2 layer.
  9. PYUV player – another widely available open-source tool that allows users to play uncompressed video streams.
  10. Audacity – a free multi-OS audio editor.

The above tools/frameworks help design efficient and quality multimedia solutions under video processing, streaming and conferencing.

The Softnautics engineering team has a long history of developing and integrating embedded multimedia and ML software stacks for many global clients. The skillsets of the team members extend to validating designs in hardware with a wide range of system interfaces, including HDMI, SDI, MIPI, PCIe, multi-Gb Ethernet, and more. With hands-on experience in Video Processing for Multi-Core SoC-based transcoder, Streaming Solutions, Optimized DSP processing for Vision Analytics, Smart Camera Applications, Multimedia Verification & validation, Device Drivers for Video/Audio interfaces, etc. Softnautics enable multimedia companies to design and develop connected multimedia solutions.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”12042″]

Software Infrastructure of an Embedded Video Processor Core for Multimedia Solutions Read More »

cloudmedia-blog-scaled

Designing Cloud Based Multimedia Solutions

Private, public or hybrid, cloud solutions for any business domain are designed to provide the freedom to grow and security for the organization and customer data. For cloud-based multimedia solutions, there is cloud-based custom transcoder IP that supports automated Video-On-Demand (VOD) pipelines. Cloud services offer solutions that ingest source videos, processes video for playback on a wide range of devices using cloud media converter, and store transcoded media files for on-demand delivery to end-users.

Custom IP integration along with other cloud services showcases better feasibility of using Open-Source codec, to use one’s transcoder instead of cloud media-converter for multimedia solutions. In this blog, we will see how an Open-Source codec like AV1 is selected as a custom IP for encoding to integrate over the cloud as a service.

Thus, the video files uploaded on the cloud can be encoded with AV1 codec, without using cloud media-converter service. The solution is automated in such a way that the content provider just needs to upload video on cloud input file storage service and the further encoding happens automatically. It stores the content on cloud storage services after completion and the end-user gets notified about content availability.

Modules’ usage

Local PC can be used to upload input video on target AWS S3 bucket and EC2 instance is used to transcode input video into AV1 codec output. Encoding can be done through FFmpeg as well GStreamer, here FFmpeg is used considering the strong support community and extra features available. EC2 cloud instance can be used on any Linux-based system server. Further, the S3 cloud output file link is integrated into AWS Sumerian, to view it using VR set in 3D scene mode.

To overcome the limitations of cloud media converter, one can have own custom IP i.e. Transcoder solution, which can be used along with other cloud services. It will make faster the encoding or provide the same speed as of cloud media converter with reduced cost per encoding job, as compared to a cloud media converter. It is also easy to integrate any codec and provides a choice of multiple encoders per codec.

Benefits of using AOMedia Video 1 (AV1) codec
  • It is an Open-Source, royalty-free video coding format for video transmissions over the Internet
  • AV1 Quality and Efficiency: Based on measurements with PSNR and VMAF at 720p, AV1 was about 25% more efficient than VP9 (libvpx). Similar conclusions with respect to quality were drawn from a test conducted by Moscow State University researchers, where it was found that VP9 requires 31% and HEVC 22% more bitrate, than AV1 for the same level of quality
  • Comparing AV1 against H.264 (x264) and VP9 (libvpx), Facebook showed about 45-50% bitrate savings using AV1 over H.264 and about 40% over VP9, when using a constant quality encoding mode

At Softnautics, we have incorporated multimedia solutions having market trending features like image overlay, timecode burn-in, bitrate control mode, advertise, rotation, motion image overlay, sub-title, cropping and more. Such features are required to build solutions like end-to-end pipeline orchestration, live and recorded streaming (VOD), transcoding, cloud services, Content Delivery Network (CDN) integration and interactive VR scene creation.

Flow Diagram:

 

Cloud-based-solutions

In the flow diagram of a Virtual Reality solution, the user uploads the video to the Watch Folder of the bucket in AWS S3. The multipart upload complete event will trigger the lambda function, which starts the EC2 instance. Encoding will then be performed through FFmpeg to encode output with the AV1 codec. If encoding is successful, then only the encoded file will be uploaded to the “output” directory in the AWS S3 bucket. If encoding is failed, then the input media file will be deleted from the “input” directory of AWS S3. The content provider will receive an email notification for failure or success of encoding job, using AWS SNS service. AWS SNS will trigger further AWS Lambda function and Lambda will stop the AWS EC2 instance. Lambda will also check whether the trigger is for output file upload or not, if yes, then it will send an email notification to the end user, using AWS SES service, to notify new content’s availability. Further AWS S3 output file link can be integrated into AWS Sumerian, to view it using VR set, in 3D scene mode. Python3 can be used for entire automation scripts.

Using cloud media custom IP-based solution services, one can stream videos to end-users at scale, deliver low-latency content, secure videos from unexpected downloads, remove the complexity of building development steps manually, and construct solutions in own environment for demo purposes.

Softnautics can help media companies design multimedia solutions across various platforms, merging physical reality and digital information in innovative ways, using advanced technologies. Softnautics multimedia experts have experience working on Augmented Reality, Virtual Reality, AV codecs development, image/video analytics, computer vision, image processing, and more.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

[elementor-template id=”12003″]

 

Designing Cloud Based Multimedia Solutions Read More »

Scroll to Top