Implementing accelerated machine-learning applications with an advanced MCU – Embedded

Historically, Artificial Intelligence (AI) was a GPU / CPU or even DSP-dependent technology. However, more recently AI is moving into data acquiring systems by integration within constrained applications running on smaller microcontrollers (also known as MCUs). This trend is mostly driven by the Internet of Things (IoT) market, within which Silicon Labs is a major player.

To address this new IoT trend, Silicon Labs has announced a Wireless MCU that can perform hardware accelerated AI operations. To achieve this, this MCU has been designed to embed a Matrix Vector Processor (MVP), namely the EFR32xG24.

In this article, I will first cover some AI basics that will highlight which use cases MVP has been made for. And most of all, how to use EFR32xG24 for designing an AI IoT application.

Artificial Intelligence, Machine Learning and Edge Computing in a nutshell

AI is a system that tries to mimic human behavior. More specifically, it is an electrical and/or mechanical entity that mimics a response to an input similarly to what a human would do. Although the terms AI and Machine Learning (ML) are often used interchangeably, they represent two different methodologies. AI is a broader concept, while ML is a subset of AI.

Using Machine Learning, a system can make predictions and improve (or train) itself after repetitive use of what is called a model. A model is the use of a trained algorithm which will eventually be used to emulate decision making. This model can be trained by collecting data or by using existing datasets. When this system applies its “trained” model to newly acquired data to make decisions, we refer to it as Machine Learning Inference.

As mentioned previously, inference needs computational power that was usually handled by high end computers. However, we are now able to run inference on more constrained devices that do not need to be connected to such high-end computers; this is called Edge Computing.

By running inferences on an MCU, one is considered performing Edge Computing. Edge Computing involves running data processing algorithms at the closest point to where that data is acquired. Examples of edge devices are usually simple and constrained devices such as sensors, or basic actuators (lightbulbs, thermostats, door sensors, electricity meters, and so on). These devices are typically running on low power, ARM Cortex-M class MCUs:

click for full size image

Performing Edge Computing has a lot of benefits. Arguably the most valuable benefit is that the system which uses edge computing does not depend on an external entity. Devices can “make their own decisions” locally.

Making decisions locally has the following practical benefits:

  • Provides lower latency
    Raw data does not need to be transferred to the cloud for processing, which means that decisions can appear in real-time on the device.
  • Reduces required internet bandwidth
    Sensors create a huge amount of real-time data that in turn creates a large demand on bandwidth even if there is nothing to ‘report’ therefore saturating wireless spectrum and increased running cost.
  • Reduces power consumption
    It requires significantly less power to analyze data locally (using AI) than to transmit the data
  • Enables compliance with privacy and security requirements
    By making decisions locally, there is no need to send the detailed raw data up to the cloud, only inference results and metadata, therefore removing the potential for data privacy breaches.
  • Reduces Cost
    Analyzing sensor data locally saves the expense of utilizing the cloud infrastructure and traffic.
  • Increases resiliency
    Should the connection to the cloud go down, the edge node can still operate autonomously.

Silicon Labs’ EFR32xG24 for Edge Computing

EFR32xG24 is a Secure Wireless MCU that supports several 2.4 GHz IoT protocols (Bluetooth Low Energy, Matter, Zigbee, and OpenThread protocols). It also includes Secure Vault, which is an improved security feature set that is common to all Silicon Labs Series 2 platforms.

However, in addition to improved security and connectivity unique to this MCU is a Hardware Accelerator for Machine Learning models Inference (amongst other accelerations), called Matrix Vector Processor (MVP).

The MVP provides the ability to run Machine Learning inferences more efficiently with up to 6x lower power and 2-4x faster speed when compared to ARM Cortex-M without hardware acceleration (the actual improvement being dependent upon the model and application).

click for full size image

The MVP is designed to offload the CPU by handling intensive floating-point operations. It is especially designed for complex matrixed floating-point multiplications and additions.

The MVP consists of a dedicated hardware arithmetic logic unit (ALU), a load/store unit (LSU) and a sequencer.

click for full size image

As a result, MVP helps accelerate processing and saving power on a wide variety of applications, such as Angle-of-Arrival (AoA), MUSIC algorithm computations, Machine Learning (Eigen or Basic Linear Algebra Subprograms BLAS), and so on.

Because this device is a simple MCU, it cannot address all the use cases AI/ML could cover. It is designed to address the following four categories listed below along with real-life applications:

  • Sensor Signal Processing
    • Predictive maintenance
    • Bio Signal analysis
    • Cold chain monitoring
    • Accelerometer use cases
  • Audio pattern matching
    • Glass break detections
    • Shot detection
  • Voice commands
    • Words command set for smart appliances
    • Wake word detection
  • Low Resolution vision
    • Presence detection
    • Counting
    • Fingerprint

To help address these, Silicon Labs delivers dedicated sample applications that are based on an AI/ML framework called TensorFlow.

TensorFlow is an end-to-end open-source platform for machine learning from Google. It has a com-prehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

The Tensor Flow project has also an optimized for embedded hardware variant called TensorFlow Lite for Microcontrollers (TFLM). It is an open-source project, where most of the code is contributed by community engineers, including Silicon Labs and other silicon vendors. At the moment, this is the only framework delivered with Silicon Labs Gecko SDK Software Suite to create AI/ML applications.

Available AI/ML examples from Silicon Labs are:

  • Zigbee 3.0 Light Switch with Voice Activation
  • Tensor Flow Magic Wand
  • Voice Activated LED
  • Tensor Flow Hello world
  • Tensor Flow Micro speech

To start developing an application based on any of these, you can have very little experience, or you can be an expert. Silicon Labs provides multiple Machine Learning development tools to choose from depending on your level of Machine Learning expertise.

For First-Time ML Developers, you can start from one of our examples or try one of our 3rd party partners. Our 3rd party ML partners support the full end-to-end workflow with richly featured, and easy-to-use GUI interfaces for building the optimal Machine Learning model for our chips.

For Experts in ML that want to work directly with the Keras/TensorFlow platform, Silicon Labs offers a self-serve, self-support reference package that organizes that model development workflow into one that is tailored for building ML models for Silicon Labs chips.

click for full size image

Developing an ML-Enabled Application Example: Voice-Controlled Zigbee Switch with EFR32xG24

To create an ML-enabled application, two main steps are necessary. The first step is to create a wireless application, which you can do using either Zigbee, BLE, Matter, or any proprietary 2.4 GHz protocol-based application. It can even be a non-connected application. The second step is to build an ML model to integrate it with the application.

As mentioned above, Silicon Labs provides several options to create an ML application for its MCUs. The approach chosen here is by using an existing sample application with a predefined model. In this example, the model is trained to detect two voice commands: “on” and “off”.

Getting Started with an EFR32xG24 Application

To get started, get the EFR32MG24 developer’s kit, BRD2601A (left).

This development kit is a compact board embedding several sensors (IMU, Temperature, Relative Humidity and more), LEDs, and Stereo I2S microphones.

This project will use I2S microphones.

These devices might not be as rare as GPUs, but if you do not have the chance of getting one of these kits, you can also use an older Series 1 based devkit called “Thunderboard Sense 2” Ref. SLTB004A (right).

However, this MCU does not have MVP and will perform all the inference using the main core without acceleration.

Next, you need the Silicon Labs’ IDE, Simplicity Studio, to create the ML project. It comes with a simple way of downloading Silicon Labs’ Gecko SDK Software Suite, which brings the required libraries and drivers needed for the application, as follows.

  • The Wireless Networking Stack (Zigbee in this case)
  • The Hardware Drivers (for I2S microphones as well as the MVP)
  • The TensorFlow Lite framework
  • An already trained model for detecting command words

click for full size image

The IDE also provides tools to further analyze your application power consumption or networking operations.

Creating the Zigbee 3.0 Switch Project with MVP Enabled

Silicon Labs provides a ready to use sample application, Z3SwitchWithVoice, that you will create and build. The application already comes with an ML model, so you do not need to create one.

After it is created, note that a Simplicity Studio project is made of source files brought by components, which are GUI entities that make the use of Silicon Labs’ MCUs easy by simplifying the integration of complex software. In this case, you can see that MVP support and the Zigbee networking stack are installed by default.

click for full size image

The main application code is  in the app.c source file.

On the networking side, the application can be paired to any existing Zigbee 3.0 network by a simple button press, also known as “network steering”. Once on a network, the MCU will look for a compatible and pairable lighting device, also known as “binding”.

When the networking part of the application is up and running, the MCU will then periodically poll samples of microphone data and run the inference on it. This code is located in keyword_detection.c.

/***************************************************************************//**
 * Processes the output from output_tensor
 ******************************************************************************/
sl_status_t process_output()
{
  // Determine whether a command was recognized based on the output of inference
  uint8_t found_command_index = 0;
  uint8_t score = 0;
  bool is_new_command = false;
  uint32_t current_time_stamp;
 
  // Get current time stamp needed by CommandRecognizer
  current_time_stamp = sl_sleeptimer_tick_to_ms(sl_sleeptimer_get_tick_count());
 
  TfLiteStatus process_status = command_recognizer->ProcessLatestResults(
    sl_tflite_micro_get_output_tensor(), current_time_stamp, &found_command_index, &score, &is_new_command);
 
  if (process_status != kTfLiteOk) {
    return SL_STATUS_FAIL;
  }
 
  if (is_new_command) {
    if (found_command_index == 0 || found_command_index == 1) {
      printf("Heard %s (%d) @%ldms\r\n", kCategoryLabels[found_command_index],
             score, current_time_stamp);
      keyword_detected(found_command_index);
    }
  }
 
  return SL_STATUS_OK;
}

Upon detection of a keyword, the handler in app.c will send the corresponding Zigbee command:

static void detected_keywork_event_handler(sl_zigbee_event_t *event)
{
  EmberStatus status;
 
  if (emberAfNetworkState() == EMBER_JOINED_NETWORK) {
    emberAfGetCommandApsFrame()->sourceEndpoint = SWITCH_ENDPOINT;
 
    if (detected_keyword_index == 0) {
      emberAfFillCommandOnOffClusterOn();
    } else if (detected_keyword_index == 1) {
      emberAfFillCommandOnOffClusterOff();
    }
 
    status = emberAfSendCommandUnicastToBindings();
    sl_zigbee_app_debug_print("%s: 0x%02X\n", "Send to bindings", status);
  }
}

At this point, you have a hardware accelerated inference running on a wireless MCU for edge computing.

Customizing the TensorFlow Model to Use Different Command Words

As mentioned before, the actual model was already integrated in that application and was not modified further. However, if you were integrating the model yourself, you would do so with the following steps:

  1. Collect and Label Data
  2. Design and Build Model
  3. Evaluate and Verify Model
  4. Convert Model for the Embedded Device

These steps must be followed no matter how familiar you are with Machine Learning. The difference though will be on how you can build the model, as follows:

  1. If you are a beginner in ML, Silicon Labs recommends using one of our easy to go, end-to-end third-party partners’ platform: Edge Impulse or SensiML to build your model.
  2. If you are an expert with Keras/TensorFlow and don’t want to use a third party tools, you can use the Machine Learning Tool Kit (MLTK), which will be a self-serve, self-support python package. Silicon Labs has created this reference package around the audio use case, which can be extended, modified, or otherwise cherry-picked for pieces the expert finds appealing. This package will be available on GitHub with documentation.You can also directly import a .tflite file that runs on an embedded version of TensorFlow lite for micro compiled for the EFR32 product line. You have to ensure that the feature extraction on the data is EXACTLY the same for training the model as it would be for running the inference on the target chip.

Within Simplicity Studio, the latter is the simplest. To change the model in Simplicity Studio, copy your .tflite model file into the config/tflite folder of your project. The project configurator provides a tool that will automatically convert .tflite files into a sl_ml_model source and header files. The full documentation for this tool is available at Flatbuffer Conversion.

[Note: All images and code are courtesy of Silicon Labs.]


Brian Rodrigues is a senior field applications engineer at Silicon Labs Paris dedicated to Wireless Software development support. He graduated from ESIEE Paris in 2016 with a bachelor’s degree in embedded systems engineering.

Related Contents:

For more Embedded, subscribe to Embedded’s weekly email newsletter.

<!–

–>
<!– –>

<!– –>
<!–

–>

<!–

googletag.cmd.push(function() { googletag.display(‘div-gpt-ad-942957474691236830-3’); });

–> <!–

googletag.cmd.push(function() { googletag.display(‘div-gpt-ad-942957474691236830-42’); });

–>

Spread the love

Leave a Reply

Your email address will not be published.