Memory Considerations for Faster MCUs. Part 2 - Kinetis, STM32 and Stellaris MCUs

Part 1 - NXP Semiconductors LPC4000 - Real-time aids data processing

STMicroelectronics STM32 – Quick, artful memory

STMicroelectronics is another company that soon embraced the ARM Cortex-M3 in microcontrollers with its STM32 product line after working the earlier ARM7 and ARM9 cores into 32-bit MCUs. STMicroelectronics's latest STM32F4 series (see Figure 2) can push the Cortex-M4 to 168 MHz in a 90 nm process while offering up to 1 MB of Flash and 192 KB of RAM on chip.

STMicroelectronics STM32F4 architecture
Figure 2. STMicroelectronics STM32F4 architecture (Courtesy STMicroelectronics).

To get that kind of performance, STMicroelectronics developed its adaptive real-time memory accelerator (ART Accelerator). This is a microprocessor-system-like cache controller tailored to the needs of programs executing from Flash. Flash is organized by 128 bits so a single read contains four 32-bit instructions, which with the Thumb2 instructions can be six to eight real instructions.

The ART Accelerator uses a prefetch queue and a 64-entry branch cache to mitigate delays from a change-of-flow in the instructions due to branching, subroutine calls, and possibly even system calls or interrupts. If the redirected program counter wants a recently-fetched location, the target probably still resides in the branch cache, in which case it can be loaded immediately into the prefetch queue for execution, saving cycles. More intelligent (adaptive) cache management by on-chip logic should yield more positive results (a higher bit rate) than simpler methods.

To alleviate Flash stalls on data accesses, such as data lookup tables or image data, the ART Accelerator has eight 128-bit buffers. Locality-of-reference is pretty poor for data, but it can be improved by cleverly arranging data based on detailed understanding of its use in the program. This is akin to hand-coding in assembly.

STMicroelectronics is seeing Flash execution up to the 168 MHz speed within 2.5 percent of execution from zero-wait-state memory. It touts CoreMark benchmarks as proof of their efficiency and speed, although compiler effectiveness and settings also influence those results. First, a 168 MHz STM32F4 MCU executes the routines much quicker than any other MCU in this class and shows linearity over frequency. Second, the "Coremarks/MHz" (effective work done per clock cycle) is one of the highest.

A real-time clock module on the STM32F4 includes a 4 KB battery-backed SRAM for holding variable and state information during extremely low power conditions. More distinctively, 528 bytes of one-time programmable ROM is available for serial numbers, MAC addresses, cryptography keys, calibration settings, and storage of other data unique to each device shipped.

STMicroelectronics also utilizes a 7-level ARM high-speed bus (AHB) matrix that allows simultaneous data transfers between masters like the ARM processor, general-purpose DMAs, DMAs associated with USB or network controllers, and slaves like the multitude of peripherals and memories.

STMicroelectronics has numerous MCU configurations of ARM Cortex-M0 and the original Cortex-M3 ranging from lower cost, lightly loaded controllers to fast clocked devices with sophisticated peripherals. They also have a low-power line. STMicroelectronics claims a 45 percent market share in cumulative units shipped of Cortex-M-based MCUs, so many of these products have been used.

Freescale Semiconductor Kinetis – Flexible memory

Freescale Semiconductor's primary microcontrollers based on ARM processors took a while to get started, although it has sold 32-bit MCUs based on the Power Architecture and its proprietary ColdFire architecture for decades. Jumping quickly on the ARM Cortex-M4 core with its enhanced capabilities, Freescale filled out its new Kinetis product families fairly well (see Figure 3).

Freescale Kinetis architecture
Figure 2. Freescale Kinetis architecture (Courtesy Freescale).

Ranging from the smallish K10 to today's full-bore K70, on-chip Flash is available from 32 KB to 1 MB, organized from 32 bits to 128 bits wide depending on the chip. Manufactured on a 90 nm process node, the Flash responds in around 30 ns depending on voltage, but the Kinetis MCUs run up to 100 MHz with promises of double the speed. Freescale's thin film storage (TFS) Flash can read, erase, and write at voltages down to 1.71 volts, which is nice because it is within the limits of two almost-spent 1.5 volt AA batteries (which degrade rapidly once they hit 0.9 volts each).

Kinetis MCUs have their own instruction and data caches to help overcome Flash read delays, and they address off-chip memory as well. This is effective enough that Kinetis MCUs look as efficient as the others up to Kinetis' rated speed. A memory protection unit helps the operating system keep one task's program from getting into another task's memory space.

The primary Flash is supplemented with something Freescale calls FlexMemory, a special variety of Flash that can also operate as E2PROM. The programmer decides how much to use as program Flash with the balance being used as E2 – up to 16 KB. The portion that operates as E2 automatically engages special logic that performs wear-leveling and writing algorithms to get one million and possibly up to 10 million endurance cycles as more FlexFlash is dedicated.

As is the case with some other vendors, Freescale utilizes a crossbar switch to let the main Flash, the FlexFlash, the SRAM, and various peripherals be accessed simultaneously by bus masters in order to keep data moving optimally.

Texas Instruments Stellaris – Firmware included

The Stellaris microcontrollers were the first products to use the new ARM Cortex-M3 architecture when they were developed by lead partner Luminary Micro, now owned by Texas Instruments. Stellaris has a rich collection of MCUs serving applications from motor control to networking and user interfaces.

The Texas Instruments MCUs run at modest 80 MHz speeds, have up to 512 KB of error-checking Flash memory, up to 96 KB of data RAM, and some have their own 2 KB of traditional E2PROM on-chip. Stellaris' Flash memory can perform single-cycle reads up to 50 MHz, above which the effect of a prefetch buffer minimizes delays by fetching 64-bits per read and engaging speculative branching.

While ROM seems to have disappeared on most MCUs these days, many Stellaris LM3S and Cortex-M4-based LM4F MCUs (see Figure 4) make a special use of compact ROM to store some fundamental and often-accessed code that is likely to be used by all applications. These drivers and routines are called StellarisWare and consist of peripheral driver libraries, boot loaders and vector tables, the pre-emptive real-time scheduler SafeRTOS, cyclic redundancy check (CRC) error detection operations, and cryptography tables used for the Advanced Encryption Standard (AES) functions. Putting these useful functions and data in fast, cheap ROM (where appropriate) frees up a significant quantity of Flash that is better used for custom code that enhances the end equipment.

Texas Instruments Stellaris LM4F architecture
Figure 4. Texas Instruments Stellaris LM4F architecture (Courtesy Texas Instruments).

Remember your application – Memory may make it zing

The needs of every application are different and there are many factors to consider in choosing a microcontroller. A number of Flash, SRAM, ROM, and specialty memory features pertinent to higher-end MCUs from a variety of vendors have been reviewed here. While no one part might have precisely the ideal features to accommodate your application, many of the memory options should now be clearer.

1-4 Layer PCBs $2

You may have to register before you can post comments and get full access to forum.
User Name