Microcontrollers & the STM32

From Avi's Gadgets
Jump to: navigation, search


A digital device is one that uses only 2 discrete voltages for signalling.
These voltages are far enough apart to eliminate the error and noise that may be found in equivalent analog systems.
doing so means data can be transferred repetitively across several devices without any degradation.
common standards are:

5V systems 3.3v systems
High Logic 1 5V High Logic 1 3.3V
Low Logic 0 0V Low Logic 0 0V

The STM32 is a 3.3v system but has 5V tolerant inputs

Logic Gates

Logic gates are simplistic digital circuits that take 1 or more (typically 2) digital inputs, and output a single digital bit depending on the type of gate and the state of the input bits.
By stringing many of these devices together, complex circuits can be created, one such device is the Full Adder

Truth tables

Truth tables are tables that show all combinations of input bits, and what the output will be in each case. These are used to demonstrate how a particular gate or digital device will function.

The basic logic gates and their truth tables

Inverter (NOT) gate
Input Output
0 1
1 0
AND gate
Input 1 Input 2 Output
0 0 0
0 1 0
1 0 0
1 1 1
NAND gate
Input 1 Input 2 Output
0 0 1
0 1 1
1 0 1
1 1 0
OR gate
Input 1 Input 2 Output
0 0 0
0 1 1
1 0 1
1 1 1
NOR gate
Input 1 Input 2 Output
0 0 1
0 1 0
1 0 0
1 1 0
XOR (EOR) gate
Input 1 Input 2 Output
0 0 0
0 1 1
1 0 1
1 1 0
inverter gate symbol
Not gate.jpg
AND gate symbol
And gate.jpg
NAND gate symbol
Nand gate.jpg
OR gate symbol
Or gate.jpg
NOR gate symbol
Nor gate.jpg
XOR gate symbol
Xor gate.jpg

Bases and counting

When we count in decimal, e.g. 0 1 2 3 4 5 6 7 8 9 10 11 12... This is counting in base 10. What this means is:

Number in decimal Digit number 1 Digit number 0
101 100
10s 1s
Number of 10s Number of 1s
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
5 0 5
6 0 6
7 0 7
8 0 8
9 0 9
10 1 0
11 1 1
12 1 2

For every digit added in base 10, we can count 10 times as many numbers.


Binary is a system of counting using digital bits (described above).
It is sometimes called base 2.
The way we count in binary is:

Number in decimal Bit number 3 Bit number 2 Bit number 1 Bit number 0
23 22 21 20
8s 4s 2s 1s
Number of 8s Number of 4s Number of 2s Number of 1s
0 0 0 0 0
1 0 0 0 1
2 0 0 1 0
3 0 0 1 1
4 0 1 0 0
5 0 1 0 1
6 0 1 1 0
7 0 1 1 1
8 1 0 0 0
9 1 0 0 1
10 1 0 1 0
11 1 0 1 1
12 1 1 0 0

This table contains 4 bits, and is thus called 4 bit binary. It can count from 0 to 24-1 or 24 possible values
It is importaint to note that the first bit is called bit 0 not bit 1. Additionally, many binary systems begin their counts at 0 for the first count
For every digit added in base 2, we can count 2 times as many numbers.
when we have a binary number, for example 11110000, the right hand bit is called the Least Significant Bit (LSB), whereas the lefthand bit (the one hundred and twenty eights) is called the Most Significant Bit (MSB) because it has the largest effect on the value.
8 bit binary example:
(bit 7) MSB→11110000←LSB (bit 0)

Some examples of 8 bit binary values (out of a possible 256 of them):

Number in decimal Bit number 7 Bit number 6 Bit number 5 Bit number 4 Bit number 3 Bit number 2 Bit number 1 Bit number 0
27 26 25 24 23 22 21 20
128s 64s 32s 16s 8s 4s 2s 1s
Number of 128s Number of 64s Number of 32s Number of 16s Number of 8s Number of 4s Number of 2s Number of 1s
128 1 0 0 0 0 0 0 0
170 1 0 1 0 1 0 1 0
85 0 1 0 1 0 1 0 1
255 1 1 1 1 1 1 1 1
239 1 1 1 0 1 1 1 1

Logic functions with binary

Using logic functions on binary values can be very useful in many ways, they can be used to:

  • Set bits (and leave the other bits undisturbed)
  • Clear bits (and leave the other bits undisturbed)
  • Invert some or all the bits
  • reposition the bits to the required bit position

Setting bits

00011110 or 11110000
Follow this table downwards for each bit.

0 0 0 1 1 1 1 0
OR OR OR OR OR OR OR OR ←This refers to the logical OR function of an OR gate
1 1 1 1 0 0 0 0
= = = = = = = =
1 1 1 1 1 1 1 0

1 overlapping bit was deliberately added to show that even if that bit was set or not, the result would still become set.

Clearing bits

00011110 and 11111001 (the bits you want to clear must be 0, anything ANDed with 0=0, the bits you want to copy should be ANDed with 1, 0 AND 1=0, 1 AND 1=1)
Follow this table downwards for each bit.

0 0 0 1 1 1 1 0
AND AND AND AND AND AND AND AND ←This refers to the logical AND function of an AND gate
1 1 1 1 1 0 0 1
= = = = = = = =
0 0 0 1 1 0 0 0

Inverting bits

00011110 xor 00101000 (anything xor 1 will be the opposite of what it was previously, 0 xor 1=1, 1 xor 1=0)
Follow this table downwards for each bit.

0 0 0 1 1 1 1 0
XOR XOR XOR XOR XOR XOR XOR XOR ←This refers to the logical XOR function of an XOR gate
0 0 1 0 1 0 0 0
= = = = = = = =
0 0 1 1 0 1 1 0

Shifting bits

00011110 left shift by 2

00011110 original value
00111100 left shift 1
01111000 left shift 2

This gives the same result as multiplying the value by 2, 2 times (every left shift multiplies the value by 2), this is much faster than using the multiplication function because the circuit is simple and can be completed in 1 clock cycle

00011110 right shift by 2

00011110 original value
00001111 left shift 1
00000111 left shift 2

This gives the same result as dividing the value by 2, 2 times (every right shift divided the value by 2), this is much faster than using the division function because the circuit is simple and can be completed in 1 clock cycle.
The rightmost bits is discarded as they 'fall off the edge', this has the same effect as division by 2 with round down. For example 3/2=1.5 11 (bin) right shifted 1=1 (bin) = 1 (dec)
It should be noted that is it also possible to keep these bits and insert them back around at the other side.


Hexadecimal is a system of counting using letters and numbers, 0-9 and A-F.
It is sometimes called base 16.
0 through to 9 indicates decimal 0 through to 9. A through to F indicates decimal 10 though to 15. So with a single character we can count from 0 to 15
This is very compact compared to both decimal and binary.
The way we count in hexadecimal is:

Number in decimal Digit number 1 Digit number 0
161 160
16s 1s
Number of 16s Number of 1s
0 0 0
1 0 1
2 0 2
3 0 3
4 0 4
5 0 5
6 0 6
7 0 7
8 0 8
9 0 9
10 0 A
11 0 B
12 0 C
13 0 D
14 0 E
15 0 F
16 1 0
17 1 1
18 1 2

For every digit added in base 16, we can count 16 times as many numbers.
The advantage of hex is that each character represents exactly the full range of 4 binary bits.
0000 (bin) = 0 (dec) = 0 (hex)
1111 (bin) = 15 (dec) = F (hex)
to convert FF (hex) to decimal, the leftmost value (F) which is 15, is in the 16s column, and another F (15) is in the units column. so FF=(15*16)+15=255
FF can also be directly converted to binary by converting each character directly to a 4 bit value, and placing the 2 groups of 4 bits together as 8 bits, F=1111 F=1111 so FF=11111111

The main components of a microcontroller


contains the Arithmetic Logic Unit (ALU) that has digital circuits for performing mathematical and logical computations and comparisons
A comparison is when you check if a number is equal, not equal, greater than, or less than some other number.


Program memory

storage for your machine code, each line of code has an address in memory. The stm32 uses flash memory for this


holds the stack (variables and function return addresses)

Memory operation principals (RAM and Flash)

when an address is placed on the memory’s address bus, the bits are decoded though several address decoders, the output of which will enable a particular byte ‘latch’ to be connected to the data bus where the bits can be read or written, a separate circuit determines the direction, and is connected to all bytes. Some types of memory have storage units of 16 bits or more (several bytes) per address, this is known as a word (a number of bits that is n number of 8 bits). Memory is usually divided up into pages, in the case of the stm32 flash they are 2kB each, but a total of 256 of them, giving 512kB of storage. Using pages simplifies the selection circuit. In the case of SRAM the bits are stored within flip flops. In flash memory the bits are stored in gate charges.

An example of a simple 3 bit address bus to an 8 bit selector output
Address decoder.png


These are additional hardware devices for various functions that are interconnected with the CPU, and are used to interface with the external pins

A block diagram of the main STM32 components

Stm32 block diagram.png

Addresses and busses

A bus is a set of wires or tracks on a circuit board or within an integrated circuit that are functionally connected.
Digital busses are grouped in bit length, sometimes called the bus width.
For example, a 32 bit system will have 32 tracks grouped together and called a single 32 bit bus.
In CPUs and Microcontrollers, 2 main types of busses are used. These are the address bus and the data bus
when a binary value is placed onto the address bus, it functions as a selector/router for the bits on the data bus.
When bits are then placed onto the data bus, they end up at the device (usually a register) selected by the address bits.
For example, with an 8 bit system, a total of 256 addresses exist. Therefore 256 possible endpoint devices can be connected to the address decoder and the data bus.
Note, when changing the address bus, all other registers retain their previous value and only the selected register can be changed.

Here is an example of how an address and data bus may be impelented
The register blocks are clocked latches
Bus example1.jpg
Here is an interactive version of this example. use Alt+drag to move the schematic around.

Here is another example of how an address and data bus may be impelented
The register blocks are clocked latches. In this example, each device has its own address activation circuit
Bus example2.jpg
Here is an interactive version of this example. use Alt+drag to move the schematic around.

The Clock Tree

The STM32 has an elaborate string of clock buses and frequency dividers.
Each peripheral block contains its own circuitry to perform its required functions and additionally it contains the registers needed to configure it.
Due to this, if you try to write or read from them, nothing will happen. You must enable and configure all the clocks that lead up to and including the peripheral you are trying to configure or use.
Once the peripheral receives a clock signal, the registers and the device become functional.
A common mistake is to forget to enable the clock somewhere along the line to the desired peripheral, so it is a good idea to double check this.
Note: a prescaller is a counter that divides the clock, either in half n types using flip flops or up to a preset number using a mod n counter configuration

The STM32 clock tree
STM32RET6 clock tree.png

Microcontroller vs CPU

The main difference between a CPU and a microcontroller is that micocontrollers will have additional components internal to the chip.
These are mainly the peripheral devices, RAM, and flash program storage.
A classical CPU will often have many of these components externally and connected on the circuit board or via sockets. Having these components internally reduces the board size, and the learning curve and time taken to get started on development with them.

Components of the CPU core

Arithmetic Logic Unit

Where computations and comparisons are performed.


32 bit hardware clocked latches within the CPU that are readable and writable from assembly code (assembly code converted to machine code). There are only a limited number of these, 16 in the case of the stm32.
Each peripheral also has its own set of registers accessible via addressing. In the case of the STM32 you must enable the clock to these peripherals in order to get data to and from their registers
Registers remain set at the value you set them to until you next update that particular register, even if you change another register.
Some registers are for special purposes and should not generally be used.

Program counter

a counter that the CPU uses as an address that it reads instructions out of the program memory at.


Flags are special bits that are set or cleared depending on the result of a computation or comparison. They can indicate things like positive, negative, equal to, greater than, less than, overflow etc. Instructions can be modified to read these bits and act conditionally upon them.

Basic operating principle

the CPU will read some binary at the starting address of program memory, this group of binary bits is called an instruction. The CPU will read these bits, and decode it (e.g.. separate) into which function it is being asked to perform, and any source and destination values or locations to use.
This is then fed to the ALU where the function bits are fed to a binary decoder (similar to this diagram described previously) to activate the correct circuit for the called function, it reads the source and destination bits (a destination will be a register, each one has its own binary code), performs the computation, and places the resulting bits into the output (destination) register.
After each instruction in completed, the program counter is incremented so that upon the next read the next instruction will be loaded. The program counter (PC) is stored in register 15 (R15) as described above. By writing to this register, you will force the next instruction to be the one at the address of the value now in R15 (this will overwrite the current address+1 value within it).
CPU block diagram.jpg

Turing completeness

“A computational system that can compute every Turing-computable function is called Turing complete (or Turing powerful). Alternatively, such a system is one that can simulate a universal Turing machine.” – Wikipedia
Basically this means any system that is “Turing complete” is fully capable of simulating any other Turing complete system given enough time and memory. For a CPU/microcontroller/computer to be Turing complete, it must at minimum, contain 2 functions that are able to reproduce every other function.
Example, by using an AND gate and a NOT (inverter) gate, every other gate type can be created from these building blocks. And from these, even more complex functions, all the way up to an ALU.

Logic gates built from AND & NOT gates
Logic blocks.png
Interactive simulation of this diagram

The main concept being that a very old CPU can do everything a modern one can, by creating the modern functions out of several lines of code using its older instruction set, it will take longer and most likely use more memory but it is all theoretically possible.
This is also true for programming languages. For example if a language does not have a bit shift operation, one can be made by simply multiplying the value by 2 to shift left and dividing by 2 to shift right. Additionally, all languages eventually end up at the lowest level (machine code) before being executed by a CPU


One of the keys to programming is to have a good understanding of exactly what happens on the first and last iteration (number of times something has ran/executed) of a loop.
Doing so will ensure you are not off by 1, which in digital programming can cause unforeseen disasters. For example trying to read an address 1 past where your table actually ends.

All timings will only be ideal and will vary from CPU to CPU.
Some things that may affect timing:

  • Wait states (clock cycles for memory access)
  • Bus width (the number of bits of the bus)
  • DMA operations (automatic peripheral data transfers that don't use the CPU but hold up the data bus from being used)
  • Cache (memory speed optimisations by using a a small amount of fast memory)
  • Superscalar execution (multiple instructions executed at the same time)

Machine code

Machine code is binary, formatted to comply with a particular CPUs instruction decoder. This is defined by the instruction set of that particular CPU. Each CPU will have its own standard for doing this. The instruction set is matched to all the different function circuits within the ALU of that particular CPU or microcontroller, generally 1 instruction name per function the ALU can perform. This is why machine code for 1 CPU will not work on a different type of CPU. They will all have different functions on their ALU and different codes for similar functions. When you download a program into a microcontroller, it is machine code that is being sent to it, regardless of the language it was written in.

Assembly code

This is the closest you can get to writing binary, without actually doing so. It is designed to be somewhat humanly readable compared to a string of 1s and 0s. This code is passed though an assembler (part of the development software on the computer, in this case, included in the keil uvision software package) which then turns it into machine code and writes it to a .hex file for downloading.

C code

C code is a higher level language, which makes complicated procedures easier to write, understand, and follow. It also incorporates the use of variables. C code must first be processed though a compiler (also included with the development software) which turns it into assembly code, which is then assembled into machine code. The disadvantage with C is that you don’t really know what assembly code it produced to perform a particular line of C code, since there is often several ways to get something done, you don’t know which method was used. It may or may not be the fastest method/most memory efficient method, or a completely non understandable method with high optimization enabled. Often none of this matters for non critical applications.

Programming in assembly

this section will refer to information documented on this website
I will be explaining how to use this page rather than just telling you how code works so you get an idea of how to follow it for yourself to see and understand how other instructions work.

As previously explained there are 16 registers R0-R15, R0 to R12 are safe to use for general use.

Using an instruction

If we scroll to Thumb-2 instruction set, look at the instruction called MOV.

MOV{S} Rd, <Operand2>                   Move                          Rd = Operand2

It means that:

  • the instruction name is called MOV
  • it moves data from 1 place to another
  • its function is as follows: Rd = Operand2
As described under Register Names Rd is the destination register, this is where the data will be placed.
An operand is a number that will be operated on by a function, e.g. 1+2=3, 1 and 2 are operands.
MOV will move the operand to the destination register.
  • under Optional Parameters, anything inside { } is optional, this means the instruction may be written as MOV or MOVS
At the bottom of Condition Flags, it can be seen that adding S to the instruction will set or
clear the appropriate flags based on the result of the function just performed.
If S is not added, the flag wits will remain as they previously were last time they were used.
  • MOV{S} Rd, <Operand2> means we can set the destination register to a value and optionally set the flags.

Going back to Condition Flags, if you look at <Operand2> may be one of the following:
you will se a table of methods you can specify this value.
2 main ways are:

  1. #imm8<<imm5 (this is the <Operand2>), the reason the shift is 5 bit is because 11111 is 31, shifting 31 places would put the first bit at the end bit after the shift for a 32 bit value)
  2. Rm (this is the <Operand2>)

Under Immediate constants, we see that #imm8 means an immediate (a typed number, not a number stored in a register or an address) 8 bit value. x<<y means shift the bits of value x, y times to the left, so 5<<1 becomes 10 (00000101 shifted right 1 time becomes 00001010)
MOV R0,#5<<1 would load 10 into R0 (this is the MOV Rd, #imm8<<imm8 method)

  1. imm8<<imm8 can also mean directly typing the number in full after the # as long as you only use 8 or less of the 32 bits directly in a row within it and leave the rest of the bits at 0

Under Register Names, Rm indicates the Second operand register
so the MOV command can also be used like:
MOV R0,R1 This will load the value from R1 into R0 (This is the MOV Rd, Rm method)

Back under Thumb-2 instruction set, there is another 2 important uses for the MOV instruction
these are:

MOV Rd, #<imm16>                        Move wide                     Rd = imm16
MOVT Rd, #<imm16>                       Move top                      Rd[31:16] = imm16
                                                                      the constant is put in the upper 16 bits of Rd
                                                                      the lower 16 bits are unaffected

Lets look at MOV Rd, #<imm16> Move wide Rd = imm16< br /> this allows us to move a typed 16 bit number directly into a register, example: MOV R0,#65535 (#65535 can by written as #0xFFFF or #2_1111111111111111 in keil uvision, each program has their own way of writing other data formats, the 2 is describing base 2, which is what binary is) since it loads a 16 bit number, it only uses bits 0 to 15 of R0 (which is 32 bit). When describing a range if bits, it can sometimes be described in the format of [15:0] meaning bit 0 through to and including 15. This will overwrite the entire registers contents, including the upper bits [31:16] and set them to 0

So how do we set the upper bits [31:16] ?
we could MOV the upper bits first (into [15:0]), LSL (left shift the bits, see Thumb-2 instruction set) to move them into the upper bits [31:16], and perform another MOV to fill [16:0], but a much simpler way exists.

There is another variation of the MOV command, and that is the:

MOVT Rd, #<imm16>                       Move top                      Rd[31:16] = imm16

This loads 16 bits from a typed number and places them at the top bits at [31:16] in other words it takes bits [15:0] from your value and loads them into bits [31:16] of the destination register while leaving the lower bits unaffected.
so to load 0xDEADBEEF into R0 we can simply do:


The reason we can not do the second instruction first and the first one second is because MOV will erase and overwrite what MOVT did and place 0s there in the upper bits.

Another interesting use it the:

MVN{S} Rd, <Operand2>                   Move not                      Rd = 0xFFFFFFFF EOR Operand2

This inverts all the bits from the source (which remains unchanged) and places the result into the destination register, remember that <Operand2> can be another register or a typed 16 bit number as described by Condition Flags under <Operand2>

Condition Flags

Condition Flags are bits are set and cleared based on the result of a computation or comparison.
These are useful because the next instruction can read a particular flag of your choosing. If it detects that it is set, the instruction will be executed, otherwise it will be skipped and it will go to the next line. The way to do this is append (add to the end) the condition code you wish to check to the instruction name. If for example we have previously compared 2 numbers, and we want to set a register only if they were equal, you append EQ to MOV. It then becomes MOVEQ

CMP R0,#15 an explanation of CMP can be found under the Thumb-2 instruction set
MOVEQ R1,#1 ;R1 will be loaded with 1 only if the Z flag is true, which will be the case if R0 was 15

Note: CMP did not need an S at the end of it (CMPS) because it is inherent within that particular instruction, that is why you don’t see S listed as optional, but you can add it anyway if you like.
Condition flags can be used together with S if you want to execute an instruction based on the result of the line before, and then if it runs, set the flags again based on its own result. If this is done the S is added first and then the condition code, such as ADDSEQ, see bottom section of Condition Flags


comments are lines of text for you the programmer to read, for your own reading purpose and not part of the code, for example in the code “MOV R0,#42 Sets R0 to 42” Sets R0 to 42 is the comment (usually something more meaningful would be written). This is actually invalid code. To tell the assembler a bit of text is to be ignored as code and turned into a comment, a ; character needs to be placed in front of it, such as:

MOV R0,#42			;Sets R0 to 42

comments can also be used to temporarily disable 1 or more lines of code while testing by placing a ; in front of them.


The stack is an area of memory dedicated to holding variables and function return addresses. It functions as a last in, first out system. The way this works is a stack pointer holds the address for data to be written or read. It is increased by 1 when writing to the stack and decreased by 1 when reading from the stack. The stack pointer is stored in register R13 on the STM32.
To read these diagrams, read the entire left hand side of a diagram first, and then the entire right hand side of the same diagram.
PUSH means to place onto the stack.
POP means to read from the stack.

PUSH 2 on to the stack (write the number to stack pointer's address then increase the pointer's address)
Before & During Write After Write
0←stack pointer 2
0 0←stack pointer
0 0
0 0

PUSH 6 on to the stack (write the number to stack pointer's address then increase the pointer's address)
Before & During Write After Write
2 2
0←stack pointer 6
0 0←stack pointer
0 0

PUSH 29 on to the stack (write the number to stack pointer's address then increase the pointer's address)
Before & During Write After Write
2 2
6 6
0←stack pointer 29
0 0←stack pointer

POP a value out from the stack (decrease the stack pointer's address and then return the number at the stack pointers address)
Before Read During & After Read
2 2
6 6
29 29←stack pointer
0←stack pointer 0

29 is returned

POP a value out from the stack (decrease the stack pointer's address and then return the number at the stack pointers address)
Before Read During & After Read
2 2
6 6←stack pointer
29←stack pointer 29
0 0

6 is returned

PUSH 123 on to the stack (write the number to stack pointer's address then increase the pointer's address)
Before & During Write After Write
2 2
6←stack pointer 123
29 29←stack pointer
0 0

Remember that each line has its own address, This also means that each function/subroutine you create also has an address for its first line, to run them the address must be moved into the Program Counter.
After the subroutine has completed how do we get back to where we were in the main code?
One method is to PUSH (write onto the stack) the address of the next line before jumping to a subroutine. That way when the subroutine is complete we just POP (read out from the stack) the address back into the PC.
The main advantage of the stack reading out the values in reverse order than they were entered is that one subroutine can call another subroutine which can call yet another subroutine and it will always exit to the one that called it (the one above it in the stack)
Doing so avoids the user having to think about how many levels of calls they are in and where they should return, it is all handled by the stack automatically.
Before jumping to subroutines, you PUSH the address of the next line into the stack (stack pointer increases) and when a subroutine is done, you POP the address back out of the stack into the PC and the stack pointer decreases.

Another use for the stack is you can temporarily PUSH registers onto the stack if there are no more free registers, after they are backed up in the stack, the registers are free to be overwritten by a new computation, and when you are finished with that, the original values my be restored back to the registers with POP


These lines are separate examples each and should not be considered a program

MOV R0,#42		;Sets R0 to 42
MOVS R0,#42		;Sets R0 to 42, and sets/clears the flags for this value
MOV R0,R1		;Sets R0 to the value within R1
MOV R0,#3<<1		;Sets R0 to 6
MOVT R0,#0xFFFF		;sets [31:16] of R0 while leaving [15:0] unaltered
MVN R0,#0		;sets R0 to 0xFFFFFFFF (all 32 0s are inverted to form 32 1s)
CMP R0,#15		;compare instruction
MOVEQ R1,#1		;R1 will be loaded with 1 only if the Z flag is true, which will be the case if R0 was 15
MOVSEQ R1,#1		;R1 will be loaded with 1 only if R0 was 15, additionally if it was 15 will also set and clear the flags based on the number 1
  • Each line is assembled by the assembler on the computer into binary.
  • Each line of binary (from a line of assembly) is placed in its own address in the program flash memory (after downloading it from the computer to the microcontroller).
  • The PC (program counter) is incremented automatically to point to the address of the next line after executing the current line
  • The stack pointer is incremented when data is pushed into it, and decremented when data is popped out (read) from it

Example Code

Rather than going through what every instruction does, I will go though a simple program, explain how it works, and several ways it can be improved.
This should give you a sense of how the instructions are used in practice in a real program.

These functions were set up to be called from C. As such its standard is to pass numbers to functions through R0, and back to C through R0

Counting how many binary 1s a value has

256 clock cycle run time

count1s			MOV r1,#0	;reset 2 registers (R1 and R2) we will be using to value 0, notice the words count1s and loop, these are labels
					;when you type the name of a label in the code the assembler will automatically replace the word
					;with the address of the line given that label (so you don't have to work it out yourself)

			MOV r2,#0
loop			ANDS r3,r0,#1	;AND the input value (R0) with 1 to separate its LSB, and store it into R3
					;the S instructs the flags to be updated based on the result that went into R3

			ADDNE r1,r1,#1	;adding NE to ADD means this line will only run if the NE flag (not equal) is set, otherwise it will be skipped.
					;If you look at Condition flags under "ARM: Cortex-M3 Thumb-2 instruction set", NE indicates that Z = 0,
					;meaning that if the number does not equal zero (z=0 means zero=false). So if the LSB (from the above instruction) was not 0 (it was a 1),
					;then add 1 to r1 and store it back into r1 (the 1s count will be in here)

			LSR r0,r0,#1	;logical shift right (shift the bits right) the value in r0 (the value we are checking the number of 1s in) 1 times.
					;so we have checked and counted the LSB, now we shift it right so the next bit becomes the LSB to check on the next loop
			ADD r2,r2,#1	;add 1 to the loop counter (using R2 for that) so we can check how many times we have looped
			CMP r2,#32	;compare (check) if the loop counter is 32 yet (we are checking a 32 bit value, so we need to loop 32 times),
					;the CMP instruction automatically updates the condition flags
					;comparison is done by subtracting the value to be compared, if they are equal the result will be 0
			BNE loop	;B is branch (jump) which loads an address into PC (R15).
					;BNE means branch if not equal, so if the above comparison was not equal (the Z flag was 0, not zero)
					;then jump to address of label loop, otherwise skip this line.
					;so if R2 was not equal to 32 then we jump to loop otherwise we continue onto next line

			MOV r0,r1	;the loop is complete, the count is in r1, now copy the result to r0 for C as per standard
			bx    lr	;jumps back to the address in the link register (R14) which C filled for the return address back to the C function that called the assembly routine
					;the link register is simply a fast method for single jump function calls compared to using the stack

192 clock cycle worst case run time

count1s2		MOV r1,#0	;set r1 (the 1s count to 0)

loop2			CBZ r0,ret2	;compare and brach (jump) to label ret2 (we go here when the count is complete) if the input value r0 was 0
					;This is handy to avoid looping if the input value is 0, or has become 0 due to all the 1s being shifted out already
					;to avoid additional loops being performed when not required. Additionally, this eliminates the need for a loop count
					;since it quits when there are no more 1s to count
					;CBZ can be found under Thumb-2 instruction set, and has an alternate form of CBNZ if you want to branch if something was Not Zero

			LSLS r0,r0,#1	;logical shift left the input value r0 1 time and update the condition flags (S at the end for set flags).
					;shifting left (making the number larger by 2x) instead of right causes 1s to fall off the MSB side.
					;if a 1 falls of the left hand side it ends up in a special bit called the carry flag (meaning that the 32 bit value has gotten so large it overflowed)
					;LSLS is the 16 bit version of the instruction, instructions (with their parameters/opcodes) are mostly 32 bits in length
					;doing so requires only 1 flash access compared to 2 (in the STM32 each flash word at each address is 16 bits)
					;Note: 16 bit instructions and only access R0 to R7 and 32 bit ones can access all registers

			ADDCS r1,r1,#1	;if the carry flag (it is just the MSB bit 31 after the shift) was set then add 1 to the 1s count (in r1)

			B loop2	;branch to address of label loop2

ret2			MOV r0,r1	;when the loop is complete and r0 is 0, this line will be jumped to and the result placed into r0

			bx    lr	;branch back to C function

160 clock cycle worst case run time

count1s3		MOV r1,#0	;set the 1s count to 0

loop3			LSLS r0,r0,#1	;shift the input value bits left once and set the flags (S at the end)

			ADDCS r1,r1,#1	;increase the count if a 1 was shifted into carry (add if carry set), if this ran then set the flags (S at the end)

			BNE loop3	;note, this may take the flags from LSLS or ADDCS, depending if ADDCS ran or not.
					;if there was a 1 then ADD ran, and will jump to loop3 because the count wont be equal (NE) to 0
					;if there was not a 1, ADD didn't run, so the flags from LSLS will be used. so if the shifted value is not equal (NE) to 0 then we are not done yet so loop again

			MOV r0,r1	;done, copy the result to r0

			bx    lr	;branch back to C function

160 clock cycle worst case run time

count1s4		MOV r1,#0	;set the 1s count to 0

loop4			LSLS r0,r0,#1	;shift the input value bits left once and set the flags

			ADC r1,r1,#0	;ADC is add with carry. It will add the 2 numbers supplied and then add an extra 1 if the carry is set
					;in this case r1 is being added with 0 so it is r1+0+carry or just r1+carry

			BNE loop4	;if the shift result is not equal to 0 then we are not done yet, jump back to loop4

			MOV r0,r1	;done, copy the result to r0

			bx    lr	;branch back to C function

80 clock cycle worst case run time

count1s5		LSRS r1,r0,#31		;this means r1=r0>>31 (r0 logical shift right 31 times)
						;this instruction moves the MSB of the input value into the LSB of count for the first loop, this line will not be re-executed
loop5			LSLS r0,r0,#2		;shift the input value left 2 times and set the flags
						;since we copied the MSB already in the previous instruction,
						;it does not matter that is will be lost (this only applies to the first round of the loop)
						;shifting once puts the MSB into the Carry bit
						;Shifting once more puts bit 30 into the Carry bit and the previous carry is lost

			ADC r1,r1,r0,lsr #31	;this means r1=r1+(r0>>31)+carry
						;this shifting can be done on the last register of most instructions
						;this is because there is an in built shifter in the data line, and because of this it takes no extra time
						;so we get 2 operations done in the same amount of time as 1
						;now that we have the MSB (bit 31) in LSB (bit 0) of count, and bit 30 in Carry
						;we can use add with carry to set count to count+(input value>>31)+carry
						;by doing this we add old MSB+new MSB+carry

			BNE loop5		;if the Z (zero) flag is not 0 (as set by LSLS r0,r0,#2) then branch to loop5
						;if the shift caused r0 (the input value) to become 0, then the Z flag will become 1
						;so we only branch back to loop when Z is not equal to 0
						;or in other words, we loop while the input value has not become 0 yet

			MOV r0,r1		;done, store the result in r0

			bx    lr		;branch back to C function

~1Hz oscillator

this section will refer to information documented on this website and the STM32F103xx Reference manual datasheet and the Specifications datasheet.
Datasheet page numbers will be based on revision 13 of the Reference manual document and revision 7 of the Specifications document.
This oscillator will use the pin PA0 and will assume the PLL has already been set to run at 72Mhz

version 1

RCC_APB2ENR		equ 0x40021018		;this isn't code, but will tell the assembler to replace the word RCC_APB2ENR with 0x40010800 every time it is found
						;these addresses are documented beginning at page 50 under Memory Map in the Reference manual, and page 38 of the Specifications
GPIOA_CRL		equ 0x40010800
GPIOA_ODR		equ 0x4001080C

oscillator		MOV r0,#1<<2
			MOV32 r7,#RCC_APB2ENR
			STR r0,[r7,#0]
			MOV32 r0,#0x44444443
			MOV32 r7,#GPIOA_CRL
			STR r0,[r7,#0]

			MOV r0,#1
			MOV r2,#0

			MOV r3,#0
			MOV R4,#0
			MOV r5,#1000
			MOV r6,#3600

			MOV32 r7,#GPIOA_ODR
loop			STR r0,[r7,#0]
			BL delay
			STR r1,[r7,#0]
			BL delay
			B loop

delay			ADD r3,r3,#1
			CMP r3,r5
			MOVEQ r3,#0
			BXEQ lr
delay2			ADD r4,r4,#1
			CMP r4,r6
			MOVEQ r4,#0
			BNE delay2
			B delay