[En-Nut-Discussion] Questions regarding UART implementation

Tue Oct 5 00:38:29 CEST 2010

Hi Thiago!

Am 02.10.2010 20:59, schrieb Thiago A. Corrêa:
> [...]
> I understand that we are not bound to how it's done on the PC, but
> that's definetely not how it works there :) You define the "packet"
> size with an ioctl, not the read call. It would make more sense
> indeed the way you propose, so you can handle different "packet"
> sizes
>
Yes, that's what I have in Mind. If you do packet oriented 
communication, in lots of cases you exactly know what's going out and 
what to expect. It the saves a lot of CPU time and memory if you write 
the software around the packets instead of searching for start / end 
markers, counters, length and copy from one buffer to another.
>>>
>> That sounds good. There definitely must be a ioctl option to check
>> if the return from read/write was cause of transfer errors or just
>> a timeout. I have to think about that again, but normally it looks
>> like this: rc = write( block, size, timeout); rc == size ->  good
>> transer rc<  size ->  transfer aborted cause of error or timeout rc
>> == 0 transfer timed out at all rc = -1 transfer aborted with error
>
> Then it would be changing the public API. The timeout could be done
> as in the PC world with an ioctl. I think that's how it's done today
> in Nut/OS as well, but not sure. I would like to keep read/write as
> standard C calls.

I didn't intend to change write and read. My intention was something else:
The basic change needed to do nice packet transfer is to change 
*StartTx(void) and *StartRx(void) to *StartTx(RINGBUF *rbf) and 
*StartRx(RINGBUF *rbf).

The functions can then check if the data is delivered in ringbuffer 
(rbf->rbf_cnt != 0) or as a packet (rbf->rbf_blkptr != NULL or 
rbf->rbf_blkcnt != 0)

With that small change, a no-ringbuffer option can be adopted to all 
platforms even with IRQ driven transfers. It makes sense as you save the 
RAM for the ringbuffer and the copy from ringbuffer to your buffers.

The second change then would be to upgrade platforms that support DMA or 
PDC to use these mechanisms as it again saves CPU time.
>
>>> True. Even if we implement the packet based API, we would have
>>> to make the serial buffer size configurable and make the call
>>> fail with some error code if it requests a packet larger than the
>>> buffer.
>>>
>> No. If you provide a buffer pointer to a buffer that is smaller
>> than the size you set for the expected bytes, it is your problem.
>> You cannot detect the size of a buffer behind a void* pointer. What
>> we need to do is to prevent the reception of more bytes than the
>> read was called for. These additional bytes need to be discarded.
>
> That would likely mess with the protocol. But the user could use
> flowcontrol with that feature to avoid losing data.

As Harald described me lately, the mechanism works like this:
You throw in bytes to the transmit buffer. Nothing happens until you do 
one of two things:
- You put some more bytes into the buffer that fills it above the high 
water mark what gets the transmit routine going.
- You issue a write(fb, 0, 0) what is the same as fflush(fb).

If you send more characters than the buffer can take, your thread will 
be blocked until the last character is at least in the buffer.

Reading is even more difficult if you like to do packet communication:
You request to read 128 characters. So you ask for
rc = read( fb, mybuffer, 128, 1000) what tells read to block your thread 
for 1 second and hope that the needed characters came in.
rc is now a number of characters that came in.
The timeout is done by a simple NutEventWait(usarthandle, timeout).
After that timeout it comes back to you, with something in between 0 or 
128 characters.
So what you like to get int that time can be less or far more than you 
expect (the rest is waiting in the ringbuffer). For a terminal 
connection this is pretty fine as you don't care how many characters are 
already sent to the other side and you are happy if you get the string 
the user typed when the software arrives at your scanf() or gets() function.

With packed communication you expect a special length packet in a 
special time. Nothing less, nothing more.
Especial if you need to do transceiver handling (DE/RE for RS485) it is 
important to switch after the last stop bit is out and not a us earlier :)

I thought about your idea about using ioctl(). I think that is pretty 
fine. You can add some flags and you'll be safe:
The rc = write(fb, packet, size, timeout) returns with the number of 
bytes really written. This can be counted in the interrupt routine or 
fetched from the DMA controller counter register.
If you see that rc != size you can request the detailed error 
information from ioctl(). For the 32 bit systems there should be enough 
bits left in the error flags to put in the DMA abort reasons.
>
> [...]
> If one set of functions for all USARTs is possible, it would be
> worth it. Even if it's only a few of them :) Perhaps it's because of
> the duplicate code, or the AVR32 gcc compiler is not good in code
> optimization, but looks like a simple "do nothing" program using
> Nut/OS and AVR32 takes about 30kb of flash. And it's still only 2
> uarts enabled in the compilation. For the A0 series, there should be
> 4.

I had some tests using twi, usart and some other peripherals with Nut/OS 
port to STM32F107 and it took 14k. Now I added a full CANopen stack, 
lots of console output and command interface, 3 usarts and structs for 
some external devices and I am around 38k.

But I had a hard trip though linker scripts and makefiles. May be I can 
check your scripts and Makefile in the next days.
>
> [...]
> Not sure how it works with STM32, but it looks like the DMA
> controller in AVR32 raises interrupts. This should fit well with how
> the usart handles the blocking of the "userspace" thread.
>
It is the same with the STM32.
You can choose between DMA-Half-Complete, DMA-Full-Complete, 
DMA-Last-Byte interrupts. In addition with USARTs you still can enable 
the TX-Complete interrupt, what is the best choice for packet transfer. 
Cause with this interrupt you are sure the bus is free.

So you set up a DMA transfer, enable the TXC Interrupt and do whatever 
you like then. After the last byte is out ( completely including STOP 
bit) you get an interrupt.

As packet handling means that you normally send a packet to receive 
information and then set the task sleeping until the information gets 
in, the StartTX() function can do a NutEventWait(packethandle...).
The TX-Complete Interrupt will do an NutEventPost(packethandle).

The StartTx() is called by write() what again is called from the thread 
that likes to write. So it is on the threads context and therefore a 
NutEventWait will pause the calling thread.

> [...]
> Ok. So, I guess for now let's go with private API's for DMA. Right
> now must implement other things that are priority for the projects
> I'm working with, but I will try to at least follow the same func.
> signature you use.
>
Fine. That's why I thought that it makes more sense to split that work 
into two parts. First I will modify usart.h to send the rbf pointer to 
all transmission related functions. That can be done for all 
architectures without breaking anything.

The second part then would be to hand over the dcb instead of the rbf. 
That would enable to have smaller rbfs for packet mode and to use one 
single software for any amount of USARTs in one chip.
> [...]
> It's actually fairly common :) You could make macros/inline functions
> to make it easier to understand: is_ringbuffer() or
> is_block_transfer() or something like that. Reduces the magic number
> comparisons in the code.
>
That isn't the problem. But I already smashed Michaels brain by a short 
excursus into DEVICE-BUS-NODE chains. I don't want to send you offline 
too :)

No, just a short one:
Normally you program a device driver that uses a bus to communicate with 
a node.
So my temperature sensor is a node on TWI bus and the LM75 driver talks 
via the TWI bus with the registerd LM75 node at address xy.
The EEPROM driver uses the same TWI bus but registered a node 0x50, the 
EEPROM.

Now, things get funny, if you look at two things:
The Terminal device registered as stdout, registers a OLED driver that 
connects to the SPI node (at /CS1) via the SPI bus.
Here the gag is that there are drivers chained. But it is possible as 
for SPI this DEVICE-BUS-NODE concept is established in Nut/OS.

With USARTs it is not and with I2C it isn't too.
USARTs are registered as devices, what is wrong as they are busses.
That's why someone could get dizzy while studying Nut/OS.
The terminal is the device that uses the USART-bus to talk to a node 
what can be the PC at the other end.

If you implement it that way, you can implement one thread that talks to 
your small sensor devices on the other end of a RS485 bus. If you have 
20 of them on one bus, you fire this single thread 20 times and pass the 
sensor address as a parameter. It the registers the nodes at the USART 
bus and happy you are.
With the actual USART implementation you need to write a dispatcher that 
handles all traffic and sends it to the threads.

For that reason I already rewrote the STM implementation of the TWI bus.
Let's see how it runs in the next few weeks.
>
> Well, the debug port is actually is a different driver, without
> interrupts. I was thinking about sharing ring buffer handling or
> high watermark/low watermark handling.
>
I did it the other way round. With the nutconf option you just declare 
one of the uarts to be the debug port. The rest is identical. But you 
have to have interrupt going. If you adapt the driver with at least the 
ringbuffer pointer presented to all functions, you can adopt polling 
driver as  an option for all usarts.
>
> I tried devnut_cortexm3 at work and someother I can't remember atm.
> Didn't seem to have problems with it. I will try monday on
> devnut_m2n
>
The other cortex branches where just interim ones. I needed them for 
cross merging of some ports I got. I'll delete them in a few days. As 
soon the stm port is running fine and Michael did some work on his 
Luminary port, I think about switching it to the cortex or stm32 branch. 
But the best would be we could merge it to the trunk.
>
>>>>
>>>> By the way, Option 2 is what I did for TWI cause STM32 has two
>>>> interfaces and 4 interrupt providers ( two per interface) that
>>>> call the same code existing only once. Old Tw*() functions are
>>>> #defined to the stm32 specific functions. Works fine here :)
>>>>
>>>
>>> Yesterday I was thinking about a platform independent TWI. So we
>>> could have platform independent drivers to access EEPROMs and
>>> Atmel QTouch chips.
>>>
>> ... Now I am sad... I already did that.
>
> That's good news :) Is it in trunk or just in one of the branches?
>
The EEPROM driver is in the trunk. The STM version is a bit more 
sophisticated. Doesn't really need much mor flash or ram, but comes 
around a problem with large cross traffic ( multiple devices sharing TWI 
bus) and with those small Microchip EEPROMs where the lower three bits 
of the chip address is misused as page select for the internal address. 
I hate Microchip, they have no idea of what I2C is.
And they already cost us a lot of money with their whatsoever bus.

> [...]
>> The gpio.h then only includes an architecture specific xxx_gpio.h
>
> I'm starting to take a look at what you wrote in the wiki. I'm going
> to start another thread on that subject :)
>
Yes, we should keep that off from this thread :) But feel free to add 
AVR32 related things into the wiki and add corrections and opinions.

[...]

So, best regards, good nite and CU!

Ulrich