[En-Nut-Discussion] Questions regarding UART implementation

Thu Sep 30 09:00:48 CEST 2010

Sh...
eMail client crashed. So again :)

Am 29.09.2010 21:29, schrieb Thiago A. Corrêa:
> Hi Ulrich,
>
> There are a lot of topics here :)
>
>>
>> For the blockread / blockwrite functions I found a missleading
>> description. So should there be some totally different things:
>> USART_BLOCKWRITE should control block transfer for writing.
>> USART_BLOCKREAD should control block transfer for reading.
>> USART_BLOCKING should control if the calling thread is blocked
>> until transceiving is done. I think there is an option ASYNC too
>> which I would call the one that controls if a thread is blocked or
>> not on calling a usart transfer (read or write). But that is not
>> working or not implemented.
>
> From what I understand, it helps to read from devices which send a
> fixed size "packet". AFAIK it's not implemented in any of our archs.
> Even on PCs it's fairly uncommon to use it. Even ppl who write bad
> code to read from barcode scanners, don't usually use that feature.
> It's usually much better to find the STX or ETX yourself, or
> otherwise parse the protocol properly.
>
That will cause double overhead!
First you design your packet to send out, then usart.c is copying it
into a buffer that needs to be large enough. And you need to do a flush
because only if the buffer is full enough it is sent.
With my idea you borrow some heap, design your packet, call usart to
send it out ( what is done immediately) and then you free the heap again.
For reception in many cases you know how many bytes you have to expect.
So instead of wating, polling, and then copying or even parsing byte by
byte, you just borrow some heap, call the receiver and you thread
awakes if the blockread posts the blocking event on reception of the
last byte. You decode your data and free the heap again.

>>
>> For all those functions I miss something too: If you use transfers
>> async, you will not get an reply on the read/write that is valid as
>> the transfer is not finished at that time. So _ioctl needs another
>> option too. Besides getting the information about the errors from
>> the last transfer one needs to get the status of the current
>> transfer, i.e. the number of bytes trasnferred and the status if
>> the transfer is ongoing or whatever reason aborted.
>>
>
> I wrote a while ago a serial port class for my desktop apps and
> spent some time digging the Unix and Windows APIs. On Linux, if you
> use a non-blocking transfer, read() returns EWOULDBLOCK which is a
> define to some negative number. Much like the sockets API. Otherwise
> it returns the number of read bytes. Is that what you mean?
>
That sounds good. There definitely must be a ioctl option to check if 
the return from read/write was cause of transfer errors or just a 
timeout. I have to think about that again, but normally it looks like this:
rc = write( block, size, timeout);
rc == size -> good transer
rc < size -> transfer aborted cause of error or timeout
rc == 0 transfer timed out at all
rc = -1 transfer aborted with error

For better reason detection you definitely need a ioctl.

>> So what we have is a usart that relies on ringbuffer even the
>> ringbuffer struct supplies blockptr / blockcntr. If you use packet
>> oriented data, you cannot handle timeouts on packets cause the
>> timeouts are based on the ringbuffer and, if the ringbuffer is to
>> small two locks block your thread no, better, something blocks you
>> that you cannot determine.
>
> True. Even if we implement the packet based API, we would have to
> make the serial buffer size configurable and make the call fail with
> some error code if it requests a packet larger than the buffer.
>
No. If you provide a buffer pointer to a buffer that is smaller than the 
size you set for the expected bytes, it is your problem. You cannot 
detect the size of a buffer behind a void* pointer.
What we need to do is to prevent the reception of more bytes than the 
read was called for. These additional bytes need to be discarded.
>>
>> If you have a function that allocated memory to form a block you
>> don't need to copy it to the ringbuffer for transfer but the actual
>> implemenation does. On smaller systems that need packet oriented
>> communication ( block transfer) the ringbuffer memory could bee
>> freed completely.
>
> This would be a nice trick. But I'm not sure how it would fit in our
> current driver structure. Somehow we would have to get the buffer
> pointer down to the driver level.

We already have it there. The rbf->rbf_blkptr and ->rbf_blkcntr are 
handed over to the functions called by usart.c But not the first one: 
StartTx(void) or StartRx(void).
For getting a DMA or IRQ driven Blocktransfer going only one 
modification is needed in USARTDCB that enables handing over rbf to the 
StartTx(RINGBUFFER *rbf) and StartRx(...);

Only if you like to make it more general or have only one set of driver 
functions for all USARTs, then you need to pass NUTDEVICE *dev to all 
functions. That rises the question if there aren't much more systems 
touched but only USARTs... NUTDEVICE is used for lots of file / stream 
oriented devices...

 > Then again, I wonder if there is
> real use for the packet oriented reads.
>
I have actually 20 uses for it in my development of new sensors.
I can remember at least 15 projects in my own past where this would have 
been an improvement or where I implemented this sort of handling.
Most of them where in a time long before I knew Nut/OS.
>>
>> On bigger systems with DMA/PDC support, you save a lot of CPU time
>> for all those TXE Interrupts that do not appear.
>>
>> Unfortunately I cannot implement DMA in the actual structure as DMA
>> should lock the calling thread until transfer is over or set a
>> signal after finishing the transfer. I tried to do that by using
>> the normal StartTx(void) function that will rise an TXE Interrupt
>> and this first TxReadyIrq( RINGBUF *rbf) will setup the DMA
>> process. Unfortunately this function is out of thread as it is an
>> interrupt and therefore cannot set a NutEventWait that blocks the
>> calling thread.
>>
>
> I'm confused. Why wouldn't the calling thread keep it's blocked
> state from read()?

Now the things get puzzled. The ringbuffer system in usart.c block the 
calling thread and the usart-driver unblocks it again. This may work 
with interrupt driven communication and a (ring)buffer that is available 
any time. If a timeout happens or a transfer is stopped there is no 
accident if an unexpected character is coming in or the interrupt is 
rised by any other cause ( line interference, GPIO switching...).

For block transfer this should work fine but you have to take care that 
no character is triggering your interrupts after a transfer as there is 
no buffer available anymore. The rbf_blkptr is pointing anywhere or even 
NULL if it got freed by the caller. If then the interrupt throws a 
character at NULL you get a HardFault.

With DMA you need to intercept all abort conditions of your transfer. So 
no one should post the caller thread but your DMA handler. This is 
because you need to ensure that the DMA went well or you need to take 
measures to shut down the DMA transfer if any error condition had 
happened. This sounds difficult but in fact a simple write 0 into the 
channel control register helps to cut it off and free it for a new start.

--- Blocking ---
I was confused by the minimalistic description of the ioctl options. So 
in fact there is no non-blocking communication in Nut/OS. You decide the 
blocking time by the timeout you give to the read or write function. Any 
Sync/Async does not have to do with the blocking of the caller thread 
but with the interface setup. The STM32 provides 5 usarts that can do 
synchronous mode and can be used as SPI too. I wondered how to enable 
that one in Nut/OS now I know. ioctl( usart, USART_SETSYNCMODE, &mode); 
If you pass over Mode x the appropriate SPI mode is set. But that's 
another story.

The description of USART_SETBLOCKWRITE or so told about blocking, but in 
fact this wasn't true. So I got confused in the first run.

 > Anyway, I thought about using DMA first with the
> EMAC driver, which should benefit the most from it, as it transfers
> at least an ethernet frame each time. I see that u-boot, linux and
> other kernels usually define a API for DMA, with dma_alloc (similar
> to malloc). The question is, should we try to do something like that,
> and have each arch provide the implementation, or should we confine
> the DMA engine within the arch folder as a private API, so each port
> does it as it pleases.

I already wrote that part for STM32. DMA_Setup( channel, dest, source, 
length, options) does a lot things automatically you normally have to 
set up by lots of bit-options.
I will definitely implement DMA for the STM32F107 EMAC. I investigated 
that DMA/PDC for SAM7XE a while ago but I found out that these chips 
doesn't support memory2memory transfer so the external EMAC on the 
internet radio is not reachable by DMA. With STM32 it is different. It 
supports mem2mem DMA. You even can use it as memset by reading from a 
single memory address containing the right value (byte/half word/word) 
and filling a destination area with that content.
>
>>
>> In my STM32 implementation I fear that if I can call one set of
>> functions from all usart interrupts the code in flash will be much
>> lower even I implement and enable all features. All features mean:
>> HW/SW-Handshake, DTR/DSR support, XON/XOFF, STX//ETX,
>> Full-/Half-Duplex, IRDA, ...
>>
>
> It makes a lot more sense to have all functions for the USART to
> receive the DCB structure or the DEVUSART structure. That's how
> drivers in Linux and Windows Driver Model works (sort of).

Yes I agree. But I don't want to have a too big footprint in RAM and 
Flash. We can do things that are awe full to understand but save lots of 
space. Just experiment with this idea:
The USARTDCB is a struct that consists of a union that handles block 
transfer pointers and ring buffer pointers overlayed. There might be a 
first byte to decide which set of the union is active.
It is not a problem to understand for the geeks like us but...
>
>> The backdraw of this change would be that all architectures have to
>> be modified to pass DEVUSART *dev or at least USARTDCB *dcb to all
>> functions. That would lead to one small problem, any function
>> accessing the ringbuffer needs to derive it from the dcb. For a
>> 72MHz STM32 it's not a problem to do RINGBUF *rbf = dcb->dcb_rbf;
>> at every function start. But how is that on an AVR?
>
> That could easily be offset by any deduplication we achieve in the
> code. It should not be too much per function really.
>
Yes I hope that a single I-can-all driver will use less space then the 
actual complete-driver-per-port version. On the other hand every driver 
can be highly optimized for every port in the actual solution. You don't 
want to hassle with blocks and DMA for the USARTx that you use as 
terminal / debug interface. But you want to have low impact high speed 
on the three other ports you use for device communication.
>
>> Ah, by the way. I am thinking about making the things a bit
>> comfortable. So one could set "Use Interrupts" and "Use DMA"
>> independantly for every USART in a system. If it stays like it is,
>> so usart1.c includes usart.c this saves some flash if the user
>> unchecks the one or the other option. If there is only one usart.c
>> calld by the interrupts of usartx.c it could be an idea to include
>> portions of the code only if at least one of the usarts has enabled
>> that option. So DMA handling in the general driver is only enabled
>> and compiled if at least one usart has the option set in nutconf.
>>
>
> Wouldn't it actually make the code bigger? Some routines would be
> duplicated in the binary blob, one with DMA and another without. I'm
> not sure if there is a use-case were one would like to enable DMA
> for one USART but not for the others.
>
We have to find out about that. I am not sure about the footprint.

>> So now I have three options: 1 Modify usart.c / usart.h / uart.h to
>> the new structure and hope that someone is helping me to bull AVR
>> and ARM architecture to that level.
>
> I can help with AVR and AVR32.

It would help if you check out the actual version of the stm32 port 
devnut_m2n and see if avr32 still works. I ported the latest changes 
from the trunk to there which contain a lot of AVR32 related things. And 
I already did modifications on some interfaces for stm32 and that could 
have broken compatibility to other architectures. So any report would help.
>
>> 2 Just split usart.h / uart.h into stm32_usart.h and other_usart.h
>> while usart.h includes the one or the other depending on the
>> architecture selected.
>
> It can easily became a nightmare regarding to maintenance and
> portability.
>
>> 3 Leave it as it is and forget about that all :)
>
> Tempting *smile* Actually I think Nut/OS already has the most
> comprehensive USART driver from the RTOS I know of, and for the
> applications we work with, that's a huge benefit :) But it's also
> quite hard to maintain the way it is... If a bug is found in the flow
> control code for instance, one has to remember to fix it in all other
> archs, and it only get's worst with new platforms being added.
>
Yes but that is the same problem with all systems that cover different 
architectures. You cannot get around that. The only thing that helps is 
to define global interface rules and then let the developers in the 
architectures do what they can. Write bug reports to the tracker so a 
user of an architecture can check if there was something reported that 
hits his own problem and the he can check if the fix done for one 
architecture will fix his architecture too.
>>
>> By the way, Option 2 is what I did for TWI cause STM32 has two
>> interfaces and 4 interrupt providers ( two per interface) that call
>> the same code existing only once. Old Tw*() functions are #defined
>> to the stm32 specific functions. Works fine here :)
>>
>
> Yesterday I was thinking about a platform independent TWI. So we
> could have platform independent drivers to access EEPROMs and Atmel
> QTouch chips.
>
... Now I am sad... I already did that.
TWI bus:
TwMasterTransact() send / receive any ammount of bytes to / from anywhere.
TwMasterRegRead() send internal address to chip and receive data with 
automatic RESTART condition handling.
TwMasterRegWrite() write internal address and write data to chip.

We already approved it working with EEPROMs, IO-Expanders, Serial-Port 
Expanders, Temperature- and Humidity Sensors, Displays...

EEPROM:
I added some fixes and improvements to at24c.c in STM32 port. Now you 
can simply call EE_Init() and you can access your EEPROM with 
EEWriteData() and EEReadData(). These routines call At24cRead() and 
At24cWrite(). They implement a full low level driver for EEPROMs 
including page handling and ACK-Polling. In the time of ACK-Polling the 
TWI bus is free for other transfers.

The test system uses two PCA9555 IO-Expanders and an EEPROM on one bus. 
While wildly toggle LEDs and reading buttons at these two Port Expanders 
the EEPROM is written read and compared in large segments ( bigger than 
page-size). There is almost no visible impact on the LEDs and the button 
reaction time with 100kBit/s. With 400kBit/s it is simply invisible.

IO-EXTENDERS (PCA9555 and others):
The IO-Expander driver fits himself into the NutGpio system. You can use 
it as you can use any other GPIO of Nut/OS.
My LED and key handler functions can be enabled to use IO-Expanders on 
TWI like any other GPIO port of your chip.

I have some Qtouch demo boards so I can do dome testing with that to and 
write a driver that fits into the existing system too.

> But I'm actually quite worried about the GPIO. I'm going to start
> working on a board with UC3B164 connected with sensors/relays. I
> would like to see and use an interface to set pin functions, level,
> configure interrupts, etc in a way that's standard and portable
> between current and future platforms. How are you handling this with
> STM32? Btw, which STM32 are you using? I would like to take a look at
> the datasheets :)
>
That is another story. There seems only one solution:
Write a document of mandatory and optional defines and functions.
So mandatory defines are the ones that are needed to get all example 
application working on all architectures. You need to write the 
NutPinConfigSet and NutPortConfigSet according to that.

The gpio.h then only includes an architecture specific xxx_gpio.h

On top of the mandatory defines and functions you may add optional chip 
specific things.
You have to do it that way as you cannot handle all chips in one 
function set.
AVR has only one peripheral function per pin an almost automatically 
activates this function if you use the peripheral.
AT91SAMx has up to three different peripherals per pin and the pin 
configuration decides to which of the peripherals it is connected.
STM32 has up to 5 different peripherals per pin but the peripheral or 
its clock control register decides to which pin it is connected. There 
are up to 3 different sets of pins where some peripherals can be 
connected to.

I got some support from STM software division in form of hardware. 
Additionally we started our new products on STM32F series so I have some 
additional platforms at hand. So I actually use:
STM32F103R8T6
STM32F103ZET6
STM32F107VCT6

The absolutely perfect thing is that all these chips have their 
peripherals aligned. Write a driver for one chip, use it on the others 
as well. And together with the unifications ARM introduced with its 
CortexM series writing drivers is a dream.

Best regards,
Ulrich