[En-Nut-Discussion] Questions regarding UART implementation

Thiago A. Corrêa thiago.correa at gmail.com
Sat Oct 2 20:59:27 CEST 2010


On Thu, Sep 30, 2010 at 4:00 AM, Ulrich Prinz <uprinz2 at netscape.net> wrote:
> Am 29.09.2010 21:29, schrieb Thiago A. Corrêa:
>> From what I understand, it helps to read from devices which send a
>> fixed size "packet". AFAIK it's not implemented in any of our archs.
>> Even on PCs it's fairly uncommon to use it. Even ppl who write bad
>> code to read from barcode scanners, don't usually use that feature.
>> It's usually much better to find the STX or ETX yourself, or
>> otherwise parse the protocol properly.
>>
> That will cause double overhead!
> First you design your packet to send out, then usart.c is copying it
> into a buffer that needs to be large enough. And you need to do a flush
> because only if the buffer is full enough it is sent.
> With my idea you borrow some heap, design your packet, call usart to
> send it out ( what is done immediately) and then you free the heap again.
> For reception in many cases you know how many bytes you have to expect.
> So instead of wating, polling, and then copying or even parsing byte by
> byte, you just borrow some heap, call the receiver and you thread
> awakes if the blockread posts the blocking event on reception of the
> last byte. You decode your data and free the heap again.

I understand that we are not bound to how it's done on the PC, but
that's definetely not how it works there :)
You define the "packet" size with an ioctl, not the read call. It
would make more sense indeed the way you propose, so you can handle
different "packet" sizes

>>
> That sounds good. There definitely must be a ioctl option to check if
> the return from read/write was cause of transfer errors or just a
> timeout. I have to think about that again, but normally it looks like this:
> rc = write( block, size, timeout);
> rc == size -> good transer
> rc < size -> transfer aborted cause of error or timeout
> rc == 0 transfer timed out at all
> rc = -1 transfer aborted with error

Then it would be changing the public API. The timeout could be done as
in the PC world with an ioctl. I think that's how it's done today in
Nut/OS as well, but not sure.
I would like to keep read/write as standard C calls.

>> True. Even if we implement the packet based API, we would have to
>> make the serial buffer size configurable and make the call fail with
>> some error code if it requests a packet larger than the buffer.
>>
> No. If you provide a buffer pointer to a buffer that is smaller than the
> size you set for the expected bytes, it is your problem. You cannot
> detect the size of a buffer behind a void* pointer.
> What we need to do is to prevent the reception of more bytes than the
> read was called for. These additional bytes need to be discarded.

That would likely mess with the protocol. But the user could use
flowcontrol with that feature to avoid losing data.

>>>
>>> If you have a function that allocated memory to form a block you
>>> don't need to copy it to the ringbuffer for transfer but the actual
>>> implemenation does. On smaller systems that need packet oriented
>>> communication ( block transfer) the ringbuffer memory could bee
>>> freed completely.
>>
>> This would be a nice trick. But I'm not sure how it would fit in our
>> current driver structure. Somehow we would have to get the buffer
>> pointer down to the driver level.
>
> We already have it there. The rbf->rbf_blkptr and ->rbf_blkcntr are
> handed over to the functions called by usart.c But not the first one:
> StartTx(void) or StartRx(void).
> For getting a DMA or IRQ driven Blocktransfer going only one
> modification is needed in USARTDCB that enables handing over rbf to the
> StartTx(RINGBUFFER *rbf) and StartRx(...);
>
> Only if you like to make it more general or have only one set of driver
> functions for all USARTs, then you need to pass NUTDEVICE *dev to all
> functions. That rises the question if there aren't much more systems
> touched but only USARTs... NUTDEVICE is used for lots of file / stream
> oriented devices...
>

If one set of functions for all USARTs is possible, it would be worth
it. Even if it's only a few of them :)
Perhaps it's because of the duplicate code, or the AVR32 gcc compiler
is not good in code optimization, but looks like a simple "do nothing"
program using Nut/OS and AVR32 takes about 30kb of flash.
And it's still only 2 uarts enabled in the compilation. For the A0
series, there should be 4.

>>>
>>> On bigger systems with DMA/PDC support, you save a lot of CPU time
>>> for all those TXE Interrupts that do not appear.
>>>
>>> Unfortunately I cannot implement DMA in the actual structure as DMA
>>> should lock the calling thread until transfer is over or set a
>>> signal after finishing the transfer. I tried to do that by using
>>> the normal StartTx(void) function that will rise an TXE Interrupt
>>> and this first TxReadyIrq( RINGBUF *rbf) will setup the DMA
>>> process. Unfortunately this function is out of thread as it is an
>>> interrupt and therefore cannot set a NutEventWait that blocks the
>>> calling thread.
>>>
>>
>> I'm confused. Why wouldn't the calling thread keep it's blocked
>> state from read()?
>
> Now the things get puzzled. The ringbuffer system in usart.c block the
> calling thread and the usart-driver unblocks it again. This may work
> with interrupt driven communication and a (ring)buffer that is available
> any time. If a timeout happens or a transfer is stopped there is no
> accident if an unexpected character is coming in or the interrupt is
> rised by any other cause ( line interference, GPIO switching...).
>
> For block transfer this should work fine but you have to take care that
> no character is triggering your interrupts after a transfer as there is
> no buffer available anymore. The rbf_blkptr is pointing anywhere or even
> NULL if it got freed by the caller. If then the interrupt throws a
> character at NULL you get a HardFault.
>
> With DMA you need to intercept all abort conditions of your transfer. So
> no one should post the caller thread but your DMA handler. This is
> because you need to ensure that the DMA went well or you need to take
> measures to shut down the DMA transfer if any error condition had
> happened. This sounds difficult but in fact a simple write 0 into the
> channel control register helps to cut it off and free it for a new start.
>

Not sure how it works with STM32, but it looks like the DMA controller
in AVR32 raises interrupts. This should fit well with how the usart
handles the blocking of the "userspace" thread.


>  > Anyway, I thought about using DMA first with the
>> EMAC driver, which should benefit the most from it, as it transfers
>> at least an ethernet frame each time. I see that u-boot, linux and
>> other kernels usually define a API for DMA, with dma_alloc (similar
>> to malloc). The question is, should we try to do something like that,
>> and have each arch provide the implementation, or should we confine
>> the DMA engine within the arch folder as a private API, so each port
>> does it as it pleases.
>
> I already wrote that part for STM32. DMA_Setup( channel, dest, source,
> length, options) does a lot things automatically you normally have to
> set up by lots of bit-options.
> I will definitely implement DMA for the STM32F107 EMAC. I investigated
> that DMA/PDC for SAM7XE a while ago but I found out that these chips
> doesn't support memory2memory transfer so the external EMAC on the
> internet radio is not reachable by DMA. With STM32 it is different. It
> supports mem2mem DMA. You even can use it as memset by reading from a
> single memory address containing the right value (byte/half word/word)
> and filling a destination area with that content.

Ok. So, I guess for now let's go with private API's for DMA. Right now
must implement other things that are priority for the projects I'm
working with, but I will try to at least follow the same func.
signature you use.

>>>
>>> In my STM32 implementation I fear that if I can call one set of
>>> functions from all usart interrupts the code in flash will be much
>>> lower even I implement and enable all features. All features mean:
>>> HW/SW-Handshake, DTR/DSR support, XON/XOFF, STX//ETX,
>>> Full-/Half-Duplex, IRDA, ...
>>>
>>
>> It makes a lot more sense to have all functions for the USART to
>> receive the DCB structure or the DEVUSART structure. That's how
>> drivers in Linux and Windows Driver Model works (sort of).
>
> Yes I agree. But I don't want to have a too big footprint in RAM and
> Flash. We can do things that are awe full to understand but save lots of
> space. Just experiment with this idea:
> The USARTDCB is a struct that consists of a union that handles block
> transfer pointers and ring buffer pointers overlayed. There might be a
> first byte to decide which set of the union is active.
> It is not a problem to understand for the geeks like us but...

It's actually fairly common :)
You could make macros/inline functions to make it easier to
understand: is_ringbuffer() or is_block_transfer() or something like
that.
Reduces the magic number comparisons in the code.

>>
>>> The backdraw of this change would be that all architectures have to
>>> be modified to pass DEVUSART *dev or at least USARTDCB *dcb to all
>>> functions. That would lead to one small problem, any function
>>> accessing the ringbuffer needs to derive it from the dcb. For a
>>> 72MHz STM32 it's not a problem to do RINGBUF *rbf = dcb->dcb_rbf;
>>> at every function start. But how is that on an AVR?
>>
>> That could easily be offset by any deduplication we achieve in the
>> code. It should not be too much per function really.
>>
> Yes I hope that a single I-can-all driver will use less space then the
> actual complete-driver-per-port version. On the other hand every driver
> can be highly optimized for every port in the actual solution. You don't
> want to hassle with blocks and DMA for the USARTx that you use as
> terminal / debug interface. But you want to have low impact high speed
> on the three other ports you use for device communication.

Well, the debug port is actually is a different driver, without interrupts.
I was thinking about sharing ring buffer handling or high
watermark/low watermark handling.

>>> So now I have three options: 1 Modify usart.c / usart.h / uart.h to
>>> the new structure and hope that someone is helping me to bull AVR
>>> and ARM architecture to that level.
>>
>> I can help with AVR and AVR32.
>
> It would help if you check out the actual version of the stm32 port
> devnut_m2n and see if avr32 still works. I ported the latest changes
> from the trunk to there which contain a lot of AVR32 related things. And
> I already did modifications on some interfaces for stm32 and that could
> have broken compatibility to other architectures. So any report would help.

I tried devnut_cortexm3 at work and someother I can't remember atm.
Didn't seem to have problems with it. I will try monday on devnut_m2n


>>>
>>> By the way, Option 2 is what I did for TWI cause STM32 has two
>>> interfaces and 4 interrupt providers ( two per interface) that call
>>> the same code existing only once. Old Tw*() functions are #defined
>>> to the stm32 specific functions. Works fine here :)
>>>
>>
>> Yesterday I was thinking about a platform independent TWI. So we
>> could have platform independent drivers to access EEPROMs and Atmel
>> QTouch chips.
>>
> ... Now I am sad... I already did that.

That's good news :)
Is it in trunk or just in one of the branches?

>> But I'm actually quite worried about the GPIO. I'm going to start
>> working on a board with UC3B164 connected with sensors/relays. I
>> would like to see and use an interface to set pin functions, level,
>> configure interrupts, etc in a way that's standard and portable
>> between current and future platforms. How are you handling this with
>> STM32? Btw, which STM32 are you using? I would like to take a look at
>> the datasheets :)
>>
> That is another story. There seems only one solution:
> Write a document of mandatory and optional defines and functions.
> So mandatory defines are the ones that are needed to get all example
> application working on all architectures. You need to write the
> NutPinConfigSet and NutPortConfigSet according to that.
>
> The gpio.h then only includes an architecture specific xxx_gpio.h

I'm starting to take a look at what you wrote in the wiki. I'm going
to start another thread on that subject :)

> I got some support from STM software division in form of hardware.
> Additionally we started our new products on STM32F series so I have some
> additional platforms at hand. So I actually use:
> STM32F103R8T6
> STM32F103ZET6
> STM32F107VCT6
>
> The absolutely perfect thing is that all these chips have their
> peripherals aligned. Write a driver for one chip, use it on the others
> as well. And together with the unifications ARM introduced with its
> CortexM series writing drivers is a dream.

It's more or less the same for AVR32. There are macros in the
toolchain that already contains the chip's register addresses as well
as structs that help access the register/bits. The internal layout of
the devices are usually the same. There are a few exceptions though,
but they usually call those peripherals differently, for example:
FLASHC and FLASHCW.

Kind Regards,
    Thiago A. Correa



More information about the En-Nut-Discussion mailing list