[En-Nut-Discussion] BUG?: Strange reboots and/or incorrect assembler calls to apparently random program addresses from ICC's xicall(). NutOS library calls using ICC optimizer fail. (sprintf, sprintf_P)

Brett Abbott Brett.Abbott at digital-telemetry.com
Tue Mar 29 22:43:17 CEST 2005


re: Strange reboots and/or incorrect assembler calls to apparently 
random program addresses from ICC's xicall(). 
NutOS library calls using ICC optimizer go to incorrect program 
locations. (sprintf, sprintf_P)

I have tracked down a bug which causes the program to call the wrong 
address in memory in certain circumstances.  This behaviour comes and 
goes based on code size/shape but isnt a "big" code size problem. 

The resulting symptoms are "reboots" and "strange behaviour" with the 
wrong functions being called as well as the processor even trying to 
execute application data (idata_start!) or uninitialised interupts.  
This occurs inside the xicall function which is not finding the correct 
function address for some functions in specific situations.

The problem can be reliably reproduced using the AVR Simulator so I am 
confident that this is not a hardware problem.  This leaves me wondering 
if I have a compiler issue, an issue with the way NutOS libraries are 
linked, or an issue with my environment.... (Hmmm, too many wonders)

The problem can be prevented if you include crt/spriintf.c, 
crt/vsprintf.c, crt/sprintf_P.c and crt/vsprintf_P.c in the application 
project source list and compile it as part of the application.  Not so 
ideal.

I detail the findings, scenarios and a bodgy workaround that seems to 
solve the problem below.  I have checked and triple checked the 
environment, paths, object files etc but perhaps Im missing something.  
Any advice is appreciated, or if you have seen similar things - this 
would help me focus on cause.  Unless I understand the cause of the 
issue, my regression testing will be forever extended.

Environment
-----------------
1. NutOS, 3.9.5
2. ICC AVR 6.31A, Code Optimiser on, resulting code @ 80% full with 4K 
Bootloader.
3. Ethernut 2
4. Atmel128, 32K SRAM
5. NutOS Library code located in source directories.  object files 
compiled into target directory structure.  libraries copied to icc/lib 
as expected.
6. JTAG 1, AVR Studio 4.11 (build 401)

Thank you
Brett


Symptoms/Observations
----------------------------
*Code executes as expected, until an indirect call is executed using 
xicall (indirection to support compression) at which point it jumps to 
the wrong address. 
* The address called in error is always consistent every execution and 
compilation but may change when the source changes.  ie. not random.
*Sometimes it caluclates the correct address if you have just the right 
size of code.  This is usually when you introduce enough debug code to 
materially alter the code and then it just goes until you take out the 
debug code or add more code.
*This behaviour occurs on hardware and on AVR Simulator
*Errors occur without external hardware such as Ethernet  being 
accessed.  ie. Native AVR Atmel 128
*Only certain library function calls fail.  Typically they are calls to 
functions which then call other functions.  both of these functions are 
in the same library but compiled from separate .o files - the .c files 
are in another directory again.  (Nutos files: calls to sprintf() which 
then fail when calling vsprintf(), and calls to sprintf_P which fails 
when calling vsprintf_P(). - Note these are nutos sources not ICC. - I 
will include source below for one of these)
* If I add both sprintf.c and vsprintf.c to the project list for the 
application, it forces the recompile of the two functions, creates local 
copies of the object files and the problem goes away - possibly the 
linker is losing track of where to send the indirection?  Strange that 
final application code size should alter this - without changing the 
libraries.
* Other nutos library calls work ok.  (eg. fprintf etc)
* I am uncertain if the xicall lookup table is wrong or xicall is 
looking in the wrong place.
* Are there another people using ICC compression and sprintf?

How to track down this "bug"
-----------------------------------
As the bug typically occurs at exactly the same place in the code 
(complicated only by real world events) I placed flushed "writes" to a 
uart at key points in the code until I got as close to the problem in my 
app and then used the JTAG to step through at the microcode level.

The "beauty" of this issue is that when you catch it, it is reproduceable.

Why could it be a....
----------------------
1) Compiler/Linker problem
The intermittent nature (by code version) but reliably reproduceable 
nature when occurring of this suggests that we may have found a scenario 
where linker is confused.  Nutos has multiple levels of indirection, add 
to this compression from ICC.  The fact that it occurs on such widely 
used functions (in the nutos world) suggests an obscure pagey memory 
mappy type thing...   The latest version 6.31A has been in use here for 
some time (ie. not recent change).  Of course we may be using an 
unsupported method...
The problem goes away (ie. lookup table in func_lit's is correct) when 
the source is compiled in the current application directory at 
application compile time.  This could suggest a linker problem or could 
just be masking it.
2) Library linking issue
Perhaps the move to having .o files in different directories to the .c 
files has resulted in a more complex environment for the linker or 
perhaps the order of linking causes a mismatch in mapping tables.  
Having local copies of the .o file seems to solve this (the .c file 
stayed in the library folders)
3) Environment Problem
Aha, the obvious answer and always most likely.  I believe Ive 
eliminated: stale libraries or using wrong libraries.  Ive confirmed 
that changes to nutos libraries are carried through to the executing 
code.  It is possible that something is altering the memory layout by 
including a different structure or object at library compile time as to 
application compile/link time.  Ive checked that all -D options are the 
same between library make time and application make time.  I have now 
reinstalled NutOs a dozen times (tried many versions), and ICC several 
times so think I havent screwed up anything silly (but who can be sure)?.
4) Source code problem.
The obvious other answer.  Perhaps Im using sprintf or sprintf_P 
incorrectly?


Any help is appreciated.  Let me know if you have seen similar issues.  
I suspect this doesnt occur on gcc. 

Many Many Thanks
Brett

// main.c portion - one of the offending command

// variable defs
char *OutText  (uses heap alloc of 200 bytes)
prog_char P_XMLDATA1_s1[] = "<tr x=\"%s\" t=\"a\" m=\"i\"";  
(prog_char is: #define prog_char const char)
volatile char XID_String[15];  (typically a 4 alphanumeric 0 terminated string)

// offending code
sprintf_P(OutText, P_XMLDATA1_s1, XID_String);



// sprintf_p.c

#include "nut_io.h"


/*!
 * \addtogroup xgCrtStdio
 */
/*@{*/

/*!
 * \brief Write formatted data to a string.
 *
 * Similar to sprintf() except that the format string is located in
 * program memory.
 *
 * \param buffer Pointer to a buffer that receives the output string.
 * \param fmt    Format string in program space containing conversion 
 *               specifications.
 *
 * \return The number of characters written or a negative value to
 *         indicate an error.
 */
int sprintf_P(char *buffer, PGM_P fmt, ...)
{
    int rc;
    va_list ap;

    va_start(ap, fmt);
    /* Bugfix by Ralph Mason. */
    rc = vsprintf_P(buffer, (char *) fmt, ap);
    va_end(ap);

    return rc;
}


//vsprintf_p.c
#include "nut_io.h"

#include <string.h>
#include <sys/heap.h>

/*!
 * \addtogroup xgCrtStdio
 */
/*@{*/

static int _sputb(int fd, CONST void *buffer, size_t count)
{
    char **spp = (char **) ((uptr_t) fd);

    memcpy(*spp, buffer, count);
    *spp += count;

    return count;
}

/*!
 * \brief Write argument list to a string using a given format.
 *
 * Similar to vsprintf() except that the format string is located in
 * program memory.
 *
 * \param buffer Pointer to a buffer that receives the output string.
 * \param fmt    Format string in program space containing conversion 
 *               specifications.
 * \param ap     List of arguments.
 *
 * \return The number of characters written or a negative value to
 *         indicate an error.
 */
int vsprintf_P(char *buffer, PGM_P fmt, va_list ap)
{
    int rc;
    char *rp;
    size_t rl;

    rl = strlen_P(fmt) + 1;
    if ((rp = NutHeapAlloc(rl)) == 0)
        return -1;
    memcpy_P(rp, fmt, rl);
    rc = _putf(_sputb, (int) ((uptr_t) &buffer), rp, ap);
    NutHeapFree(rp);
    *buffer = 0;

    return rc;
}



(All code copyright as per original source)
-- 
-----------------------------------------------------------------
Brett Abbott, Managing Director, Digital Telemetry Limited
Email: Brett.Abbott at digital-telemetry.com
PO Box 24 036 Manners Street, Wellington, New Zealand
Phone +64 (4) 5666-860  Mobile +64 (21) 656-144
------------------- Commercial in confidence --------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.egnite.de/pipermail/en-nut-discussion/attachments/20050330/eaae2d37/attachment.html>


More information about the En-Nut-Discussion mailing list