[En-Nut-Discussion] TCP uses to much memory with zero window?

Fri Jun 26 01:32:06 CEST 2009

On Thu, Jun 11, 2009 at 01:57:00PM +0200, Bernd Walter wrote:
> On Wed, Jun 10, 2009 at 04:09:33PM +0200, Harald Kipp wrote:
> > Bernd Walter wrote:
> > > But the network problem is still there.
> > > I started with about 26k free and it went below 3k and finally the
> > > system hang.
> > 
> > I'm not sure that this is network related. After upgrading, you are now
> > able to dump heap usage.
> > 
> > Add
> > 
> > HWDEF+=-DNUTDEBUG_HEAP
> > 
> > to UserConf.mk (build and app tree) and rebuild everything. At specific
> > locations within your application call
> > 
> > NutHeapDump(<file-pointer>);
> 
> I like that feature.
> 
> > replacing <file-pointer> with your console stream, probably stdout.
> > You'll get a list of all allocated fragments, which will help to locate
> > the memory hole.
> 
> It agrees with my assumption.
> After some time it fills up with many small netbuf allocations:
> [...]
> 0x20b420(60) netbuf.c:146

[...]

Hi Harald and others interested in this,
 I investigated this problem more deeply in the source.

I still fail to see how the window size is enforced.

There is a memory protector in the receive, which keeps 2048 Bytes free.
Don't know why it doesn't protect my system from failing - maybe
the memory is too fragmented - 2048 is not very much.
But it is not the right answer, because at some point it just ignores
incoming packets and therefor ZWP probes are not answered, which then
makes the peer drop the connection.
The right answer would be to resend the zero window size, so the peer
knows that we are still alive and the peer didn't miss a window reopening.

The packed makes it up to NutTcpStateEstablished, which is basicly Ok,
because it might contain a new ack and we can't just ignore a ZWP.
But is is also queued into the socket after calling NutTcpProcessAck,
which is not Ok.

There is a check to handle duplicate packets (tcpsm.c at 1207 for me):
                if (th_seq == thq_seq) {
                    NutNetBufFree(nb);
                    sock->so_tx_flags |= SO_ACK | SO_FORCE;
                    NutTcpOutput(sock, 0, 0);
                    return;
                }
If I understand it correctly it drops the packet and then queues a
new packet to reack the current receiver state.
Since the enqueued packet also contains the current windows size this
is excatly what we need as a response to the ZWP as well.
Unfortunately the packet is already queued before:
                if (th_seq < thq_seq) {
                    *nbqp = nb;
                    nb->nb_next = nbq;
                    break;
                }
                if (th_seq == thq_seq) {
                    NutNetBufFree(nb);
                    sock->so_tx_flags |= SO_ACK | SO_FORCE;
                    NutTcpOutput(sock, 0, 0);
                    return;
                }

What do you think about to change that to something like this:
                if (th_seq < thq_seq && nb->nb_ap.sz <= thq->th_win) {
                    *nbqp = nb;
                    nb->nb_next = nbq;
                    break;
                }
                if (th_seq == thq_seq || nb->nb_ap.sz > thq->th_win) {
                    NutNetBufFree(nb);
                    sock->so_tx_flags |= SO_ACK | SO_FORCE;
                    NutTcpOutput(sock, 0, 0);
                    return;
                }

There is still no protection from queuing up to the receive window with
single byte payloaded packets from bad peers, but it protects to happen
under normal conditions.

-- 
B.Walter <bernd at bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.