Race condition in list_consume()

Raul Igrisan raul.igrisan at gmail.com
Mon Apr 16 17:36:39 CEST 2007


It is wrong to release the mutex (unlock) before the wait. The wait is
atomically releasing the lock, wait on the condition to be signalled than
acquire it back for the current thread before returning.

 

  _____  

From: Andrija Petrovic [mailto:andrija.petrovic at software888.com] 
Sent: 16 April 2007 18:19
To: devel Devel
Subject: Re: Race condition in list_consume()

 

Perhaps I'm missing the point here, but I'm not sure why B should lock() the
list and then wait for somebody else to alter the list through
gwlist_remove_producer().

It sound perfectly reasonable to me to lock() and later unlock() the list in
gwlist_remove_producer().
Furthermore, putting unlock as the first line in gwlist_consume does not
sound logical to me.
So, I see no need to change anything in gwlib.
AFAIK, the whole problem arises from problematic usage of gwlib by B.
Correct me if I'm wrong, but, perhaps...

B does:

     lock(list); /* atomic lock */

 

        list->single_operation_lock->owner = -1;
          unlock(list); /*let the list be altered from some other thread,
and then wait for that alteration*/

        pthread_cond_wait(&list->nonempty,
&list->single_operation_lock->mutex);



There is, however, the basic question behind the whole problem: why is
list->single_operation_lock->owner set to -1? If that change was the point
of lock()ing and unlock()ing the list, perhaps a change in software design
could help...

Andrija Petrovic

Andreas Fink wrote: 

We found a severe bug in gwlib.

 

We have the following scenario:

 

A  calls debug("xxx",0,"xxxx") which does :

 

                gwlist_add_producer(writers);

 

 and continues but doesnt reach yet this line:

 

                gwlist_remove_producer(writers);

 

 

at this point the list "writers" is empty but has writers->num_producers=1

 

B does:

     lock(list); /* atomic lock */

 

        list->single_operation_lock->owner = -1;

        pthread_cond_wait(&list->nonempty,
&list->single_operation_lock->mutex);

 

so it waits until A is calling gwlist_remove_producer()

 

and wait until A completes.

 

 

Now A is calling this:

 

void gwlist_remove_producer(List *list)

{

    lock(list);

    gw_assert(list->num_producers > 0);

    --list->num_producers;

    pthread_cond_broadcast(&list->nonempty);

    unlock(list);

}

 

and gets locked up because the list's atomic lock is locked by B.

 

 

C now has a new debug message and gets stopped at  gwlist_produce().

 

 

In other words, every process who wants to write to debug log gets stuck.

 

Now there is different solutions to this.

Our approach would be to do in gwlist_consume() to do this:

 

 

  unlock(list);

        pthread_cond_wait(&list->nonempty,
&list->single_operation_lock->mutex);

        lock(list);

 

 

Any other ideas?

maybe no atomic lock around gwlist_remove_producer() ?

 

 

Andreas Fink

 

Fink Consulting GmbH

Global Networks Schweiz AG

BebbiCell AG

 

---------------------------------------------------------------

Tel: +41-61-6666330 Fax: +41-61-6666331  Mobile: +41-79-2457333

Address: Clarastrasse 3, 4058 Basel, Switzerland

E-Mail:  andreas at fink.org

www.finkconsulting.com www.global-networks.ch www.bebbicell.ch

---------------------------------------------------------------

ICQ: 8239353 MSN: msn1 at gni.ch AIM: smsrelay Skype: andreasfink

Yahoo: finkconsulting SMS: +41792457333

 

 

 





 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.kannel.org/pipermail/devel/attachments/20070416/598ef995/attachment-0001.html 


More information about the devel mailing list