Interesting issue with DLRs
st at tolj.org
Fri Oct 20 16:06:15 CEST 2006
Ben Suffolk wrote:
> I have been running kannel for a month or so and its been great.
> I looked at the outstanding DLRs the earlier today and say a few, and
> identified some as phones that I know people are not using any more,
> hence no delivery. Thats fine, but then I noticed my number was in the
> outstanding DLRs, and after a bit of investigation I knew it was a
> message that I had received.
> Looking at the debug from the smsc logs (Im using SMPP, with postgresql
> as the DLR storage BTW) I see that what has been happening is that the
> DLR is actually coming in a fraction faster that the submit_sm_resp
> with the message ID in it. (Or at least the receiver thread is before
> the transmitter thread).
> This means the DLR is being ignored as its not in the table, then its
> gets created and put in the table immediately after. So its then
> outstanding, and of course the DLR callback is never run.
ok, interesting thing indeed... We need to discuss here if this is a logical PDU
flow "problem" of the SMPP SMSC, or even if we (kannel) misbehave in terms of
how threads are processing... But (!) receiver thread inside the smsc_smpp.c
module handles all PDUs from SMSC. So, if DLR (deliver_sm or data_sm) arrive
before the submit_sm_resp, then I assume this is a logical misbehaviour of SMSC.
> I wonder if a) anybody else has come across this, or b) you can think
> of any good ways to make sure the DLRs are not lost. e.g. maybe we
> store them, and then when we create them we can see the status has
> already been updated and trigger the callback?
hmmm... good point. I face also some connectivity issues when connecting 2
independant SMPP client systems with the same SMPP upstream account. Kannel
receives DLRs for which it has no temp data in DLR storage and hence "discards"
the DLRs without any meaningfull processing.
We may put any receiving DLRs that we can't match in teh "DLR MT" storage table
to the "DLR MO" storage table. Hence run 2 tables. When we insert into "DLR MT"
table at the point we receive submit_sm_resp, we may check that there is no
existing entry in "DLR MO" table. If there is, then we have already received a
DLR for this MT message.
This solves 2 issues:
a) DLR MO tables holds any DLRs that can't be resolved... that means external
applications can "fetch" the DLRs from DLR MO table to process further on.
b) "race conditioning" between submit_sm_resp with message id and DLR itself can
be hooked together, so we get the usual HTTP callback even while SMSC sends DLRs
Opinions by the others for this approach?
> I suspect its because I am connected directly to an operator as opposed
> to an aggregator that I am having this occasional (about 30 messages in
> 600 over 7 days approx) issue.
> I should also say that I set-up and did the operator integration
> testing with 1.4.0 as 1.4.1 was not out at the time (came out a couple
> of weeks after), so my live service is currently running 1.4.0. I will
> upgrade, but first need to be sure of the effects of the upgrade, as
> obviously having been thought the integration testing I need to be
> careful about using a different version thats does something unexpected
> to the connection (in which case I would be in danger of loosing the
> operator connection).
1.4.1 has limited COMPATIBILITY BREAKERS, Please check the NEWS file section for
the 1.4.1 release which will indicate any serious changes.
In any circumstances 1.4.1 is way BETTER and more RELIABLE then 1.4.0.
> So if you think this issue is only with 1.4.0 then no problem, but I
> could not see anything in any of the release notes that suggest this
> has been identified before.
I don't think this is an issue for 1.4.0 only, regarding the DLR handling issue.
This will be definetly also an issue for 1.4.1 and CVS HEAD.
Kölner Landstrasse 419
40589 Düsseldorf, NRW, Germany
tolj.org system architecture Kannel Software Foundation (KSF)
More information about the devel