close Warning: Failed to sync with repository "(default)": [Errno 12] Cannot allocate memory; repository information may be out of date. Look in the Trac log for more information including mitigation strategies.
Modify

#785 closed defect (fixed)

BitlBee can't establish SSL connections with NSS, works when starting via "bitlbee -D -n"

Reported by: dskulinska@… Owned by: dx
Priority: major Milestone:
Component: BitlBee Version: 3.0.1
Keywords: jabber, login error patch Cc:
IRC client+version: Operating System: Linux
OS version/distro: Fedora 14

Description

I am running a local bitlbee server version 3.0.1 on a GNU/Linux system (Fedora 14).

Three accounts have been set up for a user (ICQ, Jabber, Google Talk). When BitlBee is run in daemon mode as local user "bitlbee", the ICQ login succeeds, but both XMPP logins (Jabber, GoogleTalk) fail:

@root | jabber(XXX@XXX) - Logging in: Connected to server, logging in

@root | jabber(XXX@XXX) - Logging in: Converting stream to TLS

@root | jabber(XXX@XXX) - Login error: Could not connect to server

@root | jabber(XXX@XXX) - Logging in: Signing off..

However, when BitlBee is run via "bitlbee -n -v -D" (for debugging), all accounts are successfully connected, using the same configuration for the user (explicitly dictated via "/etc/bitlbee/bitlbee.conf").

I find it odd that starting the BitlBee server with an option ("-n") that should not affect the configuration leads to a different result. If this problem is due to my own ignorance, I would welcome any hints and pointers, otherwise I would be happy to help to resolve this issue on my own. I did find it difficult, however, to track down the problem due to the lack of debugging output. Am I missing something?

Regards

Dasia

Attachments (2)

deferred_ssl_init.patch (1.6 KB) - added by dskulinska@… at 2011-04-24T02:44:37Z.
deferred call to ssl_init() and otr_init() until after fork()
Fix-the-NSS-init-after-fork-bug-and-clean-up-lies.patch (2.5 KB) - added by dx at 2014-07-09T11:34:47Z.
A hopefully working and secure version of the patch!

Download all attachments as: .zip

Change History (22)

comment:1 Changed at 2011-04-23T20:43:32Z by wilmer

XMPP failing and ICQ working may be because ICQ doesn't need SSL anywhere while your Jabber logins seem to go wrong at the STARTTLS stage.

I do indeed also find this very odd. And without more debugging info from the SSL layer I don't know what to do.. So -n is the only flag you changed? Which SSL lib is this BTW?

Changed at 2011-04-24T02:44:37Z by dskulinska@…

Attachment: deferred_ssl_init.patch added

deferred call to ssl_init() and otr_init() until after fork()

comment:2 Changed at 2011-04-24T02:49:33Z by dskulinska@…

Thank you for the quick reply and pointing towards the SSL library, which turned out to be a fruitful hint. The "-n" flag was indeed the only flag changed (I used to add "-v", but it did not produce any more log output).

I am using bitlbee 3.0.1 as packaged for Fedora 14 (x86_64 architecture), which links against libnss3.so from NSS version 3.12.9.

I have checked out bitlbee branch "-r 3.0.1" from the repository and built two versions,

  • bitlbee-3.0.1-nss, configured with "--ssl=nss", linking against libssl3.so from NSS 3.12.9, and
  • bitlbee-3.0.1-gnutls, configured with "--ssl=gnutls", linking against libgnutls.so from GnuTLS 2.8.6.

The reported problem can be reproduced with bitlbee-3.0.1-nss, i.e. Jabber logins fail when forking into background (run with "-D"), but work in "non-fork mode" (giving "-n -D" on the command line). Interestingly, bitlbee-3.0.1-gnutls works in both cases.

When running bitlbee-3.0.1-nss in daemon mode, the call to SSL_ForceHandshake() in ssl_connected() (lib/ssl_nss.c) fails. Further investigation revealed that there might be a problem regarding NSS and the use of fork():

I am not familiar at all with the various SSL libraries or their initialization process, so I am not sure if these are at all related. A quick patch (see attached file) that moves the call to ssl_init() further down until after fork()ing into background seems to solve at least part of the problem. Now the Google Talk login succeeds, but the login to a second account (a private server running ejabberd) fails with a timeout (after requesting the buddy list -- the initial SSL handshake succeeds, however).

Regards

comment:3 Changed at 2011-05-01T14:13:22Z by wilmer

Ah, thanks for tracking that down!

I wonder if this fixes the problem for ForkDaemon mode? I suppose not..

comment:4 Changed at 2012-05-06T16:52:48Z by castaway@…

I'm having a similar issue with 3.0.5 right now, this is using gnutls (not nss), built on gentoo with: USE="gnutls jabber msn oscar plugins ssl yahoo -debug -ipv6 -libevent -nss -otr -purple (-skype) -test -twitter -xinetd"

I previously had 3.0.3 installed which did the same thing (same options) .. it seems to eventually, after about 10 rounds of connecting, disconnecting, figure it out and stay connected:

17:43 <@root> fb - Reconnecting in 5 seconds..
17:43 <@root> fb - Logging in: Connecting
17:43 <@root> fb - Logging in: Connected to server, logging in
17:43 <@root> fb - Logging in: Converting stream to TLS
17:43 <@root> fb - Logging in: Connected to server, logging in
17:43 <@root> fb - Logging in: Authentication finished
17:43 <@root> fb - Logging in: Server changed session resource string to 
              `BitlBee_f0e1effa_4BF60DA422B82'
17:43 <@root> fb - Logging in: Authenticated, requesting buddy list
17:43 <@root> fb - Logging in: Logged in
17:43 -!- Netsplit over, joins: <names>

17:43 <@root> fb - Error: Error while reading from server
17:43 <@root> fb - Signing off..
17:43 -!- Netsplit localhost.localdomain <-> chat.facebook.com quits: 
          <names>
17:43 <@root> fb - Reconnecting in 5 seconds..

etc, around and around

comment:5 Changed at 2012-05-27T17:41:00Z by anonymous

I can confirm that the issue exists in 3.0.5 (3.0.5 from EPEL), and that running with the -n -D switch fixes the issue.

Using Daemon or ForkDaemon does not seem to make a difference.

NSS is the default SSL library on Fedora and RHEL based distributions, so there are likely to be a number of affected users.

comment:6 Changed at 2012-07-10T19:58:45Z by Matěj Cepl <mcepl@…>

Please distinguish between using Fedora patches and NSS (if you don't have Fedora packaged bitlbee, you most likely don't have NSS), patches in #714 have not been merged yet and they need first review from RH security people (and apparently rewrite).

Also, if you have Fedora package, the only supported mode of running bitlbee is with the init scripts provided with the package. Doing anything else will most likely hit SELinux and others. If you use bitlbee in other mode on Fedora, then please don't file bug reports to bugzilla.redhat.com.

comment:7 Changed at 2013-07-28T15:53:09Z by anonymous

I have the same issue with 3.2. Bitlbee is compiled with --ssl=gnutls.

comment:8 Changed at 2014-02-04T06:14:13Z by dx

Keywords: patch added

comment:9 Changed at 2014-02-11T13:17:05Z by dx

So it turns out, in the last week two people using CentOS 6.5 and RHEL 6.5 joined #bitlbee about this exact issue, although with different error messages (Unusuable response (sic) with facebook OAuth2, and Error during Passport authentication: Error while writing HTTP request with MSN, both of them HTTPS)

And we discovered the -n workaround when trying to debug it, too.

And then we noticed it was a NSS issue.

And then I pretty much replicated the research that the OP did in comment:2, and did a bunch of pointless debugging (Fun fact: debug builds of NSS do a CHECK_FORK() assert that does abort() when they detect a fork, instead of failing silently).

And then I realized that this wasn't a NSS bug and we were really doing it wrong.

And then I came up with a patch that looks very similar to the one posted here.

And then I noticed that we have this goddamn bug, with major priority, that I also tagged as "patch" seven days ago but somehow managed to avoid reading.

SIGH.

Sorry for the massive delay, redhat users!

Anyway, the patch posted here does solve it for daemon mode, but will fail for forkdaemon. Libpurple had a similar issue with forks, so it does its initialization in irc.c. I think it's an appropriate place to put both otr and ssl init, but will need to test some more.

Also, comment:4 and comment:7 use gnutls, those are unrelated issues.

comment:10 Changed at 2014-04-10T17:59:35Z by dx

Summary: BitlBee fails loggin into Jabber when started as daemon, works when starting via "bitlbee -n -v -D"BitlBee can't establish SSL connections with NSS, works when starting via "bitlbee -D -n"

Changed subject to make it clearer that this is *the* nss bug

comment:11 Changed at 2014-05-18T06:36:14Z by mcepl@…

I am here.

comment:12 in reply to:  11 Changed at 2014-05-18T06:36:55Z by mcepl@…

Replying to mcepl@…:

I am here.

Hmm, I am not, how to get myself to CC?

comment:13 Changed at 2014-05-18T07:00:40Z by Matěj Cepl <mcepl@…>

Couple of questions:

  • if there are hords of CentOS/Fedora users with broken bitlbee, why I don't have anything in bugzilla.redhat.com?
  • we have in EPEL-6 bitlbee-3.2.1-3 so I would be more interested in crashes reproduced with that. Could somebody who can reproduce this test with the updated EPEL package could try also this experimental build (in next 14 days or so, then it gets garbage collected), please?

comment:14 in reply to:  13 ; Changed at 2014-05-18T17:25:25Z by dx

Replying to Matěj Cepl <mcepl@…>:

Couple of questions:

  • if there are hords of CentOS/Fedora users with broken bitlbee, why I don't have anything in bugzilla.redhat.com?

I already replied on irc so copypasting here for completeness:

BitlBee+nss only breaks when using forkdaemon or daemon without the -n parameter - centos only ships xinetd units, while fedora's systemd service unit starts it with -D -n (the exact workaround, but might be a coincidence) and the systemd socket unit is like xinetd, I think.

So in practice this issue only happens when someone decides to ignore what your packages include, and start it with "bitlbee -F" or "bitlbee -D"

/end copypasta

Also, just checked the source of bitlbee-3.2.1-3.el5 - the patch named bitlbee-systemd.patch changes the systemd service unit to use -D -n instead of -F -n - but it might be just so the unit can be turned into a Type=simple one, which is why i call it 'coincidence'. Fun stuff.

Replying to Matěj Cepl <mcepl@…>:

  • we have in EPEL-6 bitlbee-3.2.1-3 so I would be more interested in crashes reproduced with that.

I don't redhat but i've already reproduced this in my arch linux with 3.2.1 / current bzr, just by enabling nss. Not a crash, btw. Only ssl connections dying silently and returning weird-ass error messages.

Replying to Matěj Cepl <mcepl@…>:

Could somebody who can reproduce this test with the updated EPEL package could try also this experimental build (in next 14 days or so, then it gets garbage collected), please?

Took me a while to find out how to get a source package out of that link (or any package for that matter. had to pick one of the architectures in descendants, and there are package links in the output section).

That seems to be the exact patch posted in this ticket, which, as i said in comment:9, fixes daemon and breaks forkdaemon.

Will attach fixed patch soon™

comment:15 in reply to:  14 Changed at 2014-05-19T07:28:28Z by Matěj Cepl <mcepl@…>

Replying to dx:

So in practice this issue only happens when someone decides to ignore what your packages include, and start it with "bitlbee -F" or "bitlbee -D"

The main reason why we prefer -D on Fedora (and consequently RHEL/EPEL) is that it works way better with SELinux (bitlbee is tightly confined on Fedora and it is much more simple to write a policy for one process than for randomly popping up ones). We will add README.Fedora explaining this to the Fedora/EPEL package.

https://bugzilla.redhat.com/show_bug.cgi?id=1098801

Also, just checked the source of bitlbee-3.2.1-3.el5 - the patch named bitlbee-systemd.patch changes the systemd service unit to use -D -n instead of -F -n - but it might be just so the unit can be turned into a Type=simple one, which is why i call it 'coincidence'. Fun stuff.

Yes, this issue happens only when somebody changes Fedora default configuration (which works perfectly as far as I know) without understanding consequences (SELinux issues and this bug are only I can recall at the moment, but I cannot guarantee there are no others).

I don't redhat but i've already reproduced this in my arch linux with 3.2.1 / current bzr, just by enabling nss. Not a crash, btw. Only ssl connections dying silently and returning weird-ass error messages.

Cool, thank you.

That seems to be the exact patch posted in this ticket, which, as i said in comment:9, fixes daemon and breaks forkdaemon.

Will attach fixed patch soon™

Thank you.

Changed at 2014-07-09T11:34:47Z by dx

A hopefully working and secure version of the patch!

comment:16 Changed at 2014-07-09T11:40:15Z by dx

Okay that "soon™" took a bit too long.

Copy-pasting from the commit message (included in the patch):

This might look like a simple diff, but those 'lies' made this not very straightforward.

The NSS bug itself is simple: NSS detects a fork happened after the initialization, and refuses to work because shared CSPRNG state is bad. The bug has been around for long time. I've been aware of it for 5 months, which says something about this mess. Trac link: #785 [this ticket]

This wasn't a big deal because the main users of NSS (redhat) already applied a different patch in their packages that workarounded the issue somewhat accidentally. And this is the ticket for the 'lies' in unix.c: #1159

Basically a conflict with libotr that doesn't happen anymore. Read that ticket for details on why ignoring those comments is acceptable.

Anyway: yay!

comment:17 Changed at 2014-07-14T16:41:37Z by dx

Owner: set to dx
Status: newaccepted

Patch submitted, setting as 'accepted'.

comment:18 Changed at 2014-07-14T20:16:58Z by robert@…

FYI: bitlbee-3.2.2-2.fc22 is shipping this patch and switches from daemon to forkdaemon mode; older Fedora releases (Fedora < 22) are still at daemon mode for the moment.

comment:19 Changed at 2014-07-22T20:11:27Z by robert@…

As of today, bitlbee-3.2.2-3.fc19, bitlbee-3.2.2-3.fc20, bitlbee-3.2.2-3.fc21 and bitlbee-3.2.2-3.fc22 are shipping this patch to unbreak NSS. However Fedora < 22 are still at daemon mode (and without libpurple), only Fedora >= 22 is forkdaemon (and with additional libpurple).

comment:20 Changed at 2014-09-27T14:35:31Z by dx

Resolution: fixed
Status: acceptedclosed

Applied in bzr rev 1038

Modify Ticket

Action
as closed The owner will remain dx.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.