Ticket #53 (new enhancement)
Remote encoding...
| Reported by: | Clojster | Owned by: | jelmer |
|---|---|---|---|
| Priority: | wishlist | Milestone: | |
| Component: | OSCAR | Version: | 1.0 |
| Keywords: | charset oscar | Cc: | |
| IRC client+version: | Client-independent | Operating System: | Linux |
| OS version/distro: |
Description
Hi, it'd be great if BitlBee will support changing remote charset. Because not all users on ICQ uses UTF-8, so conversion to my localcharset produces messy text. To be more specific - I'd like to change charset in which the BitleBee will receive messages. (CP1250 in my case as most of my friends uses windows). Another level would be to change remote charset per user/group etc...
Attachments
Change History
comment:2 Changed 6 years ago by Clojster
Yes, I agree that this is OSCAR problem... And "no". On windows clients are sending messages in CP1250 (In my country where Czech is default language) but they can receive messages in UTF-8 with no problem. The best thing would be, if you can do this: bitlbee detects encoding in which the message has been sent and according to that it will convert it to local charset. But I don't know if it's even possible... do OSCAR protocol send some info about encoding in each message?
comment:3 Changed 6 years ago by wilmer
- Keywords charset oscar added
- Owner set to jelmer
Oops, yes, it seems I didn't read your report very well. In that case, it should be possible for the ICQ code to recognize the charset and convert it to UTF-8 (which is the internal charset for BitlBee) automatically. I hope Jelmer will be able to figure this out from the right specs? :-)
comment:4 Changed 6 years ago by Clojster
Wow, that would be GREAT! Now I got weird message... One of my contacts wrote me something which produced that messy text (he uses standard Mirabilis ICQ5) and last word of that message was the encoding. But I don't know why was it there because it appeared only in that one message... The message was as follows: "no vidÃm, Âe u tì tak napñl majà [cp1250]" Correct message would be: "no vidím, že už tě tak napůl mají" I don't think this really helps, but... whatever...
comment:5 Changed 6 years ago by wilmer
Okay, I just asked Jelmer about this, I'll post it here FYI:
15:04:27 jelmer| wilmer: Yeah, but I'll postpone that until I've got the
Win32 port up and running.
15:04:42 jelmer| wilmer: I'd like to get it right in my oscar rewrite
rather then fixing it in the current implementation.
So I hope you can wait for just another while, and it'll work! :-)
comment:6 Changed 6 years ago by Clojster
Well, what else can I do than "wait" :)) But I'm glad it will work someday. BTW: You guys are doing great work! This is what I was looking for for a long time. And as soon as you implement those "groups" features and filetransfers, this piece of software will be flawless :) Keep up great work and I hope to see new version with correct oscar charsets soon ;)
comment:7 Changed 6 years ago by wilmer
BTW, if you don't feel that much like waiting, there might be a temporary solution, at least if you use only ICQ. You can disable charset conversion by setting charset to none, and then BitlBee shouldn't do any translation at all. Then just talk cp1250 (IIRC?) to BitlBee and BitlBee will just pass it as-is.
I'm not sure if it'll work, but it might just be a solution for now. Good luck!
comment:8 Changed 6 years ago by Clojster
Thanks for advice, but I think I'd rather wait... Because if I understand it well, I will have to change terminal fonts to some CP1250, locales to 1250 etc... or am I wrong?
comment:9 Changed 6 years ago by anonymous
hi! i wrote the little patch for bitlbee, which allows to set remote encoding and recode the message if it's not in unicode.
patch adds new set-variable "oscar_recode_charset", which controls recoding behavior (original code just uses iso88590-1).
so, for cp1251 u can simply type set oscar_recode_charset cp1251
i successfully tested it with irssi and russian cp1251 encoding.
see attach, and thanks for such a great piece of software!
WBR, Alexey "waker" Yakovenko <waker@…>
Changed 6 years ago by waker@…
-
attachment
bitlbee-recode.diff
added
patch which adds recoding capabilities to bitlbee
Changed 6 years ago by wakeroid@…
-
attachment
bitlbee-recode.2.diff
added
patch which adds recoding capabilities to bitlbee (updated)
comment:10 Changed 6 years ago by wilmer
Hmmm, nice. Wouldn't it maybe be nice to also make this patch somehow send a flag to indicate the charset used to encode the message? I don't know how easy this is though, I haven't read the OSCAR "specs" very well yet...
Implementing this would probably be easier in the storage-xml branch by the way, since it adds support for per-account settings. So then you can just type something like "account set oscar/charset CP1250" and you're done. And you can set it per-account, if you want.
comment:11 Changed 6 years ago by waker
unfortunately i don't have enough expertise in bitlbee hacking to implement such stuff.
hovewer the good news is that patch performs extremely well for me. i tested it with miranda, icq2003a (mirabilis client), qip and centericq, and the only bugged client was &rq which cant _recieve_ utf8 text by default, though it should be possible to fix that using &rq's settings, and it's unrelated to job of my patch.
question is why would one need different oscar charsets for 2 accounts on same machine? though it can be done easily i think..
comment:12 Changed 6 years ago by wilmer
With storage-xml it's the easiest. I try to keep BitlBee-wide settings completely out of the IM-modules (there are only some references to the debug setting at some places) and instead introduced per-account settings in that branch.
One advantage of having different charset settings per account could be so that you could, if necessary, have a separate account with a different charset for people who use a different charset.
Sure, it's hackish, but isn't having to use different charsets for different people hackish in general? ;-)
I'll give a shot at a storage-xml port some day then (not too much development time for the next few weeks though, unfortunately).
comment:13 Changed 6 years ago by anonymous
i've added support for recoding outgoing messages, and going to add recoding of offline messages (broken too). will post diff in 1-2 days.
after that i'll test it for some days, and if it'll work i gonna checkout latest cvs and experiment with xml-whatever branch (really hate xml! why u wanna use it?!)
comment:14 Changed 6 years ago by wilmer
Sounds good!
And for XML, it's a pretty decent format for this kind of things. Users usually won't have to edit the files by hand (I'm not a big fan of editing XML-conffiles by hand myself either) so it doesn't matter that much.
And also XML is pretty easy to parse because there are enough parsers available. It's certainly (in many ways) a huge improvement over the old format.
comment:15 Changed 6 years ago by waker
here we go.. updated patch for recoding, recodes both incoming and outgoing messages as well as offline messages.
Changed 5 years ago by waker
-
attachment
bitlbee-recode-0.5.diff
added
partial support for recoding offline messages (see http://bugs.bitlbee.org/bitlbee/ticket/221)
comment:16 Changed 5 years ago by Clojster
Wow, that's great to see that someone is actually doing something about this... It would be great though, if you guys added this patch to the next release... what do you think?
Changed 5 years ago by darkk
-
attachment
8bit-charset.diff
added
patch for configurable encoding for icq accounts
Changed 5 years ago by waker
-
attachment
bitlbee-recode-0.6.diff
added
added recoding of user info for 1.0.3
comment:17 Changed 4 years ago by anonymous
Could somebody please port this patches to current 1.1.1dev? Thank you in advance!
comment:18 Changed 4 years ago by newman
Please make it (=bitlbee-recode) someone working with recent 1.1.1dev version, thanks.
comment:19 Changed 4 years ago by newman
Attaching patch for v1.1.1dev. It's not possible to change oscar_recode_charset via set (set oscar_recode_charset iso-8859-2), it's hardcoded in patch.
Until fixed, do
%s/cp1250/iso88590-1/g
on patch, for example.
The problem is
assam bitlbee-1.1.1dev-new # make
- Compiling irc.c
irc.c: In function 'irc_new':
irc.c:112: warning: passing argument 4 of 'set_add' from incompatible pointer type
make[1]: Entering directory `/usr/src/bitlbee-1.1.1dev-new/lib'
- Compiling misc.c
please, anyone fix it. I was not able to figure it out.
Changed 4 years ago by newman
-
attachment
bitlbee-recode-0.6.1.diff
added
patch against 1.1.1dev, not fully functional (regression)
comment:20 Changed 4 years ago by newman
tested finally. hardcoded charset works as intended. setting via set is not possible, please fix.
comment:21 Changed 4 years ago by newman
Running for several weeks and seems OK to me. Once happen, after some weeks of continuous run, Bitlebee stopped in encoding, restart of service did the job.
comment:22 follow-up: ↓ 24 Changed 4 years ago by wilmer
Hmm, instead of having a hardcoded setting, this patch should be able to use per-account settings now. Actually I should probably apply the patch to the main tree like that.
comment:23 Changed 4 years ago by wilmer
BTW, is it a good idea to set the AIM_IMFLAGS_ISO_8859_1 flag while in fact the message isn't really coded in that charset?
comment:24 in reply to: ↑ 22 Changed 4 years ago by newman
Replying to wilmer:
Hmm, instead of having a hardcoded setting, this patch should be able to use per-account settings now. Actually I should probably apply the patch to the main tree like that.
Yup, right. See the warning while compiling
- Compiling irc.c
irc.c: In function 'irc_new':
irc.c:112: warning: passing argument 4 of 'set_add' from incompatible pointer type
-- it should be The problem/hardcoding but I didn't know that time how to fix it.
Replying to wilmer:
BTW, is it a good idea to set the AIM_IMFLAGS_ISO_8859_1 flag while in fact the message isn't really coded in that charset?
I really do not know, just recoded the patch to patch and compile clean, I'm not that familiar with the code.
Please report back when patch pushed, so I can check out recent bzr.
comment:25 Changed 4 years ago by newman
was'up? is it already in upstream?
comment:26 Changed 4 years ago by newman
So what? Please review and push this patch into upstream.
comment:27 Changed 4 years ago by wilmer
Your attitude is broken. Please review and push it into your brain.
(And yes, this will happen at some point.)
Changed 4 years ago by newman
-
attachment
bitlbee-recode-0.6.3.patch
added
Runs OK with bitlbee-1.2.3, if anyone else, after three years, still cares...
comment:28 Changed 4 years ago by newman
Sorry for the previous tone, but it's frustrating to have it for three years unfixed. Patch for recent version attached.
comment:29 Changed 4 years ago by wilmer
The patch still touches BitlBee's irc structure to read this setting, protocol modules really should use their own set_t now... I may try to do this myself, but don't know when I'll have time for that.
comment:30 Changed 14 months ago by anonymous
I ported the latest patch by newman to the current bzr version. But I won't built:
- Compiling oscar.c
oscar.c: In function ‘get_oscar_recode_charset’: oscar.c:978:26: error: ‘struct im_connection’ has no member named ‘irc’ make[2]: * [oscar.o] Error 1 make[2]: Leaving directory `/home/virus_found/abs/bitlbee/src/bitlbee-build/protocols/oscar' make[1]: * [oscar] Error 2 make[1]: Leaving directory `/home/virus_found/abs/bitlbee/src/bitlbee-build/protocols' make: * [protocols] Error 2
comment:31 Changed 14 months ago by Wilmer van der Gaast <wilmer@…>
Try replacing it with bee. Structs got moved around a little bit.
comment:32 Changed 14 months ago by anonymous
Thank you, builds fine now. But I've yet to test it. If someone is interested, an applicable to bzr, but untested patch is here - http://sprunge.us/YDOI

Most likely this is only a problem with OSCAR, so I'll reassign this. Per-buddy will be extremely nasty, I hope we can avoid that. So not all recent ICQ clients support Unicode (UTF-16, actually, not UTF-8) yet?