Automatic charset determination problems
When answering emails, about 1 in 25 replies end up with obvious charset problems in the email that gets sent (for instance 4 nonsensical characters between sentences where there should be just two plain spaces instead). I've tried most of the day to figure this out, and sorry to have to ask for help on this. I'm a very long term Thunderbird user--since the days of Netscape.
The email example that I was testing today appeared to have plain text, and looking at the source (cntl-u), the charset is shown as "us-ascii", and the two spaces that are between sentences are just plain spaces. Looking at View/Text Encoding, TB identifies it as "Western". When sending a reply everything looks fine on my screen, but then the recipient gets the reply, there are goofy characters after the periods in all sentences (where there should be two spaces). Looking at the Sent File at my end, everything looks fine (no goofy characters). If I try sending the reply to myself and look at View/Text Encoding of the reply, it says "Western". If I go back to the original email and use the View/Text Encoding to force the charset to Unicode, and then answer the email, everything works perfectly--no goofy characters in the reply.
Again, looking at the reply in my "sent" folder, the reply looks fine--it's the reply that gets sent that has visual charset problems. this happens about once in 20 or 30 replies, and I don't know how to predict it. When an experienced user receives one of these goofed up emails, they are merely annoyed because it makes the email hard to read. But one customer that I was replying to saw all the goofy characters and was convinced that I had sent her a virus, and refused to email any more. (It cost us a $1100 sale).
I've attached a screen shot showing the reply window with all the goofy characters, and the View/Text Encoding showing Western, and a second image showing my charset settings. I don't know exactly what's going on with this. It obviously has something to do with charsets being incorrect. A solution would be wonderful. Thank you in advance.
N
Thank you in advance.
All Replies (7)
I wonder why your view and send default charsets are different? Not that they should matter, because if all were working properly, a charset declared in the message should override any default settings.
I always cringe when I see a Windows charset mentioned. I use Linux and Android devices and I don't expect them to have to cope with a proprietary Microsoft convention.
"Western" implies a left-right writing style, and the Latin (ABC…) alphabet. I don't think it's equivalent to a charset declaration. (When I first saw "western" I assumed it meant a font you might see in Deadwood Gulch, perhaps using wooden characters with bullet holes. See below.)
There have been many reports of these character mishaps recently, but I don't get them myself. I'd like to know what's different in my set-up that stops these. Whilst I may be using Stationery, or userContent.css to set font face and size preferences, these don't affect charsets.
Zenos trɔe
The view and send default charsets are where they are at the moment. I was experimenting with changing them, and found that they don't affect the issue. I normally keep them both at Unicode.
"Western" seems to be a TB abbreviation for Windows-1252 and Iso-8859-1. I don't know why the same shortened word would be used for both. Maybe they are similar enough to be interchangeable.
Yes, this charset prob;em has been going on (for me at least) for at least several months. It affects all of our systems (almost all are W7 systems, with varied hardware). You don't know that it is happening until you get a reply back from your reply, and you can see what you have been actually sending out. (Your own Sent file at your end looks fine).
Thanks,
N
So, is what you sent broken, or did your correspondent's mail client break it?
I know, it's easy to ask this question, not so easy to study it. Particularly if your clients assume you're trying to infect them.
Zenos said
So, is what you sent broken, or did your correspondent's mail client break it? I know, it's easy to ask this question, not so easy to study it. Particularly if your clients assume you're trying to infect them.
What I'm sending out is definitely broken. I mentioned in my OP that I can send the email to myself and I can see that the reply that I receive has charset issues. The customer that thought we were trying to infect her WAS seeing emails from us with charset issues, but of course we were't trying to infect her. I mentioned it in the OP to show that the issue IS causing problems for us.
Thanks,
N
Is gmail involved here at all? I've seen messages re-encoded (and occasionally broken in a manner similar to how you're describing) when being passed though a gmail provider.
Any pattern in the use of any particular email client (or server) for those correspondents whose messages are affected?
This is a hard one to study, given that it comes about when you reply to incoming messages. I suspect there is a server playing into this. I'm reminded of another user who has messages rejected because the server fails to process messages using an encoding (8-bit mime) it claims to support but in fact does not.
Zenos said
Is gmail involved here at all? I've seen messages re-encoded (and occasionally broken in a manner similar to how you're describing) when being passed though a gmail provider.
From what I can see, gmail is not involved with this.
If I post the source code of an email that demonstrates the problem when replying to it, would that help?
Thanks,
N
Maybe. In most cases the damage has been done and it's hard to unwind it to see what was there originally.
Are these messages composed in Thunderbird? Not pasted in from a word processor? I can't see any logical reason for one or two space character sequences being treated differently, by Thunderbird, at least. However, word processors do stuff like smart quotes, em-dash vs en-dash hyphens, automatic conversion from (c) to a copyright symbol and so on. It may be that a double space has an alternative single symbol representation.
If you'd prefer not to post here on a public forum, you can send me a sample privately at xenos @ gmx . co . uk