One of the on-going behind-the-scene problems we have had here at ChristianBlog.Com over the last year or so has been dealing with odd characters that show up within Blog Entries. The vast majority of the time these odd characters show up it is due to our members who write their Blog Entries in a program such as Microsoft Word, OpenOffice, Apple Pages or TextEdit, or similar word processor applications.
We have spent in excess of a dozen hours just trying to develop a method to properly handle all of the idiosyncrasies of each of the mainstream word processors and how the interact with website software such as ChristianBlog.Com and at time it still appears as if we are unable to 100-percent accomplish the goals of properly converting the word processor characters to web characters.
Many of you are probably use to seeing the weird little character around the website from time to time. Getting rid of that has been a huge problem for us. Properly converting it to whatever the actual/intended character was has been an even bigger challenge for us.
Earlier today we decided to take a rather aggressive approach to solving the conversion process. We have decided to back-track and rather than trying to solve how to convert each little odd-character that we encounter, to taking a more aggressive approach by defeating those issues from the very get go.
For the better part of 2009 ChristianBlog.Com has been using the internationally recognized "UTF-8" character coding system, which converts data from whatever character/language format you might be using to a standardized format. In doing that we were able to solve about 95-98% of the problems with weird characters from showing up. We have forced the UTF-8 system via the website encoding method defined within the header of our HTML. We have also forced our software pages to run under the UTF-8 charset, though this is not yet implemented throughout our entire website.
What we put into place earlier today, on a trial basis, is to convert the entire Blog Message text to a UTF-8 character set. Doing this requires a bit more processing on our end, and a slight increase of page load time (we guesstimate it to be about 0.0002 increase of page load time) each time you view a Blog Entry.
By converting the actual Blog Message Text to the UTF-8 Character set we believe we should be able to resolve 98-100% of the issues of unintended weird characters showing up. However, we are not positive of this so we will be closing watching to see if this change does have any negative affects. We do know that some programs may still result in the weird little character showing up. Specifically we know that Microsoft's Word includes several characters in the range 0x80-0x9F whose codepoints in Unicode do not match the byte's value of UTF-8 (in Unicode, codepoints U+80 - U+9F are unassigned). Because UTF-8 simply assumes the bytes integer value is the codepoint number in Unicode it may cause problems. Additionally, the euro sign, curly quotes, em dashes, are those characters that we are hoping our new method will prevail, however, UTF-8 converts 0x80 into U+0080 (an unassigned codepoint) rather than U+20AC. If this does become an issue we may have to add further converting systems to convert 1252 character type (what MS Word uses) to UTF-8.
We are also wanting to ask the ChristianBlog.Com Community to please notify us, by responding to this Blog Entry or contacting technical support, should you encounter any odd or weird characters in the hours, days and weeks ahead. With the growth of the website is has become very hard for us to review every single bit of content that is published for these type of things, so we could really use the help of our Community to notify us if you encounter such weird characters anywhere.
What we would also like to ask our members to do, is to be more aware of what is presented on the "Preview" page when you are creating a new Blog Entry. One-hundred percent of the issues concerning weird little characters could be nullified if our members were to take the time to properly review their Blog Message on the "Preview" page, when creating a new Blog Entry. A few additional seconds on the part of our members to try to remove these odd characters is something that will benefit both themselves and all of our members and visitors. It is so easy to fix on your end, yet so hard for us to fix on our end. Simply removing the odd character, or replacing them with a character on your keyboard, is all it will take to ensure we have a website free of those pesky little weird characters!
Many thanks once again, John. The amount of time that you put into CB, and keeping it well oiled, is so appreciated.
I have noticed these funny little characters from time to time.
May God continue to bless you, as You labour on for Him.
One problem, and I do use Microsoft Word, is that the pesky little characters DO NOT show up on the preview page. They wait to appear on the posted blog. The other problem is that long periods go by with no problems whatsoever and then suddenly they show up usually with the ', and especially the ". When I post a blog with bunches of these characters I immediately edit it and replace the " with ' and it usually works. I have found no way to correct the ' in a contraction though.
I am all for correcting the problem on the preview page but these marks NEVER show up on it for me. Any suggestions?
Thanks for all your attempts to correct this problem and this site is by no means the only one with them. I have seen many other sites with identical strange marks. One would think the people who make programs such as WORD would make it easier on us to use it. No such luck evidently.
I have noticed this sometimes happens if you cut and paste something like a passage of scripture from Biblegateway. That too seems to be intermittent.
When I first came to CB, I know it was hard to write a blog within the text box for it would time out within a few minutes. That doesn't seem to happen anymore. I type all of my blogs right within the text box rather than use a word processing program.
I know that is not always possible but I would like to point out that if you are working on a blog within the text box and you can't complete it at that time, it will show up in "my blogs" as an incomplete blog and you can click on edit and complete it at a later time.
I just wanted to report that in the blog I just posted I was able to see all the strange capital A's in the preview and correct them. Every single time I used a ' or " there was a A with a curve over the top of it. Once omitted, the final product was fine.
I too have been having problems with the funny A with a tilde over the top. In fact, today after writing my comment up using MS Word (for another blog) I copied and pasted the whole thing into CB and did the preview thing. Of course I ended up with all those funny A characters, but what was really aggravating was that even AFTER correcting I ended up going back about 10 times (literally) and removing the exact same character I had just removed. It appears that at least one of the characters was invisible to me until previewed above and I had a very hard time deleting it. (This all took maybe 20 minutes for one comment.)
By the way John, I have found that if I save my blog in MS Word first as a TXT file and then copy it to CB I will not have these problems. Can CB turn our copied text automatically into TXT, or is that what it does already?
I sure wish I had used the TXT thing today. I just forgot. :(
I think I got that stupid [wiki]a-tilde[/wiki] fixed... very very sorry you guys had to spend so much time on it.
Now, be careful because what I did was simply strip it out... I did not convert it over to anything else.
So, if some of were are getting that instead of an apostrophe, be aware that it [u]might[/u] strip both the a-tilde and the apostrophized. I do not have MS Word installed so I cannot test this, so please, let me know what behavior it does do and I'll try to fix it ASAP!
Sorry you had to spend to long of a time trying to fix this thing lineman.
I realize it is a pain in the backside to have to even deal with manually changing the a-tilde to apostrophes and quotes... so, once we get this a-tilde issue worked out, I will try to see what I can to do make it so you guys don't have to spend any more time doing the conversion on your end, manually.
Hi, John, I ran into this today, this being the first time I had composed a private message to several friends, wanting them all to have the same information. I caught the anomaly in the first one, manually took out the funny stuff, copied the corrected material and went on from there. Incidentally, I sent you a copy of this information, then noticed it may be addressing a private prayer request I had not shared earlier. Let me know privately if that is the case. Also, I will watch this thread to try to see how your efforts play out. Blessings, Possum
is there any current processor(s), other than actually typing the whole thing in the blog entry, that one could work in, that you are aware of, that does convert 100%? wordpad, works, word, whatever????
Sorry John, I did not mean to be such a "whiner". And all is well because you and I are being lead by a GREAT and WONDERFUL God who allows things like this to happen for a reason.
Blessings my brother!
I have been watching this the last 24-hours pretty closely and I have noticed a few have snuck through, but they end up never getting displayed on the website.
Can anybody confirm that you have seen any weird characters over the last twenty-four hours?
John B. Abela
I ¢ Â€ Â™m going to try to use a lot of ¢ Â€ Âœ ¢ Â€ Â˜ and such here in MS Word just to see if ¢ Â€ Âœ any turn into the funny A. I ¢ Â€ Â˜ personally have not tried to use Word ¢ Â€ Â˜ to post any comments partly because I have ¢ Â€ Âœ very little ¢ Â€ Â˜ ¢ Â€ Â™ ¢ Â€ Â™ ¢ Â€ Â™ ¢ Â€ Â™ ¢ Â€ Â™ ¢ Â€ Â™ time.
Last time I used Word (a week ago?) I ended up with maybe 7 A ¢ Â€ Â™s for each apostrophe. Sometimes less, sometimes more and evidently I could not see every one as I could not seem to totally delete all of them unless I ¢ Â€ Â˜ ¢ Â€ Â™ ¢ Â€ Â™ ¢ Â€ Â™ ¢ Â€ Â™ ¢ Â€ Â™ deleted the character before and behind the A. ( ¢ Â€ Â˜ ¢ Â€ Â™ ¢ Â€ Â™)
Now I will see what this does on CB.
PS. I see zero capital A's but I do see several lowercase a's which have turned into a cent sign. Lowercase a in this area of CB, when previewed, turn into a cent sign. (I used both apostrophe's and quotation marks.)
John, I saw zero diamonds with ? in my preview, but everything else appears to be the same. It may be that the diamonds were the quotation marks. Would it help if I sent you a screen shot of MS Word?
I would just like to point out that after posting a blog, we have the option to see it. I always try to go and view my blog immediately after it posts to the website even though I have already previewed it. That way I can check it once again to make sure there are no strange characters, narrow margins due to bbcode errors (on my part):wink:
I think it is a good habit to get into.
Well, today I used MS Word to create a new blog (because it will do spelling and grammar checks), then copied all the text and put it into WordPad, then copied it again and pasted it here in CB. Whew, but it worked! I only saw one funny A in a quote from BibleGateway.com and even that disappeared when I did the preview thing. (Nice job John!)
Well it looks like my most recent attempts to fix this issue have worked.
Lets see how long it will last this time!
Please report back if anybody comes across any more weird/odd characters!
John B. Abela
Just worked up a blog using MS Word and then copied and pasted it directly into CB. Everything worked very well, except...I found that all my double quote marks were turned into apostrophes (single quote marks). When I added a word with " (double quotes) while editing here on CB the quote marks stayed as I had them (double).
Otherwise things look really GREAT at this point!
D Kelley (@lineman),
Yeah, there is no way that I have found to determine if MS Word is trying to use a single apostrophe or a double quote... so I just had to choose one. As I was looking at all of the cases where there were issues with MS Word causing problems, the vast (like, probably 85%+) were people using an apostrophe. So, I went with that one.
In the end, I figured people would rather have an apostrophe than nothing at all - which was the only other option.
The best solution is to just open up notepad, copy your MS Word stuff from there into notepad, than copy it from notepad into CB. I would be very surprised if you do that extra little step and you had any problems.
I do realize this whole issue is a serious pain in the butt, but I just don't have any way to totally solve this, without drastic measures that would result in us totally doing away with things like BBCode and such. Or, I switch gears and go with a totally different method... but that is a whole other story that I'd rather not get into...
Thanks for the update on this lineman!