, utf-8, UTF-8, uft-16, UTF-16 Revisited

This forum is for all Flare issues not related to any of the other categories.
Post Reply
trent the thief
Propellus Maximus
Posts: 614
Joined: Wed Feb 01, 2006 6:21 am
Location: Off in the dark....

, utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by trent the thief »

Hi,

I've discovered that the issue where WebHelp help does not display when served via an application server or when only BOM characters are displayed is caused by improper file encoding in the Flare Resource files.

The en-us Resource files do not follow the RFC.

There are a several things that are necessary to ensure that UTF-8 files work properly:

First, the encoding type is UTF-8, not utf-8. It does make a difference. encoding="UTF-8" is how the encoding attribute should always appear.

Second, if the file has the encoding="UTF-8" attribute, then the editor that saved that file must also save it as UTF-8, not ANSI.

Third, UTF-8 does not use byte ordering, and therefore, should not include the Byte Order Markers (). It is not properly handled because its use is not required by the RFC. One of the JavaScript files even has it twice, once at the beginning and once more in the middle.

Fourth, when a file includes the unnecessary UTF-8 BOM, it should probably have encoding="UTF-8", not encoding-"utf-16" (Flare.app\Resources\WebHelp\Default.flwht\Navigation.htm, being one example)

Flare's Resource files are a mish-mash of files having improper encoding attributes (utf when UTF is the requirement), unnecessary UTF-8 BOM for UTF-8 files and sometimes on UTF-16 files, and files with an encoding attribute that have been saved as some other type.

I have only looked through the en-us files. There may or may not be similar problems in the files for en-uk and the Resources for the UTF-16 languages. I do know, that MadCap has used utf instead of UTF almost everywhere.


All of this is behind the scenes action that writers should never need to ever deal with.

I'm going to prepare a procedure that you can use to clean up the file encoding using a hex editor. If you're in need of it, PM me with your email address and I'll forward a copy to you when it's ready (~2 days). I need to publish today.

I've lost three workdays troubleshooting this issue and I don't think anyone else should need to struggle with opaque problems caused when their help files weren't properly encoded through no fault of their own. :evil:
Trent.

Certifiable.

Image

umm...
I meant MAD Certified.

Official Propeller Beanie Owner :-)

:flare: Are you on Flare's Slack channels? PM me for an invitation! :flare:
Diane2010
Jr. Propeller Head
Posts: 2
Joined: Wed Dec 30, 2009 12:52 pm

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by Diane2010 »

I'm having this same issue. I have to deliver WebHelp from a web server and support IE 7, IE 8, Firefox, Chrome, and Safari browsers. My WebHelp launches fine with I select OPEN WITH and select a browser. It does NOT work correctly when I try to view it in Firefox from the web server as our customes will. In Firefox, all I see are the  characters. I'm going to submit an urgent defect to MadCap as I have to deliver five Help systems by the end of March.
Diane2010
Jr. Propeller Head
Posts: 2
Joined: Wed Dec 30, 2009 12:52 pm

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by Diane2010 »

I'm having this same issue. I have to deliver WebHelp from a web server and support IE 7, IE 8, Firefox, Chrome, and Safari browsers. My WebHelp launches fine with I select OPEN WITH and select a browser. It does NOT work correctly when I try to view it in Firefox from the web server as our customes will. In Firefox, all I see are the  characters. I'm going to submit an urgent defect to MadCap as I have to deliver five Help systems by the end of March.
SteveS
Senior Propellus Maximus
Posts: 2090
Joined: Tue Mar 07, 2006 5:06 pm
Location: Adelaide, far side of the world ( 34°56'0.78\"S 138°46'44.28\"E).
Contact:

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by SteveS »

Hi Dianne,

Welcome to the forums!
Diane2010 wrote:I'm having this same issue. I have to deliver WebHelp from a web server and support IE 7, IE 8, Firefox, Chrome, and Safari browsers. My WebHelp launches fine with I select OPEN WITH and select a browser. It does NOT work correctly when I try to view it in Firefox from the web server as our customes will. In Firefox, all I see are the  characters. I'm going to submit an urgent defect to MadCap as I have to deliver five Help systems by the end of March.
Normally we'd say add your request via the bugbase (https://www.madcapsoftware.com/bugs/submit.aspx), but if you have a maintenance contract go directky to support. Nothing worse than a toght deadline when something outside of your control is not working as expected.
Image
Steve
Life's too short for bad coffee, bad chocolate, and bad red wine.
Ryan Cerniglia

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by Ryan Cerniglia »

Please allow me to answer these issues on a one-by-one basis:
trent the thief wrote:First, the encoding type is UTF-8, not utf-8. It does make a difference. encoding="UTF-8" is how the encoding attribute should always appear.
To quote from the XML Specification:
XML 1.0 wrote:XML processors should match character encoding names in a case-insensitive way and should either interpret an IANA-registered name as the encoding registered at IANA for that name or treat it as unknown (processors are, of course, not required to support all IANA-registered encodings).
To me this speaks to the fact that although Flare should place these as upper-case, if the declaration is lower-case it should still be parsed properly.
trent the thief wrote:Second, if the file has the encoding="UTF-8" attribute, then the editor that saved that file must also save it as UTF-8, not ANSI.
If this is appearing in the output files, please let me know either via a bug report, support case, or PM and I'll take a look. There shouldn't be any ANSI encoded documents in the output.
trent the thief wrote:Third, UTF-8 does not use byte ordering, and therefore, should not include the Byte Order Markers (). It is not properly handled because its use is not required by the RFC.
Although the Byte Order Mark (BOM) is not absolutely needed, it is not an illegal character and is mentioned quite heavily in the UTF-8 RFC (See Section 6 of the RFC). In addition, it's explicitly allowed as a possibility in the XML Specification.
trent the thief wrote:Fourth, when a file includes the unnecessary UTF-8 BOM, it should probably have encoding="UTF-8", not encoding-"utf-16" (Flare.app\Resources\WebHelp\Default.flwht\Navigation.htm, being one example)
Using Flare v5.01, I haven't been able to reproduce this in the output - it looks like the UTF-16 declaration is stripped but the BOM is still enabled. Can you confirm this?
trent the thief
Propellus Maximus
Posts: 614
Joined: Wed Feb 01, 2006 6:21 am
Location: Off in the dark....

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by trent the thief »

Ryan Cerniglia wrote:Please allow me to answer these issues on a one-by-one basis:
trent the thief wrote:First, the encoding type is UTF-8, not utf-8. It does make a difference. encoding="UTF-8" is how the encoding attribute should always appear.
Ryan Cerniglia wrote:To quote from the XML Specification:
XML 1.0 wrote:XML processors should match character encoding names in a case-insensitive way and should either interpret an IANA-registered name as the encoding registered at IANA for that name or treat it as unknown (processors are, of course, not required to support all IANA-registered encodings). To me this speaks to the fact that although Flare should place these as upper-case, if the declaration is lower-case it should still be parsed properly.
When the RFC says should, they mean should, not must. The situation as it stands is that "Musts" are almost always implemented. The Should and Mays are spotty.
trent the thief wrote:Second, if the file has the encoding="UTF-8" attribute, then the editor that saved that file must also save it as UTF-8, not ANSI.
Ryan Cerniglia wrote:If this is appearing in the output files, please let me know either via a bug report, support case, or PM and I'll take a look. There shouldn't be any ANSI encoded documents in the output.
Flare's Resource files are a mixture. I don't think any ANSI file made it to the output, but if we're all dealing with UTF-8 in en-us and UTF-16 in other language groups, they the resource files that are the basis for our output should be saved accordingly using the proper UTF encoding. This is part of the issue for people ending up with "corrupt" fltoc files. The ones that are encoded in utf-16 and unreadable in a UTF-8 project.
trent the thief wrote:Third, UTF-8 does not use byte ordering, and therefore, should not include the Byte Order Markers (). It is not properly handled because its use is not required by the RFC.
Ryan Cerniglia wrote:Although the Byte Order Mark (BOM) is not absolutely needed, it is not an illegal character and is mentioned quite heavily in the UTF-8 RFC (See Section 6 of the RFC). In addition, it's explicitly allowed as a possibility in the XML Specification.
There again, there is a difference between MUST appear and allowed to appear.
trent the thief wrote:Fourth, when a file includes the unnecessary UTF-8 BOM, it should probably have encoding="UTF-8", not encoding-"utf-16" (Flare.app\Resources\WebHelp\Default.flwht\Navigation.htm, being one example)
Ryan Cerniglia wrote:Using Flare v5.01, I haven't been able to reproduce this in the output - it looks like the UTF-16 declaration is stripped but the BOM is still enabled. Can you confirm this?
Yes, the BOM is still there. In fact, this issue has been brought up several times in the past three years. I'm still dealing with the UTF-8/UTF-16 issue in fltoc files. UTF-16 BOMs and UTF-16 files do make it into the output. I entered a bug on this last year. My content is 100% en-us and so is that of the other writers in my group. Yet on a random basis, Flare spits out UTF-16 fltoc and opening them is a fifty-fifty proposition.

The Flare editor and help compiler both spit out BOM every chance they get. Not to mention the fact that when MadCapAll.js is put together it has 20 or so BOMs strewn through it, so even your compiler is not expecting to see them, otherwise, it would concatenate the files cleanly, using only a single (yet unneeded) BOM at the beginning of the file.

At a bare minimum, MadCap needs to clean up the Resource files and add a switch to the editor to make the use of BOM selectable.

If you search through trouble calls since 2.0, you'll see that this problem is not new. I've been dealing with it almost since the beginning.

This situation exists in 4.2, 5, 5.01, and that which shall be unnumbered ;-)
Trent.

Certifiable.

Image

umm...
I meant MAD Certified.

Official Propeller Beanie Owner :-)

:flare: Are you on Flare's Slack channels? PM me for an invitation! :flare:
curlynshort
Propeller Head
Posts: 15
Joined: Wed Oct 01, 2008 12:28 am

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by curlynshort »

Is there any progress on this? I, too, have experienced the same problem (not) viewing WebHelp output in Firefox. Happy to submit a bug report if this improves the prospects of getting a fix.
Rowena
trent the thief
Propellus Maximus
Posts: 614
Joined: Wed Feb 01, 2006 6:21 am
Location: Off in the dark....

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by trent the thief »

curlynshort wrote:Is there any progress on this? I, too, have experienced the same problem (not) viewing WebHelp output in Firefox. Happy to submit a bug report if this improves the prospects of getting a fix.
Rowena

Hi,

No news yet. If you need the perl script I wrote to strip the BOM characters, let me know. I can email it to you with instructions. You'd need to have perl installed on the machine running the script and be able to change the directory for work inside the script. It's easy to do in a text editor.
Trent.

Certifiable.

Image

umm...
I meant MAD Certified.

Official Propeller Beanie Owner :-)

:flare: Are you on Flare's Slack channels? PM me for an invitation! :flare:
curlynshort
Propeller Head
Posts: 15
Joined: Wed Oct 01, 2008 12:28 am

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by curlynshort »

Thanks, Trent.

It would be great if you could send me the script, plus instructions, if you have time.
Rowena
Last edited by curlynshort on Tue Apr 20, 2010 1:04 am, edited 1 time in total.
trent the thief
Propellus Maximus
Posts: 614
Joined: Wed Feb 01, 2006 6:21 am
Location: Off in the dark....

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by trent the thief »

On the way. The instructions are in side the script. Open it with a real text editor (texpad, vi, etc.) to avoid anything untoward happening :-)
Trent.

Certifiable.

Image

umm...
I meant MAD Certified.

Official Propeller Beanie Owner :-)

:flare: Are you on Flare's Slack channels? PM me for an invitation! :flare:
MadCapWriter
Jr. Propeller Head
Posts: 3
Joined: Thu Mar 18, 2010 8:27 am

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by MadCapWriter »

Trent, thank you so much for putting together this script. We also need to have our online help viewable on Linux and Solaris machines. Would you please send me the script as well?
Thanks in advance for your help.
Christina

HP
susan1000
Jr. Propeller Head
Posts: 1
Joined: Sun Feb 28, 2010 12:04 pm

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by susan1000 »

We are running into some of the same issues. Would you be willing to share the script with us?
Thank you,
Susan
trent the thief
Propellus Maximus
Posts: 614
Joined: Wed Feb 01, 2006 6:21 am
Location: Off in the dark....

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by trent the thief »

Just awaiting your email
Trent.

Certifiable.

Image

umm...
I meant MAD Certified.

Official Propeller Beanie Owner :-)

:flare: Are you on Flare's Slack channels? PM me for an invitation! :flare:
Joseph_McMullen
Jr. Propeller Head
Posts: 7
Joined: Wed Feb 16, 2011 12:18 pm

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by Joseph_McMullen »

Hi Trent,
Can you please e-mail this script to me? This is a request from Colin Walters, the tools support person in my documentation group. He is having access problems with this forum and asked me to request the script.

thanks
trent the thief
Propellus Maximus
Posts: 614
Joined: Wed Feb 01, 2006 6:21 am
Location: Off in the dark....

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by trent the thief »

Sure thing.

Just PM me your email address :-)
Trent.

Certifiable.

Image

umm...
I meant MAD Certified.

Official Propeller Beanie Owner :-)

:flare: Are you on Flare's Slack channels? PM me for an invitation! :flare:
trent the thief
Propellus Maximus
Posts: 614
Joined: Wed Feb 01, 2006 6:21 am
Location: Off in the dark....

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by trent the thief »

Hi Everyone,

Would you please do me a favor? If you've used the BOM script, please ask MadCap to just fix this issue in Flare so that we don't need to manually adjust their broken UTF-8 output?

My rough count says about 25 people have used the script since I started this thread. Here is the bug fix/enhancement request link:

http://www.madcapsoftware.com/bugs/submit.aspx

Thanks!
Trent.

Certifiable.

Image

umm...
I meant MAD Certified.

Official Propeller Beanie Owner :-)

:flare: Are you on Flare's Slack channels? PM me for an invitation! :flare:
rob hollinger
Propellus Maximus
Posts: 661
Joined: Mon Mar 17, 2008 8:40 am

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by rob hollinger »

The issue is now fixed in 7.1. Byte Order Marks are no longer in the outputs.

There are a few side effects to removing the BOM characters involving IE.

If you have left over characters in topics from a word import that show as little boxes in the text editor, this can cause an Internal error: The surrogate pair (0xDBC0, 0xDBC0) is invalid during a build of webhelp. Removing the characters or enabling BOM is the fix.
ASPX pages require BOM to display simple characters such as asterisks etc.

To turn BOM back on:
1) Open the registry editor (type regedit in the windows run window)
2) Navigate to HKEY_CURRENT_USER\Software\MadCap Software\Flare) Right-Click on the right, and select New | String Value
4) Name the string value SaveXmlWithBom
5) Double-click the value and enter True into the Value Data field
6) Re-Launch Flare and generate WebHelp. All generated files will have the BOM inserted
7) To turn it back off, change the Value Data field to False, or delete the string value
Rob Hollinger
MadCap Software
LTinker68
Master Propellus Maximus
Posts: 7247
Joined: Thu Feb 16, 2006 9:38 pm

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by LTinker68 »

rhollinger wrote:To turn BOM back on:
1) Open the registry editor (type regedit in the windows run window)
Will this become a configuration option in the target in v7.2 or whatever the next release is so users don't have to edit their registry?
Image

Lisa
Eagles may soar, but weasels aren't sucked into jet engines.
Warning! Loose nut behind the keyboard.
trent the thief
Propellus Maximus
Posts: 614
Joined: Wed Feb 01, 2006 6:21 am
Location: Off in the dark....

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by trent the thief »

Thanks, Rob! You've made quite a few people happy with this change.
Trent.

Certifiable.

Image

umm...
I meant MAD Certified.

Official Propeller Beanie Owner :-)

:flare: Are you on Flare's Slack channels? PM me for an invitation! :flare:
eeclifford
Jr. Propeller Head
Posts: 1
Joined: Mon May 16, 2016 11:18 am

Re: , utf-8, UTF-8, uft-16, UTF-16 Revisited

Post by eeclifford »

Hi, Trent, I know it's been a while since you came up with your script to resolve this issue. I'm using an older version of Flare and have encountered this problem. Could you possibly please send me a copy of the script? I think I read that the instructions are documented inside the script.
Many thanks,

Estella
Post Reply