Word Import - Headings put inside <ol> elements

This forum is for all Flare issues related to importing files or projects.
Post Reply
normir
Jr. Propeller Head
Posts: 8
Joined: Sun Jun 07, 2015 2:34 am

Word Import - Headings put inside <ol> elements

Post by normir »

I am importing a word document and for some reason all of my headings (h1, h2,h3) are being imported at <ol><h1>blah blah blah</h1></ol>

Why - where is this list coming from - it is killing my ability to do some formatting ...
wclass
Propellus Maximus
Posts: 1238
Joined: Mon Feb 27, 2006 5:56 am
Location: Melbourne, Australia

Re: Word Import - Headings put inside <ol> elements

Post by wclass »

Well, first up, that doesn't sound right, but the Word-Flare style mapping depends on quite a number of factors, so we need to ask a few questions first.

In the import settings are you preserving Word styles?
In Word, how is your Heading 1 defined - does it have auto-numbering applied?
In Flare, what should your H1 be defined as - does it have auto-numbering?

I find that I get best results when I don't preserve Word styles, and clear out all Word auto-numbering before I attempt an import, and then apply the correct stylesheet after the import.
Margaret Hassall - Melbourne
normir
Jr. Propeller Head
Posts: 8
Joined: Sun Jun 07, 2015 2:34 am

Re: Word Import - Headings put inside <ol> elements

Post by normir »

I do not retain Word styles ... but, YES the Headings are all autonumbered ... that's what's doing it ?

Wow - any suggestions on how to fix this after the conversion has already been done - I've already pulled all the word files in and did a lot of post conversion work - would hate to have to go back an re-import.
wclass
Propellus Maximus
Posts: 1238
Joined: Mon Feb 27, 2006 5:56 am
Location: Melbourne, Australia

Re: Word Import - Headings put inside <ol> elements

Post by wclass »

normir wrote:...
Wow - any suggestions on how to fix this after the conversion has already been done - ...
It's hard to guess how to fix without seeing the problem exactly. What I would look at, though, is first the definition of the headings generated in the CSS file. And then I would look at what the code for the headings looked like in the topics.
When I import Word documents, I often need to do a find/replace to fix the styles behind lists, but I have to see the generated HTML to know how to set the find/replace.
Margaret Hassall - Melbourne
dorcutt
Sr. Propeller Head
Posts: 234
Joined: Thu May 15, 2014 12:16 pm

Re: Word Import - Headings put inside <ol> elements

Post by dorcutt »

Moving forward, removing the auto-numbering from the Word files first is definitely a good idea.

For fixing what' already done, you can probably do what you need to do with a good regular expression Find and Replace. While Flare does have some built-in support for this, people usually use a third party program. Nita Beck uses FAR HTML, which appears to be a great tool but costs $72 US for one license. This is probably worth it if the accounting/purchasing bureaucratic gods are on your side. I use Notpad++, which is free and works great but requires more advanced knowledge of regular expressions. (Beware of downloading this from SourceForge or CNet though; these are apparently now run by malicious owners.

Before doing anything like this, check in all files to source control and back up your entire project. You can seriously mess up your project if you do a regular expression find and replace in all files incorrectly, so performing a proper backups is a must.

If you need help with the regular expressions or setting up Far HTML, post an example of the text-view of a typical messed up heading, and we'll do our best to help you.
-Dan, Propellerhead-in-training
emsachs
Propeller Head
Posts: 91
Joined: Wed Nov 19, 2014 12:49 pm

Re: Word Import - Headings put inside <ol> elements

Post by emsachs »

If it's really doing <ol><h1>stuff</h1></ol>, you don't even need a regular expression editor. First, back up and follow all warnings & caveats from Dorcutt.
Then do a Find and Replace in Files, click Find in source code, and do a replace all
<ol><h1> to <h1>
</h1></ol> to </h1>

No regular expressions necessary.

Try this on one file first and see if it fixes it. Then (but still, backup backupbackup), do it on the whole project.
Do the same for <ol><h2>, etc.
normir
Jr. Propeller Head
Posts: 8
Joined: Sun Jun 07, 2015 2:34 am

Re: Word Import - Headings put inside <ol> elements

Post by normir »

Unfortunately the find replace option is a little tricky ... for example here is an H3

<ol style="list-style-type: decimal;margin-left: 36pt;" start="115">
<h3 MadCap:autonum="0.1.5  ">Configure Non-Transactional Data Source</h3>
</ol>

Here is another example of a H2
<ol style="list-style-type: decimal;margin-left: 29pt;" start="11">
<h2 MadCap:autonum="0.1  ">WebLogic/MSSQL Configuration</h2>
</ol>

As you see ... the actual Header text is (obviously I guess) different for each header ... in addition the conversion into flare "cleverly" put in some styling for the OL and put in a START attribute - which changes with each OL ... so the regular expression needed to replace this seems difficult - is it even possible ?

I'm thinking to write a quick perl script to do it ... or just bit the bullet, put on the headphones and just bang them out manually ...
dorcutt
Sr. Propeller Head
Posts: 234
Joined: Thu May 15, 2014 12:16 pm

Re: Word Import - Headings put inside <ol> elements

Post by dorcutt »

First off, yes, this is very possible. I'm not really familiar with perl scripts, but I decided to practice my regex kung fu and successfully performed a find and replace using your test code in NotePad++:

Find string:

Code: Select all

<ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>
Replace string:

Code: Select all

\1
Explanation
What this code does is it looks for all ol tags <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>, but only the ones that have the text <h1, <h2, <h3, etc. in them: <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>. By wrapping the heading tag in parentheses, it stores the entire contents of the heading tag as a string <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>. Because this is the first saved string, it is automatically labelled as string "1". Then, for the replacement text, only the saved string (the heading tag) is put back in.

I think this should do what you need it to do. Use the "find and replace in all files" and limit it to the folders your topics is in. Again, for the love of all that you hold holy, do a full backup before attempting this, and try to manually find and replace a few to make sure it's only picking up what you want. Only when you are very confident should you hit the "replace all" button.

PS: For reference, I use this resource for help crafting the regex strings: http://regexlib.com/CheatSheet.aspx?Asp ... eSupport=1.

EDIT: Refined code slightly so that the heading end tag is specified. I realized otherwise if you had bolds or superscripts in your heading titles it might miss some occurrences.
-Dan, Propellerhead-in-training
normir
Jr. Propeller Head
Posts: 8
Joined: Sun Jun 07, 2015 2:34 am

Re: Word Import - Headings put inside <ol> elements

Post by normir »

ok - so that really was kung fu regular expression stuff. I had decided NOT to do this as it seemed like I had found all the bad guys by hand. Today I found a whole other folder full of these import issues (lets call them !).

I ran your expression and after remembering to tell FLARE to use regular expressions it worked great - thanks.

R
dorcutt
Sr. Propeller Head
Posts: 234
Joined: Thu May 15, 2014 12:16 pm

Re: Word Import - Headings put inside <ol> elements

Post by dorcutt »

Excellent, glad to hear it!
-Dan, Propellerhead-in-training
Post Reply