I am importing a word document and for some reason all of my headings (h1, h2,h3) are being imported at <ol><h1>blah blah blah</h1></ol>
Why - where is this list coming from - it is killing my ability to do some formatting ...
Word Import - Headings put inside <ol> elements
-
- Propellus Maximus
- Posts: 1238
- Joined: Mon Feb 27, 2006 5:56 am
- Location: Melbourne, Australia
Re: Word Import - Headings put inside <ol> elements
Well, first up, that doesn't sound right, but the Word-Flare style mapping depends on quite a number of factors, so we need to ask a few questions first.
In the import settings are you preserving Word styles?
In Word, how is your Heading 1 defined - does it have auto-numbering applied?
In Flare, what should your H1 be defined as - does it have auto-numbering?
I find that I get best results when I don't preserve Word styles, and clear out all Word auto-numbering before I attempt an import, and then apply the correct stylesheet after the import.
In the import settings are you preserving Word styles?
In Word, how is your Heading 1 defined - does it have auto-numbering applied?
In Flare, what should your H1 be defined as - does it have auto-numbering?
I find that I get best results when I don't preserve Word styles, and clear out all Word auto-numbering before I attempt an import, and then apply the correct stylesheet after the import.
Margaret Hassall - Melbourne
Re: Word Import - Headings put inside <ol> elements
I do not retain Word styles ... but, YES the Headings are all autonumbered ... that's what's doing it ?
Wow - any suggestions on how to fix this after the conversion has already been done - I've already pulled all the word files in and did a lot of post conversion work - would hate to have to go back an re-import.
Wow - any suggestions on how to fix this after the conversion has already been done - I've already pulled all the word files in and did a lot of post conversion work - would hate to have to go back an re-import.
-
- Propellus Maximus
- Posts: 1238
- Joined: Mon Feb 27, 2006 5:56 am
- Location: Melbourne, Australia
Re: Word Import - Headings put inside <ol> elements
It's hard to guess how to fix without seeing the problem exactly. What I would look at, though, is first the definition of the headings generated in the CSS file. And then I would look at what the code for the headings looked like in the topics.normir wrote:...
Wow - any suggestions on how to fix this after the conversion has already been done - ...
When I import Word documents, I often need to do a find/replace to fix the styles behind lists, but I have to see the generated HTML to know how to set the find/replace.
Margaret Hassall - Melbourne
Re: Word Import - Headings put inside <ol> elements
Moving forward, removing the auto-numbering from the Word files first is definitely a good idea.
For fixing what' already done, you can probably do what you need to do with a good regular expression Find and Replace. While Flare does have some built-in support for this, people usually use a third party program. Nita Beck uses FAR HTML, which appears to be a great tool but costs $72 US for one license. This is probably worth it if the accounting/purchasing bureaucratic gods are on your side. I use Notpad++, which is free and works great but requires more advanced knowledge of regular expressions. (Beware of downloading this from SourceForge or CNet though; these are apparently now run by malicious owners.
Before doing anything like this, check in all files to source control and back up your entire project. You can seriously mess up your project if you do a regular expression find and replace in all files incorrectly, so performing a proper backups is a must.
If you need help with the regular expressions or setting up Far HTML, post an example of the text-view of a typical messed up heading, and we'll do our best to help you.
For fixing what' already done, you can probably do what you need to do with a good regular expression Find and Replace. While Flare does have some built-in support for this, people usually use a third party program. Nita Beck uses FAR HTML, which appears to be a great tool but costs $72 US for one license. This is probably worth it if the accounting/purchasing bureaucratic gods are on your side. I use Notpad++, which is free and works great but requires more advanced knowledge of regular expressions. (Beware of downloading this from SourceForge or CNet though; these are apparently now run by malicious owners.
Before doing anything like this, check in all files to source control and back up your entire project. You can seriously mess up your project if you do a regular expression find and replace in all files incorrectly, so performing a proper backups is a must.
If you need help with the regular expressions or setting up Far HTML, post an example of the text-view of a typical messed up heading, and we'll do our best to help you.
-Dan, Propellerhead-in-training
Re: Word Import - Headings put inside <ol> elements
If it's really doing <ol><h1>stuff</h1></ol>, you don't even need a regular expression editor. First, back up and follow all warnings & caveats from Dorcutt.
Then do a Find and Replace in Files, click Find in source code, and do a replace all
<ol><h1> to <h1>
</h1></ol> to </h1>
No regular expressions necessary.
Try this on one file first and see if it fixes it. Then (but still, backup backupbackup), do it on the whole project.
Do the same for <ol><h2>, etc.
Then do a Find and Replace in Files, click Find in source code, and do a replace all
<ol><h1> to <h1>
</h1></ol> to </h1>
No regular expressions necessary.
Try this on one file first and see if it fixes it. Then (but still, backup backupbackup), do it on the whole project.
Do the same for <ol><h2>, etc.
Re: Word Import - Headings put inside <ol> elements
Unfortunately the find replace option is a little tricky ... for example here is an H3
<ol style="list-style-type: decimal;margin-left: 36pt;" start="115">
<h3 MadCap:autonum="0.1.5 ">Configure Non-Transactional Data Source</h3>
</ol>
Here is another example of a H2
<ol style="list-style-type: decimal;margin-left: 29pt;" start="11">
<h2 MadCap:autonum="0.1 ">WebLogic/MSSQL Configuration</h2>
</ol>
As you see ... the actual Header text is (obviously I guess) different for each header ... in addition the conversion into flare "cleverly" put in some styling for the OL and put in a START attribute - which changes with each OL ... so the regular expression needed to replace this seems difficult - is it even possible ?
I'm thinking to write a quick perl script to do it ... or just bit the bullet, put on the headphones and just bang them out manually ...
<ol style="list-style-type: decimal;margin-left: 36pt;" start="115">
<h3 MadCap:autonum="0.1.5 ">Configure Non-Transactional Data Source</h3>
</ol>
Here is another example of a H2
<ol style="list-style-type: decimal;margin-left: 29pt;" start="11">
<h2 MadCap:autonum="0.1 ">WebLogic/MSSQL Configuration</h2>
</ol>
As you see ... the actual Header text is (obviously I guess) different for each header ... in addition the conversion into flare "cleverly" put in some styling for the OL and put in a START attribute - which changes with each OL ... so the regular expression needed to replace this seems difficult - is it even possible ?
I'm thinking to write a quick perl script to do it ... or just bit the bullet, put on the headphones and just bang them out manually ...
Re: Word Import - Headings put inside <ol> elements
First off, yes, this is very possible. I'm not really familiar with perl scripts, but I decided to practice my regex kung fu and successfully performed a find and replace using your test code in NotePad++:
Find string:
Replace string:
Explanation
What this code does is it looks for all ol tags <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>, but only the ones that have the text <h1, <h2, <h3, etc. in them: <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>. By wrapping the heading tag in parentheses, it stores the entire contents of the heading tag as a string <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>. Because this is the first saved string, it is automatically labelled as string "1". Then, for the replacement text, only the saved string (the heading tag) is put back in.
I think this should do what you need it to do. Use the "find and replace in all files" and limit it to the folders your topics is in. Again, for the love of all that you hold holy, do a full backup before attempting this, and try to manually find and replace a few to make sure it's only picking up what you want. Only when you are very confident should you hit the "replace all" button.
PS: For reference, I use this resource for help crafting the regex strings: http://regexlib.com/CheatSheet.aspx?Asp ... eSupport=1.
EDIT: Refined code slightly so that the heading end tag is specified. I realized otherwise if you had bolds or superscripts in your heading titles it might miss some occurrences.
Find string:
Code: Select all
<ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>
Code: Select all
\1
What this code does is it looks for all ol tags <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>, but only the ones that have the text <h1, <h2, <h3, etc. in them: <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>. By wrapping the heading tag in parentheses, it stores the entire contents of the heading tag as a string <ol.*>(\s*<h\d.*\s*</h\d>\s*)</ol>. Because this is the first saved string, it is automatically labelled as string "1". Then, for the replacement text, only the saved string (the heading tag) is put back in.
I think this should do what you need it to do. Use the "find and replace in all files" and limit it to the folders your topics is in. Again, for the love of all that you hold holy, do a full backup before attempting this, and try to manually find and replace a few to make sure it's only picking up what you want. Only when you are very confident should you hit the "replace all" button.
PS: For reference, I use this resource for help crafting the regex strings: http://regexlib.com/CheatSheet.aspx?Asp ... eSupport=1.
EDIT: Refined code slightly so that the heading end tag is specified. I realized otherwise if you had bolds or superscripts in your heading titles it might miss some occurrences.
-Dan, Propellerhead-in-training
Re: Word Import - Headings put inside <ol> elements
ok - so that really was kung fu regular expression stuff. I had decided NOT to do this as it seemed like I had found all the bad guys by hand. Today I found a whole other folder full of these import issues (lets call them !).
I ran your expression and after remembering to tell FLARE to use regular expressions it worked great - thanks.
R
I ran your expression and after remembering to tell FLARE to use regular expressions it worked great - thanks.
R
Re: Word Import - Headings put inside <ol> elements
Excellent, glad to hear it!
-Dan, Propellerhead-in-training