Getting rid of </span><span+> from word imports
Getting rid of </span><span+> from word imports
Hi All,
I assume the problem is with word (it usually is) - I'm starting up a big conversion task from word to MadCap, so want all ducks lined up beforehand/processes clear. When I import docx into a project, the converted output is generally ok - but often there are a bunch of </span><span+>s in the middle of the text - right in the middle of the paragraph. I have no idea why, or whether this will cause any harm.
I imagine the problem is with Word keepng some secret, pointless bloatware formatting that cannot be erased.
Any clues / tricks anyone has found about how to 'simplify' a word docx, so if it doesn't look like there's any changes in the middle of the text, there aren't any, so when it is converted into MadCap, it doesn't have unnecessary <span> formatting?
Many thanks.
I assume the problem is with word (it usually is) - I'm starting up a big conversion task from word to MadCap, so want all ducks lined up beforehand/processes clear. When I import docx into a project, the converted output is generally ok - but often there are a bunch of </span><span+>s in the middle of the text - right in the middle of the paragraph. I have no idea why, or whether this will cause any harm.
I imagine the problem is with Word keepng some secret, pointless bloatware formatting that cannot be erased.
Any clues / tricks anyone has found about how to 'simplify' a word docx, so if it doesn't look like there's any changes in the middle of the text, there aren't any, so when it is converted into MadCap, it doesn't have unnecessary <span> formatting?
Many thanks.
Re: Getting rid of </span><span+> from word imports
Beyond cracking open the docx XML files and editing the XML directly, I don't know of a way (and, as you might guess, that is harder than actually removing them in Flare).
Actually, if you are familiar with Regular Expressions (RegEx), you might be able to simply remove all <span></span> tags with no content between them. Flare has a built-in RegEx tool, or you could use a program like FAR (Find And Replace).
Actually, if you are familiar with Regular Expressions (RegEx), you might be able to simply remove all <span></span> tags with no content between them. Flare has a built-in RegEx tool, or you could use a program like FAR (Find And Replace).
Flare v6.1 | Capture 4.0.0
-
- Senior Propellus Maximus
- Posts: 4293
- Joined: Thu Feb 02, 2006 9:29 am
- Location: The Electric City
Re: Getting rid of </span><span+> from word imports
The spans are typically from inline formatting done in Word. Word makes this extremely easy and encourages it. Many Word users also just shake their head at strictly adhering to style based styling. One option would be to clean up the Word document first, then import. You could also do the cleanup after import in Flare. Not sure which way is easier.
New Book: Creating user-friendly Online Help
Paperback http://www.amazon.com/dp/1449952038/ or https://www.createspace.com/3416509
eBook http://www.amazon.com/dp/B005XB9E3U
Paperback http://www.amazon.com/dp/1449952038/ or https://www.createspace.com/3416509
eBook http://www.amazon.com/dp/B005XB9E3U
-
- Senior Propellus Maximus
- Posts: 3669
- Joined: Thu Feb 02, 2006 9:57 am
- Location: Pittsford, NY
Re: Getting rid of </span><span+> from word imports
There is another way, too, if you have Analyzer 4. Use the Markup Suggestions feature, which can find empty tags (among other markup issues) in all the files in your project and prompt if you want to remove them.Andrew wrote:Actually, if you are familiar with Regular Expressions (RegEx), you might be able to simply remove all <span></span> tags with no content between them. Flare has a built-in RegEx tool, or you could use a program like FAR (Find And Replace).
Nita
RETIRED, but still fond of all the Flare friends I've made. See you around now and then!
RETIRED, but still fond of all the Flare friends I've made. See you around now and then!
Re: Getting rid of </span><span+> from word imports
Thanks for all the suggestions. I'll go through each, to the best of my ability (warning, this could kick off further questions). We are also evaluating a product 'DataExtractor' which apparently strips out unwanted code, will let you know how I get on - can't believe I'm the only one to have this migration problem.
thanks again.
thanks again.
-
- Senior Propellus Maximus
- Posts: 3669
- Joined: Thu Feb 02, 2006 9:57 am
- Location: Pittsford, NY
Re: Getting rid of </span><span+> from word imports
You're not. There is always some kind of code cleanup that has to be done after importing content from elsewhere, whether Word or FrameMaker or RoboHelp. Not everyone sees exactly the same issue you're seeing, but I would bet that everyone sees some kind of issue that has to be cleaned up post migration. And sometimes the trick is to clean up content in the source application before pulling it into Flare.owilkes wrote:...can't believe I'm the only one to have this migration problem.
Nita
RETIRED, but still fond of all the Flare friends I've made. See you around now and then!
RETIRED, but still fond of all the Flare friends I've made. See you around now and then!
-
- Propellus Maximus
- Posts: 661
- Joined: Mon Mar 17, 2008 8:40 am
Re: Getting rid of </span><span+> from word imports
In Flare 7, we fixed the "Remove inline formatting" tool which is on the Text Formatting toolbar.
Its the Bold underline "B".
Open a topic that's full of span tags you want to remove.
Click CTRL + A to select all.
Click the button and they are all removed.
Tip: For really long topics, be sure your in Layout(Web) to avoid kicking off the pagination engine.
Its the Bold underline "B".
Open a topic that's full of span tags you want to remove.
Click CTRL + A to select all.
Click the button and they are all removed.
Tip: For really long topics, be sure your in Layout(Web) to avoid kicking off the pagination engine.
Rob Hollinger
MadCap Software
MadCap Software
-
- Sr. Propeller Head
- Posts: 205
- Joined: Wed Apr 28, 2010 2:51 am
Re: Getting rid of </span><span+> from word imports
Another thing to do is to make sure that "Preserve MS Word Styles" is cleared when you import your Word document...
Re: Getting rid of </span><span+> from word imports
Many thanks - the Remove Inline Formatting tool seems to have worked (will need to do more checking, but seems to do the trick).
Always nice (and rare) to try something new, and it do exactly what you want, first time!
thanks again
Always nice (and rare) to try something new, and it do exactly what you want, first time!
thanks again
-
- Propeller Head
- Posts: 28
- Joined: Wed Feb 15, 2006 12:58 pm
- Location: Chicago, IL
- Contact:
Re: Getting rid of </span><span+> from word imports
The Unformat (Remove Inline formatting) button works well with one or two instances, but importing Word you may get literally hundreds of these pairs. Consider using regex. In a tool like NotePad++, you can :
Find what: <span class="span_[0-9]">(.*?)</span>
Replace with: \1
or variations. This does a great job at cleaning up the mess.
Why Word does this in the first place is unclear. Likely the Word user used inline formatting at some point and Word doesn't always clean up its internal code properly. You can't see this in Word, and although removing a selection's formatting and then reapplying it works, it's hardly practical. Not with detailed formatting and not with hundreds of documents. The best you can do is handle it with Flare afterwards. On the other hand, using NotePad++'s regex feature with multiple files option, you can clean an unlimited number of documents in a few seconds. You can also use it to record a macro, making it even easier.
Find what: <span class="span_[0-9]">(.*?)</span>
Replace with: \1
or variations. This does a great job at cleaning up the mess.
Why Word does this in the first place is unclear. Likely the Word user used inline formatting at some point and Word doesn't always clean up its internal code properly. You can't see this in Word, and although removing a selection's formatting and then reapplying it works, it's hardly practical. Not with detailed formatting and not with hundreds of documents. The best you can do is handle it with Flare afterwards. On the other hand, using NotePad++'s regex feature with multiple files option, you can clean an unlimited number of documents in a few seconds. You can also use it to record a macro, making it even easier.
-
- Sr. Propeller Head
- Posts: 277
- Joined: Fri Feb 13, 2015 8:25 am
- Location: Germany
Re: Getting rid of </span><span+> from word imports
Hi,
Here is just an old "trick" to remove inline formatting in Word:
mark all text, then press ctrl+space
Here is just an old "trick" to remove inline formatting in Word:
mark all text, then press ctrl+space
Kind regards,
Sabine Kamprowski
DocToHelp MVP (by ComponentOne)
Sabine Kamprowski
DocToHelp MVP (by ComponentOne)