Page 1 of 1

Batch removing old translation variants from TM

Posted: Thu Jan 16, 2020 5:18 am
by okonvick
I have a large TM developed over the course of many years. We have recently done a revision of the translation of one of our larger projects, so some existing segment translations were edited. As a result, we now have many cases when the TM includes two variants of how to translate the same thing. I would like to remove the old ones since Lingo often applies the older (not the updated) translation when autotranslating the project using the TM. Is there any tool for this or do I have to start programming to dig through the SQL database storing the TM?

Re: Batch removing old translation variants from TM

Posted: Fri Jan 31, 2020 12:41 am
by okonvick
To answer my own question, Lingo does not include a function for this, so I've put together a piece of SQL code that solves this problem.

The script looks for translations the source segments of which are the same - in other words for different translations of the same thing. If more than one translations are found (a duplicity), the translations are sorted from oldest to the newest and everything except the newest is removed. It could also happen that after this step, there is still more than one translation left since it can happen that two translations have the same creation datetime - if that happens, the script only leaves the one with the highest ID, which is safe to take as the newest one.

This solved the problem for me without any exceptions.

You can run this code on the translation memory database - for example in SQL Management Studio.

Code: Select all

delete from Translations where id in (
select 
t.id
from Translations t 
join Segments source on (source.id = t.sourceSegment_id)
join Segments targ on (targ.id = t.targetSegment_id)
join 
(
select 
cast(s_source.RawText as varchar(max)) as src_rawtext,
s_source.language_id,
max(s_targ.ModifiedOn) as targ_mod
from 
Translations s_t
join Segments s_source on (s_source.id = s_t.sourceSegment_id)
join Segments s_targ on (s_targ.id = s_t.targetSegment_id)
group by cast(s_source.RawText  as varchar(max)),s_source.language_id) ondra
on (ondra.src_rawtext =  cast(source.RawText  as varchar(max)) and ondra.language_id = source.language_id
and ondra.targ_mod > targ.ModifiedOn) 
);
go


delete from Translations where not id in (
select max(t.id)
from 
Translations t
join Segments source on (source.id = t.sourceSegment_id)
join Segments targ on (targ.id = t.targetSegment_id)
group by cast(source.RawText as varchar(max)),source.language_id
);
go

delete  from Segments 
where not exists (Select 1 from Translations t where t.sourceSegment_id = Segments.id or t.targetSegment_id= Segments.id);
go
Don't forget to backup your translation memory before trying this ;)

Re: Batch removing old translation variants from TM

Posted: Mon Feb 24, 2020 3:56 am
by ASTeC
Hello,

You can also avoid the creation of these double translations. To do this, update the translated segments instead of adding a new (and additional) translation (as it is done by default)...

For detailed procedure, see https://help.madcapsoftware.com/lingo11 ... Source.htm