Forum: SDL Trados support
Topic: Producing a glossary with no repetitions from a file with many repetitions
Poster: Huw Watkins
Hi Guys,
Some help with how SDL Trados Studio (2009) works under the hood would be very helpful here. I have a very large excel file that has now been fully translated. The issue I now have is that the agency has requested a glossary to be compiled from this file, but with no repetitions.
The original file was 300,000 words with 60,000 no match and a number of fuzzies. There are over 200,000 repetitions and I am using a fresh TM, so no TM matches.
Given the complexity of the task at hand, the agency has kindly agreed that I compile the glossary on a segment by segment basis (not word by word - or the next 10 years of my life would be written off(!!) or I'd have to buy some sort of term extraction tool, which is not going to happen).
My plan of action is this:
1) Recreate the project with another fresh TM.
2) Select the Export unknown segments option in Analyze Files settings
and Possibly:
3) Select the Export frequent Segments option in Analyze Files settings.
4) Process the project as normal and use the export files from 2 and possibly 3 for my glossary.
My doubt comes in step 3. Thus far my experiment has involved me doing steps 1 and 2 and producing an unknown segments file that contains solely the no match words. It doesn't contain the fuzzies however (this is based on looking at the analysis of the file exported during the batch processing).
If I repeat the process but including step three will there be any duplications with the no match words. Do the no match words actually count the first occurrence of segment that is repeated numerous times throughout a file? Is this the same for fuzzies?
Am I running the danger of having repetitions if I use both the unknown segment file and frequent segment file included in the final glossary (bear in mind that I want the fuzzies to appear, but not the reps - there are no 100% matches which makes things easier...)
My next question is this - I am finding that I am not able to export the unknown segments file to excel (the original format of the original file) - does anyone know how to solve this? I have attempt a good old fashioned copy and paste into excel with all the target segments and that seems to work, thankfully(!!!), but I'm curious to know if I can save the target file in excel or not.
Any other tips on how I should approach this?
[Edited at 2013-07-26 14:31 GMT]
Topic: Producing a glossary with no repetitions from a file with many repetitions
Poster: Huw Watkins
Hi Guys,
Some help with how SDL Trados Studio (2009) works under the hood would be very helpful here. I have a very large excel file that has now been fully translated. The issue I now have is that the agency has requested a glossary to be compiled from this file, but with no repetitions.
The original file was 300,000 words with 60,000 no match and a number of fuzzies. There are over 200,000 repetitions and I am using a fresh TM, so no TM matches.
Given the complexity of the task at hand, the agency has kindly agreed that I compile the glossary on a segment by segment basis (not word by word - or the next 10 years of my life would be written off(!!) or I'd have to buy some sort of term extraction tool, which is not going to happen).
My plan of action is this:
1) Recreate the project with another fresh TM.
2) Select the Export unknown segments option in Analyze Files settings
and Possibly:
3) Select the Export frequent Segments option in Analyze Files settings.
4) Process the project as normal and use the export files from 2 and possibly 3 for my glossary.
My doubt comes in step 3. Thus far my experiment has involved me doing steps 1 and 2 and producing an unknown segments file that contains solely the no match words. It doesn't contain the fuzzies however (this is based on looking at the analysis of the file exported during the batch processing).
If I repeat the process but including step three will there be any duplications with the no match words. Do the no match words actually count the first occurrence of segment that is repeated numerous times throughout a file? Is this the same for fuzzies?
Am I running the danger of having repetitions if I use both the unknown segment file and frequent segment file included in the final glossary (bear in mind that I want the fuzzies to appear, but not the reps - there are no 100% matches which makes things easier...)
My next question is this - I am finding that I am not able to export the unknown segments file to excel (the original format of the original file) - does anyone know how to solve this? I have attempt a good old fashioned copy and paste into excel with all the target segments and that seems to work, thankfully(!!!), but I'm curious to know if I can save the target file in excel or not.
Any other tips on how I should approach this?
[Edited at 2013-07-26 14:31 GMT]