Translation Technology Portfolio and Lessons Learned

Introduction

I would like to use this portfolio to show what I’ve learned in my Translation Technology course. I have worked with several CAT tools, such as SDL Trados, memoQ, and Memsource, learning how to perform a pseudotranslation before beginning a project, create and align a Translation Memory, use glossary converter to create a Term Base compatible with SDL Trados from scratch, and even some rudimentary Regex, or Regular Expression, to help with Quality Assurance. I worked with a group of classmates to perform a simulation project, using all the skills I gained throughout the semester to run a multilingual translation.

Part One: Our Project

File Access

[Proposal] (PDF)

[Deliverables] (ZIP)

Statement of Work

This is our Statement of Work. While we did our research regarding market rates for the language pairs in our project, we failed to put as much effort into the presentation of our proposal. This is a mistake that I know I’ve learned from, and I will be sure to keep in mind the importance of document format and presentation in the future.

Deliverables

Here is a screenshot of our project deliverables, which can be accessed above. The screenshot includes our folder structure, where we kept all files related to the project.

Our Lessons Learned

Part Two: Regex Tips for Trados

Collaborating with classmates with the same language pair, we developed the following regular expressions to help with quality assurance checking. The expressions range from grammar to pragmatic language usage, which we believe will be useful in varying circumstances.

1. Checking for Proper Quotations

Regex101 Link [Quotation Checker]

With this simple regex expression, you can check the entirety of a document for the proper usage of guillemets, or angular quotation marks, in Spanish. Some Spanish locales do not use the same quotation marks that we use in English. Ideally, the translator would know to use the proper quotation marks, but the QA checker can use this simple “and/or” regex function to check for both the left and right-facing guillemets. The QA checker can then use the find-and-replace function for substitutions, as shown below:

2. Checking for Gendered Articles

Regex101 link [Gendered Articles Checker]

Unlike English, Spanish makes use of gendered words and articles. Recently, there has been a movement to change the way language is used to make it more inclusive of non-binary people. In translation of such texts from English to Spanish, it would be very useful to see instances of gendered articles and the noun immediately following it so that the QA checker can then adjust the language as necessary.

In our regex expression, the [Ll] portion looks for all instances of a capital or lowercase “L”, as found in the articles “Los” and “Las.” At first, we only included a lowercase “l”, but we quickly found that doing that excluded some instances within the article. The next section, [o|a], looks for either an “o” or an “a” after the [Ll]. This allows us to keep the expression short and search for both “los” and “las” at the same time. The next portion to make note of is the \w+ followed by the [o|a]s. Since these gendered words will end with either an “o” or an “a” followed by an “s,” we decided to write the expression in a way that would search for all the characters in a word up until the ending character sequence we wanted. The \w+ considers all the potential characters that would appear before the ending sequence, as seen above with “las fronteras,” which has 6 characters before the “-as,” and “los propietarios,” which has 10 characters before the “-os.” In this context, we only developed a regular expression for words with the definite articles “los” and “las,” but the expression can certainly be adjusted to suit a singular word.

3. Checking for Diaeresis in Trados

Regex101 Link [Diaeresis]

We can also use Regex to check for additional grammatical errors in Trados. As an example, we used an expression to check for the proper use of diaeresis over the “u” character in Spanish. The grammatical rule made it simple to design such an expression: the “ü” character is preceded by a “g” and followed by either an “e” or “i,” either with or without a diacritical marker over it. This is what our expression looked like:

If you look at our expression, you’ll notice that it uses a “u” instead of a “ü.” This is intentional; our aim is to find expressions that meet the criteria for improper use. This way, the project editor can check for any improper grammar and make adjustments as necessary. A find-and-replace function would not work as well for this simply because there are also expressions that meet the criteria but do not require the diaeresis. This would be up to the editor or QA to keep an eye out for, since as of this moment there is no way to eliminate false positives from the expression criteria.

Checking for gendered words ES → EN in Trados

Regex101 link [Gendered Words]

When it comes to translating words from Spanish to English, there are instances where certain nouns are specifically masculine or feminine, lacking gender inclusivity. It is fair and considerate to adjust our usage of these nouns, deemed by society to be considered appropriate for only a woman or man. Certain titles also represent a relationship with a man, excluding certain groups, and reduce women to their marital status. For example, the more inclusive “Mr.” does not represent marriage, while “Miss” and “Mrs.” specifically denote a woman’s marital status. Replacing Miss and Mrs. with Ms. could be seen as a more inclusive option. The Regex imputed in Trados (man|Miss|Mrs)\-?\.? Would search for any nouns with man and titles above to replace them with more inclusive language.

The image above shows the regex format through Regex101. You notice that there was a slight change in format between one in Trados and Regex101: In Regex101, we would keep the backslashes for “Miss” and “Mrs.” to be matched. The group consists of finding “man” and titles “Miss” and “Mrs.” Therefore, they are grouped in parentheses () with the either or symbol |, to look for each of these instances. Added to the regex would be the search for every single character where there is a “-“ for man and a “. “ for title Mrs.

Find-and-Replace with Trados

We also developed some potential uses for the find-and-replace function on Trados, which can also make use of regex.

Removing Commas from Numerical Figures

Unlike English, Spanish does not use commas within numerical figures. Instead, a space is used after each group of three numbers: 50,255 would be 50 255, and so on. This find-and-replace function allows us to easily detect improper usage of commas within numbers in the source text and allows us to replace the commas with a blank space.

This is what our expression looked like: ([0-9]+),([0-9]+). First, we put each part of the regex into parentheses, to make each grouping of three digits as its own “group” in the regex. The numbers in brackets look for any integer from 0 to 9, and the “+” indicates that multiple integers could be in the expression that satisfies the criteria. For example, in the number “50,000,” the “50” would be included in the first part of the expression while the “000” would be included in the second part. Between these groups, we added a comma so that the function would look for numbers with commas in the target text. Next, when using the replace function, we added a space between the groupings (instead of $1,$2 we wrote $1 $2) and clicked “replace.” This gave us the end result as seen above. To include larger numbers, you can simply duplicate the end part of the expression above, so that it looks like this: ([0-9]+),([0-9]+),([0-9]+). Similar logic would apply to that as well: $1,$2,$3 would be written as $1 $2 $3 (with spaces instead of commas).

Ellxs

This expression searches for instances of the masculine and feminine pronouns ellos and ellas and replaces the respective vowels with an “x,” a popular gender-neutral alternative.

Key Takeaways

While some of the regular expressions that we developed may have some niche uses, the others can certainly see a fair bit of use in a daily work environment. I hope that you can benefit from our work, whether you use it directly or use it as inspiration to develop your own regex for Trados!

Leave a Comment