This repository has been archived by the owner on May 2, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 89
Sync missing translation keys in app/locales/*.json files to a Google Sheet #481
Open
michaelmcmillan
wants to merge
16
commits into
master
Choose a base branch
from
lost-in-translation-js
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
b379cbc
add script for finding missing translations across all locales
michaelmcmillan 06d1ef0
remove unused var
michaelmcmillan 5a50bc1
port lost-in-translation.py to javascript for consistency
michaelmcmillan 004d5e0
ported over script usage
michaelmcmillan 721769d
remove python script
michaelmcmillan 432870d
delimit language with ---
michaelmcmillan f70d5a6
determine locales from dir
michaelmcmillan ae00680
upload missing translations to a google sheet
michaelmcmillan 490106c
empty string as missing
michaelmcmillan 6ecc6fa
sync all available sheets
michaelmcmillan 34e600a
add sheet if it doesnt already exist
michaelmcmillan e2def1e
avoid rate limit
michaelmcmillan 1fe146f
rate limit circumvension
michaelmcmillan 99f1c80
prettify comment
michaelmcmillan d355fbf
refactored code and made it less complex after feedback from @fossecode
michaelmcmillan be1a7e1
linted
michaelmcmillan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
/* eslint-disable no-await-in-loop, global-require */ | ||
const { parse } = require('path'); | ||
const { readdirSync } = require('fs'); | ||
const { spawnSync } = require('child_process'); | ||
const { GoogleSpreadsheet } = require('google-spreadsheet'); | ||
|
||
/** | ||
* This script finds all the english translation keys across | ||
* all commits (including branches and pull requests) for each | ||
* locale.json in app/locales. Afterwards it looks at each locale | ||
* separately to see if it's missing any one the english keys. | ||
* | ||
* Important: You need to run it from the root directory. | ||
* | ||
* Usage: | ||
* node ops/lost-in-translation.js | ||
*/ | ||
|
||
/** | ||
* Calls git cli with arguments and returns stdout. | ||
*/ | ||
function git(...args) { | ||
const { stdout } = spawnSync('git', args, { encoding: 'utf-8' }); | ||
return stdout; | ||
} | ||
|
||
/** | ||
* Returns the commit hashes for a file path across all local branches. | ||
*/ | ||
function findCommitHashesForFile(filePath) { | ||
const output = git('log', '--pretty=format:"%h"', '--all', '--', filePath); | ||
const hashes = output.split('\n').map(line => line.replace(/"/g, '')); | ||
return hashes; | ||
} | ||
|
||
/** | ||
* Returns the JSON representation of the contents of a file at certain commit hash. | ||
*/ | ||
function retrieveJSONForFileAtCommitHash(filePath, commitHash) { | ||
const contentsAtCommitHash = git('show', `${commitHash}:${filePath}`); | ||
try { | ||
return JSON.parse(contentsAtCommitHash); | ||
} catch (error) { | ||
// Sometimes the file is not in a proper JSON format. Simply return nothing in that case. | ||
return {}; | ||
} | ||
} | ||
|
||
/** | ||
* Normalizes translation key like we do in app/server.ts. | ||
*/ | ||
function normalizeTranslationKey(translationKey) { | ||
return translationKey.replace(/[\s\n\t]+/g, ' ').trim(); | ||
} | ||
|
||
/** | ||
* Find all the locales (ie. en-IN, no, se) in the provided directory. | ||
*/ | ||
function retrieveAllLocales(directoryPath) { | ||
const filenames = readdirSync(directoryPath); | ||
const locales = filenames.map(filename => parse(filename).name); | ||
return locales; | ||
} | ||
|
||
/** | ||
* Add rows to a Google Spreadsheet that are not already added. | ||
*/ | ||
async function addUniqueRowsToGoogleSheet(sheet, rowsToAdd) { | ||
const alreadyAddedRows = await sheet.getRows({ limit: 1000 }); | ||
const uniqueRowsToAdd = rowsToAdd.filter( | ||
rowToAdd => | ||
!alreadyAddedRows.find( | ||
alreadyAddedRow => alreadyAddedRow.key === rowToAdd.key | ||
) | ||
); | ||
await sheet.addRows(uniqueRowsToAdd); | ||
return uniqueRowsToAdd; | ||
} | ||
|
||
/** | ||
* Retrieve all the sheets in a Google Spreadsheet document. | ||
*/ | ||
async function retrieveSheetsInDocument(doc) { | ||
const sheets = []; | ||
for (let sheetIndex = 0; sheetIndex < doc.sheetCount; sheetIndex += 1) { | ||
const sheet = await doc.sheetsByIndex[sheetIndex]; | ||
sheets.push(sheet); | ||
} | ||
return sheets; | ||
} | ||
|
||
/** | ||
* Step 1: Find all the (english) translation keys across all branches and PRs. | ||
* | ||
* If a Dutch developer has made a feature in a branch, we expect that him/her added a key | ||
* to one or many locale files (usually the english locale (en.json) or to their own locale | ||
* file (nl.json), or both). | ||
* | ||
* But they probably haven't added translations to all the other locales, because they don't | ||
* speak the other languages. And this is the problem, because now we need find people to help | ||
* translate the added keys to all the other locales. | ||
* | ||
* So what we do here is simply to find *all* the english translation keys across the entire | ||
* project. That means looking at all the english translations keys in all the locale files | ||
* since the start of the project across all branches and PRs. We throw them into a set that | ||
* we use in the next step. | ||
*/ | ||
const allLocales = retrieveAllLocales('app/locales/'); | ||
const allEnglishTranslationKeys = new Set([]); | ||
for (const locale of allLocales) { | ||
const filePath = `app/locales/${locale}.json`; | ||
for (const commitHash of findCommitHashesForFile(filePath)) { | ||
const translation = retrieveJSONForFileAtCommitHash(filePath, commitHash); | ||
for (const translationKey of Object.keys(translation)) { | ||
allEnglishTranslationKeys.add(normalizeTranslationKey(translationKey)); | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* Step 2: For each locale file check if there are any missing english translation keys | ||
* across all branches and PRs and all commits/changes. | ||
* | ||
* If there's a missing english translation key, we know that we're most likely missing | ||
* a translation for that locale. So what we need to do is to translate it, or get help | ||
* to translate it. | ||
* | ||
* We throw it into a dictionary where the key is the locale and the value is a set of | ||
* missing translations from english to that locale. | ||
*/ | ||
const translationKeysByLocale = {}; | ||
for (const locale of allLocales) { | ||
const filePath = `app/locales/${locale}.json`; | ||
for (const commitHash of findCommitHashesForFile(filePath)) { | ||
const translation = retrieveJSONForFileAtCommitHash(filePath, commitHash); | ||
const translationKeys = Object.keys(translation).map(key => | ||
normalizeTranslationKey(key) | ||
); | ||
if (translationKeysByLocale[locale] === undefined) { | ||
translationKeysByLocale[locale] = new Set([]); | ||
} | ||
translationKeysByLocale[locale] = new Set([ | ||
...translationKeysByLocale[locale], | ||
...translationKeys | ||
]); | ||
} | ||
} | ||
|
||
/** | ||
* Step 3: Add rows of missing translations to Google Spreadsheet. | ||
michaelmcmillan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* | ||
* https://docs.google.com/spreadsheets/d/1ILFfc1DX4ujMnLnf9UqhwQGM9Ke3s1cAWciy8VqMHZw | ||
*/ | ||
(async () => { | ||
const doc = new GoogleSpreadsheet( | ||
'1ILFfc1DX4ujMnLnf9UqhwQGM9Ke3s1cAWciy8VqMHZw' | ||
); | ||
await doc.useServiceAccountAuth( | ||
require('./coronastatus-translation-486cef09736e-credentials.json') | ||
); | ||
await doc.loadInfo(); | ||
const sheets = await retrieveSheetsInDocument(doc); | ||
|
||
for (const locale of allLocales) { | ||
// Create sheet for this locale if it doesn't already exist. | ||
let matchingSheet = sheets.find(sheet => sheet.title === locale); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice, much better 👍 |
||
if (!matchingSheet) { | ||
matchingSheet = await doc.addSheet({ | ||
title: locale, | ||
headerValues: ['key', 'translation'] | ||
}); | ||
} | ||
|
||
// Find rows that don't already exist by looking at the key column. | ||
const rows = []; | ||
for (const englishTranslationKey of allEnglishTranslationKeys) { | ||
if (!translationKeysByLocale[locale].has(englishTranslationKey)) { | ||
const row = { key: englishTranslationKey, translation: '' }; | ||
rows.push(row); | ||
} | ||
} | ||
|
||
// Add and print out how many rows we added. | ||
const addedRows = await addUniqueRowsToGoogleSheet(matchingSheet, rows); | ||
console.log( | ||
`Added ${addedRows.length} of ${rows.length} missing translations to the ${locale} sheet.` | ||
); | ||
|
||
// Avoid getting rate limited by Google's API (max writes per 100 seconds). | ||
console.log('Waiting before processing the next sheet.'); | ||
await new Promise(resolve => setTimeout(resolve, 20 * 1000)); | ||
} | ||
})(); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds unnecessary to me, as it would also include WIP's and stale branches. But it might be some cases where this makes sense that I have not thought about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the part that sucks, but having it would make sure that all translations are added and (hopefully) translated by the time we want to merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I understand. So a possible solution would be to run this if this run as a cronjob or something, so that the sheet is updated as soon as there are new texts in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so in the case with Stano's PR, those translations are now added to the sheet (even though his PR is WIP). Not sure if we want that or not, but we could try it out?