Skip to content

Rebase PR 1943 of git/git-scm.com #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

dscho
Copy link
Contributor

@dscho dscho commented Mar 10, 2025

Since I offered to help with rebasing git/git-scm.com#1943, I figured I should give it a quick try, to see how hard it would be.

Narrator's voice: It was hard. Very.

When I saw just how involved it would be, I didn't want to just look from the peanut gallery but offer my assistance. And while I have literally no knowledge whatsoever of Farsi, I know other languages, such as Javascript. So I wrote a node.js script to help rebase the patches.

This is the script I used.
const fs = require('fs')
const child_process = require('child_process')

const git = (...arguments) => {
  const result = child_process.spawnSync('git', arguments)
  if (result.error) throw gpgDecrypt.error
  if (result.status !== 0) {
    const quotedArgs = arguments.map(
      arg => arg.match(/[ ']/)
      ? `'${arg.replace(/'/g, "'\\''")}'`
      : arg
    )
    throw new Error(`\`git ${quotedArgs.join(' ')}\` failed(${result.status}): ${result.stderr}`)
  }
  return result.stdout.toString('utf8').trim()
}

const guessFile = (needle) => {
  const file = git('grep', '-F', '-l', needle)
  if (file.includes('\n')) throw new Error(`Looking for ${needle} turned up multiple files:\n${file}`)
  return file
}

const rebasePatch = async (patch) => {
fs.writeFileSync('a1.patch', patch)
  const lines = patch.split('\n').filter(line => line !== '\\ No newline at end of file')
  if (!lines[0].match(/^From [0-9a-f]{40}/)) throw new Error(`Not a Git patch: starts with: '${lines[0]}'`)
  let author
  let date
  let message = []
  let i
  for (i = 1; i < lines.length && !lines[i].startsWith('diff'); i++) {
    if (lines[i].startsWith('From: ')) author = lines[i].slice(6)
    else if (lines[i].startsWith('Date: ')) date = lines[i].slice(6)
    else if (lines[i].startsWith('Subject: ')) {
      let subject = lines[i].slice(9).replace(/^\[PATCH\] */, '')
      while (i + 1 < lines.length && lines[i + 1].startsWith(' ')) subject += lines[++i]
      message.push(subject)
    } else if (lines[i] === '') {
      message.push('')
      break
    } else console.error(`warning: unrecognized header line '${lines[i]}'`)
  }

  console.error(`Parsing ${message[0]}`)
  while (i < lines.length && lines[i] !== '---' && !lines[i].startsWith('diff')) message.push(lines[i++])
  while (i < lines.length && !lines[i].startsWith('diff')) i++

  const targetFiles = []
  let targetFile
  let targetContent

  while (i < lines.length) {
    if (i < lines.length && lines[i].startsWith('diff ')) {
      i++
      // skip ---/+++ lines
      while (i < lines.length && !lines[i].startsWith('@')) i++
    }
    while (i < lines.length && !lines[i].match(/^[-+]/)) i++

    const minus = []
    const plus = []

    while (i < lines.length && lines[i].startsWith('-')) minus.push(lines[i++].slice(1))
    while (i < lines.length && lines[i].startsWith('+')) plus.push(lines[i++].slice(1))

    const joinLinesAndSplitAtTags =
      array => array
        .join('\n')
        .replace(/<code>([^<]*)<\/code>/g, "`$1`")
        .replace(/<pre class="highlight"><code class="language-([^"]*)"[ \n]+data-lang="\1">/g, '[source,$1]\n----\n')
        .replace(/<\/code><\/pre>/g, '\n----')
        .replace(/<a[ \n]+href="{{< relurl " *book[^"]*\/([^"#]*)"[ \n]*>}}">[^<]*<\/a>/g, "<<$1#$1>>")
        .replace(/(\[remote rejected\] master -)&gt;/g, "$1>")
        .replace(/<a[\n ]+href/g, '<a href')
        .replace(/<img src="{{< relurl " *book[^"]*\/(images\/[^"]*)" >}}" alt="([^>]*)">[ \n]*<\/div>[ \n]*<div class="title">[^.<]*\. ([^<]*)/g, '.$3\nimage::$1[$2]')
        .replace(/ &gt; (LAST_COMMIT)/, ' > $1')
        .replace(/<em>([^<]*)<\/em>/g, '_$1_')
        .replace(//g, ' -- ')
        .replace('"git reset HEAD &lt;file&gt;..."', '"git reset HEAD <file>..."')
        .replace(/(mergetool\.)&lt;(tool)&gt;\./g, '$1<$2>.')
        .replace(/(--tool=)&lt;(tool)&gt;/, '$1<$2>')
	.replace(/<div id="nav"><a href="{{< previous-section >}}">[^>]*<\/a> | <a href="{{< next-section >}}">[^>]*<\/a><\/div>/g, '')
	.replace(/(<a href="[^">]*")\n *(class="bare")/g, '$1 $2')
	.replace(/(Author: [^>\n]+ )&lt;([^&]*)&gt;/g, '$1<$2>')
        .split(/(\s*<(?!schacon)[^>]*(?!<>)>(?!>)\s*)/)
    const en = joinLinesAndSplitAtTags(minus)
    const fa = joinLinesAndSplitAtTags(plus)

    const sanitize = (line) => line
      .trim()
      // .replace(/({{< relurl ") *([^"]*")\s*(>)+/, '$1$2$3')
      .replace(/\s+(data-lang=")/, ' $1')
      .replace(//g, "'")
      .replace(//g, '``')
      .replace(//g, "''")

    let a = 0
    let b = 0
    while (a < en.length && b < fa.length) {
      let enLine = sanitize(en[a])
      let faLine = sanitize(fa[b])

      if (enLine === faLine) {
        a++
        b++
        continue
      }
      if (faLine.match(/^\s*<div dir="rtl">\s*$/) && b + 1 < fa.length && sanitize(fa[b + 1]) === '') {
        b += 2
        continue
      }
      if ((a % 2) === 0 && (b % 2) === 0) {
        /* if (
          b + 3 < fa.length
          && sanitize(fa[b + 1]) === '<code>'
          && (a + 1 >= en.length || sanitize(en[a + 1]) !== '<code>')
          && sanitize(fa[b + 3]) === '</code>'
        ) {
          faLine = sanitize(fa.slice(b, b + 5).join(''))
          b += 4
        } */
        if (enLine !== '' || faLine !== '') {
          // translate
          const needle = enLine.replace(/^\[source,console\]\n----\n/, '').replace(/\n[^]*/, '')
          if (targetFile === undefined) {
            if (enLine === 'Summary' && minus.join('\n').includes('covered most of the major ways')) {
	      targetFile = 'ch08-customizing-git.asc'
            } else targetFile = guessFile(needle)
            targetContent = fs.readFileSync(targetFile, 'utf8')
          }
          let found = targetContent.indexOf(enLine)
          if (found < 0) {
            for (const e of [{
              pattern: '_an_example_git_enforced_policy#_an_example_git_enforced_policy',
              replacement: 'ch08-customizing-git#_an_example_git_enforced_policy',
            }, {
              pattern: 'filters_a#filters_a',
              regex: /(filters_[ab])#\1/g,
              replacement: '$1',
            }, {
              pattern: '_signing#_signing',
              replacement: 'ch07-git-tools#_signing',
            }, {
              pattern: '_ignoring#_ignoring',
              replacement: 'ch02-git-basics-chapter#_ignoring',
            }, {
              pattern: '"&lt;input&gt;"="&lt;output&gt;"',
	      regex: /&lt;((in|out)put)&gt;/g,
	      replacement: '<$1>',
            }, {
              pattern: '_p4_git_fusion#_p4_git_fusion',
	      replacement: '_p4_git_fusion',
            }, {
              pattern: '_git_p4_branches#_git_p4_branches',
	      replacement: '_git_p4_branches',
	    }, {
	      pattern: '\nYou can use `git filter-branch` to remove',
	      replacement: '(((git commands, filter-branch)))\nYou can use `git filter-branch` to remove',
	    }, {
	      pattern: 'the _User_ column (the 2nd one)',
	      replacement: "the 'User' column (the 2nd one)",
	    }, {
	      pattern: '-unified=&lt;n&gt;',
	      regex: /(-u(nified=)?)&lt;n&gt;/g,
	      replacement: '$1<n>',
	    }, {
	      pattern: 'It is invoked like `$GIT_SSH',
	      regex: /&lt;([^&]*)&gt;/g,
	      replacement: '<$1>',
	    }, {
	      pattern: '_revision_selection#_revision_selection',
	      replacement: 'ch07-git-tools#_revision_selection',
	    }, {
	      pattern: '_credential_caching#_credential_caching',
	      replacement: 'ch07-git-tools#_credential_caching',
            }]) {
              if (!enLine.includes(e.pattern)) continue
              const candidate = enLine.replace(e.regex || e.pattern, e.replacement)
              found = targetContent.indexOf(candidate)
              if (found >= 0) {
                enLine = candidate
                break
              }
            }
          }
          if (found < 0) {
            console.error(`Could not find ${enLine} in ${targetFile}; looking harder`)
            fs.writeFileSync(targetFile, targetContent)
            targetFiles.push(targetFile)
	    if (enLine === 'Subversion' && en[a - 1] === '<h3 id="_subversion">') {
	      targetFile = 'book/09-git-and-other-scms/sections/import-svn.asc'
	    } else if (enLine === 'Mercurial' && en[a - 1] === '<h3 id="_mercurial">') {
	      targetFile = 'book/09-git-and-other-scms/sections/client-hg.asc'
	    } else if (enLine === 'Bazaar' && en[a - 1] === '<h3 id="_bazaar">') {
	      targetFile = 'book/09-git-and-other-scms/sections/import-bzr.asc'
	    } else if (enLine === 'Perforce' && en[a - 1] === '<h3 id="_perforce_import">') {
	      targetFile = 'book/09-git-and-other-scms/sections/import-p4.asc'
            } else targetFile = guessFile(needle)
            targetContent = fs.readFileSync(targetFile, 'utf8')
            found = targetContent.indexOf(enLine)
          }
          if (found < 0) throw new Error(`Could not find '${enLine}'`)
          targetContent = `${targetContent.slice(0, found)}${faLine}${targetContent.slice(found + enLine.length)}`
        }

        a++
        b++
        continue
      }
      throw new Error(`Stopped at a: ${a}, b: ${b}\n'${en.slice(a, a + 10).join('')}'\nvs\n'${fa.slice(b, b + 10).join('')}'`)
    }
  }

  if (!targetFile) throw new Error(`Could not find any edits in ${patch}`)
  fs.writeFileSync(targetFile, targetContent)
  targetFiles.push(targetFile)

  git('commit', '-m', message.join('\n'), `--author=${author}`, `--date=${date}`, '--', ...targetFiles)
  console.log(`Committed ${message[0]}`)
}

(async () => {
  for (let i = 7; i >= 0; i--) {
    const patch = await fetch(`https://github.com/git/git-scm.com/commit/dfd9553ba76c1b11aa978ef99a7dfc944bfb36c7~${i}.patch`)
    await rebasePatch(await patch.text())
  }
})().catch(e => { throw e })

True to form, as a one-time hack, it lacks pretty much all of the documentation.

As one might guess, I started out with something straight-forward: parse the diffs, ignoring the HTML tags, trying to let the script figure out automatically what text snippets should be replaced with what other text snippets.

However, some of the HTML -- even between HTML tags -- needed to be "back-converted" to AsciiDoc. So I added that.

From there, I worked my way through the exceptions to that rule, and there were tons.

The high-level overview of the script is that the loop at the end of the script tries to fetch the commits as patches, then calls rebasePatch(), which parses first the header (to learn the metadata that will later be used to create the commit), then parses the diff to obtain minimal pre-/post-images, then transforms those to look a lot more like AsciiDoc than HTML, then splits by HTML tags, then iterates over the parts between the HTML tags (verifying that the HTML tags are identical between English and Farsi). For the parts between the HTML tags that differ between English and Farsi, the script uses git grep -F to figure out which file needs to be edited, then finds the respective location where the English text (= "pre-image") is located, and replaces it with the Farsi text. In this part, there are quite a few hacks related to my reluctance to replace &lt;/&gt; wholesale, and there are quite a few hacks due to the {{< relurl ... >}} links no longer necessarily having all the information to recreate the AsciiDoc <<...>> references.

The last commit of git/git-scm.com#1943, git/git-scm.com@dfd9553, is not even applicable because the AsciiDoc references do not have the link text, and neither do they have full links.

Now, @YasinDehfuli I hope that this here PR is useful in some shape or form and does not cause more work than it took to craft.

@YasinDehfuli
Copy link
Collaborator

Of course, dear @dscho.

This was a very professional and interesting move, and I truly appreciate it.

My initial review of your translation was quite good. However, the Persian translation had some structural issues that needed correction. I'll edit them to ensure a flawless and accurate translation.

The Iranian open-source community will be grateful to you.

@dscho
Copy link
Contributor Author

dscho commented Mar 11, 2025

the Persian translation had some structural issues that needed correction. I'll edit them to ensure a flawless and accurate translation.

As long as I did not cause more work for you, I'm happy!

Copy link
Contributor Author

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot claim to understand Farsi, and I lack permissions to approve the PR, but I'd say: ship it!

@YasinDehfuli
Copy link
Collaborator

Dear friends, the revisions and translation of this pull request have been completed.

@jnavila Please check it and let me know if there are no structural issues and the code is working fine, so we can merge the pull request. Or you can merge it yourself.

@dscho, you can correct the untranslated sections or those with translation issues in the same way we proceeded. Let's resolve the issues and apply the changes.

I am available, and you can tag me for translations of any pull requests.

@YasinDehfuli
Copy link
Collaborator

YasinDehfuli commented Apr 19, 2025

Dear @dscho ,
I’ve noticed an issue in the Persian translation that can be easily fixed, and resolving it could significantly improve the fluency and quality of the Persian text.

As we know, Persian—like Arabic—is a right-to-left (RTL) language, but currently on the GitHub site, the layout is left-to-right (LTR).

This problem can be solved quite easily, and we can also handle the RTL formatting in our future translations.

Just needs to change #content direction from this
image

to dir="rtl"

image

or we can handle it in out files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants