Fork me on GitHub

Help troubleshooting gettext string re-insertion script  Bottom

  • I've got this script and it's not working properly.

    It is meant to take translated strings in a '.po' file and re-insert them into the source code. After that, a new '.pot' file would have to be regenerated from the code (the script doesn't do that).

    I want to make this script run successfully against the Zikula core, so that I can re-insert all the reviewed strings (from the en_GB language on the translation portal) into the Zikula core, before the 1.2 release. After that, it can be used by the international communities and by 3rd-party devs to produce localised versions of Zikula and 3rd-party modules.

    Problems:

    1) First of all, it is failing to escape single and double quotation marks. This is causing syntax errors. For instance, when I try to run 'install.php' to install a site with the processed Zikula core, I'm getting:

    Code

    Parse error: syntax error, unexpected T_STRING in /var/www/zk120rc4/includes/pnAPI.php on line 1495


    The gettext string in the file is:

    Code

    'Error! Oh! Wow! An 'unidentified system error' has occurred.'


    It is not escaped (backslashed), so it's a syntax error.

    2) After the script has run, I'm finding that not all the strings have been replaced. For example, after processing a copy of the core, the file 'includes/LogUtil.class.php' at line 438 contains the string:

    Code

    "Unknown log destination [%s] ... "


    But the reviewed string that *should* have replaced it is:

    Code

    Error! The log file destination is unrecognised: '%s'.


    The script is not processing all strings properly.

    3) This is not a bug, but it's something it should do: the script does not handle plural forms. To be a properly working tool for the community, it should be able to process plural-form gettext strings.

    Any help troubleshooting these problems would be gratefully accepted.

    Please reply here or contact me by mail (I might be willing to pay for the solution if you can make it work *perfectly*).

    Thanks for any help or suggestions.

    Here's the full code of the script.

    Code

    /**
     * This is a little script to take a Zikula '.po' file
     * and insert the language strings back into the
     * source code files, so that the translated strings
     * in the '.po' file become the gettext ID strings
     * within the code files.
     * It's a tool for language reviews and obtaining
     * Zikula perfectly localised in the language of your
     * Choice.
     *
     * You should run this tool on a local system. It uses
     * too much memory to run it in a remote Web space.
     * Put this script and a '.po' file in the root directory
     * of a Zikula distribution.
     * Then run this command from the command line:
     *
     * php replacepo.php
     *
     * It will examine every file in the tree structure of the
     * distribution and perform the string replacement operations
     * on each file. You'll need plenty of memory allocated
     * for scripts (set to 128M below). Remember that you'll
     * need write permissions for all files (on Linux, you
     * might need to be logged in as root or, better still,
     * prefix the above command with 'sudo' to simply run
     * the script as root user.
     *
     * If you want to process a '.po' file other than 'zikula.po'
     * then just edit the file name on line 46 below.
     *
     * Limitations: The script does not handle plural strings.
     * You'll still have to do those by hand. This is a feature
     * that should be added.
     *
     * Contact: David, commerce@traduction.biz
     * License: GNU/GPL - http://www.gnu.org/copyleft/gpl.html
     */


    // Set the script's memory requirement and then include the Zikula base API.
    ini_set('memory_limit', '128M');

            // Load the file we're going to process.
            $poFile  = file_get_contents ('zikula.po');
            // Break the file down into an array of individual lines.
            $poLines = explode ("\n", $poFile);

            // Declare an array to hold the list of file names that contain a given language string.
        $files   = array();
            // Declare an array of strings for the message IDs.
        $msgIDs  = array();
            // Declare an array of strings for the message strings (the actual language strings).
        $msgStrs = array();
            // Declare a string to hold the counter keeping track of our position within the string we'll be parsing.
        $count   = 0;

            // Start a loop of examining each string containing one line in the input '.po' file.
        foreach ($poLines as $poLine) {
            $tMsgID = '';
                // Trim any white space off the beginning and end of the string.
            $poLine = trim($poLine);

                // If the line is one of the comment lines preceding each language string 'msgid' then
                // that line gets broken into two fields: the comment tag and a file name.
                // Each file name is a file contains that particular 'msgid'.
            if (strpos ($poLine, '#:') === 0) {
                $fields = explode (' ', $poLine);

                    // Just drop the comment tag. We don't need it.
                unset ($fields[0]);

                    // Analyse the string containing the file name and remove the line number.
                    // We just want the file name (including the path).
                    // Then we add the file name to an array of file names that contain the 'msgid'.
                foreach ($fields as $field) {
                    $fields2 = explode (':', $field);
                    $file = $fields2[0];
                    $files[$file] = $file;
            }

                // Else if the line delimits the start of a plural-case string ID or
                // one of the plural cases of the string, then just ignore it.
                // This script does not yet handle plural cases.
            } elseif (strpos ($poLine, 'msgid_plural') === 0 || strpos($poLine, 'msgstr[') === 0) {
                continue;

                // Else if the line begins with an 'msgid' then take the double-quote-bounded
                // string that follows immediately after, break it down into an array of characters.
            } elseif (strpos ($poLine, 'msgid') === 0) {
                $msg = substr (trim($poLine), 6);
                $chars = str_split ($msg);
                unset ($chars[count($chars)-1]);
                unset ($chars[0]);
            $msg = addslashes(implode('', $chars));
            $_msgID1 = '"' . $msg . '"';
            $_msgID2 = "'" . $msg . "'";

            $msgID1 = '__(' . $_msgID1 . ')';
            $msgID2 = '__(' . $_msgID2 . ')';
            $msgID3 = 'gt text=' . $_msgID1;
            $msgID4 = 'gt text=' . $_msgID2;

                // Else if the line begins with an 'msgstr' then it's the string associated with the 'msgid'.
            } elseif (strpos ($poLine, 'msgstr') === 0) {
                if (!$count++) {
                    continue;
            }
                $msg = substr (trim($poLine), 7);
                $chars = str_split ($msg);
                unset ($chars[count($chars)-1]);
                unset ($chars[0]);
            $msg = addslashes(implode('', $chars));
            $_msgString1 = '"' . $msg . '"';
            $_msgString2 = "'" . $msg . "'";

            $msgString1 = '__(' . $_msgString1 . ')';
            $msgString2 = '__(' . $_msgString2 . ')';
            $msgString3 = 'gt text=' . $_msgString1;
            $msgString4 = 'gt text=' . $_msgString2;

                $msgIDs[] = $msgID1;
                $msgStrs[] = $msgString1;
                $msgIDs[] = $msgID2;
                $msgStrs[] = $msgString2;
                $msgIDs[] = $msgID3;
                $msgStrs[] = $msgString3;
                $msgIDs[] = $msgID4;
                $msgStrs[] = $msgString4;
            }
        }

         //foreach ($msgIDs as $k=>$v) {
            //print "$v\n";
            //print "$msgStrs[$k]\n\n";
        //} exit();


        foreach ($files as $file) {
            print "Processing $file ...\n";
            $fData = file_get_contents ($file);
            $newFileData = str_replace ($msgIDs, $msgStrs, $fData);
            file_put_contents ($file, $newFileData);
        }
  • David -- one quick thing: There is no reason on earth to be logged in as root on a linux system to do this, especially if this should only run on a local system. If someone has downloaded the distribution, they should own the files and be able to write to them.

    Where can I grab the zikula.po file ?
  • Hi Chris, :)

    You can get the zikula.po file here:

    http://translate.zik…/zikula/core/1.2.0/
  • You can replace your escaping code:

    Code

    $msg = substr (trim($poLine), 6);
                $chars = str_split ($msg);
                unset ($chars[count($chars)-1]);
                unset ($chars[0]);
            $msg = addslashes(implode('', $chars));


    with one line:

    Code

    // remove 'mesgid' word, leading/trailing whitespace, and outer double quotes.
    $msg = addslashes( preg_replace('/^msgid\s*"|\s*"$/','',$poLine) );


    Also, instead of reading the entire file into an array, simple read the file one line at a time ! (How do you think batch processing got done on 14k machines icon_smile You'll find it's actually much faster that way, and you don't have to worry about the file size -- it will work on whatever file you give it, AND can than work without problems on the server side if needed.
  • Your second problem is because you aren't allowing for strings marked with __f() instead of __() , as is the case in line 438 of LogUtil.class.php

    I started a version that uses preg_replace to cover all cases. There is some string after the 'System error' line in zikula.po that results in a bad regex, but otherwise it's working.

    Have to do housework now -- Here is the code so far if someone else wants to look. It works if you truncate zikula.po after the rule for 'System error'.

    Code

    #!/usr/local/bin/php

    <?php

     * Contact: David, commerce@traduction.biz
     * License: GNU/GPL - http://www.gnu.org/copyleft/gpl.html
     */

    // Set the script's memory requirement and then include the Zikula base API.
    ini_set('memory_limit', '128M');
    ini_set('display_errors', 1);

            // Load the file we're going to process.
            $poFile  = file_get_contents ('zikula.po');
            // Break the file down into an array of individual lines.
            $poLines = explode ("\n", $poFile);

            // Declare an array to hold the list of file names that contain a given language string.
        $files   = array();
            // Declare an array of strings for the message IDs.
        $msgIDs  = array();
            // Declare an array of strings for the message strings (the actual language strings).
        $msgStrs = array();
            // Declare a string to hold the counter keeping track of our position within the string we'll be parsing.
        $count   = 0;

            // Start a loop of examining each string containing one line in the input '.po' file.
        foreach ($poLines as $poLine) {
            $tMsgID = '';
                // Trim any white space off the beginning and end of the string.
            $poLine = trim($poLine);

                // If the line is one of the comment lines preceding each language string 'msgid' then
                // that line gets broken into two fields: the comment tag and a file name.
                // Each file name is a file contains that particular 'msgid'.
            if (strpos ($poLine, '#:') === 0) {
                $fields = explode (' ', $poLine);


                    // Just drop the comment tag. We don't need it.
                unset ($fields[0]);

                    // Analyse the string containing the file name and remove the line number.
                    // We just want the file name (including the path).
                    // Then we add the file name to an array of file names that contain the 'msgid'.
                foreach ($fields as $field) {
                    $fields2 = explode (':', $field);
                    $file = $fields2[0];
                    $files[$file] = $file;
            }

                // Else if the line delimits the start of a plural-case string ID or
                // one of the plural cases of the string, then just ignore it.
                // This script does not yet handle plural cases.
            } elseif (strpos ($poLine, 'msgid_plural') === 0 || strpos($poLine, 'msgstr[') === 0) {
                continue;

                // Else if the line begins with an 'msgid' then take the double-quote-bounded
                // string that follows immediately after, break it down into an array of characters.
            } elseif (strpos ($poLine, 'msgid') === 0) {
                // remove 'mesgid' word, leading/trailing whitespace, and outer double quotes.
                $msg = preg_replace('/^msgid\s*"|\s*"$/','',$poLine);

                if ($msg ) {
                    // Escape quotes and slashes (for regex)
                    $msg = preg_replace('/([\'"\/])/', '\', $msg);
                    $msgRegex = '/((?:__f?\(|gt text=)[\'"])' . $msg . '([\'"])/';
                }

                // Else if the line begins with an 'msgstr' then it's the string associated with the 'msgid'.
            } elseif (strpos ($poLine, 'msgstr') === 0) {
                if (!$count++) {
                    continue;
            }
                // remove 'mesgid' word, leading/trailing whitespace, and outer double quotes.
                $msg = addslashes( preg_replace('/^msgstr\s*"|\s*"$/','',$poLine) );

                if ( $msg && $msgRegex ) {

                    $msgIDs[] = $msgRegex;
                    $msgStrs[] = '' . $msg . '';
                    $msgRegex = '';
                }
            }
        }

    /*
         foreach ($msgIDs as $k=>$v) {
            print "$v\n";
            print "$msgStrs[$k]\n\n";
        }
    */


        foreach ($files as $file) {
            print "Processing $file ...\n";
            $fData = file_get_contents ($file);
            $newFileData = preg_replace ($msgIDs, $msgStrs, $fData);
            file_put_contents ($file, $newFileData);
        }
  • Script has been fixed ... David has the final result ...

    Greetings
    R
  • @Chris
    Thanks very much for the input. Sorry I didn't respond earlier, but I've been working with Robert on this (thanks, Robert icon_wink ), and have since been reviewing the generated distribution with all the reviewed lang strings incorporated. I need to do a final review of the result and deliver it to Drak within the next 24 hours or so.

    But, in a few days, I plan to revisit this script, because it can be a pretty useful tool for translators, the international communities and 3rd-party devs. Once it's in shape, it could probably provide a framework for another script that harmonises the usage of single quotation marks and double quotation marks throughout the code base, which might perhaps be useful as well. Regular expressions will certainly be part of that.

    I'd like to clean-up the points you posted about, so I'll be coming back to this thread and would be grateful for more input (code and pseudo code would be very welcome).

    So, more very soon. Meanwhile, here's the script as it currently stands:

    Code

    /**
     * This is a little script to take a Zikula '.po' file
     * and insert the language strings back into the
     * source code files, so that the translated strings
     * in the '.po' file become the gettext ID strings
     * within the code files.
     * It's a tool for language reviews and obtaining
     * Zikula perfectly localised in the language of your
     * Choice.
     *
     * You should run this tool on a local system. It uses
     * too much memory to run it in a remote Web space.
     * Put this script and a '.po' file in the root directory
     * of a Zikula distribution.
     * Then run this command from the command line:
     *
     * php replacepo.php
     *
     * It will examine every file in the tree structure of the
     * distribution and perform the string replacement operations
     * on each file. You'll need plenty of memory allocated
     * for scripts (set to 128M below). Remember that you'll
     * need write permissions for all files (on Linux, you
     * might need to be logged in as root or, better still,
     * prefix the above command with 'sudo' to simply run
     * the script as root user.
     *
     * If you want to process a '.po' file other than 'zikula.po'
     * then just edit the file name on line 46 below.
     *
     * Limitations: The script does not handle plural strings.
     * You'll still have to do those by hand. This is a feature
     * that should be added.
     *
     * Contact: David, commerce@traduction.biz
     * License: GNU/GPL - http://www.gnu.org/copyleft/gpl.html
     */


    // Set the script's memory requirement and then include the Zikula base API.
    ini_set('memory_limit', '128M');
    require_once 'includes/DataUtil.class.php';
    require_once 'includes/debug.php';

            // Load the file we're going to process.
            $poFile  = file_get_contents ('zikula.po');
            // Break the file down into an array of individual lines.
            $poLines = explode ("\n", $poFile);

            // Declare an array to hold the list of file names that contain a given language string.
        $files   = array();
            // Declare an array of strings for the message IDs.
        $msgIDs  = array();
            // Declare an array of strings for the message strings (the actual language strings).
        $msgStrs = array();
            // Declare a string to hold the counter keeping track of our position within the string we'll be parsing.
        $count   = 0;

            // Start a loop of examining each string containing one line in the input '.po' file.
        foreach ($poLines as $poLine) {
            $tMsgID = '';
                // Trim any white space off the beginning and end of the string.
            $poLine = trim($poLine);

                // If the line is one of the comment lines preceding each language string 'msgid' then
                // that line gets broken into two fields: the comment tag and a file name.
                // Each file name is a file contains that particular 'msgid'.
            if (strpos ($poLine, '#:') === 0) {
                $fields = explode (' ', $poLine);

                    // Just drop the comment tag. We don't need it.
                unset ($fields[0]);

                    // Analyse the string containing the file name and remove the line number.
                    // We just want the file name (including the path).
                    // Then we add the file name to an array of file names that contain the 'msgid'.
                foreach ($fields as $field) {
                    $fields2 = explode (':', $field);
                    $file = $fields2[0];
                    $files[$file] = $file;
            }

                // Else if the line delimits the start of a plural-case string ID or
                // one of the plural cases of the string, then just ignore it.
                // This script does not yet handle plural cases.
            } elseif (strpos ($poLine, 'msgid_plural') === 0 || strpos($poLine, 'msgstr[') === 0) {
                continue;

                // Else if the line begins with an 'msgid' then take the double-quote-bounded
                // string that follows immediately after, break it down into an array of characters.
            } elseif (strpos ($poLine, 'msgid') === 0) {

                    // Get the line after the leading "msgid ". This leaves us with a double-quoted string.
                $msg = substr (trim($poLine), 6);
           
            // Split the string into an array of characters.
            $chars = str_split ($msg);

            // Remove the trailing double quotation marks.
            unset ($chars[count($chars)-1]);

            // Remove the leading double quotation marks.
            unset ($chars[0]);

            // Re-assemble the line, without the leading and trailing double quotation marks.
            $msg = implode('', $chars);

            // 1st version of string is double-quoted.
            $_msgID1 = '"' . $msg . '"';

            // 2nd version of string is single-quoted.
            $_msgID2 = "'" . $msg . "'";

            // Now build all the versions we might encounter in the code.
            $msgID1 = '__(' . $_msgID1 . ')';
            $msgID2 = '__(' . $_msgID2 . ')';
            $msgID3 = '__f(' . $_msgID1;
            $msgID4 = '__f(' . $_msgID2;
            $msgID5 = 'gt text=' . $_msgID1;
            $msgID6 = 'gt text=' . $_msgID2;

                // Else if the line begins with an 'msgstr' then it's the string associated with the 'msgid'.
            } elseif (strpos ($poLine, 'msgstr') === 0) {
                if (!$count++) {
                    continue;
            }

                    // Get the line after the leading "msgid ". This leaves us with a double-quoted string.
            $msg = substr (trim($poLine), 7);

            // Split the string into an array of characters.
            $chars = str_split ($msg);

            // Remove the trailing double quotation marks.
            unset ($chars[count($chars)-1]);

            // Remove the leading double quotation marks.
            unset ($chars[0]);

            // Re-assemble the line, without the leading and trailing double quotation marks.
            $msg = implode('', $chars);

            // 1st version of string is double-quoted.
            $_msgString1 = '"' . $msg . '"';

            // 2nd version of string is single-quoted, with single quotes escaped in the string.
            $_msgString2 = "'" . _addslashes($msg) . "'";

            // Now build all the versions we might encounter in the code.
            $msgString1 = '__(' . $_msgString1 . ')';
            $msgString2 = '__(' . $_msgString2 . ')';
            $msgString3 = '__f(' . $_msgString1;
            $msgString4 = '__f(' . $_msgString2;
            $msgString5 = 'gt text=' . $_msgString1;
            $msgString6 = 'gt text=' . $_msgString2;

            // Now build the arrays we use for the 'str_replace' performed on the code.
                $msgIDs[] = $msgID1;
                $msgStrs[] = $msgString1;
                $msgIDs[] = $msgID2;
                $msgStrs[] = $msgString2;
                $msgIDs[] = $msgID3;
                $msgStrs[] = $msgString3;
                $msgIDs[] = $msgID4;
                $msgStrs[] = $msgString4;
                $msgIDs[] = $msgID5;
                $msgStrs[] = $msgString5;
                $msgIDs[] = $msgID6;
                $msgStrs[] = $msgString6;
            }
        }

        //foreach ($msgIDs as $k=>$v) {
            //print "$v\n";
            //print "$msgStrs[$k]\n\n";
        //} exit();


        foreach ($files as $file) {
            print "Processing $file ...\n";
            $fData = file_get_contents ($file);
            $newFileData = str_replace ($msgIDs, $msgStrs, $fData);
            file_put_contents ($file, $newFileData);
        }


        function _addslashes ($string)
        {
            return str_replace ("'", "\'", $string);
        }
  • @Chris
    I'll also be looking carefully at the code you posted. Thanks for that! icon_smile

This list is based on users active over the last 60 minutes.