Selecting text encoding when opening and saving files. Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint File Formats Selecting an encoding when saving a file

When you open text file V Microsoft Word or another program (for example, on a computer whose operating system language is different from the one in which the text in the file is written), encoding helps the program determine in what form the text should be displayed on the screen so that it can be read.

In this article

Understanding text encoding

The text that appears as text on the screen is actually stored as numeric values ​​in a text file. The computer translates numeric values ​​into visible symbols. An encoding standard is used for this.

An encoding is a numbering scheme in which each text character in a set corresponds to a specific numeric value. The encoding may contain letters, numbers and other symbols. Different languages ​​often use different character sets, so many of the existing encodings are designed to represent the character sets of their respective languages.

Different encodings for different alphabets

The encoding information saved with the text file is used by the computer to display text on the screen. For example, in the "Cyrillic (Windows)" encoding, the character "Y" corresponds to the numeric value 201. When you open a file containing this character on a computer that uses the "Cyrillic (Windows)" encoding, the computer reads the number 201 and displays "Y" sign.

However, if the same file is opened on a computer that uses a different encoding by default, the character corresponding to the number 201 in this encoding will be displayed on the screen. For example, if the computer uses the "Western European (Windows)" encoding, the character "Y" from the source text file based on the Cyrillic alphabet will be displayed as "É", since this is the character that corresponds to the number 201 in this encoding.

Unicode: a single encoding for different alphabets

To avoid problems with encoding and decoding text files, you can save them in Unicode. This encoding includes most characters from all languages ​​that are commonly used on modern computers.

Since Word is based on Unicode, all files in it are automatically saved in this encoding. Unicode files can be opened on any computer with an operating system English language regardless of the language of the text. In addition, on such a computer you can save files in Unicode that contain characters that are not in Western European alphabets (for example, Greek, Cyrillic, Arabic or Japanese).

Selecting encoding when opening a file

If in open file the text is distorted or appears as question marks or squares; Word may have incorrectly determined the encoding. You can specify the encoding to be used for displaying (decoding) text.

    Open the tab File.

    Click the button Options.

    Click the button Additionally.

    Go to section Are common and check the box Confirm file format conversion when opening.

    Note: When this check box is selected, Word displays a dialog box File Conversion Whenever you open a file in a format other than Word (that is, a file that does not have a DOC, DOT, DOCX, DOCM, DOTX, or DOTM extension). If you work with these files frequently but don't usually need to select an encoding, be sure to disable this option to prevent this dialog box from appearing.

    Close and then reopen the file.

    In the dialog box File Conversion select item Coded text.

    In the dialog box File Conversion set the switch Other and select the desired encoding from the list.

    In area Sample

If almost all of the text looks the same (for example, squares or dots), your computer may not have the correct font installed. In this case, you can install additional fonts.

To install additional fonts, do the following:

    Click the button Start and select Control Panel.

    Do one of the following:

    On Windows 7

    1. In Control Panel, select the item Uninstalling programs.

      Change.

    IN Windows Vista

      In the control panel, select the section Uninstalling a program.

      In the list of programs, click Microsoft Office or Microsoft Word if it was installed separately from Microsoft Office, and click Change.

    On Windows XP

      In Control Panel, click Installation and removal of programms.

      On the list Installed programs Click Microsoft Office, or Microsoft Word if it was installed separately from Microsoft Office, and click Change.

    In Group Change Microsoft installations Office click the button Add or remove components and then click the button Continue.

    In chapter Installation options expand the element Office Common Tools, and then - Multi-language support.

    Select the font you want, click the arrow next to it and select Run from my computer.

Advice: When opening a text file in one encoding or another, Word uses the fonts defined in the dialog box Web Document Options. (To bring up the dialog box Web Document Options, press Microsoft Office button, then click Word Options and select a category Additionally. In chapter Are common click the button Web Document Options.) Using the options on the tab Fonts dialog box Web Document Options You can customize the font for each encoding.

Selecting encoding when saving a file

If you do not select an encoding when saving the file, Unicode will be used. In general, Unicode is recommended because it supports most characters in most languages.

If you plan to open the document in a program that does not support Unicode, you can select the desired encoding. For example, in operating system in English, you can create a document in Chinese (traditional script) using Unicode. However, if such a document will be opened in a program that supports Chinese but does not support Unicode, the file can be saved in the "Chinese Traditional (Big5)" encoding. As a result, the text will display correctly when you open the document in a program that supports Traditional Chinese.

Note: Because Unicode is the most comprehensive standard, some characters may not appear when saving text in other encodings. For example, suppose that a Unicode document contains text in both Hebrew and Cyrillic. If you save the file in the "Cyrillic (Windows)" encoding, the Hebrew text will not be displayed, and if you save it in the "Hebrew (Windows)" encoding, the Cyrillic text will not be displayed.

If you select an encoding standard that doesn't support some characters in the file, Word will mark them in red. You can preview the text in the selected encoding before saving the file.

When you save a file as encoded text, the text for which the Symbol font is selected, as well as the field codes, are removed from the file.

Encoding selection

    Open the tab File.

    In field File name enter a name for the new file.

    In field File type select Plain text.

    If a dialog box appears Microsoft Office Word - Compatibility Check, press the button Continue.

    In the dialog box File Conversion select the appropriate encoding.

    • To use standard encoding, select the option Windows (default).

      To use MS-DOS encoding, select the option MS-DOS.

      To set a different encoding, select the radio button Other and select the desired item from the list. In area Sample you can preview the text and check if it displays correctly in the selected encoding.

      Note: To increase the document display area, you can resize the dialog box File Conversion.

    If the message "Text highlighted in red cannot be saved correctly in the selected encoding" appears, you can select a different encoding or check the box Allow character substitution.

    If character substitution is enabled, characters that cannot be displayed will be replaced with the nearest equivalent characters in the selected encoding. For example, an ellipsis is replaced by three dots, and corner quotes are replaced by straight ones.

    If the selected encoding does not have equivalent characters for the characters highlighted in red, they will be stored as out-of-context (for example, as question marks).

    If the document will be opened in a program that does not wrap text from one line to another, you can enable hard line breaks in it. To do this, check the box Insert line breaks and specify the break symbol you want (carriage return (CR), line feed (LF), or both) in the End lines.

Finding encodings available in Word

Word recognizes multiple encodings and supports encodings that are included with the system software.

Below is a list of scripts and their associated encodings (code pages).

Writing system

Encodings

Font used

Multilingual

Unicode (UCS-2 little endian, UTF-8, UTF-7)

Standard font for the "Normal" style of the localized version of Word

Arabic

Windows 1256, ASMO 708

Chinese (Simplified)

GB2312, GBK, EUC-CN, ISO-2022-CN, HZ

Chinese (traditional script)

BIG5, EUC-TW, ISO-2022-TW

Cyrillic

Windows 1251, KOI8-R, KOI8-RU, ISO8859-5, DOS 866

English, Western European and others based on the Latin alphabet

Windows 1250, 1252-1254, 1257, ISO8859-x

Greek

Japanese

Shift-JIS, ISO-2022-JP (JIS), EUC-JP

Korean

Wansung, Johab, ISO-2022-KR, EUC-KR

Vietnamese

Indian: Tamil

Indian: Nepali

ISCII 57002 (Devanagari)

Indian: Konkani

ISCII 57002 (Devanagari)

Indian: Hindi

ISCII 57002 (Devanagari)

Indian: Assamese

Indian: Bengali

Indian: Gujarati

Indian: Kannada

Indian: Malayalam

Indian: Oriya

Indian: Marathi

ISCII 57002 (Devanagari)

Indian: Punjabi

Indian: Sanskrit

ISCII 57002 (Devanagari)

Indian: Telugu

    To use Indian languages, you need to support them in the operating system and have the appropriate OpenType fonts.

    Only limited support is available for Nepali, Assamese, Bengali, Gujarati, Malayalam and Oriya.

When solving everyday IT problems, such as network administration and user support, various files are often used, especially documents compiled in text editors. Unfortunately, built-in Windows tools allow you to work with documents only as files; standard tools do not handle internal Word data, such as document type conversion.

I've put together a WSH (Windows Script Host) script called ConvertWord that is used as a command shell for Microsoft Word and makes working with documents easier. In addition, the script can be useful for testing incorrect documents.

ConvertWord requirements

To use ConvertWord, you must have Word 97 or later installed on your computer text editor. The full source text of ConvertWord can be downloaded from our magazine's Web site. Excerpts from the ConvertWord script are below. The convertword.wsf and convertword.cmd files should be saved in the same folder.

ConvertWord can automatically use any converter file formats, implemented in Word. Word comes with a basic set of file format converters for standard documents. However, this set does not include special converters, for example for Microsoft documents Works or WordPerfect. To obtain these and other optional converters, you must run special installation Word.

The standard Word converters included in the Microsoft Office Resource Kits can be downloaded from the Office 2003 Editions Resource Kit page at http://www.microsoft.com/office/ork/2003/default.htm . The converters in the resource pack are compatible with Word 97 and newer versions of the editor. After installing the resource pack, you should go to the created directory (\%programfiles%orktools by default) and find the file with the converter set (oconvpck.exe), and then run oconvpck.exe on all computers on which you want to deploy the converters.

Purpose of ConvertWord

The original purpose of creating ConvertWord was to perform some tasks that were not possible with the Batch Conversion Wizard Word editor. The Batch Conversion Wizard is a useful addition to any administrator's toolkit. The wizard is a Word template that converts one input format to one output format. More detailed information about such conversion can be found in the Microsoft article “How to automatically convert many documents to Word 2002 format” at http://support.microsoft.com/?kbid=313714.

The Batch Conversion Wizard does many things, but is not optimized for some of them, such as remote administration or automation simple transformations for end users sharing documents on separate network nodes. ConvertWord can help you solve these distributed conversion problems by performing the following basic operations.

  • Sends a request to the system about the version of Word it has.
  • Automatically opens lists of mixed document types of arbitrary length.
  • Guaranteed to save documents with unique names in Word (default) or other formats.
  • Tests documents to look for formatting problems and incorrect user passwords.

How ConvertWord works

The ConvertWord conversion process is a four-step process. In the first step, the script generates an instance Word applications as shown in the snippet listing 1 labeled A. Part of the script source is designed to reduce the number of dialog boxes as much as possible. For example, source text labeled B blocks dialog boxes, when it's possible.

In the second step, ConvertWord opens each document. Word object contains a set of Documents; when calling Open method this set (fragment labeled A in listing 2) the document is retrieved. If you know the name of the document and want Word to automatically detect its format, you can call the method with only the document name as an argument.

Or you can specify the document format as another parameter to the Open method. Unfortunately, depending on the version of Word, the Open method requires up to 16 parameters. Since the format control parameter is located in tenth place, the previous nine parameters must be specified. The result is a long, unwieldy string. Information about the parameters can be obtained at http://msdn.microsoft.com/library/default.asp?url=/library/enus/dv_wrcore/html/wrconwordobjectmodeloverview.asp or in Word Help.

ConvertWord options are FileName, ConfirmConversions, ReadOnly, AddToRecentFiles, PasswordDocument, PasswordTemplate, Revert, WritePasswordDocument, WritePasswordTemplate and Format. The FileName parameter is the file name of the Word document. You can use the ConfirmConversions parameter to display a dialog box when Word converts an open document. In ConvertWord, this parameter is always set to False to facilitate automation.

The ReadOnly parameter controls the process of opening a document read-only; ConvertWord always assigns this parameter True to keep the original document unchanged. AddToRecentFiles determines whether the open document will be added to the RecentFiles list current user. The document may be one of tens or even hundreds, so adding it to the list is not recommended and the parameter is set to False.

PasswordDocument is the password for opening protected documents, and PasswordTemplate is the password for templates. These values ​​are not useful for non-Word documents, so instead of any parameter, you can specify two double quotes("") indicating an empty string. The Revert parameter determines whether the script will return to the current open version document if the document to be converted is already open. ConvertWord sets this parameter to True to avoid losing changes and to only activate the open instance of the document.

The WritePasswordDocument and WritePasswordTemplate parameters specify the passwords required to save an open document or template. For the purposes of this article, these parameters are optional because ConvertWord does not overwrite the original document; so the script specifies "" for each of these arguments.

Finally, the Format parameter is a number that indicates the method Word uses to determine the format of the open document. Getting the number right is not easy because numbers and the methods they represent depend on installed version Word, additional document converters and installation procedures. Let's say we need to open and convert an RTF (Rich Text Format) document with format opening code 3. To open the sample document using a standard RTF converter, use the following procedure:

Set doc = Word.Documents._

Open("c:my.rtf", False, _

True, False, "", "", _

True, "", "", 3)

Some lines of source code in this article are split into multiple lines due to space constraints. A list of additional document converters with corresponding numbers and standard extensions can be found using the FileConverters object set. The source code in Listing 3 shows a list of these converters. There are no standard Word converters listed. A list of standard Word converters can be found at table 1 and in Word Help.

The CreateFormatCollections routine of the ConvertWord script displays a list of Word converters. Although the script somewhat simplifies the task of determining open and save formats, the format used to open or save a document depends on the version of Word and how the converters are installed.

After opening the document a new version saved using the SaveAs method (fragment labeled A in listing 4). The SaveAs method takes up to 16 parameters, but we only need two since the required SaveFormat parameter is the second one. As with the OpenFormat parameters, you must specify the format codes for the document you are opening in the SaveFormat parameter. To specify the saving format - for example, to save the document in a purely text file C:my.txt - you should enter the command

doc.SaveAs "C:my.txt", 2

After saving the document, ConvertWord closes it using the Close method (label B in Listing 4). False value specifies that Word should discard changes if the document has been changed since it was saved. Once the script has sequentially opened, saved, and closed all documents, the final step is to exit Word by calling Word's Quit method ( listing 5).

Application of ConvertWord

Before launching ConvertWord for the first time, it is useful to familiarize yourself with information about your local version of Word by running the command

convertword/version

This command shows important information, including the version number of Word installed on the machine. Microsoft stopped putting the version number in the product name starting with Office 95 (which would have been called Office 7), but the internal version number increases by 1 with each subsequent release significantly updated version. The same numbering scheme is used in Word as a component of the Office suite. Internal version numbers are 8 (Word 97), 9 (Word 2000), 10 (Word 2002), and 11 (Word 2003).

By default, ConvertWord automatically opens files, makes educated guesses about their format (e.g. Word, Plain Text, WordPerfect, RTF) and saves them as Word documents, giving them unique names consisting of the file name, an underscore, and a number. ConvertWord provides several ways to name documents. The file name can be entered as an argument to the following command:

convertword unicode.txt plain.txt

Otherdocscorel.wps

This approach results in output Word files being saved as unicode.doc, plain.doc, and otherdocscorel.doc. Another option is to configure ConvertWord to read files from a standard source, like this:

convertword

The results of a command that creates a list of files can be sent to ConvertWord as follows:

dir /s /b c:inbox*.txt

| convertword

If no input is specified, ConvertWord asks for input document names until you press Ctrl+C twice.

ConvertWord has a simple method that allows you to avoid overwriting files that have the same name. Let's say you want to save a Word file as a text file named mylist.txt. If a file with the same name already exists, ConvertWord begins to iterate through the sequence of derived names - mylist_1.txt, mylist_2.txt, etc. - until an unused name is found. This name is then assigned to the saved file. Typically, searching for a file name takes less time than manually opening and saving a document.

Changing the storage location and file name

ConvertWord saves files in the same folder as the original file, with the same base name. This way, when converting files for many users or groups of users, the new files will be placed next to the old ones. Typically, users know “their” files and remember their names.

However, the directory for recording converted documents can be changed. To do this, just specify the /d key with full name, which can be absolute or relative to the path pointing to the folder in which the script is running. ConvertWord expands the path to the full format and creates a corresponding directory if it does not already exist.

convertword /d:c: empexports

You can change the base name (file name without extension) using the /b switch. If ConvertWord encounters multiple files with the same name, then ConvertWord changes the file names as explained above. You can also use the /x switch to specify a file extension other than the standard extension of the exported file type.

Creating non-Word documents

By default, ConvertWord automatically generates Word documents. If you want to create a document other than Word, you can use the /sa option in the ConvertWord utility to change the default save format. The formats in which you can save files vary depending on the version of Word and additional converters available on the system on which ConvertWord runs. The first step when saving a file in a specific format is to launch Word with the /cnv switch to view installed converters; The converter number corresponds to the type in which you want to save the new file. If all files need to be saved in a specific format, for example RTF (number 6), then the /sa:6 switch should be added to the ConvertWord arguments. For example, to convert all WordPerfect files in the current folder to RTF, you would run the command

dir /s /b *.wpd

| convertword/sa:6

Depending on the version of Word and installed converters, the number of available formats can be large. You should always check the types before converting files, as their numbers will vary from machine to machine. The only exception to this annoying rule is the standard built-in Word converters. Word 97 and later versions have the same values ​​from 0 to 6, and the standard type numbers increase as new versions are added. For Word 2003, numbers from 0 to 11 will be the same on all machines. The exception to the standard values ​​is pin numbered -1. This value does not correspond to the Word converter, but is used as a ConvertWord command to write data from a document file to the console. It can be set using the key /sa - /sa:-1 or /sa+.

Error processing

During large-scale conversion operations, some files may experience problems. You need a way to track documents that have failed to convert. If the file cannot be converted, ConvertWord passes the file name and descriptive information to the standard error stream (StdErr); An administrator can track failures by watching file names scroll across the screen, or by redirecting error data to a file for later analysis, for example:

Errors.txt

By default, ConvertWord shows errors by giving only the file name and error number:

c:demo.rtf FAILED: 2

Using the /v+ switch (verbose output) you can get more detailed information about the error:

convertword
/v+>errors.txt

The /v- switch does not display error numbers; instead, the filename is simply passed to StdErr to make subsequent processing easier.

The last error detected by ConvertWord is always accepted as the final error level; Once the script has finished running, this value is available in the command environment and can be read by another script, which will determine whether the call to ConvertWord succeeded or failed.

To detect potential errors without converting documents, you can run ConvertWord with the /w (what if) switch. This key causes ConvertWord to open all documents without saving them. If something goes wrong with any of the files, such as internal data being corrupted, a normal error message will be displayed.

Solving the password problem

Passwords are especially problematic when processing in batches because they can be different for different documents. By default, ConvertWord uses the space character as the password, which opens all documents without passwords, but documents with passwords generate an error that does not stop further processing.

This behavior can be changed using the /p (password) switch. If you specify an empty argument (for example, /p:""), Word prompts you to enter a password for all protected documents. With the /p switch you can specify a specific password. However, you will not be able to open documents without a password or with a password different from the one specified.

Practical application of ConvertWord

I've done approximately 30K conversions using ConvertWord and found several typical problems. Unusual crashes were almost invariably caused by Word automation errors; The error number and message in most cases came from Word. Most errors (such as an incorrect password) are not difficult to resolve or understand. The following three mistakes were repeated quite regularly.

The first is Word's pop-up dialog box for documents containing macro commands. By default, ConvertWord blocks macro commands in documents to protect the user from dangerous program code. However, when Word opens documents that contain macros, a dialog box appears telling you that macros are blocked. The only way I know of to eliminate this window is to enable macro commands. You can do this by launching ConvertWord and specifying the /as (automation security) key with a value of 0 (/as:0). This is the default value for programmatically opened Word documents. Before using the /as switch, you must make sure that the document you are opening does not contain dangerous program code.

The second error is related to some RTF documents that cannot be successfully opened, but still display correctly in WordPad. They are usually not formatted correctly and cannot be opened correctly in Word. ConvertWord is unable to resolve this problem, so ConvertWord cannot be used to convert such files.

The third error occurs because Word identifies text documents Unicode by starting Byte Order Mark in the file. If there is no mark, then Word treats the document as plain text, and when opening the converted document, the user will see spaces after each visible character (the spaces actually correspond to null characters). The only way The solution to the problem is to convert files with the /oa (OpenAs) switch set to Encoded or Unicode text (/oa:5 for Word 97 and later).

Fortunately, such errors occur relatively infrequently. ConvertWord will be extremely useful for processing a large number of documents and will help you open and convert Word documents without tedious manual work.

Network Specialist Consulting in Indiana. He has MCSE, MCP+I and MVP certificates.

FOR THOSE WHO HAVE OLD MICROSOFT OFFICE INSTALLED SUCH AS 97, 2003, 2007 and 2010

FOR COMPATIBILITY OF OFFICE FORMATS, PLEASE INSTALL THE APPROPRIATE SOFTWARE.

so that new formats open in old versions

Review

For users Word programs, Excel, or PowerPoint Microsoft Office XP and 2003 packages: Before you download the Compatibility Pack install high priority updates from the website Microsoft Update before you download the compatibility pack By installing the Compatibility Pack as an add-on to Microsoft Office 2000, Office XP, or Office 2003, you can open, edit, and save files in the new file formats used in the latest versions of Word, Excel, and PowerPoint. You can also use the Compatibility Pack with Microsoft Office Word 2003, Excel 2003, and PowerPoint 2003 viewers to view files saved in new formats. For more information about the Compatibility Pack, see the Knowledge Base article.

Note. If you use Microsoft Word 2000 or Microsoft Word 2002 to read or write documents that contain complex characters, you should refer to the information in this article to ensure that Word documents display correctly in newer versions of the application.

Administrators: You can download the administrative template for Word, Excel, and PowerPoint converters included in the Compatibility Pack.

Update. The Microsoft Office Compatibility Pack has been updated to include Service Pack 2 (SP2). Now, if DOCX or DOCM files contain custom XML tags, then the tags are removed when the file is opened in Word 2003. For more information, see KB978951

System requirements

  • OS: Windows 2000 Service Pack 4, Windows Server 2003, Windows Vista, Windows Vista Service Pack 1, Windows XP Service Pack 1, Windows XP Service Pack 2, Windows XP Service Pack 3
    Windows 7;Windows Server 2008
  • Microsoft Word 2002 SP3, Microsoft Excel 2002 SP3, and Microsoft PowerPoint 2002 SP3
  • Microsoft Office Word 2003 SP1 or later, Microsoft Office Excel 2003 SP1 or later, and Microsoft Office PowerPoint 2003 SP1 or later
  • Microsoft Office Word 2003 Viewer.
  • Microsoft Office Excel 2003 Viewer
  • Microsoft Office PowerPoint 2003 Viewer

Instructions

Installing the update

  1. Make sure your system is up to date by installing high priority updates and required updates downloaded from the Microsoft Update website (required for Microsoft users Office XP and 2003).
  1. After installing High Priority Updates and Required Updates from the Microsoft Update website, download the Compatibility Pack by clicking the button above and saving the file to your hard drive.
  1. To run the installer, double-click the saved file on your hard drive. executable file FileFormatConverters.exe.
  1. Complete the installation by following the onscreen instructions.

Deleting a download file