sponsor Vim development Vim logo Vim Book Ad

AutoFenc.vim : Tries to automatically detect file encoding

 script karma  Rating 53/21, Downloaded by 2216    Comments, bugs, improvements  Vim wiki

created by
Petr Zemek
 
script type
utility
 
description
This script tries to automatically detect and set file encoding when opening a file in Vim. It does this in several possible ways (according to the configuration) in this order (when a method fails, it tries the following one):
  (1) detection of BOM (byte-order-mark) at the beginning of the file, only for some multibyte encodings
  (2) HTML way of encoding detection (via <meta> tag), only for HTML based file types
  (3) XML way of encoding detection (via <?xml ... ?> declaration), only for XML based file types
  (4) CSS way of encoding detection (via @charset 'at-rule'), only for CSS files
  (5) checks whether the encoding is specified in a comment (like '# Encoding: latin2'), for all file types
  (6) tries to detect the encoding via specified external program (the default one is enca), for all file types

If the autodetection fails, it's up to Vim (and your configuration) to set the encoding.

Configuration options for this plugin (you can set them in your $HOME/.vimrc):
- g:autofenc_enable (0 or 1, default 1)
     Enables/disables this plugin.
- g:autofenc_emit_messages (0 or 1, default 0)
     Emits messages about the detected/used encoding upon opening a file.
- g:autofenc_max_file_size (number >= 0, default 10485760)
     If the size of a file is higher than this value (in bytes), then the autodetection will not be performed.
- g:autofenc_disable_for_files_matching (regular expression, see below)
     If the file (with complete path) matches this regular expression, then the autodetection will not be performed. It is by default set to disable autodetection for non-local files (e.g. accessed via ftp, scp etc., because the script can't handle some kind of autodetection for these files). The regular expression is matched case-sensitively.
- g:autofenc_disable_for_file_types (list of strings, default [])
     If the file type matches some of the filetypes specified in this list, then the autodetection will not be performed. Comparison is done case-sensitively.
- g:autofenc_autodetect_bom (0 or 1, default 0 if 'ucs-bom' is in 'fileencodings', 1 otherwise)
     Enables/disables detection of encoding by BOM.
- g:autofenc_autodetect_html (0 or 1, default 1)
     Enables/disables detection of encoding for HTML based documents.
- g:autofenc_autodetect_html_filetypes (regular expression, see below)
     Regular expression for all supported HTML file types.
- g:autofenc_autodetect_xml (0 or 1, default 1)
     Enables/disables detection of encoding for XML based documents.
- g:autofenc_autodetect_xml_filetypes (regular expression, see below)
     Regular expression for all supported XML file types.
- g:autofenc_autodetect_css (0 or 1, default 1)
     Enables/disables detection of encoding for CSS documents.
- g:autofenc_autodetect_css_filetypes (regular expression, see below)
     Regular expression for all supported CSS file types.
- g:autofenc_autodetect_comment (0 or 1, default 1)
     Enables/disables detection of encoding in comments.
- g:autofenc_autodetect_commentexpr (regular expression, see below)
     Pattern for detection of encodings specified in a comment.
- g:autofenc_autodetect_num_of_lines (number >= 0, default 5)
     How many lines from the beginning and from the end of the file should be searched for the possible encoding declaration.
- g:autofenc_autodetect_ext_prog (0 or 1, default 1)
     Enables/disables detection of encoding via external program (see additional settings below).
- g:autofenc_ext_prog_path (string, default 'enca')
     Path to the external program. It can be either relative or absolute. The external program can take any number of arguments, but the last one must be a path to the file for which the encoding is to be detected (it will be supplied by this plugin). Output of the program must be the name of encoding in which the file is saved (string on a single line).
- g:autofenc_ext_prog_args (string, default '-i -L czech')
     Additional program arguments (can be none, i.e. '').
- g:autofenc_ext_prog_unknown_fenc (string, default '???')
     If the output of the external program is this string, then it means that the file encoding was not detected successfully. The string must be case-sensitive.
- g:autofenc_enc_blacklist (regular expression, default '')
     If the detected encoding matches this regular expression, it will be ignored.

Requirements:
- filetype plugin must be enabled (a line like 'filetype plugin on' must be in your $HOME/.vimrc [*nix] or %UserProfile%\_vimrc [MS Windows])

Notes:
  This script is by all means NOT perfect, but it works for me and suits my needs very well, so it might be also useful for you. Your feedback, opinion, suggestions, bug reports, patches, simply anything you have to say is welcomed!

  There are similar plugins to this one, so if you don't like this one, you can test these:
   - FencView.vim (http://www.vim.org/scripts/script.php?script_id=1708)
     Mainly supports detection of encodings for asian languages.
   - MultiEnc.vim (http://www.vim.org/scripts/script.php?script_id=1806)
     Obsolete, merged with the previous one.
   - charset.vim (http://www.vim.org/scripts/script.php?script_id=199)
     Not very complete/correct and last update in 2002.
   - http://vim.wikia.com/wiki/Detect_encoding_from_the_charset_specified_in_HTML_files
     Same basic ideas but only for HTML files.
  Let me know if there are others and I'll add them here.
 
install details
Put this file into your $HOME/.vim/plugin directory [*nix] or %UserProfile%\vimfiles\plugin folder [MS Windows].
 

rate this script Life Changing Helpful Unfulfilling 
script versions (upload new version)

Click on the package to download.

package script version date Vim version user release notes
AutoFenc.vim 1.5 2012-03-17 7.0 Petr Zemek Thanks to Ingo Karkat for the updates in this version.
- Supported HTML/XML/CSS file types have been made configurable and added more defaults.
- Do not emit the "unrecognized charset" message when the encoding is known.
AutoFenc.vim 1.4 2012-03-11 7.0 Petr Zemek Thanks to Ingo Karkat for the updates in this version.
- Improved the detection regexp for comments:
    - added "fileencoding" and "charset";
    - demands that there is a whitespace in front of the keyword, so that "daycoding" doesn't match;
    - g:autofenc_autodetect_commentexpr allows to configure the pattern for comment detection.
- Introduced g:autofenc_enc_blacklist to disable some encodings. For example, the enca tool has a tendency to detect plain text files as UTF-7. With the blacklist, AutoFenc can be instructed to ignore those encodings.
- The check for ASCII is set to be case-insensitive because enca reports this in uppercase, so the condition fails unless ignorecase is set.
- Keeps changed CWD with 'autochdir' setting by temporarily disabling it. For example, suppose that a user has ":lcd .." in after/ftplugin/gitcommit.vim and that he is in the Git root directory, not the .git subdir when composing a commit message. The reload of the buffer by AutoFenc (via :edit) again triggered the automatic change of the working dir, and therefore the customization was lost. The 'autochdir' setting needs to be temporarily disabled to avoid that.
- Added a support for plain Vim 7.0 in the shellescape() emulation from version 1.3.4. Otherwise, there were errors in Vim 7.0.
AutoFenc.vim 1.3.4 2012-02-27 7.0 Petr Zemek - Don't override when the user explicitly sets file encoding with ++enc (thanks to Benjamin Fritz).
- Fixed TOhtml version detection (again) and made sure line continuations can actually be used (thanks to Benjamin Fritz and Ingo Karkat).
- Disabled the option shellslash on Windows before calling shellescape() (it may cause problems on Windows, thanks for the tip goes to Benjamin Fritz).
AutoFenc.vim 1.3.3 2011-11-29 7.0 Petr Zemek Thanks to Ingo Karkat for the updates in this version.
- Fixed a problem in the TOhtml detection when, for example, g:loaded_2html_plugin = 'vim7.3_v6'.
- The return code of the call of an external program via system(ext_prog_cmd) is now checked. This prevents Vim interpreting an error message as an encoding.
- shellescape() is now used instead of quoting file_path manually.
AutoFenc.vim 1.3.2 2011-11-24 7.0 Petr Zemek Thanks to Benjamin Fritz for the updates in this version.
- Fixed the detection of the version of the TOhtml plugin.
AutoFenc.vim 1.3.1 2011-07-23 7.0 Petr Zemek Thanks to Benjamin Fritz for the updates in this version.
- Fixed the plugin behavior when reloading a file with different settings.
AutoFenc.vim 1.3 2011-04-22 7.0 Petr Zemek Thanks to Benjamin Fritz for the updates in this version.
  - Added support for HTML version 5 encoding detection.
  - The script now dies gracefully in old Vims.
  - 'g:autofenc_autodetect_comment_num_of_lines' renamed to 'g:autofenc_autodetect_num_of_lines'
AutoFenc.vim 1.2.1 2011-04-13 7.0 Petr Zemek Fixed a typo in a variable name (this resulted in an error in some occasions). Thanks to Charles Lee for pointing this bug out.
AutoFenc.vim 1.2 2011-03-31 7.0 Petr Zemek Thanks to Benjamin Fritz for the updates in this version.
  - TOhtml's IANA name/Vim encoding conversion functions are now used.
  - Changed BOM detection so it does not duplicate a check Vim already did by default (i.e. default to off if ucs-bom is in the 'fileencodings').
  - Put autocmds in the AutoFenc augroup for easier handling.
  - Made autocmd nested so we don't need to worry about restoring everything that other autocmds may set (e.g. syntax).
  - Jumplist or cursor position during detection are not affected.
  - The g:autofenc_autodetect_comment_num_of_lines option is now used also in HTML/XML/CSS detection routines (previously only used for encoding specified in comments).
  - Improved HTML charset line regex.
  - Added an option (g:autofenc_emit_messages) to emit messages about the detected/used encoding upon opening a file.
AutoFenc.vim 1.1.1 2009-10-03 7.0 Petr Zemek Fixed the comment encoding detection function (see changelog).
AutoFenc.vim 1.1 2009-08-16 7.0 Petr Zemek Added three configuration possibilites to disable autodetection for specific files (based on file size, file type and file path). See script description for more info.
AutoFenc.vim 1.0.2 2009-08-11 7.0 Petr Zemek Fixed the XML encoding detection function and minor code and documentation fixes.
AutoFenc.vim 1.0.1 2009-08-02 7.0 Petr Zemek Three bugfixes (see changelog).
AutoFenc.vim 1.0 2009-07-26 7.2 Petr Zemek Initial upload
ip used for rating: 54.227.5.234

If you have questions or remarks about this site, visit the vimonline development pages. Please use this site responsibly.
Questions about Vim should go to the maillist. Help Bram help Uganda.
   
SourceForge.net Logo