Q&A: Plain text and HTML

Student question:

Is plain text the same as txt? I’m still having a problem with it.

Answer:

Beginning students learning how to code with web standards are sometimes confused about what plain text is and how it applies to HTML.

Plain text refers to simple character set, like ASCII, or Unicode. It is the set of characters that allow us to put alpha-numeric characters on a page. They are the symbols that make up certain languages. ASCII (American Standard Code for Information Interchange) is a character set of  128 characters that make up the English language. Since the web is a global network of documents, the ASCII character set is insufficient to represent many of the languages that use it. That’s why in most HTML documents you find a tag like…

<meta charset=”UTF-8″ />

” (UCS Transformation Format—8-bit[1]) is a variable-width encoding that can represent every character in the Unicode character set. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in UTF-16 and UTF-32.”

Wikipedia

Plain text refers to characters without any regard to formatting; i.e. font, size, color, justification, weight, style, etc. Many applications are capable of reading plain text information. We normally use plain text editors like Notepad (PC) and TextEdit (Mac), vi, or emacs to create the documents so that we don’t introduce formatting codes into our documents. As web designers, we  introduce the formatting into our web documents through Cascading Style Sheets.

When we use a program like Microsoft Word, OpenOffice, Google Write, or a host of other word processing packages, we are putting simple characters onto a page and then use the programs to add the formatting to change the characters appearance. While you normally don’t see the formatting codes that the software uses to make that happen, they are there, in the background, invisible to us. When we format a web page, we see all those codes in the form of HTML and CSS and have minute control over them. It is the browser that takes those codes and uses them to transform the plain text into formatted text.

Plain text files can have more than one extension but usually  have a .txt extension. An HTML document is created using plain text. We create tags with the plain text characters that communicate specific things to a browser. So in HTML you could say that we are giving instructions to a browser (or user agent) using the Unicode plain text character set. The HTML tags, CSS, and Javascript code that we write needs a program that understands how to interpret those tags and code. We give HTML files  an extension of .html so the Operating system knows to open it up with a browser instead of TextEdit or Notepad. The browser knows what to do with the codes and show us what the content looks like when the codes are rendered. We call that interpreting the code, or rendering the page. A text editor just shows us the characters, it does not interpret them.

For a list of Text editors see the following wikipedia article, http://en.wikipedia.org/wiki/List_of_text_editors