Pages

Wednesday, September 14, 2016

Encoding and its consequences

If you've done your homework recently (see Exercise[6]) and your Korsakoff's Syndrome isn't acting up, then you should remember something about this funny fellow named Joel Spolsky and his obsession with knowing some basic things about Unicode and character sets, even if it's only that Joel will punish you by making you peel onions for six months in a submarine if he catches you blissfully coding in ignorance of how you're encoding.

If you haven't done your homework yet or if Joel's threat still makes you nervous, then:
  1. Stop!
  2. Revisit Exercise[6] . . .
  3. Repeat the homework assignment until you feel very comfortable programming in the vicinity of Mr. Spolsky.
Okay: hopefully we're all on the same page now. Let's see if we can't convince ourselves that Joel's not just another Henny Penny (or Chicken Little, if you're exceptional) hysterically broadcasting that the sky is falling!

Encoding's Effects on Web Pages


Try this:
  1. Using Notepad or some other text editor like Notepadd++ that provides the ability to save files using either ANSI or UTF-8 encoding, create a new, empty file.
  2. Copy the following text and paste it into your new text document:
    <html>
      <head>
      </head>
      <body>
        <h1>Hello Theta: ϴ</h1>
      </body>
    </html>
  3. Save the file. Be sure to use UTF-8 encoding. (Recall that when using Notepad you can do this via the Save As... window.) Make sure the file name you choose includes the extension .htm or .html. For example, you could name your file theta.html.
  4. Now open the file you just saved using a web browser. (One way to do this is to find the file using Windows Explore and then double click on the file. If you've named your file as instructed and if the file name extension you've used is associated with a web browser, then the file will hopefully open in that web browser.) What do you see? What size is the file? Recall that you can use PowerShell to view the file size in bytes and that Windows Explorer may round up to the nearest KB (kilobyte)!
  5. Now save the same file again. This time use ANSI encoding.
  6. Open the file again in a web browser. What do you see this time? What size is the file now?
  7. Study the w3schools.com HTML meta tag description.
    • What attribute can you use with the meta tag to signify the file encoding used for a particular web page?
    • Does the encoding value specified using a meta tag have to be the same as the actual encoding of the file that contains the content of the page? Why or why not?
    • What happens if the actual file encoding is different from the encoding specified using a meta tag?
    • Does anything about specifying a file's encoding using a meta tag in the file itself seem strange to you? 

Encoding's Effects on javac.exe


Perform a variation of Hello World in Java:
  1. Instead of printing "Hello World", print "Hello Theta: ϴ"
    • Make sure that you save your .java file using UTF-8 encoding.
    • Did you have any trouble compiling your .java file using javac.exe?
    • What happens when you use java.exe to interpret the bytecode in your freshly compiled .class file?
  2. After you have successfully managed to output "Hello Theta: ϴ", change the encoding of your .java file to ANSI and recompile the file (using javac.exe).
    • If you're having trouble getting your program to compile, study the syntax of the javac command (just execute javac.exe without passing any arguments to the program). Can you figure out how to get a file encoded in ANSI to compile?
    • What happens now when you use java.exe to interpret the bytecode in your recompiled .class file?
  3. Study the Google Java Style Guide. What file encoding does Google require?