Teach Yourself Web Publishing with HTML 3.2 in 14 Days
The World of the World Wide Web
- What Is the World Wide Web?
- Web Browsers
- Web Servers
- Uniform Resource Locators (URLs)
A journey of a thousand miles begins with a single step, and here
you are at Day 1, Chapter 1, of a journey
that will show you how to write, design, and publish pages on
the World Wide Web. Before beginning the actual journey, however,
it helps to start simple, with the basics:
- What the World Wide Web is and why it's really cool
- Web browsers: what they do, and some popular ones to choose from
- What a Web server is and why you need one
- Some information about Uniform Resource Locators (URLs)
If you've spent even a small amount time exploring the Web, most, if not all, of this chapter will seem like old news. If so, feel free to skim this chapter and skip ahead to the next chapter, where you'll find an overview of things to think about when you design and organize your own Web documents.
I have a friend who likes to describe things with lots of meaningful words strung together in a chain so that it takes several minutes to sort out what he's just said.
If I were him, I'd describe the World Wide Web as a global, interactive, dynamic, cross-platform, distributed, graphical hypertext information system that runs over the Internet. Whew! Unless you understand each of those words and how they fit together, that isn't going to make much sense. (My friend often doesn't make much sense, either.)
So let's take each one of those words and see what they mean in the context of how you'll be using the Web as a publishing medium.
If you've used any sort of basic online help system, you're already familiar with the primary concept behind the World Wide Web: hypertext.
The idea behind hypertext is that instead of reading text in a rigid, linear structure (such as a book), you can skip easily from one point to another. You can get more information, go back, jump to other topics, and navigate through the text based on what interests you at the time.
Hypertext enables you to read and navigate text and visual information in a nonlinear way based on what you want to know next.
Online help systems or help stacks such as those provided by Microsoft Windows Help or HyperCard on the Macintosh use hypertext to present information. To get more information on a topic, just click on that topic. That topic might be a link that takes you to a new screen (or window, or dialog box) which contains that new information. Perhaps there are links on words or phrases that take you to still other screens, and links on those screens that take you even further away from your original topic. Figure 1.1 shows a simple diagram of how that kind of system works.
Now imagine that your online help system is linked to another online help system on another application related to yours; for example, your drawing program's help is linked to your word processor's help. Your word processor's help is then linked to an encyclopedia, where you can look up any other concepts that you don't understand. The encyclopedia is hooked into a global index of magazine articles that enables you to get the most recent information on the topics that the encyclopedia covers. The article index is then also linked into information about the writers of those articles, and some pictures of their children (see Figure 1.2).
If you had all these interlinked help systems available with every program you bought, you'd rapidly run out of disk space. You might also question whether you needed all this information when all you wanted to know was how to do one simple thing. All that information could be expensive, too.
But if the information didn't take up much disk space, and if it were freely available, and you could get it reasonably quickly anytime you wanted, then things would be more interesting. In fact, the information system might very well end up more interesting than the software you bought in the first place.
That's just what the World Wide Web is: more information than you could ever digest in a lifetime, linked together in various ways, out there on the Net, available for you to browse whenever you want. It's big, and deep, and easy to get lost in. But it's also an immense amount of fun.
One of the best parts of the Web, and arguably the reason it has become so popular, is its ability to display both text and graphics in full color on the same page. Before the Web, using the Internet involved simple text-only connections. You had to navigate the Internet's various services using typed commands and arcane tools. Although there was plenty of really exciting information on the Net, it wasn't necessarily pretty to look at.
The Web provides capabilities for graphics, sound, and video to be incorporated with the text, and newer software includes even more capabilities for multimedia and embedded applications. More importantly, the interface to all this is easily navigable-just jump from link to link, from page to page, across sites and servers.
If the Web incorporates so much more than text, why do I keep calling the Web a HyperText system? Well, if you're going to be absolutely technically correct about it, the Web is not a hypertext system-it's a hypermedia system. But, on the other hand, one could argue that the Web began as a text-only system, and much of the content is still text-heavy, with extra bits of media added in as emphasis. Many very educated people are arguing these very points at this moment, and presenting their arguments in papers and discursive rants as educated people like to do. Whatever. I prefer the term hypertext, and it's my book, so I'm going to use it. You know what I mean.
If you can access the Internet, you can access the World Wide Web regardless of whether you're running on a low-end pc, a fancy expensive graphics workstation, or a multimillion-dollar mainframe. You can be using a simple text-only modem connection, a small 14-inch black and white monitor or a 21-inch full-color super gamma-corrected graphics-accelerated display system. If you think Windows menus buttons look better than Macintosh menus and buttons, or vice versa (or if you think both Mac and Windows people are weenies), it doesn't matter. The World Wide Web is not limited to any one kind of machine, or developed by any one company. The Web is entirely cross-platform.
Cross-platform means that you can access Web information equally well from any computer hardware running any operating system using any display.
You gain access to the Web through an application called a browser, like Netscape's Navigator or Microsoft's Internet Explorer. There are lots of browsers out there for most existing computer systems. And once you've got a browser and a connection to the Internet, you've got it made. You're on the Web. (I explain more about what the browser actually does later in this chapter.)
A browser is used to view and navigate Web pages and other information on the World Wide Web.
Information takes up an awful lot of space, particularly when you include images and multimedia capabilities. To store all the information that the Web provides, you'd need an untold amount of disk space, and managing it would be almost impossible. Imagine if you were interested in finding out more information about alpacas (a Peruvian mammal known for its wool), but when you selected a link in your online encyclopedia your computer prompted you to insert CD-ROM #456 ALP through ALR. You could be there for a long time just looking for the right CD!
The Web is successful in providing so much information because that information is distributed globally across thousands of Web sites, each of which contributes the space for the information it publishes. You, as a consumer of that information, go to that site to view the information. When you're done, you go somewhere else, and your system reclaims the disk space. You don't have to install it, or change disks, or do anything other than point your browser at that site.
A Web site is a location on the Web that publishes some kind of information. When you view a Web page, your browser is connecting to that Web site to get that information.
Each Web site, and each page or bit of information on that site, has a unique address. This address is called a Uniform Resource Locator, or URL. When someone tells you to visit their site at http://www.coolsite.com/, they've just given you a URL. You can use your browser (with the Open command, sometimes called Open URL or Go) to enter in the URL (or just copy and paste it).
A Uniform Resource Locator (URL) is a pointer to a specific bit of information on the Internet.
URLs are alternately pronounced as if spelled out "You are Ells" or as an actual word ("earls"). Although I prefer the former pronunciation, I've heard the latter used equally often.
You'll learn more about URLs later on in this chapter.
Because information on the Web is contained on the site that published it, the people who published it in the first place can update it at any time.
If you're browsing that information, you don't have to install a new version of the help system, buy another book, or call technical support to get updated information. Just bring up your browser and check out what's up there.
If you're publishing on the Web, you can make sure your information is up to date all the time. You don't have to spend a lot of time rereleasing updated documents. There is no cost of materials. You don't have to get bids on number of copies or quality of output. Color is free. And you won't get calls from hapless customers who have a version of the book that was obsolete four years ago.
Take, for example, the development effort for a Web server called Apache. Apache is being developed and tested through a core of volunteers, has many of the features of the larger commercial servers, and is free. The Apache Web site at http://www.apache.org/ is the central location for information about the Apache software, documentation, and the server software itself (Figure 1.3 shows its home page). Because the site can be updated any time, new releases can be distributed quickly and easily. Changes and bug fixes to the documentation, which is all online, can be made directly to the files. And new information and news can be published almost immediately.
The pictures throughout this book are usually taken from a browser on the Macintosh (Netscape, most often), or using the text-only browser Lynx. The only reason for this is because I'm writing this book primarily on a Macintosh. If you're using Windows or a UNIX system, don't feel left out. As I noted earlier, the glory of the Web is that you see the same information regardless of the platform you're on. So ignore the buttons and window borders and focus on what's inside the window.
For some sites, the ability to update the site on the fly at any moment is precisely why the site exists. Figure 1.4 shows the home page for The Nando Times, an online newspaper that is updated 24 hours a day to reflect new news as it happens. Because the site is up and available all the time, it has an immediacy that neither hardcopy newspapers nor most television news programs can match. Visit The Nando Times at http://www.nando.net/nt/nando.cgi.
If you've read any of the innumerable books on how to use the Internet, you're aware of the dozens of different ways of getting at information on the Net: FTP, Gopher, Usenet news, WAIS databases, Telnet, and e-mail. Before the Web became as popular as it is now, to get to these different kinds of information you had to use different tools for each one, all of which had to be installed and all of which used different commands. Although all these choices made for a great market for "How to Use the Internet" books, they weren't really very easy to use.
Web browsers change that. Although the Web itself is its own information system, with its own Internet protocol (HTTP, the HyperText Transfer Protocol), Web browsers can also read files from other Internet services. And, even better, you can create links to information on those systems just as you would create links to information on Web pages. It's all seamless and all available through a single application.
To point your browser to different kinds of information on the Internet, you use different kinds of URLs. Most URLs start with http:, which indicates a file at an actual Web site. To get to a file on the Web using FTP, you would use a URL that looks something like this: ftp://name_of_site/directory/filename. You can also use an ftp: URL ending with a directory name, and your Web server will show you a list of the files, as in Figure 1.5. This particular figure shows a listing of files from Simtel, a repository of Windows software at ftp://oak.oakland.edu/SimTel/win3/winsock/.
To use a Gopher server from a Web browser, use a URL that looks something like this: gopher://name_of_gopher_server/. For example, Figure 1.6 shows the Gopher server on the WELL, a popular Internet service in San Francisco. Its URL is gopher://gopher.well.com/. You'll learn more about different kinds of URLs in Chapter 4, "Links and URLs."
Interactivity is the ability to "talk back" to the Web server. More traditional media such as television isn't interactive at all; all you do is sit and watch as shows are played at you. Other than changing the channel, you don't have much control over what you see.
The Web is inherently interactive; the act of selecting a link and jumping to another Web page to go somewhere else on the Web is a form of interactivity. In addition to this simple interactivity, however, the Web also enables you to communicate with the publisher of the pages you're reading and with other readers of those pages.
For example, pages can be designed that contain interactive forms that readers can fill out. Forms can contain text-entry areas, radio buttons, or simple menus of items. When the form is "submitted," the information you typed is sent back to server where the pages originated. Figure 1.7 shows an example of an online form for a rather ridiculous census (a form you'll create later on in this book):
As a publisher of information on the Web, you can use forms for many different purposes, for example:
- To get feedback about your pages.
- To get information from your readers (survey, voting, demographic, or any other kind of data). You then can collect statistics on that data, store it in a database, or do anything you want with it.
- To provide online order forms for products or services available on the Web.
- To create "guestbooks" and conferencing systems that enable your readers to post their own information on your pages. These kinds of systems enable your readers to communicate not only with you, but with other readers of your pages as well.
In addition to forms, which provide some of the most popular forms of interactivity on the Web, advanced features of Web development provide even more interactivity. For example, capabilities such as Java and Shockwave enable you to include entire programs and games inside Web pages. Software can run on the Web to enable real-time chat sessions between your readers. And developments in 3D worlds enable you and your readers to browse the Web as if they were wandering through real three-dimensional rooms and meeting other people. As time goes on, the Web becomes less of a medium for people passively sitting and digesting information (and becoming "net potatoes") as it is a medium for reaching and communicating with other people all over the world.
A Web browser, as I mentioned earlier, is the program you use to view pages on and navigate the World Wide Web. Web browsers are sometimes referred to as Web clients or other fancy names ("Internet navigation tools"), but Web browser is the most common term.
A wide array of Web browsers is available for just about every platform you can imagine, including graphical-user-interface-based systems (Mac, Windows, X11), and text-only for dial-up UNIX connections. Most browsers are freeware or shareware (try before you buy) or have a lenient licensing policy (Netscape allows you to evaluate its browser for some time after which you are expected to buy it). Usually all you have to do to get a browser is download it from the Net.
If you get your Internet connection through a commercial online service such as America Online or CompuServe, you may have several browsers to choose from; try a couple and see what works best for you.
Currently the most popular browser for the World Wide Web is Netscape's Navigator, developed by Netscape Communications Corporation. Netscape has become so popular that using Netscape and using the Web have become synonymous to many people. However, despite the fact that Netscape has the lion's share of the market, it is not the only browser on the Web. This will become an important point later on when you learn how to design Web pages and learn about the different capabilities of different browsers. Assuming Netscape is the only browser in use on the Web, and designing your pages accordingly, will limit the audience you can reach with the information you want to present.
Any Web browser's job is twofold: given a pointer to a piece of information on the Net (a URL), it has to be able to access that information or operate in some way based on the contents of that pointer. For hypertext Web documents, this means that the browser must be able to communicate with the Web server using the HTTP protocol. Because the Web can also manage information contained on FTP and Gopher servers, in Usenet news postings, in e-mail, and so on, browsers can often communicate with those servers or protocols as well.
What the browser does most often, however, is deal with formatting and displaying Web documents. Each Web page is a file written in a language called HTML (HyperText Markup Language) that includes the text of the page, its structure, and links to other documents, images, or other media. (You'll learn all about HTML on Days 2 and 3, because you need to know it in order to write your own Web pages.) The browser takes the information it gets from the Web server and formats and displays it for your system. Different browsers may format and display the same file differently, depending on the capabilities of that system and the default layout options for the browser itself. You'll learn more about this tomorrow in Chapter 3, "Begin with the Basics."
Retrieving documents from the Web and formatting them for your system are the two tasks that make up the core of a browser's functionality. However, depending on the browser you use and the features it includes, you may also be able to play multimedia files, view and interact with Java applets, read your mail, or use other advanced features that a particular browser offers.
This section describes a few of the more popular browsers on the Web at the time this book is being written. These are in no way all the browsers available, and if the browser you're using isn't here, don't feel that you have to use one of these. Whatever browser you have is fine as long as it works for you.
The browsers in this section can be used only if you have a direct Internet connection or a dial-up SLIP or PPP Internet connection. Getting your machine connected to the Internet is beyond the scope of this book, but plenty of books are out there to help you do so.
If your connection to the Internet is through a commercial online service (AOL, CompuServe, or Prodigy), you may have a choice of several browsers including the ones in this section and browsers that your provider supplies.
Finally, if the only connection you have to the Internet is through a dial-up text-only UNIX (or other) account, you are limited to using text-only browsers such as Lynx. You will not be able to view documents in color or view graphics online (although you usually can download them to your system and view them there).
By far the most popular browser in use on the Web today is Netscape Navigator, from Netscape Communications Corporation. Netscape Navigator is most commonly just called Netscape. The Macintosh version of Netscape is shown in Figure 1.8.
Netscape is available for Windows, Macintosh, and for many different versions of UNIX running the X Window System. It is well supported and provides up-to-the-minute features including an integrated news and mail reader, support for Java applets, and the ability to handle "plug-ins" for more new and interesting features yet to be developed.
The current version of Netscape is 2.01, which is available for downloading at Netscape's site at http://www.netscape.com/, or in boxes from your favorite computer software store.
If you're a student, faculty, or staff of an educational institution, or if you work for a charitable nonprofit organization, you can download and use Netscape for free. Otherwise, you're expected to pay for Netscape after an evaluation period (typically 90 days). If you buy Netscape from a store, you've already paid the license fee.
Only a little more than a year ago, Mosaic had Netscape's place on the Web as the most popular browser. Indeed, Mosaic was the first of the full-color graphical browsers and is usually credited with making the Web as popular as it is today.
Mosaic is developed by ncSA at the University of Illinois, with several supported commercial versions available from companies such as Spry and Spyglass. ncSA Mosaic is free for personal use and comes in versions for Windows, Macintosh, and UNIX (the X Window System); each version is colloquially called WinMosaic, MacMosaic, and XMosaic, respectively. The current version of ncSA Mosaic is 2.01 on all platforms. You can find out more information and download a copy from http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/ncSAMosaicHome.html.
Figure 1.9 shows ncSA Mosaic for the Macintosh.
Lynx ("links," get it?), originally developed by the University of Kansas and now by Foteos Macrides at the Worcester Foundation for Biological Research, is an excellent browser for text-only Internet connections such as dial-up UNIX accounts. It requires VT100 terminal emulation, which most terminal emulation programs should support. You can use arrow keys to select links in Web pages.
Because Lynx runs on systems that lack the ability to display graphics, viewing Web pages using Lynx gives you nothing but the text and the links. Designing pages that work equally well in Lynx and in graphical browsers is one of the more interesting challenges of Web page design (as you'll learn later on in this book).
Lynx should be available on the system where you have a dial-up account, or you can download it from ftp://ftp2.cc.ukans.edu/pub/lynx. The current version is 2.4FM (FM are the initials of the author supporting it). Figure 1.10 shows Lynx running on a UNIX system (from a terminal emulator).
New on the scene but expected to make a significant impact in the coming months is Microsoft's new browser Internet Explorer, usually just called Explorer. Explorer runs on Windows 3.1, Windows 95, Windows NT, and Macintosh, and it is free for downloading from Microsoft's Web site (http://www.microsoft.com/ie/). No further license fee is required.
So far, Microsoft has been the only browser developer that has come close to keeping up with Netscape's pace of development, supporting many of Netscape's features and adding a few of its own. In addition, Microsoft has made significant deals with several commercial online services, which means its share of the browser market may be growing significantly in the future.
The current versions of Explorer are 2.0 for Windows 95, 1.5 for Windows NT, 1.6 beta for Windows 3.1, and 2.0 beta for Macintosh. An experimental developers-only version of the 3.0 version is also available in an alpha form. For more information about all these versions, see the Explorer home page.
Figure 1.11 shows Explorer 2.0b1 running on a Macintosh.
To view and browse pages on the Web, all you need is a Web browser. To publish pages on the Web, most of the time you'll need a Web server.
A Web server is the program that runs on a Web site and is responsible for replying to Web browser requests for files. You need a Web server to publish documents on the Web.
When you use a browser to request a page on a Web site, that browser is making a Web connection to a server (using the HTTP protocol). The server accepts the connection, sends the contents of the files that were requested, and then closes the connection. The browser then formats the information it got from the server.
On the server side, many different browsers may connect to the same server to get the same information. The Web server is responsible for handling all these requests.
Web servers do more than just deposit files. They are also responsible for managing form input and for linking forms and browsers with programs such as databases running on the server.
Just like with browsers, many different servers are available for many different platforms, each with many different features and each and ranging in cost from free to very expensive. For now, all you need to know is what the server is there for; you'll learn more about Web servers on Day 8, "Going Live on the Web."
As you learned earlier, a URL is a pointer to some bit of data on the Web, be it a Web document, a file on FTP or Gopher, a posting on Usenet, or an e-mail address. The URL provides a universal, consistent method for finding and accessing information, not necessarily for you, but mostly for your Web browser. (If URLs were for you, they would be in a format that would make them easier to remember.)
In addition to typing URLs directly into your browser to go to a particular page, you also use URLs when you create a hypertext link within a document to another document. So, any way you look at it, URLs are important to how you and your browser get around on the Web.
URLs contain information about how to get at the information (what protocol to use: FTP, Gopher, HTTP), the Internet host name to look on (www.ncsa.uiuc.edu, or ftp.apple.com, or netcom16.netcom.com, and so on), and the directory or other location on that site to find the file. There are also special URLs for things such as sending mail to people (called mailto URLs), and for using the Telnet program.
You'll learn all about URLs and what each part of them means in Chapter 4, "All About Links."
In order to publish on the Web, you have to understand the basic concepts that make up the parts of the Web. In this chapter you learned three things. First, you learned about a few of the more useful features of the Web for publishing information. Second, you learned about Web browsers and servers and how they interact to deliver Web pages. Third, you learned about what a URL is and why it's important to Web browsing and publishing.
|Q||Who runs the Web? Who controls all these protocols? Who's in charge of all this?|
|A||No single entity "owns" or controls the World Wide Web. Given the enormous number of independent sites that supply information to the Web, it is impossible for any single organization to set rules or
guidelines. There are two groups of organizations, however, that have a great influence over the look and feel and direction of the Web itself.|
The first is the World Wide Web (W3) Consortium, based at MIT in the United States and INRIA in Europe. The W3 Consortium is an organization of individuals and organizations interested in supporting and defining the languages and protocols that make up the Web (HTTP, HTML, and so on). It also provides products (browsers, servers, and so on) that are freely available to anyone who wants to use them. The W3 Consortium is the closest anyone gets to setting the standards for and enforcing rules about the World Wide Web. You can visit the Consortium's home page at http://www.w3.org/.
The second group of organizations that influences the Web is the browser developers themselves, most notably Netscape Communications Corporation and Microsoft. The competition for most popular and technically advanced browser on the Web is fierce right now, with Netscape and Microsoft as the main combatants. Although both organizations claim to support and adhere to the guidelines proposed by the W3 Consortium, both also include their own new features in new versions of their software-features that often conflict with each other and with the work the W3 Consortium is doing.
Sometimes trying to keep track of all the new and rapidly changing developments feels like being in the middle of a war zone, with Netscape on one side, Microsoft on the other, and the W3 trying to mediate and prevent global thermonuclear war. As a Web designer, you're stuck in the middle, and you'll have to make choices about which side to support, if any, and how to deal with the rapid changes. But that's what the rest of this book is for!
|Q||Why would anyone use a text-only browser such as Lynx when there are graphical browsers available?|
|A||You need a special Internet connection in order to use a graphical browser on the Web. If your machine isn't directly hooked up to the Internet (for example, on a network at work or school), you'll need to use a
modem with a special account to make your system think it's on the Net or an account with a commercial online service. These special accounts can be quite expensive, even in areas where there are a lot of Internet service providers. Even then, unless you
have a very fast modem, Web pages can take a long time to load, particularly if there are lots of graphics on the page.|
Lynx is the ideal solution for people who either don't have a direct Internet connection or don't want to take the time to use the Web graphically. It's fast and it enables you to get hold of just about everything on the Web; indirectly, yes, but it's there.
|Q||A lot of the magazine articles I've seen about the Web mention CERN, the European Particle Physics Lab, as having a significant role in Web development. You didn't mention them. Where do they stand in Web development?|
|A||The Web was invented at CERN by Tim Berners-Lee, as I'm sure you know by now from all those magazine articles. And, for several years, CERN was the center for much of the development that went on. In late 1995, however, CERN passed its part in World Wide Web development to INRIA (the Institut National pour la Recherche en Informatique et Automatique), in France. INRIA today is the European leg of the W3 Consortium.|