Orientation, HTML HTML exercises CMS + HTML CMS exercises Conclusion Contact

[ back ]
Beginning HTML

Orientation topics

HTML

HTML stands for HyperText Mark-up Language.  It was invented at CERN (called ENQUIRE) as a way of creating documents in a standard way.  The idea was to imbed the format in the document as "mark-up" instructions (tags) to tell a software program how to layout the document.  This enabled the interchange of research papers in a better way than previously, (footnote-1).  Documents, particularly research ones, often refer to other documents and this was handled by links which were embedded in the HTML as Hypertext.  When displayed by the appropriate software this link would be displayed in blue and underlined, showing it was a link, and if clicked on would bring up the referenced document.

Tim Berners Lee then saw that this could be used for the Internet and reworked it as HTML and developed a browser to render (layout) HTML files.  There were several versions of HTML and the current universal version is HTML4, but it is being replaced by HTML5.  To cater for things wanted on the web it has been extended and plugins added.  HTML is now too complex for just documents and XML is now used for that.  There is a second version of HTML called XHTML which is for the pedantic.  It has stricter rules but is largely identical.  The purists are not happy with the <br> tag for instance, because it doesn't have a closing </br> tag so they write it as <br /> and this makes them happy despite the fact the <br /> tag is still unpaired.  My view is that if a computer can handle HTML then so can I and there is no need for XHTML.  "Chacun a son gooch".  Don't worry about what a <br> tag is or does, this is covered later.

Filenames

An HTML document or file is just that, a file.  As such it has a filename and the convention for filenames is {name}.{extension}.  In the case of HTML files the extension used is htm or html.  In Windows, case is insignificant in filenames, upper and lower case are not distinguished.  So there are four possibilities but they are all equivalent.  (footnote-2).  However the filename is only a convention and having an extension of .htm does not make a file an HTML file.  It is only a guide to the operating system that if no particular program is specified then this file should be treated as an HTML file and passed to a browser to handle.  What makes a file an HTML file is the HTML markup language inside it.

Text editors

An HTML file is human readable, since it only has text (displayable) characters in it.  It is also computer readable as it has mark-up instructions within the text indicated by what they call "angle brackets", i.e. < and >.  Since it is a text file you will need a text editor to create, read and modify it.  The standard text editor in Windows is Notepad, and this will do. However you can get specialised editors for HTML as in these links :-
   Best Free HTML Editor, Gizmo
   9 Best Free HTML Editors for Windows, Lifewire
   14 Best Free HTML Editors, Capterra
   Notetab light (the free version doesn't do syntax colour coding). 

You will see the term WYSIWYG (pronounced wizzywig), What You See Is What You Get.  What this means is that if your editor has this feature you see the rendered (formatted) version (as it would be in a web browser) on the screen, but the file created is an HTML file.  Some editors only have this one view and though it may be attractive for the novice, it is not so good for those trying to learn.  You need to see the HTML code to understand what is going on.  Decent editors have both views, switchable from a menu.  Some top end editors do not have WYSIWYG built in, but provide a "view in browser" menu option instead.  If your editor does not have any of these features (e.g. if you are using Notepad), you can still see what your page looks like by opening the file in Windows.

White space

This is complicated to explain but is simple once grasped.  Basically this means that layout on the page (in the file) does not form part of the programming language; HTML, C, Python, PHP and many others.  Specifically the space character, the tab character, the newline character and the carriage return character are ignored by the HTML parser (browser).  An HTML file could be a continuous string of characters in one line with no spacing between items.  This would be difficult to read though.  Since the browser does not care about spacing we may as well use it to try and improve readability.  It is good practice to use indentation and blank lines to help show the logical structure of your HTML code.

Loading a file into a browser

If you have been using a WYSIWYG editor you can see your page taking shape as you code it, but this is only while you are developing it.  The end objective is for the page/file to be viewed in a browser.  There are several browsers available.  Windows comes with Internet Explorer already installed but you can use Firefox, Opera, Chrome, Safari and many others; it is a matter of personal choice.

There are 4 ways to get a file into a browser.  The first is only available if you have an editor that supports it and it is "View in Web Browser".  The second is the method used by the first behind the scenes and is to get the operating system to pass your file to the browser for you.  You do this by double clicking on the file name in Windows Explorer, or yu could use the right click menu and "open with".  The third and fourth ways are to load a browser and then get it to load your file.  You can do this in two ways; use the menu option File/open and browse to the file using the open dialogue or typing the path name to the file in the URL address bar of the browser.  He shows this latter method in the HTML tutorial linked below.

File Associations

What happens when you double-click on a file depends on the file association it has.  They are set up for you when your operating system is first installed.  However they can get changed when you install new software.  Some software does nothing, some asks you if you want anything changed, other software just changes things to what it thinks without asking.  It depends on how good the author of the software was.  If your file association is wrong, i.e. Windows loads a different program to the one you want it to when you double click, then you can change it as follows.

  • Open windows explorer and browse to the file if you're not already there.
  • Right click on the file.
  • From the bottom of the drop down list select Properties
  • In the second section click on the change button next to Opens with:
  • In the pop up dialogue select the program you want to use and click OK.

Domains

When you start writing your first HTML page it is just a file in a directory on a computer.  By making several of these that link together we have the start of a web site.  However, on your computer it is only available to you (generally speaking), so we need to put the files on a host that can be accessed from anywhere.  We also need a means to find them easily.  That is where domains come in.  Each node of the Internet has a unique address called an IP but these are hard to remember numbers.  So for humans an alphabetic name is used.  When the Internet started the name was structured in four parts separated by full stops.

http://www.{name1}.{type}.{country}

Where the bits in brackets were as follows.  Name was user chosen and could be anything provided it was unique within those that had the same type and country.  Type was one of co, org, gov, edu and maybe some others.  Country is a two letter code for the country where the web site is, not necessarily where the company is situated, e.g. uk=united kingdom, de=germany.  USA is a special case, it uses com instead of co and has no country code. There have been changes since, see below.

To convert between the domain name that humans can understand and use and the numeric IP address that is used on the Internet, a Domain Name Server (DNS) is used.  The DNS service is normally provided by the Internet Service Provider of the person making the connection.  It could be anywhere, but has to be specified and used at the connecting end, not the hosting end, for obvious reasons.  It is possible to use an IP as the address instead of a domain name, but this is not normally done.

There have been changes to this system over the years.  Since the www (which stands for world wide web) is in every domain name it is superfluous (except for those with Intranets).  Some web hosting companies allow www.domain-name and domain-name to be the same, this may be chargeable.  There has been a huge increase in the type with new ones like club, school, London, etc., being added. The names are allocated on a first come first served basis and are controlled by registrars, there are several dealing with different types and countries.  If an individual pinched the domain name of a large company by registering it first he might be able to sell it to the company when they decided to start an Internet presence and found their name gone.  This used to be a popular scam in the early days but registrars started demanding proof, such as letterhead paper, company registrations, etc., before granting a domain name.  Because of the administration involved registrars charge a yearly fee to register a domain.  If you stop paying you lose the right to the domain.

anchor tag

Hypertext links are handled by the anchor tag.  There are two types, there may be more, but I only use the two.  The first is to specify where you want to go, i.e. a hypertext link, and has the form <a href={link}>text description</a>.  The second is to specify where you want to land (i.e. which part of a document) and has the form <a name{label}></a>.

The hypertext link has the general form {domain}{/path}{#label}.  It may point to somewhere within the current page in which case it just has the form {#label}, somewhere within another page on the same web site in which case it has the form {/path}{#label} or somewhere on another web site in which case it has the general form beginning with the domain name of the web site.  If the {#label} part is omitted the default of the top of the page is used.

If you go to another page on your own web site then you could include your domain name, which is known as a full or absolute reference.  For example the address of this page is "http://sturnidae.com/HTM1.php".  However, for a reason that I will explain in a minute, it is better to use a local reference.  That is, a reference local to the current directory.  So another file in the same directory might be HTM2.htm, a file in a sub directory might be images/picture.jpg.  The reason for using local references is two-fold.  If you change the directory structure of your site you will have to modify all the absolute links.  You would still need to check all the relative links but would probably only need to modify some, maybe none. The second reason is that if you are writing hand coded HTML you would probably have a test copy of the web site on your local PC.  You would want links clicked on your test version to remain on your local test version and not change over to the web hosted production site.  An exception is often the link to the home page of a web site as that ensures you get rid of any undesirable embedding if someone comes to your site on a referred link, but then they decide to explore your site further.

One of the pains with external links is that if the target site is reorganised or disappears, the link no longer works and gives a "Page not found" (404) error.  If the web site is written properly it should handle this for you using redirection, but most web sites do not go to this trouble.  For this reason it is necessary to periodically check that all your links are still working.  You can get link checker programs to do this, but if you register your site with Google they will do it for you, and there are other benefits.

Web Server

On your own computer, to display an HTML page you simply load it in to a browser.  To load a page from the web is the same, you load the remote HTML file into your local browser.  Just a little more effort is needed to get it from there.  For a start there is the DNS process of converting the alphabetic human readable address into the 32 bit numeric address used on the internet.  Leaving that aside there is still the security risk of allowing an external computer to have access to the web site's computer with Windows explorer, as it could delete files, rename them, over-write them, or cause all sorts of mayhem.  It would be open season for hackers.  To get round this security problem the web was designed so that your computer asks for the HTML file it wants from the computer hosting it and is given it in return.  This is handled by a web server, usually a program called Apache and the communication between the web server and your computer (the client) is handled under the http or https protocol, where the s (standing for secure) implies encryption will be used.  This protocol contains the rules that are intended to ensure everything works as it should.

The server can serve HTML (with or without Javascript) but can also serve PHP or ASP pages.  These are script files which are not passed on to the user but intrepeted and the generated output is given to the user.  PHP stands for PHP Hypertext Preprocessor which is a recursive acronym.  ASP is Microsoft's version.  The use of server scripts allows for dynamic pages rather than the static pages that result from plain HTML.  Javascript also allows for dynamic pages with the logic being performed at the client end, wheras server scripts apply the logic at the server end.  The use of server side scripting and a database allows the development of Content Management Systems (CMS).

It is possible to run a web server on your local computer.  You would do this if you wanted to host your web site on your own computer, but there is a lot involved in this.  It is not a trivial task.  A more likely reason is that you are devloping in PHP or ASP and so you need a server for local testing.  There is a software bundle available to do this under Windows called WAMP, with Linux and Mac versions called LAMP and MAMP respectively.  If you are just writing HTML and not using PHP or ASP then you do not need a local server.

Alternatives to hand coded HTML

You can develop a document in word or excel and save it in HTML format.  This would work providing you do not need to modify the produced HTML.  Since Excel or Word need to recreate the format if the file is reloaded, they store the formatting instructions in the HTML using their own meta-language which is easier for them to parse, and not being recognised by the browsers is ignored by them during page layout.  This makes the HTML pages produced very verbose and hard to read. This method is not recomended for a complex site, or one which will have much maintenance, but it is an easy method to generate a simple site.

The next level of sophistication is to use a Web builder package.  These are often supplied by your web hosting provider if you have one, or there are websites that will supply the service and even host your web site.  These are usually based on a WYSIWYG editor, so you don't need to learn HTML unless you want to tweak the provided layout.  If you use Web based hosting and creation you should research them using search engines and a comparison site.  Although I have no experience of most, one that I have used is Google which has the advantage of being a large company, so unlikely to disappear overnight, and they also provide tutorials and support for SEO (Search Engine Optimisation).

The third level of sophistication is hand coded HTML (with or without server side scripting).  This is what we are looking at here so it will not be considered under alternatives other than to say it can be the most flexible method but also the most complex.

The last level of sophistication is the Content Management Systems.  They are the most powerful and are capable of delivering really large corporate systems developed by large collaborative teams.  Although in theory anything that can be achieved with a CMS could also be achieved with hand coded HTML, in practice the development cost could make it uneconomic.  There are several well known Content Management Systems among which are :-

CMS packages usually come as software instal scripts that are supplied by your web hosting supplier and as such your choice is limited to those packages that they support.  However they also come as software bundles that can be installed though you may need to liase with your hosting company over compatibity issues.  You can also get a web based hosting and CMS package combined.  Some of these are free, but then you are then usually limited in scope (sprat to catch a mackerel).

Tutorial videos

The headings below are hyperlinks to videos.  There may be better ones, but these were the first ones I found that checked out as OK.  There is a tutorial at W3Schools but that is better suited for reference, or for those with prior programming experience.  It is not HTML for dummies.  W3Schools tends to steer you to HTML 5.  For the experienced HTML programmer, 5 has definite advantages over 4, particularly for graphics and multimedia, but in my view 4 is easier to start with.  Furthermore HTML5 is still being developed whereas HTML 4 is stable.

HTML tutorial

This is a set of four videos, each about 9 minutes, that run sequentially.  The first video is introductory and you should make sure you understand this before moving on to video 2.  You should save the second tutorial to start as a new session, i.e. start fresh, as this covers more meat and starts to get detailed.  The web site he introduces in his second video looks to me to be better than W3schools, certainly for beginners.  Only do the 4 HTML videos.  Do not look at CMS yet.  Good practice uses both HTML and CSS together, but it is possible to use only HTML, although not recommended.  However it is important to know how formatting is done in HTML, particularly if you want to be able to read and amend someone else's code.

4 programming concepts

This is not an ideal video as it is done by a foreigner, who although has a good command of English has a distinct accent to our ears, which makes for a bit more effort in understanding what he says.  However what he covers in the video is an essential starting point for all programmers.  This is a context only topic for HTML, as the HTML language only uses the first of the four concepts.  The others are important later if you get on to PHP or Javascript.  You don't need to master this video but it is worth watching once for orientation.  The last concept he calls function, but it is also known as subroutine, macro and other names.  The concept of writing code once and then reusing it is very important.  We will come back to this with CSS.

Explanation of binary

This was written after PCs were established and is not quite accurate historically.  For instance binary was used in computers with vacuum tubes (valves) before transistors were invented.  Memory now comes in special integrated circuits not discrete transistors but these contain the equivalent of transistors.  My first computer, the LEO had mercury delay lines for its memory.  A tube of mercury with sound pulses travelling down it.  When they got to the end they were brought back to the start on a wire and sent down again.  One tube could hold 40 bits of data. Later memory was fabricated from tiny ferrite cores.  Only much later did the transistor memory arrive.  However what he says is OK for an introduction, the principle is right.  Just think of a bit as any thing which can have two states, ON or OFF.  The first Science Foundation course with the OU (S101) had as part of its home experiment kit a BOBCAT which stood for Ball Operated Binary Calculator and Tutor.  I never saw one but I think it used marbles to operate it.  There is a description here.

He covers a lot of ground.  I know the subject material, but I found it seemed to make your head spin, partly because he goes too fast and doesn't allow thinking/assimilation time.  You may need to watch it more than once.  He has a series of videos, I have only looked at the first one, you may find some of the others interesting.  As well as binary there is hexadecimal, which I briefly described. This uses a radix of 16 (hexadecim is Latin for sixteen) and uses the digits 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F.  It is useful because hexadecimal numbers are 4 bits long and there are 8 bits in a byte, hence two hexadecimal digits can represent the possible values of a byte, 0 to 255 (256 values), i.e. 16 x 16.  You do not need to learn hexadecimal, though over time this will happen as you become familiar with using it.  I can read text expressed in hexadecimal (ASCII characters), but I have been in computing a long while.  If you do need to convert hexadecimal then there is a web site that will do it for you.

I also think a bit is stored in two transistors rather than one, known as a bi-stable.  However, this is nit picking and not material to the point.  The essence is a bit is ON or OFF representing 1 or 0.

Footnotes

Footnote-1:  Many different word processors were in use, each embedding format information, but in their own proprietary way.  They still continued but later versions had the capability to save a document in the HTML format which solved the compatibility problem on interchange of documents. (back).

Footnote-2:  If mixed case is allowed there are more than 4 possibilities.  But if we restrict ourseves to only all upper case or all lower case, then there are four.  Note that test.htm and test.html are different files, but they both are handled by a browser.  However, Test.HTM and TEST.htm, or indeed any mixture of upper and lower case are the same file, Windows will not distinguish them, though Unix will.  (back).

JG - 17 April 2019 - revised 6 May 2019 p.m.

[ top - 2nd topic ]