ISG LogoISG Tutorial

Back to Newsletter Home

Let us demystify GIS - A New Series

Geography is the science of spatial relationships. Maps form a major constituent of geography as they are a means of representing very large spatial relationships in a physically handle-able size. Cartography, the science of map making, is a technology for the organization of geographical information into a map. However a map by itself is of no value unless it is put to use. The traditional uses are many. Navigation, delineation of property boundaries and ownerships, and civil engineering are a few of the commonly known applications. This scenario of applications is, however, rapidly changing. The need for environmentally benign and socially accountable development has put a heavy demand on the capabilities of planners. Therefore planning and execution now require more accurate, reliable and timely information and better tools for the management of such information.

Consider for example the siting of a steel plant. In classical engineering terms the location of the Tata Steel plant at Jamshedpur is often quoted to engineering students as the best positioning from the point of view of raw materials availability. Today, however, there are many more factors to consider. The environmental impact assessment is a major issue. How much forested area will be lost, nearness to habitations, effect on wildlife, impact on the local population are some of the myriad issues to be addressed. This requires not only a variety of maps but a large amount of aspatial information, commonly known as statistics, and the tools to handle these complex data sets and to selectively extract relevant information. It can be easily understood that an enormous amount of effort goes into any spatial management activity. These can be classified as below:

GIS FIG 1

Measurement

This represents the acquisition of data. Survey of natural resources, inventory of land holdings, measurement of pollutants, record of traffic flow are some examples.

Mapping

The acquired data has to be processed to yield intelligible information. For spatial data this involves the representation of the measurements in their spatial context.

Monitoring

This represents the preliminary analysis which involves the time dimension. Tracking land use changes, forest working plans, land records, municipal plans, utility maintenance are some examples.

Modeling

This involves efforts to represent a spatially related real system for the purpose of analyzing and understanding reality; to make predictions and decisions about the system and finally to control the system. Models help rational decision making, strategic analysis and decision support. Modeling helps reduce cost of implementation.

These activities can be organized very efficiently using a tool such as a GIS which integrates hardware, data entry systems, database management, analytic software and cartography in one package. Today, remote sensing provides another powerful tool for data acquisition. The integration of digital thematic maps prepared from remotely sensed data and digital cartographic data is elegantly done in a GIS. The GIS should have the following functional capabilities:

Any information system has four major components. There is an input module, which accepts data, a database module, which organizes and stores the data, an analysis module, which selectively retrieves and manipulates the data and an output module, which presents the analyzed information. A geographical information system is different in that it handles both spatial and aspatial data, consequently the corresponding modules become more complex. A GIS can be manual or automated. Manual methods are traditional and involve overlaying and retracing. These techniques are not adequate when handling large data sets or when repeated analysis is required or several alternative scenarios are to be generated. In these notes only the automated GIS will be discussed.

Geographic Information Systems are classified in several ways. They can be grouped by scope and objectives, type of organization, administrative hierarchy, discipline involved, data structure, programming language, type of users and inherent methodology used by the system. There can be overlaps in these classification systems. Further, in the process of evolution a GIS can move from one class to another. Many people have attempted to arrive at a broad functional definition, which encompasses all these systems. In general, a GIS can be considered to be a computer-assisted system for the capture, storage, retrieval, analysis and display of spatial data.

Alternatively, we could describe Geographic Information System (GIS) as a system which provides a computerised mechanism for integrating various geo-referenced data sets and analysing them in order to generate information relevant to planning needs in a given context.

GISs go under a plethora of names. They are variously called Geo-based Information Systems, Natural Resource Information Systems, Geo-data Systems, Spatial Information Systems, etc,. Since geography is a binding element in all these systems it is suggested that all these systems fall under the generic term, Geographical Information Systems, GIS for short.

Published contributions to GIS research literature include works in computer science, information science, pattern recognition, image processing, remote sensing, computer graphics, photogrammetry, geodesy, cartography and geography.

Geographic Information System (GIS)

Provides a computerised mechanism (S/W and H/W) for integrating and managing various geo- referenced data sets and analysing them together in order to generate information for planning needs in a given context. Like any other Information System, GIS is also primarily an input-output system and includes DBMS functions. However, It differs from conventional DBMS and Information Systems in the sense that every piece of data element in GIS has to be directly or indirectly associated with location on earth's surface expressed as co-ordinates with respect to pre-defined co-ordinate system either in a form of x-y or row-column data with explicit co- ordinate reference. Co-ordinates could be expressed indirectly e.g demography associated with village and village being expressed in the form of a polygon (series of x-y co-ordinates) or point co-ordinate. In general GIS facilitates handling of a variety of data both in spatial as well non-spatial form on natural resources, environment, infrastructure and socio-economic aspects. Typical data in spatial form could be on general resources like topography, landuse, forests, soils, water, watersheds, geology, geomorphology, climate; administrative boundaries like state, district, taluka, villages, forest ranges/ compartments; data on infrastructure like road network, powerlines, waterlines, sewage; locational data in form of latitude/longitude co-ordinates (not just postal address) and the data derived from models like groundwater flow, soil productivity, timber growth, water erosion, etc. The data in non-spatial form could be the descriptive attributes associated with spatial features like soil type, landuse type, village name etc. or the socio-economic data associated with administrative units/ locations like demography, occupation, livestock, crops, labour force, land ownership etc.

Certain terms are used in a GIS which need to be explained before the components of a GIS are described.

Data

A set of observations about any phenomena or object(s). Data does not convey any thing unless analyzed and converted into meaningful information.

Information

Conveys a meaning and prompts an action or decision. Information is derived from data by applying some analysis logic.

Information Systems Approach

Emphasizes on systematic methods of data collection, purposeful, structured, non-redundant data storage; easy and user friendly information retrieval; flexible manipulation/ analysis of underlying data elements for generation/ presentation of meaningful information (not just data) and satisfying multiple planning and decision making needs.

Knowledge

Is inherent to human thinking process. Knowledge is not conveyed, rather it is imparted.

Database

Is structured and purposeful storage of data sets.

Database Management System

Provides ways and means of structured, systematic, sharable, non-redundant storage of data; storage transparency, integrity, flexible query and multiple user views of the underlying database. It forms an integral part of any Information system.

Information System

Incorporates, apart from database management functions, a set of analysis modules which selectively retrieve and manipulate the data and output module which presents the analyzed information in the context of given planning needs. Input to an information system is data, which is raw and does not convey any meaning unless analyzed and converted into meaningful information. Output from an information system is the information which conveys a meaning and prompts a set of actions or decisions. We do not deal with knowledge directly in an information system.

Data Planes

Discrete sets of data. For example an imagery, a thematic map, a topographic sheet, a page of survey data each constitute a data plane.

Themes

Maps containing different types of information. For example a toposheet contains contours, roads, railways, boundaries of forests, etc. Each of these constitute a theme.

Registration

Themes in a given data plane are spatially related to each other. We say that the data is registered. Data in different planes may not be so readily related. Scales may vary, there may be translational and rotational errors. The process of correction and development of an invariant spatial relationship between different data planes is called registration.

Spatial Database

Spatially registered set of data constitutes a spatial database. In addition each spatial object has an associated attribute. This could be a name, a number, a range of values, etc. For example, a contour has a number, a road has a name. Such attributes also form a part of the database. Further, there may be other data sets associated with the spatial object. For example a village has associated demographic data.

Spatial Objects

Spatial objects can be represented by points, lines and polygons. A city is a point, a road is a line and a forest area is a polygon. The manner in which these fundamental units are represented are defined by the spatial data model. For example we can have a chain as a set of line segments, a closed chain forms a polygon, an open chain is a line, a line segment of zero length is a point.

Scale

The relationship between distances on the ground and distances on a map. Scales always apply to linear measures, never to areas or elevations.

Resolution

The smallest element which can be distinguished in a data set. In case of imagery this usually is the pixel size or a multiple of the pixel size. However in a map this term can be confusing. This may be taken to mean the smallest mappable feature. However some features are mapped by symbols even if their size is small. The Components of GIS

The Components of GIS

The major components of a GIS are: the end use or management, data input, data storage and retrieval, analysis and information presentation.

The essential hardware components of a GIS are the host computer (or CPU linked to appropriate storage units a set of disc/tape drives), a digitizer for converting the input data from maps into digital form, a plotter for presentation of processed outputs and a visual display unit (VDU) for commanding the system by a user. Each of the hardware devices could have their variations depending upon the organisational requirements regarding throughput, accuracy etc.

The software components in a GIS includes a set of modules in order to facilitate various operations such as data capture/input, data storage/data base management, analysis and manipulation, presentation of results and a language for user interaction.

GIS Software Components

Subsystem

Components

Input

Digitizing

Editing

Transformation

Import

Storage

Format

Size

Medium

Data-Structure

Retrieval

DBMS

Where-is-this What-is-This

Analysis

Overlay

Proximity

Connectivity

3D

Spatial

Non-spatial

Output

Display Devices

Cartography

Diagrams

Management

Evaluation

Selection

Organization

Info Use

The spatial data model/structure, guides the representation of 2/3 dimensional data into computer memory (one dimensional). The relationships amongst spatial entities are of very complex nature. The spatial data model provides a way of expressing these relationships and plays a major role towards configuration of the hardware and software components, besides being the key factor for success or failure of a system in a given organisational context. We deal with each of these aspects in detail in the forthcoming issues of ISG Newsletter.

-----------------------------------------------------------------------------------------------------------------------------------

To be continued

RK Goel, Head, Geomatics Technology Division, Geomatics Group, SIIPA, Space Applications Centre(ISRO), Ahmedabad - 380 053

Back to Newsletter Tutorial Index Home

Internet Access for Geomatics Professionals - A New Series

The Internet has entered almost all aspects of human activity. Geomatics is no exception. In fact, the capabilities of the Internet makes it an ideal medium for the dissemination of Geomatics related information. All major organisations, vendors and service providers have a presence on the World Wide Web. Many offer free software, tips and other useful information. Several Indian organisations are also on the Web. In fact, our Society has a Web presence at http://members.rediff.com/isg. Apart from this, there are several newsgroups and mailing lists where Geomatics professionals discuss a wide-ranging set of issues. However, most members find the Internet to be an esoteric movement, which is the realm of computer geeks and nerds. While these designers of the computer world have nurtured and developed the Internet to its present glory, access is certainly not confined to them alone. In fact access to the Internet is very simple and requires equipment and skill available to any Geomatics professional.

To access the Internet, we must understand what it consists of and how it works. The Internet grew out of a US Defence effort to develop a command and control network that could survive a nuclear attack. Such a network has to have a large number of computers connected together in a network with multiple redundant paths so that even if a few computers are disabled and a few links are broken still the system as a whole will remain functional. Such a network turned out to be excellent for passing messages between the developers of the system. In 1990, Tim Berners-Lee and scientists of CERN, the European Laboratory for Particle Physics, developed a means of making available documents on a server on the network to be accessed by interested scientists or clients. These documents could have text as well as images and graphics and this was the genesis of the World Wide Web, or in short the Web, on the Internet. These two ways of using the Internet, message passing and client-server represent two paradigms of computer communications. Message passing depends on information push; that is, the owner of the information has to take the initiative to distribute the information. Examples are email, newsgroups and mailing lists. The client-server mode depends on information pull; that is, the users access the information at their initiative. There are several such client server models like Archie, Veronica, Gopher, etc, but the Web has superseded them all. In both these developments the key is "open" systems. There are no proprietary hardware and software, only standards, which are made known to all. There are no owners of Internet but a set of commonly agreed regulations, which must be followed.

One of these standards is TCP/IP that you must have come across in networking jargon. TCP/IP originated in UNIX and was adopted and modified by the Internet and stands for "Transfer Control Protocol/Internet Protocol". TCP/IP is the de facto open standard for computer communications and all systems conform to these standards. The other standard, which grew out of the CERN WWW development, is the hypertext transfer protocol or http and the concept of the Universal Resource Locator or URL using the http. Let us leave these terms for the moment but we will come back to them later in the article. Apart from these the Internet uses standard computer protocols for remote access (Telnet) and file transfer (ftp).

The Internet consists of a large number of computers connected together using the TCP/IP protocol. To access the Internet you must have a computer and a means of connecting to the network. Keeping the "open" systems approach in mind you can choose any computer which supports TCP/IP. Today, all computers qualify and the most economical and simple choice for an individual is the PC, running Windows operating system. To connect to the Internet you need to identify the nearest Internet computer and open an account on this computer. The Internet Service Provider, ISP, who will, for a fee, give you an account on their computer, which is connected to the Internet, facilitates this. In India there are several providers in both the Government and Private sectors. The most common being VSNL, MTNL, DOT, NIC, DOE from the Government and several from the Private Sectors like, Satyam, Wilnet, Icenet and more are coming up.

The ISP provides you with:

  1. A connection
  2. A login name & password selected by you
  3. A Domain Name Server address
  4. A POP3 mail server address
  5. A NNTP news server address

There seems to be a lot of jargon! Please bear with me as I explain them one by one.

There are several options for connecting to the Internet. The most common is the dial up connection using the telephone lines. This is the mode for individual users. To connect the telephone line to the computer we use a modem. The modem is a device that takes the digital data from the computer, compresses it and converts it to analogue signal that it sends out on the telephone line. The modem dials the designated ISP access telephone numbers, negotiates a reliable channel connection with the ISP modem and then hands over control to your computer. So, now you are connected and the ISP computer displays its login prompt on your computer. You can log in to your ISP allotted account using your login name and password. After verification the ISP computer displays its system prompt. Now you have to tell the ISP computer which protocol it must run to enable your computer to talk to it using TCP/IP. This is because TCP/IP was originally designed for networking using high-speed digital lines. To run it on a relatively slow analogue telephone line a special protocol is needed. The most common is "Point to Point Protocol" or PPP. Typing "PPP" at the system prompt launches the necessary programme at the ISP end and sets up the TCP/IP connection through "sockets". You can have multiple "sockets" opened so that you can simultaneously do email, ftp and browsing. All this happens between your computer and the ISP automatically so you need not worry. That is the beauty of TCP/IP. An older protocol is the Serial Line Internet Protocol, SLIP. Use this only if you do not have PPP. Really big users connect to the Internet using ISDN, dedicated lines and VSAT terminals, which give high-speed connectivity.

Let us now understand how the Internet handles the messages and data so that your mail does not end up in your boss's computer! The way it is done is very much like mail handling by the postal system. Every server computer is assigned a unique address called the IP address. When you connect your computer to your ISP's computer you will get a dynamically assigned address. That is a nice way of saying that it is a temporary address! This is done, as you will be connected for a short duration after which the address can be reused. This way more clients can be served using a fixed number of addresses. Your ISP has to have a look-up table, which helps it to direct the transmitted messages to the right address as well as direct the received messages to the correct addresses of its customers. So your ISP acts as the post office sorting section. The IP addresses are a string of numbers like '202.54.4.114', which computers can understand. However, if we have to remember these strings, things can get messy. So we use Domain names like 'ad1.vsnl.net.in'. The 'in' indicates India, 'vsnl.net' indicates the ISP, Videsh Sanchar Nigam Limited and 'ad1' its computer at Ahmedabad. This is much easier for us to understand and remember but is gibberish for the computer so we need a Domain Name Server, DNS that converts Domain names to IP addresses. You, or rather, your computer needs to know only the IP address of your ISP's DNS and direct all its messages to the DNS. The rest of the decoding and routing is the responsibility of the DNS.

The similarity to a post office ends here. When data is transmitted under the TCP/IP it is broken up into small packets of data with a header indicating from which address it has originated and to which address it is destined. Each packet has to be acknowledged by the receiver. The receiver does an integrity check and requests for retransmission of packets, which have failed the test. When all packets have been received successfully they are assembled into a file on the receiving computer. All this happens without your intervention and at high speed so that the entire process is transparent to the user. As you can see your computer has to do a lot of work. It uses several programmes for these tasks. These programmes are a part of the Internet suite of programmes that are supplied by your ISP and installed on your computer. They could also be a part of the Operating System like Windows 98. Installing and setting up an Internet account on your computer is really very easy and Windows 95 & 98 have "Wizards" which help you to set up the computer step by step. Once this is done you are ready to browse the web using Internet Explorer, send and receive email using an email programme like Outlook Express or Eudora and participate in mailing lists and newsgroups.

Web browsing or surfing is the most popular application on the Internet. This is a graphical application needing a good high-resolution monitor and a mouse. The original web concept and browsers were text based and two famous ones are CERN's WWW and University of Kansas' Lynx. The latter is still used by VSNL for their shell account holders! The first graphical browser was the National Centre for Supercomputing Applications, NCSA's Mosaic. This was a free product and is still available. However, today there are two browsers, Internet Explorer by Microsoft and Netscape Communicator by Netscape, which rule the Web. It is interesting to note that the original name for Netscape's browser, now called Navigator, was Mozilla. Some people claim that the term is a contraction of Mosaic Godzilla (e.g., Mosaic killer), since Mosaic was the number one Web browser at the time Netscape began developing its product. Now Netscape and Microsoft are engaged in a similar battle for supremacy! Both are free and contain a wealth of features.

The basic feature of a browser is to interpret a URL and access the resource indicated by the URL. A URL indicates an action and an address. The URL for the Indian Society of Geomatics WebPages is http://members.rediff.com/isg/. Here, 'http' indicates that the hypertext transfer protocol is to be used to access the domain 'members.rediff.com' and retrieve the file from the directory 'isg'. If a specific file is not indicated it will look for and retrieve the file named 'index,html' and display it in the graphical window of the browser. The browser interprets the HTML code in the file and presents it as a graphical page on the computer screen. HTML can handle text, images, sound, video, animation and interactive elements like forms in databases. The beauty of HTML is that it can create hyperlinks to another part of the page, to another page or even to another computer in the network. As a user you do not see any of this but you do get the benefits. A hyperlink is indicated by an underlined text in a different colour, or an image or a symbol. When you move the mouse cursor over a hyperlink it changes to a hand. Clicking the mouse activates the hyperlink and directs the browser to the new location or launches the desired action. URLs can also activate other functions like file transfer, file retrieval, email and remote login. Thus the browser becomes an all in one tool. To do this the browser uses plug-ins, small programmes that bring additional functionality to the browser. Thus, there are plug-ins for Virtual Reality graphics and for showing GIS vector graphics in a browser screen with functions like zoom and pan. Thus it is possible to run a GIS package remotely through a browser on the Web.

For those of you who are interested, the HTML code reads like any text and in fact HTML editing requires nothing more than a text editor like NotePad. A typical HTML statement could be:

<p align="center"><img src="file:// Cabinet.jpg" width="218" height="198"></p>

This is interpreted as "put an image from a file named Cabinet.jpg of width 218 pixels and height 198 pixels and align it centrally." HTML is an implementation of SGML, the International Standards Organisation, ISO's Standard Generalised Mark-up Language. We also have DHTML for dynamic html, VRML for Virtual Reality Modelling Language and so on. As a user all this is transparent to you and you only need to see that the relevant plug-ins are available if you need any of these features. Another important component is Java, which allows small programmes called applets to be downloaded and run on your computer under browser control. Java is very useful to run animations and even database applications. Where do you get Java and all these plug-ins? Relax, Java is now a part of all browsers and plug-ins are add-on programmes supplied with the browser. Sites which require a particular plug-in will warn you if you do not have it and give you the location from where to get it.

While surfing is a very popular Internet application there are two older applications which pre-date the Internet and which have now become a part of it. These are electronic mail, email and News. These two features are a part of all computer operating systems, more so of UNIX. The Simple Mail Transfer Protocol, SMTP was the granddaddy of all email and is still in use by the Internet. However, today the most common protocol is POP3 or Post Office Protocol 3 which utilises the TCP/IP protocol. As the name implies this operates like a post office store and forward system. Mail is addressed to a person by name@domain. Thus my email address is arupdg@vsnl.com. Mail sent to this address will reach the server of my domain and when I log in the mail is downloaded to my computer by the POP3 client software on my computer. Examples of such client programmes are Outlook Express, Eudora, Pegasus Mail, etc. To send an email you need to indicate the receiver's address, a subject line (optional) and the message. Your client will inform you if the mail is successfully sent to your ISP's POP3 server. But, there is no way of knowing if it has reached its destination. One of the important uses of email is the Mailing List. This is a special interest group formed by a set of individuals and hosted on a server which has a programme called Listserv or Majordomo which receives email from individual members and distributes them to all members. This acts like a discussion group where members 'meet' to exchange views, seek help, offer help or just pass the time chatting. There are literally thousands of Mailing Lists. For the geomatics professional the most eclectic ones are GIS-L and IMAGRS-L for GIS and Remote Sensing topics. To join a List you have to subscribe by sending an email to the server with the message 'subscribe List_name Your_name@your_domain'. Some lists require a return confirmation message. Once subscribed you get instructions on how to post messages, how to unsubscribe, how to obtain old messages, and so on. There are also rules of behaviour called 'Netiquette' which must be followed - like no profanity, no personal attacks, etc. Subscription is free.

News is another UNIX feature which was adopted by the Internet. News allows users to post items on a common area of the computer system which can be accessed by all clients. Administrators used it to get across important messages and others used it as a notice board to post information about events and so on. Usenet was started by two Duke University graduate students in North Carolina. In 1979 Tom Truscott and Jim Ellis, thought of hooking computers together to exchange information with the Unix community. From this idea, and the work of several students and enthusiasts came the final version of Usenet. Today Usenet uses the NNTP or Network News Transfer Protocol to distribute messages. NNTP servers receive news feeds from other NNTP servers. They have a choice to select which messages they want. They may also create their own newsgroups for distribution. For some peculiar reason VSNL does not publicise its NNTP server address but all good things must come out. So some kind soul has published the address which is 202.54.1.25 and it works!

Usenet works like a bulletin board. It is divided into newsgroups which are organised according to their specific areas of interest. Since the groups are in a tree structure, the various areas are called hierarchies. There are seven major categories, viz. comp, sci, misc, soc, talk, news and rec. Three other 'alternate' categories are alt, gnu and biz. Of these the ones of interest to Geomatics professionals are:

`comp'

Topics of interest to both computer professionals and hobbyists, including topics in computer science, software sources, and information on hardware and software systems. The most useful one being comp.infosystems.gis which used to mirror the GIS-L list but is now de-linked. The GIS FAQ is one of the best sources of information which developed in this group.

`sci'

Discussions marked by special knowledge relating to research in or application of the established sciences. The group to look for is sci.imageprocessing.

These seven "world" newsgroups are (usually) circulated around the entire Usenet--this implies world-wide distribution. Not all groups actually enjoy such wide distribution, however. The European Usenet and Eunet sites take only a selected subset of the more "technical" groups, and controversial "noise" groups are often not carried by many sites in the U.S. and Canada (these groups are primarily under the `talk' and `soc' classifications). Many sites do not carry some or all of the comp.binaries groups because of the typically large size of the posts in them (being actual executable programs).

Among the alternate hierarchies avoid 'alt' as these tend to be on all kinds of topics and the noise level is very high. The 'biz' hierarchy is for business related topics and has a US flavour. The only hierarchy of interest is:

'gnu'

Groups concentrating on interests and software with the GNU Project of the Free Software Foundation. This may be useful for the software developers as many of the gnu programmes are useful for geomatics tasks.

You access newsgroups through a Newsreader programme like Outlook Express. Initially it will download all available newsgroups from the server. This takes a little time. You have to scan the downloaded groups and subscribe to the ones of interest. No message has to be sent. Only the Newsreader programme has to be told which groups are 'subscribed'. Subsequently, whenever the Newsreader is run it will down load only the headers of the news items from the subscribed newsgroups. You can scan the headers and read only the interesting messages which will be downloaded. The advantage as compared to a Mailing List is that you can be selective about what you want to read and it saves on computer space and download time. Some of the more prolific Mailing Lists - and GIS-L is one of them - can generate 30 to 40 messages a day. You have to download them all and then delete the noise. The downside of Usenet newsgroups is that they are unmoderated i.e. the posts are not subject to any editing hence a number of messages tend to be irrelevant and wading through the headers is a task. My personal view is that mailing list or newsgroup, the benefits outweigh the drawbacks. You can make friends and have an enjoyable time chatting about scientific topics, sharing jokes and commenting on current events. May be it is because I am a Bengali and 'adda' is in my blood!

This is a personalised view of the Internet which I have gathered over ten years. I am not an expert, hence some of my conclusions may be contested by those who are. In the next instalment we will talk about setting up your computer and specifics about surfing, email and news.

If you would like to know more try:

Internet for Dummies, John R. Levine, BPB Publishers, New Delhi

http://www.webopedia.com/ is a site where you can get additional information and links to other informative sites.

--------------------------------------------------------------------------------------------------------------------------------------------------------------

…. To be continued

AR Dasgupta, Deputy Director, Signal, Image and Information Processing Area, Space Applications Centre(ISRO), Ahmedabad-380 053.

Back to Newsletter Home

End of Tutorial