Geo-Spatial Tools (GIS) (Topic)

From Tools for Applied Data Analysis
Jump to: navigation, search


What is GIS?

GIS stands for "Geographic Information Systems," but basically it's the label we apply to any time of data or analysis that takes into account where things are in relation to one another.

When people work with GIS data, the degree to which they wish to use this spatial data varies dramatically. Sometimes spatial data is just used to connect or merge two datasets. For example, people may use information on the GPS location of survey respondents to figure out which electoral constituency they live in. They use GIS tools to connect survey data to information relevant election outcomes or politician identities, but once these connections have been made, they just go back to working with traditional data analysis tools, like R or Stata. In other cases, people really want to leverage spatial data to do more complicated things, like calculate travel times between locations taking into account road availability or elevation changes, or to measure how the impact of a new hospital on citizens varies with the distance of people to the hospital.


Tools for GIS

Several software tools have been developed for GIS, though to be honest none of them are as pleasant to work with as traditional data tools like R or Stata.

One important distinction to keep in mind when choosing tools is the difference between Geospatial Analysis (doing statistical calculations based on GIS information) and Cartography (the production of visually appealing maps), as different tools have different abilities in each area.

Quick Side-By-Side

GIS Software Choices
Cost Graphical User Interface (GUI) Geo-Spatial Analysis Cartography Operating Systems Programming Language
ArcGIS Expensive. Student Trials Available. Yes Yes, with right license Best Windows Only Python
QGIS Free Yes Yes So-So All Python
R Free No Yes Weak All R
Python Free No Yes So-so All Python

ArcGIS

One of the most popular and powerful tools for GIS analysis is ArcGIS, made by ESRI. ArcGIS is very powerful and very good at both Cartography and Geospatial Analysis.

Note that ArcGIS is extremely expensive. Lots of schools can provide students with licenses, and student discounts exist, but even their most basic license (which may not even do what you want to do) for a normal user currently costs more than $1,500 [1], so make sure you can afford Arc before you start using it!

ArcGIS has a nice graphical interface, but users can also use Python for scripting ArcGIS commands, which is very useful when you discover that you need to go back and change one of the settings in the third step of twenty in your workflow. The tools for this is ArcPy, a tutorial for which can be found here.

ESRI sells several versions of Arc. These versions are actually the same, but depending on the price you pay, the tools available to you vary.

ArcGIS only runs on windows, which can be annoying for Mac users. May Mac users will either use a program that allows you to run Windows programs from within OSX (like Parallels Desktop, VMWare Fusion, or if you want a free tool the open-source VirtualBox. Or, if you don't need to be running OSX and Windows simultaneously, one can use BootCamp to setup their computer to also boot up in Windows.

QGIS

QGIS is an open-source project aimed at creating a free substitute for ArcGIS. Like ArcGIS, you can write scripts for all your commands in QGIS using Python, although the syntax is very different.

QGIS is considered to be a very good tool for Geospatial Analysis, but be aware that it still falls behind ArcGIS in its ability to make pretty maps (Cartography). It also runs natively in OSX, which makes it much easier to use for Mac users.

R

Yup, R will do GIS tool!

Here's one set of tutorials written by the founder of this wiki.

Another excellent set of tutorials can be found here. Make sure to click through to the page for each "Part", and then to the "notebook" page in each part. The spatial data types page in particular is a great starting point!

Python

Yup! Python will do GIS without ArcGIS!

Most (non-ArcGIS) GIS libraries for Python are built on open-source software maintained by OSGeo (see next section for more on them), and are distributed across a couple different libraries:

  • Fiona: Tools for importing and exporting vector data from various formats like shapefile.
  • Rasterio: Tools for importing and exporting raster data from various formats
  • PyProj: Tools for defining and transforming the datum and projections of spatial data
  • Shapely: Tools for spatial analytics, like testing for intersections, measuring areas, etc. Note that this is basically a tool for analyzing 2-dimensional cartesian shapes -- it has no facilities for managing projections. That you have to do with PyProj before you start manipulations with shapely.
  • RTree: Spatial analytics (like intersections) can be relatively computationally difficult and thus slow. For example, if you want to do something like a spatial join of millions of points to a shapefile of polygons, you want to use what's called a "Spatial Index" tool like RTree. Basically, for each point, RTree will very quickly identify a list of polygons with which that point might intersect (this list will always include the polygon that the point intersects with, but also some others. In other words, it has no false negatives, but lots of false positives). You then use a slower but more accurate tool like Shapely to check more accurately whether your point lies in each of these candidates to find the one true intersecting polygon.
  • CartoPy and Descartes: Cartography tools for making pretty maps. Cartopy is basically the successor to Basemap, which you may also read about on some forums.

OSGeo, GDAL, OGR, and GRASS: Acronyms to know

If you decide to work with anything other than ArcGIS, you're gonna keep coming across a couple abbreviations worth knowing -- OSGeo, GDAL, and OGR. OSGeo is the open-source geospatial foundation, and they manage most of the open-source software for geospatial analyses. If you're working with GIS in R, Python, or QGIS, you're basically using OSGeo tools.

These tools are usually referred to as GDAL (Geospatial Data Abstraction Library). It used to be the case OSGeo has two sets of tools -- GDAL for raster data and OGR (which apparently stands for OpenGIS Simple Features Reference Implementation ??) for vector data -- but now there's really only one software library, sometimes referred to as just GDAL and sometimes GDAL / OGR.

GRASS is a OSGeo platform for GIS analysis that tries to unify GDAL tools and provide an open-source graphical user interface, kind of like QGIS.

External Resources

Excellent Introductory QGIS Workshop