Friday 2 August 2013

Optimising line of sight calculations with caches


In a previous blog here http://davidrowley.blogspot.co.nz/2013/08/calculating-line-of-sight-with-postgis.html I explained about some functionality I put together to allow line of sight calculations using elevation rasters along with PostGIS2.1. Before reading on you should read the previous blog to get yourself up to speed with the problems which we’re going to try to solve here.

The function which calculated the line of sight could do it fairly quickly for average line of sight lengths. If we only had to calculate 1 or 2 short lines of sight we could probably get away with the 50 milliseconds it took for the calculations to process, though if many locations need to be calculated then performance could be a problem. In tests I saw some of my queries taking up to 3 seconds when I was performing queries to find peaks within a 150 kilometre radius of a panorama.

For this problem it’s quite practical to cache the results of the calculations. I by no means have billions of point combinations which I must store line of sight results for. In my website I have around 300 panoramas and even if every panorama had 1000 mountains in a 150 kilometre radius, I’d still only have 300,000 items to cache. Caching seems ideal in this case, our elevation rasters are not going to be out of date in the near future. It’s far more likely that a change would be made to the line of sight function to make it more accurate, but in this case it would be a simple matter of purging the cache to allow the updated function to re-calculate the results.

The first step we need to take here is to create a table which can act as the cache.

CREATE TABLE line_of_sight_cache (
  viewing_location GEOGRAPHY NOT NULL,
  target_location GEOGRAPHY NOT NULL,
  isvisible BOOLEAN NOT NULL,
  PRIMARY KEY(viewing_location, target_location)
)

Here we’re just storing 2 locations along with a Boolean variable to tell us if there is a line of sight between the 2 locations.

We then need to make a few changes to the line of sight function to allow it to write to the cache once it calculates a line of sight.
CREATE OR REPLACE FUNCTION pano_is_point_visible_from(IN startloc GEOGRAPHY, 
                                                      IN endloc GEOGRAPHY,
                                                      IN writecache BOOLEAN
) RETURNS BOOLEAN AS
$BODY$
-- This function calculates if there is a line of sight between 2 points.
-- It does this by analysing elevation data along the direct path towards
-- the destination point.
-- The comments in the function describe the process as if a person was standing
-- at the startloc (parameter) and was walking towards the endloc (parameter)
-- the walking process naturally involves steps. These steps are similar to
-- what the function does. It starts by finding the elevation at both the startloc
-- and the endloc and then calculates the pitch/angle of the line of sight.
-- The function then enters a loop, this is the walking loop, where we start
-- taking steps towards the endloc. After every step we calculate the pitch from
-- the starting location to the end location. If this pitch is ever higher or equal
-- to the pitch of the endloc then we know we cannot see the endloc from the startloc
-- as it's being obscured by our current location.
--
-- The function calculates these pitches using trig functions then it takes that
-- angle which has been calculated and subtracts the number of degrees around the
-- world that the object is away. It does this over a fixed sized sphere rather than
-- a complex spheoid. Please note that at this time atmospheric refraction is not
-- taken into account, this means that objects on the horizon could be miscalculated
-- by around half a degree! This means mountains in the distance will actually be
-- visibly slightly higher than they will be said to be by this function.

DECLARE elevationdata RASTER;
DECLARE tot_distance DOUBLE PRECISION;
DECLARE cur_distance DOUBLE PRECISION;
DECLARE bearing DOUBLE PRECISION;
DECLARE start_elevation DOUBLE PRECISION;
DECLARE end_elevation DOUBLE PRECISION;
DECLARE cur_elevation DOUBLE PRECISION;
DECLARE step_size DOUBLE PRECISION;
DECLARE curloc GEOGRAPHY;
DECLARE end_pitch DOUBLE PRECISION;
DECLARE cur_pitch DOUBLE PRECISION;
BEGIN
 -- This query builds a raster which contains all of the raster data
 -- between 2 points. We use ST_Makeline to create a line between our
 -- starting point and our destination. It is quite possible that we'll
 -- find no or just partial raster data between our 2 points. Later we
 -- do test to see if we got some and return NULL if we found no data.
 SELECT ST_UNION(rast) INTO elevationdata
 FROM demelevation e
 WHERE ST_Intersects(rast, ST_Makeline(CAST(startloc AS GEOMETRY), CAST(endloc AS GEOMETRY)));
 
 -- If we found no data at all then we can quit... At this
 -- point we have no idea if there is a line of sight, so we
 -- return NULL.
 IF elevationdata IS NULL THEN
  RAISE NOTICE 'No elevation data found.';
  RETURN NULL;
 END IF;

 -- We now set the elevation of our start point and our end point.
 -- Note that there currently is a bit of a fuzz factor here and I'm
 -- adding 2 metres to both these values. This is because at least for
 -- our start value our eyes are above the ground and not on the ground, 
 -- so we'll have slightly more things in sight. For the end elevation
 -- this is not quite the case but the 2 metres was added due to the
 -- shapes of some mountains. If for example the mountain is completely
 -- flat at the top and we're standing in a location lower than it, if
 -- the summit location is marked in the middle of that flat area then
 -- we'll not be able to see it. I did not find this ideal as in reality
 -- I could see the mountain, just maybe not quite the middle of the summit,
 -- so the 2 meters being added to the end_elevation is fuzz factor, it
 -- perhaps should be more complex and be happy enough with seeing a point
 -- within X number of meters of the summit. At the time of writing this
 -- I could not decide what that value should be, so it remains like this.
 start_elevation := ST_Value(elevationdata, CAST(startloc AS GEOMETRY)) + 2;
 end_elevation := ST_Value(elevationdata, CAST(endloc AS GEOMETRY)) + 2;
 
 -- Now calculate the bearing which we must "walk" to
 -- our far off destination. 
 bearing := ST_Azimuth(startloc, endloc);


 -- A variable to save the total distance we must walk.
 tot_distance := ST_Distance(startloc, endloc);
 
 -- Set our step size, smaller steps will mean more loops and slower to calculate.
 -- Also there is no point in going in smaller steps than the detail of the raster.
 -- This should match the raster resolution or be more than it for if performance
 -- is a problem.
 
 step_size = 30; -- metres
 
 -- We must now work out the pitch in degrees of our line of
 -- sight from our current location to the destination.
 
 -- Here we use atan which will give us a pitch, or angle on a triangle, since
 -- the earth is round we need to reduce the pitch by the number of degrees
 -- around the earth that the object is. We use a fixed radius for this which
 -- is not quite accurate but it will do for now. Variations caused by
 -- Atmospheric Refraction will likely cause much more variation than the shape
 -- of the planet.
 
 end_pitch := degrees(atan((end_elevation - start_elevation) / tot_distance)) - (tot_distance / (6370986 * pi() * 2) * 360);
 
 -- We now start a loop to walk to our destination.
 -- Note that we stop checking the distance when we're
 -- within step_size to the object, as there's no point in
 -- checking if we can see the object when we're standing
 -- on top of it.

 -- First we just need to take our first step...
 cur_distance := step_size;
 
 WHILE cur_distance <= (tot_distance - step_size) LOOP
  
  -- Now work out the location of our new step based on
  -- our starting location, the current distance we've
  -- travelled and the bearing to the destination.
  curloc := ST_Project(startloc, cur_distance, bearing);
  
  -- Now let's look at the elevation of the current location.
  cur_elevation := ST_Value(elevationdata, CAST(curloc AS GEOMETRY));
  
  --RAISE NOTICE 'start_elevation = %, end_elevation = %, cur_elevation = %, cur_distance = %, bearing = %', start_elevation, end_elevation, cur_elevation, cur_distance, degrees(bearing);
  
  -- Calculate the pitch to our current location, same as we did for
  -- the destination before the loop began.
  cur_pitch := degrees(atan((cur_elevation - start_elevation) / cur_distance)) - (cur_distance / (6370986 * pi() * 2) * 360);
  
  -- Now we simply check if the pitch to from our starting
  -- point to our current location is greater or equal to
  -- the pitch we calculated from the start point to the end
  -- point. If it is then we can't see the end location due
  -- to the current point appearing to be taller from the
  -- start point.
  IF cur_pitch >= end_pitch THEN
   RAISE NOTICE 'Cannot see over object at pitch %. Pitch to destination is %, start elevation %, object elevation = %, distination elevation = %, dist from start = %, dist from end = %',
    cur_pitch, end_pitch, start_elevation, cur_elevation, end_elevation, cur_distance, tot_distance - cur_distance;
   
   -- Write to the cache if enabled.
   IF writecache = True THEN
    -- We only write if the record does not already exist.
    INSERT INTO line_of_sight_cache (viewing_location,target_location,isvisible) 
    SELECT startloc,endloc,false
    FROM (VALUES(1)) AS dual
    WHERE NOT EXISTS(SELECT 1 FROM line_of_sight_cache WHERE viewing_location = startloc AND target_location = endloc);
   END IF;
   
   RETURN FALSE;
  END IF;
  
  -- Now we can take another step then start do the whole process again...
  -- That is, providing we've not arrived at our destination yet.
  cur_distance := cur_distance + step_size;
  
 END LOOP;

 -- Write to the cache if enabled.
 IF writecache = True THEN
  -- We only write if the record does not already exist.
  INSERT INTO line_of_sight_cache (viewing_location,target_location,isvisible) 
  SELECT startloc,endloc,true
  FROM (VALUES(1)) AS dual
  WHERE NOT EXISTS(SELECT 1 FROM line_of_sight_cache WHERE viewing_location = startloc AND target_location = endloc);
 END IF;
 
 RETURN TRUE;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 1000;


All that’s really changed is, I added an extra parameter named writecache which tells the function if it’s ok to write to the cache. I then added some statements just before we return true or false to cache the result in the cache table before resulting the result back to the calling function or user.

All we need to do now is modify the function which gives the visible peaks in the 150km radius from the panoramas location to try to lookup the cache. Here is a simplified version of what I'm using:
SELECT osm.name,
       osm.description,
       osm.alt,
FROM osmdata osm
LEFT OUTER JOIN line_of_sight_cache losc ON loc = losc.viewing_location AND osm.latlong = losc.target_location
WHERE ST_DWithin(osm.latlong, loc, 150000)
  AND COALESCE(losc.isvisible,pano_is_point_visible_from(loc,osm.latlong, true)) = true;
I have a query like this inside a function. “loc” is a parameter I pass to the function marking the location of the panorama. You can see that I’m performing a LEFT OUTER JOIN to attempt a lookup on the cache, if the lookup fails then losc.isvisible will be NULL, so since we’re using that column in the call to COALESCE, that value is used when there is a matching item in the cache, or if nothing is found then we execute the line of sight calculation function.

How well does it perform now you say?

Well first I’ll purge the cache to ensure we're going to hit the function each time.

test=# TRUNCATE TABLE line_of_sight_cache;
TRUNCATE TABLE
Time: 10.033 ms


test=# SELECT name,elevation FROM pano_find_osmdata_points('01010000A0E610000043C5387F1342654038F3AB3940DC45C00000000000309C40',150000);
       name       | elevation
------------------+-----------
 Mount Cook       |      3755
 Mount Malte-Brun |      3198
(2 rows)


Time: 2023.655 ms

So you can see we get our results back in around 2 seconds.
The result of executing the function once will have cached the data into the cache table, so if we query the same query again we should see our query will execute much faster this time.

test=# SELECT name,elevation FROM pano_find_osmdata_points('01010000A0E610000043C5387F1342654038F3AB3940DC45C00000000000309C40',150000);
       name       | elevation
------------------+-----------
 Mount Cook       |      3755
 Mount Malte-Brun |      3198
(2 rows)


Time: 5.312 ms

In this case we’ve managed to increase performance by around 380 times. Though calculating more results in a single pass will see that performance gap increase.

Now since I have elevation data for all of my New Zealand panoramas I think I’d like to cache results for all of the lines of sight that I’ll need. So I’ll just execute a quick query to fill the cache for me.

test=# SELECT pano_find_osmdata_points(latlong,150000) FROM pano_locations WHERE countrycode = 'NZ';

Time: 99265.579 ms

The cache was filled in just under 100 seconds!

It’s now time for me to download elevation data for my panoramas outside of New Zealand.



2 comments:

  1. Hi David, We need a SQL LOS function for the UK, to determine LOS between two points with a Location height in metres. We would need to run this calculation over a large dataset, so would prefer an in house solution. You article is very interesting - please let me know if this would be something you would consider considering assisting us with? We are based in Christchurch (UK!). My email is info[at]jonathanspratt[dot]com Regards, Jonathan

    ReplyDelete
    Replies
    1. Hi Jonathan. I've contacted you via email.

      Delete