A QGIS map of the more than 25,000 intersections in L.A. County that recorded a pedestrian-involved collision from 2002 through 2013.

How we did it: L.A. pedestrian accidents analysis

To analyze pedestrian accidents in L.A. County we began by downloading 5.6 million collision records from the California Highway Patrol's Statewide Integrated Traffic Records System (SWITRS) database.

From there we narrowed the scope to L.A. County collisions, and further narrowed the pool by removing records that cited only property damage. We did so because transportation officials said such accidents were inconsistently reported.

Loading and counting data

The L.A. County accidents that met our criteria, about 665,000 records, were then loaded into a Django app (an open-source web framework for developers). Using Django, we aggregated collision records by intersection to simplify the analysis. Each collision was assigned an intersection based on the primary and secondary roads identified in the record.

The intersections were located geographically, duplicates were removed, and columns were added for the counts of different types of collisions at each location — pedestrian, bicycle, hit-and-run, etc.

In some cases, intersections simply could not be located. In other cases, the coordinates given would have placed them in the ocean or on top of a mountain. In all, we were able to assign a location to nearly 99.5% of intersections in L.A. County where a collision between a car and a pedestrian was reported.

How did we determine which collisions involved a pedestrian? While it would seem obvious to use the "pedestrian involved" column included in the SWITRS data, we identified accidents involving cars and people on foot via the "motor vehicle involved with" column. Why? According to the California Highway Patrol, information gathered under that category, which identifies other parties in an accident, provides a more accurate record.

The analysis

Distribution of pedestrian collision counts at intersections: The Times analysis included any intersection with more than 10 pedestrian-involved crashes.
Distribution of accidents at intersections: We assigned z-values (standard scores) to determine where the volume of accidents involving pedestrians was higher than normal. Tall bar at right includes intersections with three collisions, all involving pedestrians.
A histogram showing the distribution of z-values of fatality rates in pedestrian collisions at intersections. Tall bar at right includes intersections with three pedestrian collisions, two of which were fatal.

Now that every pedestrian collision was assigned to an intersection, I worked with my colleagues Ryan Menezes and Doug Smith to establish what made an intersection statistically dangerous.

How did we do that? We counted the collisions at each intersection and calculated z-values — which describe how far the count at each intersection deviates from the mean.

In this case, we looked at 25,821 intersections with collisions involving a pedestrian and considered three factors:

  • The total number of pedestrian collisions.
  • The proportion of collisions that involved a pedestrian.
  • And the proportion of pedestrian accidents that were fatal.

The z-values for the two proportions were weighted with the number of collisions. This was a key step which allowed us to consider how busy or accident-prone an area was when identifying problem areas.

Looking at distributions, we were able to identify sensible cutoff points. We included all 583 intersections with more than 10 pedestrian collisions. (For those following the statistical analysis thread, these had z-values of 5.2 or greater.)

We also wanted to look for areas that might not have as many overall incidents but were disproportionally dangerous for pedestrians. Using this method we found 309 intersections with three collisions, all involving pedestrians. These intersections had z-values greater than the 583 cited above (z=5.5 compared with z=5.2). And we found another 44 in which two out of the three collisions with pedestrians were fatal (z=5.7).

Once we eliminated any overlap, 817 intersections throughout L.A. County met the criteria we had set for being dangerous. Of these, 579 fell within or at the city boundaries. All three cutoff points used to identify these intersections were well over five standard deviations from the mean. In plain English, this meant that all these intersections stood out as unusually dangerous compared with others in the county.

The heatmap and contours (Warning: We're about to get very technical)

A QGIS heatmap of intersections identified as dangerous in a Times analysis with contours highlighting clusters of intersections drawn in green.

So how to represent this data visually? We wanted to highlight areas of the city with clusters of these problematic intersections. We used the open-source program QGIS to perform the geospatial analysis. QGIS can connect directly with the Postgres database backing our Django app, allowing us to import the 817 intersections we had identified as dangerous.

We used the QGIS heatmap plugin to create a heatmap of problematic areas in radiuses of 750 meters. A high-value contour of the heatmap was converted from a raster to a shapefile to represent the highly-problematic areas. These shapes were then imported back into the Django app to calculate statistics such as how many of our identified intersections each shape contained and how many collisions happened in each area overall and at the dangerous intersections specifically.

After generating a heatmap of pedestrian collisions at all intersections, the lower (green and yellow) values were removed, keeping the reds and oranges that show the more problematic areas.

After generating a heatmap of pedestrian collisions at all intersections, the lower (green and yellow) values were removed, keeping the reds and oranges that show the more problematic areas.

Although the contours highlighted the most dangerous areas, they left out many intersections with more than a few accidents. To expand the view, we added a secondary heatmap derived from all 25,821 intersections across Los Angeles County that had seen at least one pedestrian collision. We set a smaller hotspot radius of 200 meters for each point and weighted the value to the number of pedestrian collisions. Then we weeded out areas with the fewest accidents by plotting the hotspots on a green-to-red scale and making the green portion of each area (the lowest values) transparent. A high percentage of the hotspots were all green and so disappeared.

The resulting map brought out another pattern of long thoroughfares such as Vermont Avenue, Western Avenue and Santa Monica Boulevard that consistently had accidents in relatively low numbers along their lengths.

Creating the Leaflet map

The resulting heatmap not only showed especially problematic areas, but also highlighted streets that saw a slightly lower, but consistent, amount of pedestrian collisions along their length. These include Vermont Avenue, Western Avenue and Santa Monica Boulevard

We wanted to display this heatmap as a layer directly on the map, so we saved the layer as a GeoTIFF, then used TileMill to export the GeoTIFF into map tiles that could be served from CloudFront.

Creating the Leaflet map required a couple of tricks. We wanted the shaded area of the contours to be below the heat map, but the borders should appear above the heat map. This meant we'd actually duplicate the contours layer, and apply two sets of styles to them — one with a fill and no borders, the other with borders and a transparent fill. Since the heat map is a tile layer, this meant we had to have an SVG layer both above and below the tile layers. In addition we wanted a label by each contour, and for each to have a little pointer going to the center of the contour.

Leaflet displays its data in different map "panes" — by default there's one for tiles, one for markers, one for popups, etc. So to address the first issue, we created custom panes on the Leaflet map to render the contour shapes, and above that to hold the heatmap tiles.

var customSvgPane = map.createPane('customSvgPane'),
    heatMapPane = map.createPane('heatMapPane');
        

We also needed a custom SVG renderer to the custom SVG pane to render the geojson for the contours.

var customSvgRenderer = (L.SVG && L.svg()) || (L.Canvas && L.canvas());
customSvgRenderer.options.pane = 'customSvgPane';
        

When adding layers, you can specify which pane you'd like the layer added to in the options, e.g.:

var heatMapLayer =  L.tileLayer(heatMapTilesUrl, {
                                minZoom: 8,
                                maxZoom: 15,
                                opacity: 0.75,
                                pane: 'heatMapPane'
                            }).addTo(map);
heatMapLayer.setZIndex(3);
        

And likewise for the contours layer. Note that the renderer is also specified, and the z-index for this layer is below the heatmap.

// "fill" contours layer displayed under the heatmap
var contoursLayer = L.geoJson(features, {
    onEachFeature: onEachFeature,
    style: contourStyles,
    pane: 'customSvgPane',
    renderer: customSvgRenderer
}).addTo(map);
contoursLayer.setZIndex(2);
        

We used the Leaflet.label plugin to create the contour labels, and wrote a small extension to L.Path to draw a pointer from the label to the centroid of the contour.

L.PointerMethods = {
    pointerStyles: {
        color: "#333",
        className: 'focus-line',
        weight: 1.0
    },
    pointerAnchorStyles: {
        className: "focus-line",
        color: "#333",
        fillColor: "#333",
        fillOpacity: 1.0,
        opacity: 1.0,
        weight: 3,
    },
    addPointer: function(opts){
        var defaults = {
            anchorOffset: [0, 0],
            labelOffset: [0, 0]
        };
        var pointerOpts = opts || defaults;
        // Get the center of the polygon
        var centroid = this.getCenter(),
            label = this.label;

        // no point in adding anything unless it has a label
        if (label) {
            var labelOpts = label.options;

            if (pointerOpts.anchorOffset) {
                // Convert the centroid to pixels so we can tweak it if need be
                centroid = map.latLngToContainerPoint(centroid);
                centroid = centroid.add(pointerOpts.anchorOffset);
                centroid = map.containerPointToLatLng(centroid);
            }

            // Add a little circle to the center of the polygon
            this.anchor = L.circleMarker(centroid, this.pointerAnchorStyles).setRadius(1).addTo(map);
            // Convert the layer's latlng to pixel coordinates
            // and cache the layer's offset values
            // Y-values take into account the label height
            var point2 = map.latLngToContainerPoint(label._latlng);
            var offset_x = labelOpts.offset[0] + pointerOpts.labelOffset[0];
            var offset_y = labelOpts.offset[1] + pointerOpts.labelOffset[1];

            // For layers that are to the right, you want to add the x value
            // For layers on the left, you want to subtract
            if (labelOpts.direction === "right") {
                point2 = point2.add([offset_x, offset_y]);
            } else {
                point2 = point2.add([-offset_x, offset_y]);
            }

            // Convert the point back to a latlng
            // and draw the line on the map
            point2 = map.containerPointToLatLng(point2);
            this.pointer = L.polyline([centroid, point2], this.pointerStyles).addTo(map);
        }
    }
};
L.Path.include(L.PointerMethods);
        

The addPointer() method is then called after defining the label, using the labelOffset option to align the pointer with the label title.

layer.bindLabel(labelHtml, {
    className: 'focus-circle-label',
    direction: focus.direction,
    noHide: true,
    pane: 'popupPane',
    offset: offset
}).addTo(map).showLabel();

layer.addPointer({labelOffset:[3,13]});
        

Update
July 14, 2015: Due to a geocoding error, the total number of L.A. County intersections with a pedestrian collision was changed to 25,821 from 25,823.

Full story: In L.A.’s quest to be pedestrian-friendly, safety is a major speed bump

Spreadsheet: L.A. County's dangerous intersections

Contact the reporter