Sample of CRTC metadata output

The following is a sample of the metadata output obtained from scraping some recent documents posted by the CRTC. The information is in the json format, which is a flexible xml type file format that allows for the storage and navigation of unstructured or inconsistent data.

This sample contains information that is not visible on the CRTC webpage, illustrated here by the field keywords. The next step for this scraping project is capturing the actual content of the page. There is a slight challenge here in terms of accurately recording the paragraph numbers, but so far the progress has been promising.