Sample of CRTC metadata output
The following is a sample of the metadata output obtained from scraping some recent documents posted by the CRTC. The information is in the json
format, which is a flexible xml
type file format that allows for the storage and navigation of unstructured or inconsistent data.
This sample contains information that is not visible on the CRTC webpage, illustrated here by the field keywords
. The next step for this scraping project is capturing the actual content of the page. There is a slight challenge here in terms of accurately recording the paragraph numbers, but so far the progress has been promising.