The web is a vast and complex place with endless information: stock prices, weather data, sport stats, flight prices, news feeds…the list goes on. For people or companies looking to regularly access this information and use it productively, they need to manually copy/paste it into new documents or implement a technology cheat code: web scraping.
Web scraping, web automation, and you
Never heard of web scraping before? Well, you’ve come to the right place, because at Govly, we use it all the time. Web scraping refers to the extraction of data from a website, using bots (programs we write to act on our behalf) to extract content and data through the underlying HTML code and, with it, data stored in a database. The scraper can then collect and export the entire website content elsewhere into a format that is more useful, be it a spreadsheet or as JSON to send to an API (spoiler alert, Govly uses an API to store the data and make it accessible throughout our platform, but we’ll get more into that later).
This process can unearth valuable data and information for those who can utilize it in effective ways. For instance, web scraping is often used for real estate listings, industry statistics and insights, comparison shopping sites, lead generation, and, you guessed it, government opportunities.
Although web scraping can be done manually, it’s not effective or efficient. Automated tools are preferred when scraping web data because it’s faster and cheaper. To do so, web automation, a subset of web scraping, is deployed using automation software to launch or attach to a virtual browser instance and perform tasks as though they were being performed by a human. There are many tools out there that can be used for web automation. For this post, we’ll focus on what we use at Govly, which are two different web automation tools called Cheerio and Playwright.
Cheerio (the simpler of the two) is a Javascript program that parses raw HTML and XML data and provides a consistent model to help traverse and manipulate the resulting data structure. Cheerio is extremely fast and efficient because it does not launch its own browser instance, nor does it deal with any visual rendering–It just requests HTML and parses it. This works great for simple html websites, but no longer works for most modern websites that heavily utilize Javascript to display their content.
Playwright is a browser automation tool that can automate tasks on more complex websites. It does this by actually launching a virtual browser instance (like Chrome or Firefox) which can then mimic users’ behavior in a way that is largely indistinguishable from an actual human.
Regardless of which specific technology is used, after accessing a webpage the scraper will then either extract all the data on the page or specific data selected by the program. Once the data is extracted, the web scraper will output all the data that has been collected into a coded format, such as a CSV/Excel spreadsheet or JSON, which can be used for an API.
Web scraping code
Wait, this sounds too good to be true
Utilizing something so simple and straightforward to circumvent the cumbersome process of finding government opportunities may sound too good to be true, but it isn’t. The reason that web scraping is such an important and valuable tool for government contracting is that, currently, government websites can be a bit archaic. They often take an incredibly long time to load and are difficult to navigate, which can be a huge roadblock for companies looking to land new opportunities and/or maintain contract vehicle compliance.
At Govly, we provide platforms and services that enable companies looking to do or maintain business with the government, so they can get access more effectively and efficiently. If someone has to wait 30 seconds or more for each page load, and they have to go through 500 pages a day or else risk losing an opportunity or even their access to a contract vehicle, this isn’t a sustainable process.
Our platform and services utilize web scraping and web automation in a number of ways to help companies win or maintain government business. We start by using automation to help ingest all opportunity data sources into a single platform. We then enrich this data with anything that’s missing, like opportunity attachments that are only accessible via a specific contract portal.
Our clients can then directly access data that is specifically relevant to them based on filters and saved searches. Instead of trying to browse through thousands of different contracts manually, our platform consolidates only the ones that are relevant to our customers’ business. For instance, if one of our clients is a technology company specializing in cybersecurity, we can filter the complex data, so that they only see opportunities that would be relevant to them.
Having access to all the relevant government opportunities in real-time is a huge time saver, but our platform doesn’t just stop there. It also completes the data for an opportunity that a client may have received in an email by going into the portal to grab the attachments and the rest of the information that was excluded from the original email. In this case, even if a business was already made aware of an opportunity via email, our platform helps to complete the data for the opportunity without someone manually having to go into the government portal—and endure the slow and cumbersome process—to retrieve the additional data. We give you a single view of all contract opportunities through aggregated data across multiple portals.
Lastly, our platform helps prime contractors who have access to specific contract vehicles maintain their compliance and thus maintain their business on those contract vehicles. There are specific tasks that have to be managed in government portals in order for those with standing contracts to stay in compliance. Using web automation, we automate those tasks and significantly ease the manual burden of maintaining compliance.
At Govly, our mission is to ensure the government can consistently procure the best products and services at the lowest price. We believe that the platform we’ve built and continue to improve drives this mission forward.
About the author
Nick Weiland is co-founder and CTO of Govly, where he leads the engineering and product teams.. He lives in California with his partner and young daughter and enjoys surfing with his family and hiking with his dog, Ender.
About Govly
Suppose you are a technology company that’s interested in selling to the U.S. Government, or a technology company that’s already selling to the government. In either case, the Govly application and leadership team are prepared to help you accelerate your business. We help businesses that want to sell to the U.S. Government by providing the insights and access to partnerships needed to do so, and assist businesses that already sell to the government by centralizing and filtering public and private opportunities into a single location that fosters B2B collaboration across the supply chain.
We’d love to talk to you.