Web Scraping In Golang



I have been flirting with go for a few weeks now and I built a simple forum-like website using gin which is a popular web framework for golang. After building the application, I was satisfied with how much I was able to learn about the language so I decided to do another little project with it. While I was browsing the web mindlessly(like most of us do), I stumbled upon a comment that talked about web scrapping in python then an idea popped in my mind, why not scrape the frontpage of a popular online forum then use the data to populate my own database. Now, scraping a website is not illegal, but it’s good you should know what you can and cannot scrape from a website. Many websites have a robots.txt file which gives such information. While there are tons of web scrapping tutorials on the web mostly in python, I felt there weren’t enough of them in go so I decided to write one. I did my research and found out an elegant golang framework for scraping websites called colly, and with this tool I was able to scrape the frontpage of a popular Nigerian forum called Nairaland.

  1. Web Scraping In Golang 2020
  2. Web Scraping Using Golang
  3. Golang Web Service Example
  4. Web Scraping Golang Vs Python
Scraping

The Go language has a ton of hype around it as it’s relatively new, the syntax is relatively easy to pick up as compared to other statically typed languages, it is very fast and natively supports concurrency which makes it a language of choice for many in building cloud services and network applications. We can leverage this speed to scrape websites in a fast and easy way.

Web Scraping

“Web scraping is a computer software technique of extracting information from websites” “Web scraping focuses on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet.”. Web Scraping is the automated method of extracting human-readable data output from a website. The specific data is gathered and copied into a central local database for later retrieval or analysis.

Resolving the Complexities of Web Scraping with Python Picking the right tools, libraries, and frameworks. First and foremost, I can't stress enough the utility of browser tools for visual inspection. Effectively planning our web scraping approach upfront can probably save us hours of head scratching in advance. Scraping framework for extracting the data you need from websites, used for a wide range of applications, like data mining, data processing or archiving. Scraping framework for extracting the data you need from websites, used for a wide range of applications, like data mining, data processing or archiving.

Web scraping is a form of data extraction that basically extracts data from websites. A web scraper is usually a bot that uses the HTTP protocol to access websites, extract HTML elements from them and use the data for various purposes.I’ll be sharing with you how you can scrape a website with minimal effort in go, let’s go 🚀

First, you will need to have go installed on your system and know the basics of the language before you can proceed.We’ll start by creating a folder to house our project, open your terminal and create a folder.

Then initialize a go module, using the go toolchain.

Replace the username and project name with appropriate values, by now we should have two files in our folder called go.mod and go.sum, these will track our dependencies.Next we go get colly with the following command.

Then we can get our hands dirty. Create a new main.go file and fire up your favorite text editor or IDE.

The above is the data structure we will be storing a single post in, it will contain necessary information about a single post. This was all I needed to populate my database, I was not interested in getting the comments since we all know how toxic the comments section of forums can be :).

We need to call the NewCollector function to create our web scrapper, then using CSS selectors, we can identify specific elements to extract data from. The main idea is that we target specific nodes, extract data, build our data structure and dump it in a json file. After inspecting the nairaland HTML structure (which I think is quite messy), I was able to target the specific nodes I wanted.

The OnHTML method registers a callback function to be called every time the scrapper comes across an html node with the selector we passed in. The above code visits every link of frontpage news.

Web scraping in golang airWeb scraping golang vs python

What is happening here is that when we visit each link to a frontpage news, we extract the title, url, body and author name using CSS selectors to identify where they are located, we then build up our post struct with this data and append it to our slice. The OnRequest and OnResponse functions registers a callback each to be called when our scrapper makes a request and receives a response respectively. With this data at our disposal, we can then serialize it into json to be dumped on disk. There are other storage backends you can use if you want to do something advanced, checkout the docs. We then make a call to c.Visit to visit our target website.

We use the standard library’s json package to serialize json then write it to a file on disk, and voila we have written our first scrapping tool in golang, easy right?. Armed with this tool, you can conquer all the web, but remember to check the robots.txt file which tells you what data you can scrape and how to handle the data. You can read more about the robots file here, and remeber to visit the docs to learn more there’s a ton of great examples you can follow along there. Cheers ✌️

Thank you for reading

Golang

In general programming interfaces are contracts that have a set of functions to be implemented to fulfill that contract. Go is no different. Go has great support for interfaces and they are implemented in an implicit way. They allow polymorphism in Go. In this post, we will talk about interfaces, what they are, and how they can be used.

What is an Interface?

An interface is an abstract concept which enables polymorphism in Go. A variable of that interface can hold the value that implements the type. Type assertion is used to get the underlying concrete value as we will see in this post.

Declaring an interface in GoLang

An interface is declared as a type. Here is the declaration that is used to declare an interface.

type interfaceName interface{}

Zero-value of an interface

The zero value of an interface is nil. That means it holds no value and type. The code below shows that.

The empty interface in Go

An interface is empty if it has no functions at all. An empty interface holds any type. That’s why it is extremely useful in many cases. Below is the declaration of an empty interface.

var i interface{}

Implementing an interface in GoLang

An interface is implemented when the type has implemented the functions of the interface. Here is an example showing how to implement an interface.

Implementing multiple interfaces in Go

Web Scraping In Golang 2020

Multiple interfaces can be implemented at the same time. If all the functions are all implemented then the type implements all the interfaces. Below the type, the bird type implements both the interfaces by implementing the functions.

Composing interfaces together

Interfaces can be composed together. The composition is one of the most important concepts in software development. When multiple interfaces are implemented then the type has performed composition. This is really helpful where polymorphism is needed.

Values in an interface

Interface values have a concrete value and a dynamic type.

In the code above chirper is of type Bird but has a concrete value of {Chirpir}.

Type assertion using the interface

Type assertion is a way to get the underlying value an interface holds. This means if an interface variable is assigned a string then the underlying value it holds is the string. Here is an example showing how to use type assertion using interfaces.

Type switch using an interface

Type switches are an extremely similar control structure like the switch-cases, the only difference is here the interface type is used to switch between different conditions.

Equality of interface values

The interface values can be equal if any of the conditions shown below are true.

Web Scraping Using Golang

  • They both are nil.
  • They have the same underlying concrete values and the same dynamic type.

Using interfaces with functions

Interfaces can be passed to functions just like any other type. Here is an example showing the usage of the interface with functions. A great advantage when using an interface is that it allows any type of argument as we can see in this code below.

Golang Web Service Example

Uses of an interface

Interfaces are used in Go where polymorphism is needed. In a function where multiple types can be passed an interface can be used. Interfaces allow Go to have polymorphism.

Web Scraping Golang Vs Python

Interfaces are a great feature in Go and should be used wisely.