Narendra Kumar's blog

Download a file from internet and gunzip it in memory in GoLang

In this blog post we I will share how to download a gzip file from a website and process it after decompressing it in memory. We won't be using disk storage.

What is Gunzip or gzip

gzip is a file format and a computer program which can compress/decompress a file. Whenever you want to reduce the size of a file, you can use gzip to compress that file. You can also compress a directory/folder using gzip along with tar: first create a tar and then compress it. Usually a file compressed with gzip ends with .gz extension and a directory compressed with gzip ends with .tar.gz. Since internet can explain gzip better than I can, we will not waste more time on this.

Download a gzip file from IMDB's website

Downloading a gzip file is no different than downloading any other binary file in GoLang. We will use GoLang's net/http package to do it.

resp, err := http.Get("https://datasets.imdbws.com/title.ratings.tsv.gz")
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()
	if resp.StatusCode != 200 {
		log.Fatalf("could not pull data from imdb, status code is %d", resp.StatusCode)
	}

The above given code, downloads ratings file from IMDB. Now we have to read the response body and decompress it using compress/gzip package of GoLang as shown below:

gReader, err := gzip.NewReader(resp.Body)
	if err != nil {
		log.Fatal(err)
	}
	scanner := bufio.NewScanner(gReader)

	done := false
	for scanner.Scan() {
		line := scanner.Text()
		println(line)
		done = true
		if done {
			break
		}
	}

The above code wraps the http response body in gzip reader, this reader can decompress the input. And then we wrap this gzip reader inside a scanner. Scanner let us read a io.Reader line by line. The above program, intentionally, just prints the first line of the file downloaded from IMDB because the file is too big and will screw my terminal.

I found this quite easy to be able to download and decompress a file in memory in golang.

Thanks for reading.