Miha Rekar bio photo

Miha Rekar

πŸ‘¨β€πŸ’» Software Developer
πŸŽ™οΈ Podcaster
β˜•οΈ Home Barista
πŸƒ Runner
πŸ“· Photographer
πŸ“– Aspiring Stoic
πŸ¦„ Incurably Curious

Email Instagram Github LinkedIn

I’m scraping a bunch of websites lately and got bored with using File.write to store cached versions of websites. Because I’m still developing the script I don’t want it to hit the real website every time. So simple way to fix that is with the vcr gem. While made primarily for testing you can also use it for this kind of tasks.

First you need some kind of configuration file that loads before your actual script. I have it in config/vcr.rb:

VCR.configure do |c|
  c.cassette_library_dir = 'cassettes'
  c.hook_into :webmock
  c.allow_http_connections_when_no_cassette = true
end

Then I have a Shared module with the cache method which I include in any classes I need this functionality:

module Shared
  def cache name
    VCR.use_cassette name do
      yield
    end
  end
end

And now you can use this magic, to have the website you’re scraping instantly cached:

def github_for user
  cache "gh-#{user}" do
    response = open("https://api.github.com/users/#{user}").read
    JSON[response]
  end
end