Alacrity – Web Page Scraper in Ruby

Wrote a simple Web Page Scrapper in Ruby , feel free to fork it and contribute more!

https://github.com/raycoding/alacrity

Alacrity is a simple Ruby Scraper, given a web page source url, alacrity finds all relevant information you want for the search elements. Alacrity depends on Nokogiri gem and uses css selector inbuilt in Nokogiri.

Lets say you have a source page url where-in a snippet is following:

<html>
    <body>
        <h3>I want to be scraped!</h3>
        <h3>Dont forget to scrap me too!</h3>
    </body>
</html>

Running Alacrity for searching elements ‘h3’ will return something like this:

{all_h3_tags => {0=>"I want to be scraped!",1=>{Dont forget to scrap me too!}}

Sample Run:

get_me_info = Alacrity::Source.new("http://some_url.com") do
  fetch "all_h3_tags", :lookup=>'h3'
end

Custom procs and lambas!

Alacrity gets the text of all elements found by default, although you can run your own Procs with definition depending what you want your structured data to be, note the ‘elem’ inside your proc/lambda are Nokogiri::XML::Element, so read the documentation over at Nokogiri to see the methods and variables you have defined on Nokogiri::XML::Element

Sample Run:

get_me_info = Alacritys::Source.new("http://www.infibeam.com/") do
  fetch "all_anchor_tags", :lookup=>'a',:post_fetch=>proc {|elem| elem.attributes["href"].value rescue nil}
end

get_me_info.structured_data["all_anchor_tags"] should give you all anchor tags links!

Alacrity

State Machine for Ruby Classes and Rails ActiveRecord Models

State Machine Flowchart Gem in Ruby

State Machine Flowchart Gem in Ruby

Hi today I will be writing about a gem I have developed in Ruby/Ruby on Rails for implementing state machines for Ruby classes (plain Ruby classes or Rails Active Record Models). In Object Oriented Programming objects transit from one state to another state depending on the workflow therefore it is imperative for a developer to harness the power of a finite state machine in object flow, as it leverages the concepts of Business Process Management. The gem works both for Non-Active Record Ruby classes as well as Rails Active Record Models out of the box!

State Flow + WorkFlow = Business Process Management

Standalone State Flow is not Process flow, here I have focussed on implementing the state of an object via transitions and how you can do a lot more with states. Worth mentioning I was very much inspired by another Rubygem named – AASM https://github.com/aasm/aasm

How does it work? Here is a snippet of the Ruby code

flowchart do
  init_flowstate :init

  flowstate :init do
    preprocess Proc.new { |o| p "blah blah" }
    postprocess :some_method
  end

  flowstate :uploaded do
      preprocess Proc.new  :do_something
      postprocess Proc.new  { |o| p "File has been uploaded" }
  end

  action :upload do
    transitions :from => :init, :to => :uploaded, :condition => :file_parsable?
  end
end

Gem – Flowchart

– Flowchart is a rubygem for State Machine(state-action-transition) workflow.
– Flowchart works out of the box both for ActiveRecord and Non-ActiveRecord Models.
– It provides an easy DSL to create a state machine flow for your model’s object.

Continue reading

Nested Hash to Dotted Notation Hash in Ruby

This was one practical problem I encountered in day-to-day coding in Ruby where I have nested Hash and I want to convert to a Dotted Notation Hash lets say for pretty printing or even print the Hash to csv!

e.g of nested Hash

 my_nested_hash =>
{
 "os"=>{
        "1"=>"windows",
        "2"=>"linux",
        "3"=>"mac"
       },
 "language"=>"ruby",
 "options"=>
  {
   "1"=>{"name"=>"Color", "value"=>"Black"},
   "2"=>{"name"=>"Color", "value"=>"Blue"},
   "3"=>{"name"=>"Color", "value"=>"Red"}
  }
}

Pretty nasty nested example! But I when I print my Hash I just to display it more simplified in dotted notation which is much more easy for readability and other purposes which you can think of.

my_dotted_hash = {
                  "options.1.name"=>"Color", "options.1.value"=>"Black",
                  "options.2.name"=>"Color", "options.2.value=>"Blue",
                  "options.3.name"=>"Color", "options.3.value"=>"Red",
                  "language"=>"ruby",
                  "os.1"=>"windows"
                  "os.2"=>"linux",
                  "os.3"=>"mac"
                 }

Continue reading

Mounting exFAT FileSystem in Ubuntu

exFat FileSystems in Ubuntu

3TB Seagate Expansion External 3.5-inch USB3.0 Desktop Hard Drive Black

3TB Seagate Expansion External 3.5-inch USB3.0 Desktop Hard Drive Black

Recently I bought a 3TB External Seagate HDD and was wondering if I could use it both on Linux and Windows. Windows’ default NTFS is only readable on OS X, not writable, and Windows computers can’t even read Mac-formatted HFS+ drives. FAT32 works for both OSes, but has a 4GB size limit per file, so it isn’t ideal. Same goes for Linux distros. Said that I chose Seagate because it gave me the option to Format the device in exFat unlike others like Western Digital as most of the portable/external HDD manufactureres out there don’t worry much about file-systems as long as it works on Windows (sucks for me!).

Okay now coming to the point. exFAT is a lot faster than FAT32/NTFS and does a better job of maintaining uploading/downloading context with large volumes of files. In other words, if you regurlarly use exHDD or portable USBs you would probably be looking for compatibility on most OS – Windows, Linux, MAC. Luckiliy exFAT is there (thanks to one thing from Microsoft) – Microsoft introduced exFAT in Win 7. exFAT is built on the simplicity of FAT but designed specifically for large volume media, the type of which you are probably looking for Mass Storages.

Note : While the Linux exFAT drivers supports reading and writing to exFAT volumes, but unfortunately exFAT is so new that there isn’t currently support for creating exFAT volumes in Linux. No worries for the moment as you can format the HDD first in Windows 7/Vista and while Formatting choose exFAT from dropdown. Also exFAT works by default in MAC! Pretty awesome right?

Now that you have formated the HDD in Windows making it use exFAT, time to get some Linux stuffs as Linux can’t read exFAT volumes by default; not without a little extra help of drivers.MAC does by default read exFAT! So nothing to do on MAC side.

Continue reading

Writing Domain Specific Langauge(DSL) in Ruby – Day 1

Domain Specific Language In Ruby

After Object Oriented Programming leveraged the power of coders and programmers to do more ; a bunch of programmers started wondering about exploring the boundaries more where they can define their own language on top a language which does the same stuff with much higher flexibility which is when the term –   domain-specific languages (abbreviated DSL) came up.

Assume you are writing a  Coffee Placing Order Machine in Ruby and lets see how you would have liked to express it.

CoffeeMachine.orders do |place_order|
  place_order.mega.coffee
  place_order.strong.expresso
  place_order.extraLarge.cappuccino.sugarless.halfCup
end

Wow, I bet if you could express that way to write a program you would have realized how powerful it would be! Why DSLs?

  • Expressive (True Nature of High Level Language)
  • DRY (Don’t Repeat Yourself)
  • Lesser Code More Flexibility
  • Scalability in terms of Functionality
  • Highly Modular

Continue reading

Array of Hashes into single Hash in Ruby

Something I always use so much! We all do have usecases where we land up having Array of Hashes and we want to merge it into one single Hash in Ruby, right? Pretty in Ruby to do that.

Lets take an example

a_hash=[{"key1"=>"value1"},{"key2"=>"value2"},{"key3"=>"value3"},{"key4"=>nil}]

Resule we want is something like this final_hash = {“key1″=>”value1”, “key2″=>”value2″,”key3″=>”value3”, “key4″=>nil}

Well here’s the snippet

final_hash = Hash[*a_hash.collect{|h| h.to_a}.flatten]

and final_hash becomes {“key4″=>nil, “key3″=>”value3”, “key2″=>”value2”, “key1″=>”value1”}, exactly what we wanted!

Continue reading