Showing posts with label Kiprosh. Show all posts
Showing posts with label Kiprosh. Show all posts

Thursday, May 31, 2012

Image crawler engine using "anemone + geospider + redirect_follower + memCached"

Recently our team at Kiprosh built an image crawler engine with following requirements (to be strictly developed in 1 week time or less).

1) crawl and spider all images from a given URL (http or https)
2) crawl it as a background process
3) scrape till 3 level deep per link. (configurable depth)
3) save img URL's in DB, caching
4) keep on displaying the crawled images in UI
5) ability to tag these images
6) ability to "multi select" (using shift + mouse clicks) tag and untag images
7) wonderful nice looking UI with ajax, pjax for pagination, tagging and ability to cache
8) search feature based on tags
9) multi size crawl feature

After thorough research and quick PoC, we used gems like geo-spider, anemone, redirect_follower and memcached to build this crawler engine. The overall app turned out to be very stable, scalable, fast and elegant due to usage of these awesome gems. There were other gems in comparison to geo-spider but for our requirement geo-spider served specific purpose to allow retrieve metadata we needed from source URL's. Anemone is another cool gem for depth crawling in URL that other gems and patterns didn't allow us earlier to dive deep into.

Links to these gems and their respective project page
GeoSpiderAnemone, Redirect_Follower, Memcached

On Heroku, we had to use following gems for caching.

#gem "memcached-northscale", "~> 0.19.5.4"
#gem 'memcached-northscale'
#gem 'dalli'

On our dedicated node, memcached worked just fine without customizing or supporting with other versions.

Queue_Classic and Mechanize Gems

Creating a list of useful (indeed very useful) gems for future reference. We used following gems recently in our rails apps (actually products & tools) that we are developing for our clients.

1) queue_classic - Though we have used Redis Resque, RabbitMQ in three to four apps in past but for this specific requirement we wanted to rely on fast, low maintenance message queue providing a simple and intuitive user experience. It is built upon PostgreSQL to avoid the necessity of adding redis or 0MQ on heroku. Yeah queue_classic doesn't increase any database load contrary its pretty efficient due to usage of inherently reliable PostgreSQL methods where PostgresSQL has many wonderful feature such as  Listen/Notify. (even Oracle too supports listen/notify.) Thus, to avoid resque worker running on heroku and due to its sheer simplicity, we opted for queue_classic. BTW, queue_classic is extensively used by Heroku Postgres team to monitor the health of their customer databases processing hundreds of jobs per second.

RailCasts - http://railscasts.com/episodes/344-queue-classic 
Source page and more information is at - https://github.com/ryandotsmith/queue_classic

2) Mechanize - What a wonderful gem we must say :) Well we have done lots of scrapping but mechanize is really handy to automate interaction with websites. We are building a tool (web based) for an enterprise to automate large number of routine and regular tasks for their helpdesk support staff. Mechanize scripts helps us execute these routine tasks that saves a ton of time for the support team.

RailCasts - http://railscasts.com/episodes/191-mechanize
Source page and more information is at https://github.com/tenderlove/mechanize


Saturday, December 10, 2011

iOS Testing Strategy

Recently our team at Kiprosh started using FoneMonkey for Automation testing of iOS apps. FoneMonkey is free and has strong support for both iPad and iPhone devices.

Here are the various tools that we evaluated and shortlisted for iOS testing and then finalized our testing strategy for iOS apps (tools marked in color green are recommended)

Unit Testing
  • Built in XCode based unit testing using OCUnit - (little complicated and requires too many steps to create the unit test, process isn't automated)
  • GTM - Google Toolbox for Mac - (suitable) http://code.google.com/p/google-toolbox-for-mac/wiki/iPhoneUnitTesting
  • GHUnit - (most suitable, easy to setup, has GUI, but documentation and other build issues)
  • Mocking - OCMock and OCHamrest

Automation Testing tools and framework

Integration / Automated Builds
  • Hudson with xcodebuild
  • Code coverage (gcovr) with Cobertura XML

Testing Strategy
Finally we formulated following testing strategy for iOS apps
1) GHUnit for unit testing
2) With memory management, we must verify that when allocation fails we get expected return value as nil rather than garbage
3) Automation test suite (using FoneMonkey mostly or Sikuli or DeviceAnywhere)
4) Finally plug all the unit tests and automation test suite with CI using Hudson
5) Integrate often

Monday, August 23, 2010

API - Face recognition across the Web

A very interesting yet useful API launched by Face.com is making rounds and expanding across the web quickly. "CelebrityFindr, Tagger Widget and Poster Yourself" already uses Face.com technology.

Developers can now tap into Face.com’s technology to add facial recognition to all kinds of web apps for free via the open API.

Well i m trying it out in one of our web app so will post review. Face.com claims that its technology can identify faces even in poor lighting or poor focus; or when subjects are wearing glasses, facial hair, and supposedly even Halloween costumes.

The company’s technology was recently used in an impressive social Augmented Reality app.

I couldn't find any other free API for facial recognition which is so powerful.

Monday, May 10, 2010

Allan Jardine accepted my jQuery code for his next version of the plugin

Famous jquery plugins "DataTable" and "KeyTable" developer Allan Jardine has accepted my code (reusable function) to be included in his next version of KeyTable jQuery plugin.

Here is the mail transcript with him

________________________________

On 10 May 2010, at 1:15, Allan Jardine wrote:

Hi Rohan,

Very nice idea for an API function. I've just updated KeyTable to 1.1.5 and including a function which basically provides exactly the same functionality, but with the name fnSetPosition and it will take either x,y coordinates or a TD node to focus on.

Thanks for sending me this.

Regards,
Allan




On 8 May 2010, at 12:16, Rohan Daxini wrote:

Hi Alan,

For your information and just to keep you posted, I added this new function in KeyTable plugin to resolve multiple issues I was facing while attaching KeyTable to my DataTable. Now I just call this function as keys.fnSetPositionAndFocus(0, 0) in my javacript code to refocus on cell[0,0]. This can be used to reset and refocus on any cell of DataTable as the selection and focus were lost.

All the issues like reattachment, refocus, reselection, repeated cell selection and lost blue rectangle is resolved now due to the inclusion of this api function

this.fnSetPositionAndFocus = function(x, y) {
_iOldX = x;
_iOldY = y;
_fnRemoveFocus(_nOldFocus);
_fnSetFocus(_fnCellFromCoords(x, y));

var str = 'table tbody ' + 'tr:eq(' + y + ')>td:eq(' + x + ')';
$(str).addClass('focus');
}

Thanks,
Rohan

Saturday, May 8, 2010

Yo Ti Fy

Sat down to wander if I can have a new way to use internet. Instead of 'google'ing and 'bing'ing information I need, let someone deliver it to me based on my defined sets and preferences.
(I just wanted to negate my thought that most of us see search engines as the hub for our internet experiences, though its true uptill great extent.)

I started writting a small service for this task. Meanwhile I came across Yotify - a site which caters to specific task of tracking updates and information that pertains to our topic of interest such as an event, an item for sale on ebay, cragslist, an RSS feed, headlines etc etc.. (like a flavor of Google Alerts and Yahoo Alerts but broader in scope)

On Yotify.com, users track anything they deem worth tracking by sending out ’scouts’ that send back regular reports on a particular sports headline, job listing, real estate posting, etc. The frequency of updates can be set to a daily or hourly amount, and different preferences can be selected so that when users send a scout out to monitor an item on Craigslist, for example, they can find out when the price drops to the amount they’re willing to pay. Searches are also area-specific, and users can specify the state and area where they want relevant results from.

Transcript of a chat with Rob Bouganim, Yotify's CEO
http://arstechnica.com/old/content/2008/09/hands-on-and-invites-yotify-your-personal-web-secretary.ars

Thursday, April 15, 2010

"Game" it up

In the hunt to break the "monotonous" routine of sprinting and activities for my team, I thought to experiment with a "Gaming" contest at Kiprosh. I picked up couple of strategy games from http://freeonlinegames.com.

We had 1 hr gameful of fun and laughter with interesting "quick" meetings to invent and come up with new strategies for improving scores (as I gave certain scoring target to be a winner). At the end it become very addictive to leave but team quickly became more charged up and rejuvenated.

It also helped us as team to keep on changing couple of strategies and not to repeat mistakes to achieve higher scores & goals.

Monday, April 12, 2010

Being KIS'ed

I started my journey this March'2010 towards building Kiprosh with foundation, goals and strong inclination towards
  • Innovation
  • KIS (keep it simple)
  • Write Less Do More
  • "wow" factor (my favorite and very close to my heart) in whatever we develop or deliver
Though its very early to comment but "KIS (keep it simple) yet effective" way of doing things in whatever we do is helping us achieve other important goals too.

I will appreciate thoughts / comments about your experience and experiments on the subject.

N2 vs. Umbraco - CMS made easy

Few days back I was evaluating open source ASP.NET based Content Management System (CMS). I shortlisted 2 of them for my purpose i.e N2 and Umbraco. Competition was not even close as N2 was a clear winner. Though both are categorically advised to be consumed based on web content management requirements i.e. Umbraco is preferable for heavy (or large) and N2 for light (or medium to small) contents.

Umbraco is well positioned in developer community as N2 is pretty new. Umbraco's user base is huge comparatively but still N2 delivers due to its sheer simplicity. I like KIS (Keep It Simple) philosophy while development and N2 enables me to deliver keeping KIS into consideration.

Thanks to Shivani for helping me finalize one of the CMS for our purpose.