Top 10 searches in the last 30 days

by Jamie 29. January 2010 11:10

We recently added a feature to the Mugurdy homepage listing the top 10 searches in the past 30 days. Some of these are pornographic so we're working on filtering out the innumerable combinations of dirty words.

Think of it as a zeitgeist of what people want to look at; it can range from part numbers of electronic devices through to sociology homework questions.

Tags:

General

Happy Christmas!

by Jamie 25. December 2009 12:17

We'd like to wish you and your loved ones a fantastic Christmas and new year. Roll on 2010!

Tags:

API for screenshots of webpages

by Jamie 5. December 2009 12:29

We're experimenting with an API to allow developers to get screenshots of particular webpages. As this is in development, we can't guarantee it'll always be around in its current guise, but it's something that's worth having a play with. To request an image of a website from the Mugurdy API simply make a request to: http://api.mugurdy.com/ImageUrl.aspx and pass the web address to the ?url parameter, e.g.:

http://api.mugurdy.com/ImageURL.aspx?url=http://en.wikipedia.org

And to specify a size (280x210, 560x426 and 1024x768) simply pass the size to the &size parameter, e.g.:

http://api.mugurdy.com/ImageURL.aspx?url=http://en.wikipedia.org&size=280
http://api.mugurdy.com/ImageURL.aspx?url=http://en.wikipedia.org&size=560
http://api.mugurdy.com/ImageURL.aspx?url=http://en.wikipedia.org&size=1024

So this can easily be used in existing webpages by specifying the API url as the src= parameter of <img> tags, e.g.:
<img src="http://api.mugurdy.com/ImageURL.aspx?url=http://www.irishtimes.com&size=280" border="1" title="The Irish Times" />

Any feedback or suggestions on this API are appreciated!

Tags:

Technical

DIY Server Cooling

by Jamie 30. November 2009 12:07

We checked in on our server room on Saturday afternoon. Thanks to the chilly Irish weather, our DIY cooling solution has lowered the temperature in our server room quite dramatically.
It actually felt warmer outside the building than it did inside!

Tags:

Technical

Practically infinitely expandable data storage [Part 2/2]

by Jamie 25. November 2009 12:38

Some background.
We nicknamed our data storage system TROVE. It's a tortured acronym of The Repository Of Vast Expanse. Each TROVE server has a RAID5 array of 5x1TB hard disks. This results in 4TB of usable storage per TROVE server in their current configuration. The TROVE servers are entry-level'ish  servers that we built ourselves - entry level motherboard with plenty of USB ports, quad core CPU, 4GB of RAM, onboard gigabit ethernet, decent ATX PSU and tower ATX case. (Larger rack servers would have a much higher storage density than our current tower servers but they would be considerably more expensive.) The TROVE servers run Windows Server 2008.

So right now we have 4TB of protected storage per server. Want another 8TB of storage? Throw another two TROVEs into the mix. Ultimately we will migrate all of our storage to higher density solutions - still based on our TROVE concept - but for the time being it's 4TB/server.

Each TROVE shares its storage volume on the network for other computers to access.

 

FAT.
There are a number of places in the Mugurdy web application and supporting service applications that we read or write to the file system. As we don't have a single/contiguous large volume to write to (because we have multiple shares that we can store data into) we need something to act as a FAT (File Allocation Table) so that we can find the appropriate share to read/write to.

So we built a Microsoft SQL Server database under Microsoft SQL Server 2008 to act as our FAT. A simplified version of the database tables is as follows:

Hosted by imgur.com

 

 

Each Filename in use throughout the system is based on a GUID, so it's guaranteed to be unique. To find where a filename is on the network we can then use the following stored procedure:

 CREATE PROCEDURE [dbo].[GetFullFilePathFromName]

(

       @filename            nvarchar(255)

)

AS

       SELECT    

              FileAllocationTable.Filename, FoldersAllocationTable.FolderName, ServersAllocationTable.ServerName

       FROM        

              FileAllocationTable INNER JOIN FoldersAllocationTable ON FileAllocationTable.FolderID = FoldersAllocationTable.FolderID

              INNER JOIN ServersAllocationTable ON FoldersAllocationTable.ServerID = ServersAllocationTable.ServerID

       WHERE

              FileAllocationTable.Filename = @FileName

 

As file locations rarely (if ever) change across the system, it's safe to cache the output from this stored procedure  wherever it's being used. Each TROVE server runs a Window Service that updates the database with the amount of free disk space available on that local RAID volume. We update disk usage about once a minute.

When we want to write a file to the system we pick a random folder (that's available for writing), write the file to the file system, and if the write was successful, we then simply add a record to the FileAllocationTable with the appropriate FolderID. If the write was not successful we pick another random folder on the network, and repeat until we get a successful write. (We haven't seen any write fails yet, but we're already covered in case this does happen in the future). As a TROVE server starts to fill up we add another TROVE server onto the network. Data will then be written to that new TROVE server, and to any existing TROVEs (that are available for data writes). Once a TROVE goes below 20% storage capacity it is disabled for writing. It will still be enabled for data reads, but data writes are disabled.

 

Reading files.
Here is a simplified version of reading in an image file from a hard disk:

Dim FileName As String = "SomeImage.png"

 

'Assume the file is in c:\

Dim MyBitmap1 = New System.Drawing.Bitmap("c:\" & FileName)

 

And to read the same file from the TROVE system:

Dim FullPath As String = (New Foundation.FileStorage).GetFullFilePathFromName(FileName)

Dim MyBitmap2 = New System.Drawing.Bitmap(FullPath)

So you can see it doesn't require very much code, and only requires very minimal changes to application logic to get it working.

 

Looking to the future.
We're working on adding another level of redundancy to the system. At present the entire system can maintain a single drive failure per TROVE server and maintain normal operation. However, if two drives failed in a single server, or the server itself failed, then all of the files stored on that server will be offline.

Soon files will be located on multiple servers across the network. This will mean that the system will be able to maintain normal operation from multiple file server failures across the network.

Tags:

Technical

CoolIris.com Plugin Integration

by Jamie 23. November 2009 08:03

We're experimenting with the really cool CoolIris.com plugin. If you haven't played with it yet head over to www.cooliris.com to pick it up, and then take a gander at: http://www.mugurdy.com/irisSearch.aspx

A sample search for Dublin for example: http://www.mugurdy.com/IrisSearch.aspx?q=Dublin&r=&ap=false

Tags:

Technical

It's gettin' hot in here!

by Jimmy 20. November 2009 09:25

It's gettin' hot in here!

 

Hard disks, switches, routers, power supplies, and 20+ servers generate a lot of heat.  In our main offices, where we built the search engine initially, we’d solved this problem by simply opening the window of the room which housed all our gear, the door in the opposite wall, and a window at the back of the building. Because our offices go from the front to the back of the building, on the fourth floor, the air flow is excellent even with the windows only slightly opened. However in the Guinness Enterprise Centre where we’d set up our data centre we only had a room with one window in it. Even leaving the door open (not a realistic solution) the heat built up significantly over the course of a few days. We contacted the maintenance people for the building, and were put in touch with an AC rental company. They recommended a big beasty air conditioning unit (that itself generated quite a bit of heat) for €100 per week. The unit would barely fit in the room. It would extract hot air from the room via an exhaust pipe stuck out the window. But where  would fresh air come from? The door – a fire door - would always be closed. The only source of new air would be the hot air in the room! The AC goliath wouldn't cut it.

A bit of bush mechanics was needed.

We bought a five metre length of corrugated metal cooker exhaust pipe and an electric fan which fit inside the 20cm or so diameter pipe. One end we pushed out the window so that it dangled down the wall about one meter (our room is on the third floor and overlooks a small internal courtyard). The other end (with the fan attached with gaffer tape) we positioned in the centre of the room. The racking with our equipment lay between this cold air outlet and the window. Another larger electric fan was then placed into the open window and a seal created around it with fibre board in such a way that it the whole thing was stable. The fans were connected to a separate electrical circuit to any of own equipment.

We had fresh cold air being pumped into the room and hot air being extracted.
Voila! DIY air conditioning.

 

Tags:

Practically infinitely expandable data storage [Part 1/2]

by Jamie 19. November 2009 11:10

The Mugurdy Visual Search Engine needs to store a lot of data. Big storage isn't necessarily a problem nowadays, but when you're working on a shoestring budget you can't afford any extravagent spending. We needed a storage system that would effectively, cheaply and securely store many Terabytes worth of image and HTML data. The storage system should also expand to any size required without taking any part of it offline. For reference:

1024 MegaBytes = 1 GigaByte (1GB)
1024 GigaBytes = 1 TeraByte (1TB) (e.g. 18.75 terabytes is the amount of text in the Library of Congress, if it were all digitized)
1024 TeraBytes = 1 PetaByte (1PB) (e.g. 50 petabytes is the amount of data equivalent to the entire written works of humankind, from the beginning of recorded history, in all languages)
1024 PetaBytes = 1 ExaByte (1EB) (e.g. 5 exabytes is the amount of data equivalent to all words ever spoken by humans, in text form)

While we don't have PBs of storage in use at present, the system has the capacity to scale to pretty much any size we need. All we need is somewhere to put the hard disks!
Over the next few days I'll be posting a blog entry on how to build such a system from off-the-shelf parts.

Tags:

Technical

Welcome to our Blog

by Jamie 17. November 2009 11:47

Hi!

Welcome to the new Mugurdy Visual Search Engine Blog. Over the coming weeks we'll post about the development of Mugurdy from an idea to the live website you see today.
We'll also post some interesting technical challenges we overcame which you may come across in your own web applications.

Stay Tuned!

Tags:

Powered by BlogEngine.NET
Theme by Mads Kristensen