Google Software Engineer Explains MapReduce Concept

clock January 28, 2008 00:24 by author anjel
Mark Chu-Carroll is a Google software engineer, and in a personal blog post this week explained the concept of one of Google’s programming models: MapReduce, which splits a task onto many computers on Google’s server farm (server farm, or single super computer, depending on how you look at it) to be quickly crunched.

What is MapReduce? What does it do?

Suppose you’re at work, and you need to do something that’s going to take a long time to run on your computer. You don’t want to wait. But you don’t want to go out and spend a couple of million dollars buying a supercomputer. How do you make it run faster? One way is buy a whole bunch of cheap machines, and make it run on all of them at once. Another is to notice that your office has lots of computers – pretty much every office has a computer on the desk of every employee. And at any given moment, most of those computers aren’t doing much. So why not take advantage of that? When your machine isn’t doing much, you let you coworkers borrow the capability you’re not using; when you need to do something, you can borrow their machines. So when you need to run something big, you can easily find a pool of a dozen machines.

The problem with that approach is that most programs aren’t written to run on a dozen machines. They’re written to run on one machine. To split a hard task among a lot of computers is hard.

MapReduce is a library that lets you adopt a particular, stylized way of programming that’s easy to split among a bunch of machines. The basic idea is that you divide the job into two parts: a Map, and a Reduce. Map basically takes the problem, splits it into sub-parts, and sends the sub-parts to different machines – so all the pieces run at the same time. Reduce takes the results from the sub-parts and combines them back together to get a single answer.

The key to how MapReduce does things is to take input as, conceptually, a list of records. The records are split among the different machines by the map. The result of the map computation is a list of key/value pairs. Reduce takes each set of values that has the same key, and combines them into a single value. So Map takes a set of data chunks, and produces key/value pairs; reduce merges things, so that instead of a set of key/value pair sets, you get one result. You can’t tell whether the job was split into 100 pieces or 2 pieces; the end result looks pretty much like the result of a single map.

Mark adds that “The beauty of MapReduce is that it’s easy to write.” and that MapReduce (or “M/R”) programs are “really as easy as parallel programming ever gets.” For a more in-depth look at MapReduce and some actual source code, take a look at the Google research publication on the subject.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


The draft version of specification HTML 5 is promulgated

clock January 22, 2008 22:52 by author anjel
Consortium World Wide Web (W3C) has published a draft variant of the specification of the fifth version of language of hypertext marking HTML.

Language HTML 5 will come in the stead obsolete HTML 4 which specification has been let out in 1997. Fifth version HTML will give to developers much wider functionality and will simplify process of creation of interactive sites, and also a web-appendices. Besides in HTML 5 there will be additional elements for interaction with various forms and introductions in a web-page of a multimedia content.

Among the most appreciable and important innovations in HTML 5 consortium W3C allocates program interfaces for work with bidimentional графикой, means of introduction in a web-page of videoclips and audiomaterials, and also the tools, allowing to give an opportunity of editing of sites to visitors. Other changes in fifth version HTML are aimed at simplifications of representation in a code of usual elements a web-pages.

5 the special group generated in March of the last year was engaged in development of the draft version of specification HTML. This group includes the order five hundred participants, including experts of such companies, as Apple, Google, IBM, Microsoft, Mozilla, Nokia, Opera, BEA Systems, Cisco, France Telecom and Hewlett-Packard. Consortium W3C emphasizes, that the published specification will be finished and specified. Therefore the web-developers and is offered to direct to programmers in W3C the comments and wishes.

To familiarize with a draft variant of specification HTML 5 it is possible on this page.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


How Google Ranks Web Pages

clock January 21, 2008 22:28 by author anjel

This is a “101” summary on how Google ranks web pages. The fundamentals can be the the same for Yahoo and to a lessor degree, MSN Search, but given that Google is the 800 LB. gorilla, I will concentrate on them.

In the title, you will see the word “pages” and not “sites”. Rising tides do not “lift all boats” when it comes to search engine dynamics. A page will rank on its own merits, not because it may be a part of a larger web site.

Also, to rank in Google is to rank for a particular search term or key word. A web page or web site just doesn’t “rank” in Google. A photographer in Chicago may rank #3 for the term “chicago wedding photographer”. He may also rank # 7,345 for the term “chicago”.

Before I go into the nuts and bolts of the Google machine, it is helpful to understand Google’s core motivations.

In their white papers and in their doctoral thesis, the founders of Google, Larry Page and Sergey Brin, made it clear that links were the arbiters of web page relevance more than any thing else. If you look at Google’s web master guideline, the majority of words are dedicated to linking and Google’s idea of what is good link and what is not. Whether you agree with the particulars and specifics of Google’s definitions, the clue is hard to avoid: Google gives relevance to links.

Google is in business to make money. They really don’t care (and probably shouldn’t) care about your web page rankings and if you make money. They do not own the internet – they are just another web site, albeit, a very popular one. While millions of people have made their livings and some, fortunes, directly or indirectly from the traffic Google has provided to their web pages, Google is not the 800 LB gorilla of the internet anymore. Of search, yes, of traffic, no.

Google feels if it provides its users with fresh, relevant content, then we will in turn have a tendency to click its AdWord ads as well as hire Google to display ads on our own websites (AdSense). Google is driven to figure out what are the most relevant web pages it can deliver on any given search term in hopes to appease the searcher. If it can do this on a regular basis, it will remain the king of search. Filtering out non-relevant results to include machine-made junk pages and pages created solely for search engine spam is a daily task.

The 2 ways a page can get ranked.

A page can either get ranked by its on-page factors or its off-page factors. That’s it. Google has created algorithms to determine each, then more algorithms to combine the two. Hence, a web page’s rank is for a particular term at a particular moment in time.

On-page factors.

This is what are web pages are. The content. The titles. The inter-linking dynamics. What we publish on our web pages gives Google an idea what the page is suppose to be about. Google determines this by way of its spiders who “crawl” the web page to “read” the information. The spider can read straight and simple HTML text. It understands meta data. It understands titles of pages.

It cannot see a picture. It cannot interpret a Flash-based animation. Even if you have text in either, the text is not in ASCII code, it is a picture. To Google you have virtually said nothing. That’s why text based HTML sites will do better in the Google index than similar (in idea) Flash based sites. If you are reading this and have a Flash based site should you trash it and start over? No. Web 2.0 will help you out. Plus there are work arounds to a Flash based site to make it very visible to Google and ultimately get traffic.

Off-page factors.

Off-page factors are what the internet is doing to your page. The only dynamic thing a web page can do to another is link to it. This is what makes the world wide web. The more links any one page gets from other pages, the greater the chance that page has relevancy for a particular topic or category.

It is one thing for 2 web masters flush with cash and time to create massive and impressive web sites. They can have the “correct” structure and keyword content. They can provide for a great user experience. But one thing they do not prove to Google is the relevance of their keywords. If Google has indexed two similar sites, how is it to determine which one you would rather see, #1 or #2?

While it’s relatively easy to create a site and make it into your vision, it is quite another to have other web masters link to it. Google feels that for other people willing to link to your web site, they are in effect, voting for your page. The more links, the more votes. With more of the internet voting for your site, the less Google has to guess about your page based on its on-page factors. After all, why would a web master place links on his or her site that would lead their visitors astray? They wouldn’t. Content can be created, links have to be earned.

Links say a lot about a web page. Where the link is coming from. How many and where the linking page’s links come from. And most importantly, what the actual link itself says. If you get a bunch of links to your pottery home page that say: Illinois pottery artist, click here, all Google really knows is that web page is linking to another and saying something about “click here”. It can determine what the linking page is and that it is linking to a page about pottery, but that’s it. If that same site changed its link to: …for an Illinois Pottery Artist, click here, well Google is now told that this particular hyper-link is definitely about an illinois pottery artist. This example is about using anchor text, the actual text of the link, to properly to maximize SEO for your page.

Anchor text is probably one of the most important off-page factors that can influence your page’s ranking in Google. If you do nothing else, get a bunch of links, 50-200 with several versions of keywords you want to rank for, and you should do very well in Google.

Reciprocal links. There has been some talk about the rise and fall of reciprocal linking. Two to three years ago, reciprocal linking was all the rage. Web pages were getting pushed up the rankings largely based on their incoming anchor text links. Then Google did an “update”. This is where Google will tell you one day such and such a web page is #1 for a term. Then the next day, that pages’ relevance dropped to #67. What?

For better or worse, Google is always trying to develop methods that they think will provide accurate and relevant results after you type in the search query. If they feel some page is getting to the top position due to too much user manipulation, well they just may knock it down a peg or two. It happened to John Chow, the internet marketing blogger. For a while Google didn’t even let him rank for his own name. Now I see he ranks for his name again and all is well at Google.

I did notice a lot of the web pages that had hundreds of back links with proper anchor text pretty much stayed the same through the last two Google smack downs. I’ll almost bet your next pay check Google can’t or does not want to mess with reciprocal linking! But they say to get “natural, editorial, one-way links”. Sure, that’s fine if you are Adobe (speaking of which, anyone want to venture a guess who ranks for the term “click here” and why?) or Apple or a super blogger like Aaron Wall – who I respect very much. But just who in the heck is going to link to the local business in any numbers that will make Google wake up and rank them? “Natural”? Not in the real world…

I use a good piece of software called SEO Elite. It lets me track the back links of any web page I enter into it. It also tells me if both web pages link to each other – a reciprocal link.

When I do keyword research for a client, I find out who ranks #1 - #5 for that term in Google. Then I enter each url into SEO Elite and let it do its thing. I really only care about what is indexed in Google and Yahoo. Results are often mixed but one thing is sure: for every “ranking” page (position #1 - # 5), they all have hundreds of links which most are reciprocal. Over 85% usually. Of those reciprocal links, most have the keyword as anchor text over 60%, and on average, most of those links have a Google PR ranking average of only 2.7.

That was a mouthful. What Google says is sometimes not what Google actually does.

So much for every guru telling you need to have 1-way links from just the right web sites with high Google PR if you want to improve your SEO.

Google PR stands for Page Rank.

Google PR is an arbitrary ranking of a page’s importance from 1-10. You can get the Google toolbar here to install the PR function. Some people feel their page has made it when it achieves a PR of 5, 6, or 7. Only a handful of web pages ever rank 8, 9, and 10. Those are usually the Google’s, Yahoo’, and super blogs of the world. With the onslaught of Web 2.0 marketing, Google’ PR is becoming less relevant as a barometer of a pages real value.

It is still accurate to believe Google does not punish but rather rewards reciprocal linking that has a good dose of structured anchor text. Don’t expect Google to admit this, or the heady SEO gurus.

Source http://www.roresteen.com/how-google-ranks-web-pages/ 

 

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Software pirate in crabs for games scam

clock January 18, 2008 01:49 by author anjel
A man who sold counterfeit games, films and music to trawlermen in exchange for a box of crabs has been fined £3,000.

George Clarke, 61, of Seamill in North Ayrshire, made the trade with seamen returning to Troon Harbour after an extended fishing trip.

He pleaded guilty at Ayr Sheriff Court to charges of illegally selling copied games, films and music.

"Clarke originally approached the returning fleet on 3 February 2006 with the intention of exchanging his copied and counterfeit discs for cash," said the Entertainment & Leisure Software Publishers Association in a statement.

"But he soon discovered than none of the crew was carrying any cash, so he had to make do with a fish supper in the form of a selection of fresh crabs."

Officers from Strathclyde Police and investigators from North Ayrshire Trading Standards searched Clarke's premises later that day.

They found three computers and more than 200 discs containing illegally copied games for PC, PlayStation 2, PSP and Xbox.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


China Software Industry Report 2007-2008

clock January 15, 2008 00:39 by author anjel
Research and Markets (http://www.researchandmarkets.com/reports/c79470) has announced the addition of China Software Industry Report, 2007-2008 to their offering.

The operating revenue of China software industry reached RMB378.499 billion in the first three quarters of 2007, up 23.6 percent compared to the same period last year and being higher than the growth rate of 20.1 percent for electronic information industry.

Since 2006, the Ministry of Information and relevant Ministries have issued a series of policies to support the development of large well-known software enterprises. With the increasing saturation of informatization demand by large enterprises, the informatization demand of medium and small sized enterprises will be the new market growth. According to our forecast, the investment into the information construction of medium-small sized enterprises will be up to US$15.87 billion by 2010. As the agriculture tax, animal husbandry tax and tax on special agricultural products have been cancelled, the disposable income of farmers increases considerably. With the further opening of agriculture, the growth of farmers' income and the progression of urbanization, information construction in rural areas will be the new potential of the market.

It is the handsome profit model and the demand for informatization that conduce to the fast development of China software industry. Chinese manufacturers have comparative advantages in software service and mixed profit model. Based on market scale, profit margin and comparative advantages, we are confident about the development of software outsourcing and management software pattern.

In 1H 2007, China's offshore outsourcing revenue amounted to RMB6.53 billion. The annual compound growth rate in the coming five years can be up to 37.9 percent. Since application, strict barriers characterize software and being quite attractive to users, the famous related enterprises will be exceedingly developed. The steady supply of rich qualified talents in China secures the strong growth of China's offshore outsourcing business.

Chinese management software market is gradually entering the maturity period, and the cooperation and acquisition become trends. There is steady rising of market concentration. The endogenous growth of profit model, barriers, scale economy and positive feedback effect of management software industry result in that the large companies will grow ever larger. We believe that the manufacturers with self-developed products, rich experience and client resources will possess comparative advantages, and the advantages of leading enterprises and special product enterprises will be obvious.

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


IT industry in India

clock January 13, 2008 21:43 by author anjel

One of the pillars of modern India, and one the country is relying on to complete its transformation from an economic backwater to a global superpower, is its IT industry.

From humble beginnings the industry has grown beyond what anyone imagined 20 years ago. It is currently worth about US$47 billion, or about 5.4 per cent of India's gross domestic product, and is still growing at about 30 per cent per year. It employs 1.6 million people directly, and many more indirectly.

Its phenomenal growth has led India to take IT to heart as a national industry, and everyday Indians have a real sense of pride now that its large tech companies are rivalling the established IT giants like IBM. This is proof to them that India can play and win on the global stage.

India's first IT company was Tata Consultancy Services (TCS), which sprang into life in 1968 as an offshoot of the giant Tata group. The industry was virtually nonexistent until the 1980s, when India's low labour costs and many fluent English speakers made it an attractive place to set up call centres and perform routine software development.

The seeds were planted then, but the growth of the Indian IT industry really began to skyrocket in the past 10 years on the back of Y2K and the first Internet boom.

The country's IT exports grew eight-fold in seven years – from US$4 billion in 2000 to US$32 billion in 2007 – while the size of its workforce has increased tenfold. It added 300,000 new employees in the past year alone.

In this period TCS's revenues have doubled every two years and it now makes more than US$4.3 billion each year, with close to US$1 billion in after-tax profit, with a market capitalisation of US$26 billion, making it one of the biggest IT firms in the world. It has more than 100,000 employees, one third of these being hired in the past year.

Satyam, another Indian IT giant, took 17 years to make US$1 billion in annual revenues, but only two years after that to reach US$2 billion. It should hit US$3 billion this year, one year later, while hiring another 10,000 to 12,000 more staff to add to its 56,000 employees. Revenue growth sits at about 45 per cent annually.

It is tempting in the West to end the story of India's IT boom there, but it's really just getting started. Both TCS and Satyam expect revenue growth to continue at the pace set by recent years for the short-term. The industry hopes to hit US$60 billion in exports by 2010, but there is every sign it will get there early.

According to Nasscom, India's national IT industry group, about 20 per cent of the world's estimated total IT spend of US$1 trillion is outsourced. About 45 per cent of this is sent offshore, and about 80 per cent of this offshored work goes to India. India's IT firms say there is still room for revenue growth in traditional outsourcing fields, but there are a lot of other IT tasks Indian companies are keen to do as well. The falling price of bandwidth makes this more and more attractive.

TCS is increasingly doing remote management of customers' infrastructure, for example managing 30,000 computers worldwide for German business software giant SAP, handling almost everything from India. "Other than plugging in a desktop, you can pretty much do everything from far away," says Pankaj Baliga, TCS's vice-president.

Today's Indian firms aren't just "body-shopping" organisations, content with call centre work, routine software development and IT support contracts, though there is still a good deal of that.

Firms like TCS and Satyam are now going after high level consulting deals – and winning them.

The consulting market in Western countries like New Zealand, long the domain of well paid locals either self-employed or belonging to multinationals, looks set to be increasingly outsourced to Indian companies using a mix of Indian and non-Indian staff, based both in India and on-site.

Only about 3.5 per cent of TCS's US$30 billion in annual revenues is currently from this sort of high- level consulting work, but the company expects it to grow to as much as 10 per cent within a few years.

"We're no longer just filling RFPs," says Virender Aggarwal, director of Satyam's operations in Asia Pacific, Africa, India and the Middle East. "More and more companies are expecting us to do high- end work. Now they're asking us where they need to go."

Another emerging revenue stream is knowledge process outsourcing, Mr Baliga says. This sees Western firms giving data from financial systems or clinical drug trials to Indian IT firms for analysis. They analyse it more cheaply and then send the results back.

Other Indian firms are filing and researching patents for Western firms at about one-third the cost.

The knowledge process outsourcing market is estimated to be worth about US$2.5 billion each year, and some pundits predict it will quadruple within five years.

"There's going to be a lot of work that today we are not even visualising that will have to be outsourced," Mr Baliga says.

Many companies also see huge growth ahead for the outsourcing of engineering services, such as the design, modelling, and testing of airplanes and cars.

Companies like Tata and Larsen and Toubro (India's biggest construction and engineering firm) have access to both the IT and engineering skills, which they say gives them an edge over the competition.

Indian IT firms are no longer the poor cousins of IBM and Accenture, using a low-wage economy to pick up the low-value scraps.

Companies like TCS, which operates in 45 countries with 67 nationalities on staff, are multinational IT firms that see themselves as the equals of any Western ones, that just happen to be based on the subcontinent.

As this new breed of multinational moves further up the value chain, existing IT giants will have to adapt and compete, or risk being overhauled.

 

Source http://www.stuff.co.nz/4355924a28.html 

Currently rated 2.0 by 1 people

  • Currently 2/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Bill Gates has informed on his leaving from the Microsoft

clock January 9, 2008 02:43 by author anjel

The founder of teh software giant of corporation Microsoft Bill Gates has informed on the intention to leave a post of the chief of the company and also has shared forecasts about the future of the computer industry. Bill Gates will leave his post of chief Microsoft in July, 2008 on what it has informed in program speech at an exhibition of high technologies CES 2008 in Las Vegas.

After leaving from a post of chief Microsoft Bill Gates plans to concentrate the efforts to the charities connected with formation and public health services, informs NewsFactor. " For the first time at me there will be a free time ", - Bill Gates who also has presented gathered the videoclip executed in the playful form about "last" working day has declared.

Within the limits of the performance Bill Gates has shared the forecasts about the near future of the computer industry, having declared, that users of personal computers will shortly not require such habitual devices, as "mouse" and the keyboard.

Chief of Microsoft has named the next decade " a digital era " during which greater rates will occur development of technologies of creation of mobile phones, TV and cars of new generation with which can be operated one contact or a voice, informs The Times.

" The first digital decade was very successful, but it only the beginning. To us nothing prevents to accelerate development and even more actively to introduce hi-tech development in the second digital decade ", - Bill Gates has noted.

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Offshore SOA

clock January 3, 2008 23:18 by author anjel
Offshore outsourcing company ZenSar has announced a partnership with management and governance vendor SOA Software. Though the partnership is new, the two companies aren't strangers to each other. In May last year, ZenSar acquired SOA Software's services business for $24.9 million. As well as giving SOA Software a way to fund its growth without raising additional funds from VCs, the deal also highlighted that international outsourcing isn't always a one-way street.

ZenSar says that the deal makes it one of the few Indian outsourcing companies with a full SOA offering, which is probably true " and at first glance seems odd, considering that offshore outsourcing and SOA are both among the fastest growing trends in IT. However, this is mostly because SOA is quite difficult to move offshore. The barriers are similar to those in front of SOA as a service – namely that SOA involves integration of other systems, most of which are still in-house and onshore. Outsourcing the middleware only really makes sense if the underlying applications are outsourced too.

This is likely what ZenSar and SOA Software are betting on. While SOA itself isn't a great candidate for outsourcing, it should enable outsourcing of the underlying applications by breaking them up into more standardized components. That's good for both offshore IT shops and SaaS providers.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Search

Calendar

<<  March 2010  >>
SuMoTuWeThFrSa
28123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Archive

Tags

Categories


Blogroll

© Copyright 2010

Sign in