Q: What is a canonical url? Do you have to use such a weird word, anyway?
A: Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages. For example, many people would consider the below urls as
same:
* www.welcome.com
* welcome.com/
* www.welcome.com/index.html
* welcome.com/home.asp
But technically all of these urls are different. A web server could return completely different content for all the urls above. When Google “canonicalizes” a url, we try to pick the url that seems like the best representative from that set.
Q: So how do I make sure that Google picks the url that I want?
A: One thing that helps is to pick the url that you want and use that url consistently across your entire site. For example, don’t make half of your links go to http://welcome.com/ and the other half go to http://www.welcome.com/ . Instead, pick the url you prefer and always use that format for your internal links.
Q: Is there anything else I can do?
A: Yes. Suppose you want your default url to be http://www.welcome.com/ . You can make your webserver so that if someone requests http://welcome.com/, it does a 301 (permanent) redirect to http://www.welcome.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.).
Q: If I want to get rid of domain.com but keep www.domain.com, should I use the url removal tool to remove domain.com?
A: No, definitely don’t do this. If you remove one of the www vs. non-www hostnames, it can end up removing your whole domain for six months. Definitely don’t do this. If you did use
the url removal tool to remove your entire domain when you actually only wanted to remove the www or non-www version of your domain, do a reinclusion request and mention that you removed your entire domain by accident using the url removal tool and that you’d like it reincluded.
Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with uppercase to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).
Q: Let’s talk about the inurl: operator. Why does everyone think that if inurl:mydomain.com shows results that aren’t from mydomain.com, it must be hijacked?
A: Many months ago, if you saw someresult.com/search2.php?url=mydomain.com, that would sometimes have content from mydomain. That could happen when the someresult.com url was a 302 redirect to mydomain.com and we decided to show a result from someresult.com. Since then, we’ve changed our heuristics to make showing the source url for 302 redirects much more rare. We are moving to a framework for handling redirects in which we will almost always show the destination url. Yahoo handles 302 redirects by usually showing the destination url, and we are in the middle of transitioning to a similar set of heuristics. Note that Yahoo reserves the right to have exceptions on redirect handling, and Google does too.
Based on our analysis, we will show the source url for a 302 redirect less than half a percent of the time (basically, when we have strong reason to think the source url is correct).
Q: What are supplemental results?
A: Supplemental results usually only show up in the search index after the normal results. They are a way for Google to extend their search database while also preventing questionable pages from getting massive exposure.
Q: Okay, how about supplemental results. Do supplemental results cause a penalty in Google?
A: Nope.
Q: How to get out of Google Supplemental results?
A: If you were recently thrown into then the problem may be Google. You may just want to give it a wait, but also check to make sure you are not making errors like www vs non www, content management errors delivering the same content at multiple URLs (doing things like rotating product URLs), or too much duplicate content for other reasons (you may also want to check that nobody outside your domain is showing up in Google when you search for Site Saturation – site:mysite.com and you can also look for duplicate content with www.copyscape.com).



Comment Moderation, Comment Spam, Pingbacks, Trackbacks
Managing Comments In Blogging
In SEM on February 27, 2008 at 9:54 am• Pingbacks
• Verifying Pingbacks and Trackbacks
• Comment Moderation
• Comment Spam
Trackbacks
In a nutshell Trackback – is a method of person A saying to person B, “This is something you may be interested in.” To do that, person A sends a TrackBack ping to person B.
• Person A writes something on their blog.
• Person B wants to comment on Person A’s blog, but wants her own readers to see what she had to say, and be able to comment on her own blog
• Person B posts on her own blog and sends a trackback to Person A’s blog
• Person A’s blog receives the trackback, and displays it as a comment to the original post. This comment contains a link to Person B’s post
1. Pingbacks and trackbacks use drastically different communication technologies (XML-RPC and HTTP POST, respectively).
2. Pingbacks support auto-discovery where the software automatically finds out the links in a post, and automatically tries to pingback those URLs, while trackbacks must be done manually by entering the trackback URL that the trackback should be sent to.
3. Pingbacks do not send any content.
• Person A posts something on his blog.
• Person B posts on her own blog, linking to Person A’s post. This automatically sends a pingback to Person A when both have pingback enabled blogs.
• Person A’s blog receives the pingback, then automatically goes to Person B’s post to confirm that the pingback did, in fact, originate there.
Comments on blogs are often criticized as lacking authority, since anyone can post anything using any name they like: there’s no verification process to ensure that the person is who they claim to be. Trackbacks and Pingbacks both aim to provide some verification to blog commenting.
Comment Moderation
Comment Moderation is a feature which allows the website owner and author to monitor and control the comments on the different article posts, and can help in tackling comment spam. It lets you moderate comments, & you can delete unwanted comments, approve cool comments and make other decisions about the comments.
Comment Spam refers to useless comments (or trackbacks, or pingbacks) to posts on a blog. These are often irrelevant to the context value of the post. They can contain one or more links to other websites or domains. Spammers use Comment Spam as a medium to get higher page rank for their domains in Google, so that they can sell those domains at a higher price sometime in future or to obtain a high ranking in search results for an existing website.
Spammers are relentless; because there can be substantial money involved, they work hard at their “job.” They even build automated tools (robots) to rapidly submit their spam to the same or multiple weblogs. Many webloggers, especially beginners, sometimes feel overwhelmed by Comment Spam.