janos erdelyi

C# Trackback, Part 2

this part of the series on how i implemented trackback in C# revolves around the trackback receiver.

this is based on the specifications on trackback

i will not be presenting one giant block of code like i did with Part 1 since this is a larger piece. i will try to retain a sense of order however.

alrighty, enough preface, let’s jump in.

first off, the trackback page is expecting a POST with a required field named “url“. please note that all of my form and querystring field names will be case-sensitive.

one thing that was initially confusing to me was that in addition to the trackback POST information, the page also needs a piece of querystring info to determine the article in question which is being trackbacked to.

for the sake of this article, the querystring info will be filtered down to a System.Int32 value which is the unique id of the article – aptly named “articleId“.

i will skip the incredibly boring code revolving around converting the querstring System.String info into System.Int32. suffice it to say that if a proper integer cannot be pulled from the querystring, an error is thrown since this can be considered an improper page request.

assuming we have gotten this far, i’d also recommend checking that the article by that id does in fact exist and is active.

i made a handy quick little method to handle all of my messages, since according to the specification, messages must follow a specific format.

private void serverResponse(
	int errorCode,
	string errorMessage
) {
	Response.ClearContent();
	Response.ContentType = "text/xml";
	Response.ContentEncoding = System.Text.Encoding.UTF8;
	
	Response.Write("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
	Response.Write("<response>");
	Response.Write("<error>" + errorCode.ToString() + "</error>");
	if (errorCode != 0) {
		Response.Write("<message>" + errorMessage + "</message>");
	}
	Response.Write("</response>");
	
	Response.End();
	return;
}

you really should take a second to familiarize yourself with the response specification if you have not already. it is very simple and very direct.

so, that error thrown as described above was really calling serverResponse instead.

moving on… once we have determined that a real and active article is being requested, the bare minimum required POST information is the url.

if (Request.Form["url"] == null) {
	serverResponse(1, "The parameter 'url' is required.");
}

i created a plain little private class in which to stuff the trackback information. here it is to save you some typing if you like:

private class TB
{
	public TB () {
		
	}
	
	public string Title {
		get { return title; }
		set { title = value; }
	}
	
	public string Exerpt {
		get { return exerpt; }
		set { exerpt = value; }
	}
	
	public string Url {
		get { return url; }
		set { url = value; }
	}
	
	public string BlogName {
		get { return blogName; }
		set { blogName = value; }
	}
	
	public string IpAddress {
		get { return ipAddress; }
		set { ipAddress = value; }
	}
	
	public int ArticleId {
		get { return articleId; }
		set { articleId = value; }
	}
	
	private string title;//optional
	private string exerpt;//optional
	private string url;//required
	private string blogName;//optional
	private string ipAddress;//not part of the spec. using this for internals
	private int articleId;//also, not part of spec. using for internal needs
}

you may note that i am also storing the ip address of the POSTer. this is for other uses, such as whitelisting or blacklisting. i know that an ip address can be a bit broad, but if you‘re having problems with someone, ip-based blacklisting can be handy. also, i just like to look to see who is tracking back (i’ve been thinking of hooking this up to a geo-ip system).

here we go, stuffing the info in.

tb.Url = Request.Form["url"];
tb.Title = Request.Form["title"];
tb.BlogName = Request.Form["blog_name"];
tb.Exerpt = Request.Form["exerpt"];
tb.IpAddress = Request.UserHostAddress;
tb.ArticleId = articleId;

as i mentioned about case-sensitivity – the specification used those form names in lower-case, so i am as well. i see no need to be friendly about it and check different cases. follow the specification!

next i also check the calling url to see if it at least contains a link to my page in it. this is a minor, but fairly effective manner of suppressing trackback spam.

//check to see that the calling reference actually has our site linked
//if not, it could easily be trackback spam
if (
	checkReference(
	tb.Url,
	blog,
	articleId
) == false
) {
	serverResponse(1, "No reference for this page '" + tb.Url + "' was found on your site. Please verify that the 'url' you have provided is correct.");
}

of course you’ll want the source to the reference checker. here it goes:

private bool checkReference(
	string url,
	com.janoserdelyi.nagry.DataObjects.Blog blog,
	int articleId
) {
	/*i should be looking for two main formats:
	<my url>/ArticleDisplay.aspx?articleId=<articleId>
	<my url>/permalink/article/<articleId>.aspx
	
	let the screen scraping begin!
	*/
	string html = GetRemoteHTTPGet(url, blog);
	
	if (html.Length == 0) {
		return false;
	}
	
	string myUrl1 = "http://" + blog.BlogFqdn + "/ArticleDisplay.aspx?articleId=" + articleId.ToString();
	string myUrl2 = "http://" + blog.BlogFqdn + "/permalink/article/" + articleId.ToString() + ".aspx";
	
	if (html.IndexOf(myUrl2) != -1) {
		return true;
	}
	if (html.IndexOf(myUrl1) != -1) {
		return true;
	}
	
	return false;
}


private string GetRemoteHTTPGet(
	string url,
	com.janoserdelyi.nagry.DataObjects.Blog blog
) {
	// get the HTML from the URL of the url that called this script
	System.Net.WebRequest webRequest = System.Net.WebRequest.Create(url);
	
	if (webRequest is System.Net.HttpWebRequest) {
		((System.Net.HttpWebRequest)webRequest).UserAgent = blog.BlogFqdn + " Trackback Checker";
		((System.Net.HttpWebRequest)webRequest).Referer = "http://" + blog.BlogFqdn + "/";
		((System.Net.HttpWebRequest)webRequest).Timeout = 6000;
	}
	
	System.Net.WebResponse webResponse = webRequest.GetResponse();
	// Get the response stream.
	System.IO.Stream responseStream = webResponse.GetResponseStream();
	
	// Use a StreamReader to read the entire response
	System.IO.StreamReader reader = new System.IO.StreamReader(responseStream, System.Text.Encoding.ASCII);
	string readerToEnd = reader.ReadToEnd();
	reader.Close();
	return readerToEnd;
}

once all of that is done, it’s up to you to insert the info into your database, whatever that may be – xml, postgresql, mysql, mssql, text files, whatever you use.

and then declare success!

serverResponse(0, "Success");

i hope that was helpful.

i’d like to note the main source i used while mucking my way through this (since i knew nothing about trackback prior to writing my own.) An ASP.NET TrackBack handler in C# my implementation is very much largely based on his, but altered to not only fit my code-base, but also to more closely fit my personal coding style preferences. his resource is excellent, however. i really did not find nearly as much online as i was expecting when i first set out, so i hope that adding my version to the heap will help someone out.

feel free to use the code as you see fit. all i ask is that some credit with my name and a link to this blog be present.

Comments

Hi Janos,

I was looking through my web stats and I noticed the link back to my website from yours. Thank you for including a link back to my work and the kind comments. It’s much appreciated. :-)

One of the things you can do with the remote screen scrape is to extract the page title and excerpt using regular expressions. It gives a much more effective trackback title compared to relying on the sender.

How much spam are you getting? Have you looked at blog search as a possible replacement for trackback yet?

I’ve recently added automatic blacklisting, which has cut the spam quite dramatically. If you the app thinks the trackback is spam then it returns a 404. I’m, guessing that the spammers have a tendency to clean out 404s to make their spamming quicker.

Regards

Ben

Ben (posted 10/18/2006 11:27 AM)

Ben,

Sorry for the late reply. i had changed my mail notification of messages to send to gmail, and it tossed it in the spam bin, ironically enough.

thankfully i’ve yet to get any spam. I’ve read up on various other technologies and approaches, but i’ve gotten so bogged down with work that i just haven’t pursued anything with my blogware in quite some time.

I had been thinking of doing some automated blacklisting. I’ve built in banlist type of function which kills requests right as they hit the web server, but i don’t really tap into it yet.

janos (posted 10/24/2006 3:02 PM)
hi:
I’m coming from china.I come here by chance and have get much more by
this article .Thank you very much.I’m programming by c# ,but not a master,hope to become a friend with you.
My english is not very well,so please forgive me.
han wei (posted 10/27/2006 2:38 AM)

hi Janos,

I have tried to implement a trackback system using your code. There is a doubt. You have passed only two parameters to GetRemoteHTTPGet() method, one is url – Is this the trackback ping url ….in the field ‘url’ or what url ?, second is blog object. The Structure of my blog object doesn’t match with the one you have used during your side of the implementation. Tell me what should I put in place of Blog.blogFqdn?...... Here is my SITUATION – I have a website ‘site1’ which is sending a ping using SendTrackbackPing() to site2/trackback.aspx?ID=5, on the other end i.e. ‘site2‘, I have trackback.aspx which has method to handle this trackback Request from site1 via GetRemoteHTTPGet() method. I am getting a trackback fail because I am not able to pass proper Blog object. What are the mandatory fields in this Blog object. And how are you actually utilizing the parameters of Blog object????
Once I am done with this basic exchange of trackback pings…..ony then I will be able to reach somewhere near implementing a working system.
Please help:(

NITIN

NITIN (posted 1/21/2008 12:18 PM)

NITIN, you can substitute out the Blog object with anything you like really. it is an object in my own implementations that has a few properties, but here i am just using it for the referrer to allow for better web stats and tracking.

janos (posted 5/28/2008 9:18 AM)

Trackbacks

There are no trackbacks for this article.
Leave a Comment
Note: comments are moderated. I reserve the right to reject comments posted to my site.

I may also edit them if they somehow interfere with the markup or functionality of my site.

I also reserve the right to re-use any comments posted to this site, though i will endeavor to maintain their original context.