PHP Code

PHP: Parse Email Body with Email Piping

I’ve had a lot of request for more posts about email piping, specifically filtering the email body and obtaining just the reply content. It’s taken me a while to get around to writing this even though the code is about 6 months old. If you’re new to email piping, check out some of my previous posts: learn the basics , isolate the email headers , save attachments and process bounce emails . Now lets get down to biz.

The Breakdown

I am assuming that you know how to obtain the email body. If not, see . This was built for a ticketing system which filters out anything below the primary email (ie. The persons reply and nothing below it). This is not ideal (if someones replies to specific quotes throughout someones email) but it worked for my situation.

The basics of this is that we break up the message into an array of lines by exploding on the carriage return. Then we cycle through each line and if it doesn’t match any of our end of email patterns, we add it a new message body. If it matches one of our patterns for the end of the primary message, we break the loop and should have all of the primary email in our new body.

The Code

//get rid of any quoted text in the email body
$body_array = explode("\n",$body);
$message = "";
foreach($body_array as $key => $value){
	
	//remove hotmail sig
	if($value == "_________________________________________________________________"){
		break;
	
	//original message quote
	} elseif(preg_match("/^-*(.*)Original Message(.*)-*/i",$value,$matches)){
		break;
	
	//check for date wrote string
	} elseif(preg_match("/^On(.*)wrote:(.*)/i",$value,$matches)) {
		break;
	
	//check for From Name email section
	} elseif(preg_match("/^On(.*)$fromName(.*)/i",$value,$matches)) {
		break;
	
	//check for To Name email section
	} elseif(preg_match("/^On(.*)$toName(.*)/i",$value,$matches)) {
		break;
	
	//check for To Email email section
	} elseif(preg_match("/^(.*)$toEmail(.*)wrote:(.*)/i",$value,$matches)) {
		break;
		
	//check for From Email email section
	} elseif(preg_match("/^(.*)$fromEmail(.*)wrote:(.*)/i",$value,$matches)) {
		break;
		
	//check for quoted ">" section
	} elseif(preg_match("/^>(.*)/i",$value,$matches)){
		break;
	
	//check for date wrote string with dashes
	} elseif(preg_match("/^---(.*)On(.*)wrote:(.*)/i",$value,$matches)){
		break;
			
	//add line to body
	} else {
		$message .= "$value\n";
	}
        	
}

//compare before and after
echo "$body


$message";

After reviewing this code, it is somewhat crude and I may end up updating it to be a little more robust. The loop is slow and I think I could do most of it with just preg_replace statements now that my pattern matching skills are a lot better.

The thing about trying to filter out the quoted replies is that every mail client does something different for quoting replies. Most quoted sections use “>” to denote a quoted line but the initial line (ie. “On Jan. 19th 2011 Skye wrote:”) varies a lot. I’ve tested this on a lot of emails and it seems to work for ~95% of emails.

It does fail when someones breaks the reply into sections and writes their response below each quoted section. I don’t consider that to be a huge problem as most people write their response in the top of the email, plus if you do manage to filter out the quoted sections properly you won’t know which part of the response is corresponding to which part of the previous reply.

It also fails for other languages but can be easily converted for people wanting to use it in other languages.

So I hope this filled everyones appetites. Let me know if you have any questions or comments.

Continue Reading

  • davis

    Thank you for posting such a nice solution.

  • You just saved my life:) Thanks

  • Thanks! This seems to work fine for me. I’ll be looking to build it into a fully working email reply system for my clients soon. 🙂

  • Morg.

    Just so you know, your regex is reaaallly bad 😉

    Basically, you can put all that in a single regex quite easily, you need to do a preg_split based on (“/($string1|$string2|$string3|….)/”) and to make sure it’s all on the same line, just use \n instead of the string start delimiter + avoid other characters being \n for all the .* .

    BTW, this would be like dozen times faster, too 😉

    • Skye

      Ya my regex was pretty terrible when I first coded this. It’s definitely better now but still lots of learn with regex. So powerful.

  • This is brilliant as I messed with piping a long time ago.. Way to long ago remember anything useful!
    I want to strip .JPG images out of an email, upload them to a MySQL database with a time/date stamp, with a unique “event” reference for each email associated to the e-mail address of the sender.
    Renaming the images also might be useful, 1.jpg-5.jpg for example.
    A bit of checking to see if the senders e-mail address is a valid customer would be useful too.
    Now if someone would be so kind to write this for me while I go to the pub I’d be ever so grateful 🙂 – I don’t ask much do i!?! lol
    .