PHP: Parse emails with email piping – Part 2

In my first post, "PHP: Parse emails with email piping – Part 1", about piping email with PHP, I detailed how to set up qmail to send an email to a PHP script, read the email, output it. Now that you have the basics down, we can get into more email processing with PHP.

When I wrote my first parsing script I did pattern matching on all the mail fields (To, From, Subject, etc). It looked pretty ugly but it worked. Then I found this great class that breaks down all the header info into an easy array you can parse.

First let start with what you should be familiar with from my previous script.

#!/usr/bin/php
<?php
//debug
#ini_set ("display_errors", "1");
#error_reporting(E_ALL);

//include email parser
require_once('/path/to/class/rfc822_addresses.php');
require_once('/path/to/class/mime_parser.php');

// read email in from stdin
$fd = fopen("php://stdin", "r");
$email = "";
while (!feof($fd)) {
    $email .= fread($fd, 1024);
}
fclose($fd);

//create the email parser class
$mime=new mime_parser_class;
$mime->ignore_syntax_errors = 1;
$parameters=array(
	'Data'=>$email,
);
	
$mime->Decode($parameters, $decoded);

print_r($decoded);

I included rfc822_addresses.php and mime_parser.php from the mime parser class linked above, read the email from stdin and fed the email into the mime parser. The mime parser provides sample emails you can use for testing. Printing out the decoded array will show you all of the different parts of the email.

Next we get all our favourite header data:


//---------------------- GET EMAIL HEADER INFO -----------------------//

//get the name and email of the sender
$fromName = $decoded[0]['ExtractedAddresses']['from:'][0]['name'];
$fromEmail = $decoded[0]['ExtractedAddresses']['from:'][0]['address'];

//get the name and email of the recipient
$toEmail = $decoded[0]['ExtractedAddresses']['to:'][0]['address'];
$toName = $decoded[0]['ExtractedAddresses']['to:'][0]['name'];

//get the subject
$subject = $decoded[0]['Headers']['subject:'];

$removeChars = array('<','>');

//get the message id
$messageID = str_replace($removeChars,'',$decoded[0]['Headers']['message-id:']);

//get the reply id
$replyToID = str_replace($removeChars,'',$decoded[0]['Headers']['in-reply-to:']);

Now it’s time to get the body.


//---------------------- FIND THE BODY ------------------//

//get the message body
if(substr($decoded[0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Body'])){
	
	$body = $decoded[0]['Body'];

} elseif(substr($decoded[0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Parts'][0]['Body'])) {
	
	$body = $decoded[0]['Parts'][0]['Body'];

} elseif(substr($decoded[0]['Parts'][0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Parts'][0]['Parts'][0]['Body'])) {
	
	$body = $decoded[0]['Parts'][0]['Parts'][0]['Body'];

}

Depending on how the email is sent, the body could be in a variety of places. I’m looking for the plaintext version as I don’t want to deal with having to remove any HTML formatting. Most emails will provide both a plaintext and HTML version though I have run into some emails that only sent an HTML version. $body will contain the entire body of the email, including any quoted text from an ongoing thread.

Now lets put it all together.

#!/usr/bin/php
<?php
//debug
#ini_set ("display_errors", "1");
#error_reporting(E_ALL);

//include email parser
require_once('/path/to/class/rfc822_addresses.php');
require_once('/path/to/class/mime_parser.php');

// read email in from stdin
$fd = fopen("php://stdin", "r");
$email = "";
while (!feof($fd)) {
    $email .= fread($fd, 1024);
}
fclose($fd);

//create the email parser class
$mime=new mime_parser_class;
$mime->ignore_syntax_errors = 1;
$parameters=array(
	'Data'=>$email,
);
	
$mime->Decode($parameters, $decoded);

//---------------------- GET EMAIL HEADER INFO -----------------------//

//get the name and email of the sender
$fromName = $decoded[0]['ExtractedAddresses']['from:'][0]['name'];
$fromEmail = $decoded[0]['ExtractedAddresses']['from:'][0]['address'];

//get the name and email of the recipient
$toEmail = $decoded[0]['ExtractedAddresses']['to:'][0]['address'];
$toName = $decoded[0]['ExtractedAddresses']['to:'][0]['name'];

//get the subject
$subject = $decoded[0]['Headers']['subject:'];

$removeChars = array('<','>');

//get the message id
$messageID = str_replace($removeChars,'',$decoded[0]['Headers']['message-id:']);

//get the reply id
$replyToID = str_replace($removeChars,'',$decoded[0]['Headers']['in-reply-to:']);


//---------------------- FIND THE BODY -----------------------//

//get the message body
if(substr($decoded[0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Body'])){
	
	$body = $decoded[0]['Body'];

} elseif(substr($decoded[0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Parts'][0]['Body'])) {
	
	$body = $decoded[0]['Parts'][0]['Body'];

} elseif(substr($decoded[0]['Parts'][0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Parts'][0]['Parts'][0]['Body'])) {
	
	$body = $decoded[0]['Parts'][0]['Parts'][0]['Body'];

}

//print out our data
echo "

Message ID: $messageID

Reply ID: $replyToID

Subject: $subject

To: $toName $toEmail

From: $fromName $fromEmail

Body: $body

"; //show all the decoded email info print_r($decoded);

Now you can use this data for many different applications:

  • insert into a database
  • create an auto reply email to the sender

I use it for a support ticketing system where I log the email into the database and update the ticket it’s associated with.

I may end up writing a Part 3 for removing quoted text from the email body so you only end up with what the sender wrote and saving any attachments.

If you run into any issues, drop me a comment or let me know if there is any other topics you want covered.

Thanks.

UPDATE: Removing Quoted Body Text Article and Saving Attachments Article

Continue Reading

  • http://livinginlucerne.blogspot.com Bruno

    Nice article.
    I’m also doing a similar work but in Perl.
    I wonder if you will also implement getting the attachments that might come together. I haven’t been successful yet in that.

    Cheers,
    Bruno

    • Skye

      Definitely. I’ll try to get a part 3 up once I have some time to do some testing. I believe I have the bulk of it coded already.

  • http://mattstuehler.com Matt

    Skye,

    This is an AWESOME article – well-written and very clear, even though this is a tricky topic.

    A few questions…

    1. You mention that some emails ONLY contain HTML in the body – I’m running into a few cases like that. How would you modify the code above to capture the body in those cases?

    2. Any chance of getting an early preview of part 3? (It’s exactly what I’m trying to do!)

    Of course, no worries if you don’t have time to respond – this article has already been immensely helpful.

    Many thanks!

  • Skye

    Thanks Matt.

    For HTML only emails, you could check to see if the body variable is empty after the code that grabs the plaintext body. If it’s empty, you would do the same check but look for text/html instead of text/plain. Once you find that, run strip_tags() to remove all the HTML formatting which should leave you with just the text.

    Looks like I better get moving on the attachment article. When you print out the decoded array, you should see any attachments in it. It will tell you the filename, type and the data. Most data is base 64 encoded so it would be a matter of taking the data and running base64_decode() on it, creating a new file with attachment name or at the very least the attachment extension and then writing the data to that file.

  • Skye

    Ok I have attachments working. I should have a post up soon.

  • MrJ

    Great article – looking forward to part 3, I hope it is still in the pipework!

    • MrJ

      Oh, I see part 3 now (http://www.damnsemicolon.com/php/php-parse-emails-email-piping-attachments-part-3) but it only covers attachments, not removing quoted text/original email – it would be great to just get only the reply while stripping everything else away :)

      • Skye

        I have the code for filtering the quoted text and just getting the reply done. I’ll try to have a post up for it soon.

        • MrJ

          Great! Thanks! I’ll look forward to seeing it sometime soon :)

  • Rob

    Hey, I know it’s only been a couple weeks since MrJ asked, but have you had a chance to figure out the quoted text issue? I’m still trying to figure it out.. hopefully its not a missing semicolon.. :)

  • Skye
  • deepti baghel

    Great article. Saved a whole lot of efforts to capture the clean body of email.
    Thanks a lot.

  • theputernerd

    Thank you. After a bit of mucking around your code works well. For me I was after the contents of the email between and – adding another section with and from text/plain to text/html did the trick.

    The code works well for cleanly created html emails, however does not work for a microsoft generated html email i.e. with a html tag like
    .

    I wonder if I can have outlook set to create W3C compliant emails.

    Now to implement to database change –

    Thanks

    • Skye

      If you have any luck with Outlook let me know. It loves to insert garbage.

  • Andrew

    Why is it not mentioned that you need to call this before you call $mime-> decode —

    $mime=new mime_parser_class;

    Thanks

    • Skye

      Ooops my fault. It was there but I changed to a different syntax highlighter which didn’t like the bracket on the php tag so it cut off part of the code. I had to HTML encode it and now it’s showing up fine.

  • http://iboamglobal.com Ray

    I found this code very helpful but i can’t save the data

    # Message ID: $messageID
    # Reply ID: $replyToID
    # Subject: $subject
    # To: $toName $toEmail
    # From: $fromName $fromEmail
    # Body: $message

    nothing is saved into a data base, i’m also did this

    mail(‘user@domain.com’,’someone sent us an email’,$message);

    and the body is empty.

    Thanks

    • Skye

      Does $decoded have info in it? print_r($decoded);

  • Mr Quiet

    Thank you, thank you, thank you!

    Too much time spent on this one already!

    One small correction, in the last (complete) script, on line 79, it reads:
    “Body: $message ”
    But it should read:
    “Body: $body”

    Thanks again

    • Skye

      Good catch. Updated.

  • James

    Great script! Everything works.

    But in my case, I have other issue. I get a “Mail delivery failed: returning message to sender” mail back:

    A message that you sent could not be delivered to one or more of its
    recipients. This is a permanent error. The following address(es) failed:

    pipe to |/home/xxxxx/public_html/xxxxx/xxxx.php

    I don’t know if you have encounter this before I think it’s not about your script

    • Skye

      I haven’t received that error in a while. I believe it’s related to a syntax error in your script.

    • Troy

      To get rid of this error I uploaded the document in ASCII format and CHMOD the file to 755. This cleared this error.

  • Erich

    I’m getting this error for all of the variables that are $decode[0]….

    Notice: Undefined offset: 0 in /blah/blah/myscript.php

    It does not appear to be parsing anything at all.

    Any ideas?

  • Erich

    What do I have to do for this script to display the Body text? I have tried everything I can think of, but it just will not populate that information anywhere.

    • Skye

      Drop me an email through contact form and we’ll figure it out.

  • RalphF

    When I put and underscore in the subject I get =UTF8 characters in the subject line. Any way to convert these back to ascii text?

    • Skye

      You can try mb_convert_encoding($subject, ‘ASCII’, ‘UTF-8′);

  • RalphF

    Skye

    That function didn’t work. I ended up writing my own UTF8 decoder function. Works fine now. Thanks for the suggestion.

  • http://bastianstalder.ch Bastian

    Hi there
    I already used this script for an email application. What I’m trying now is to deliver the whole mail message to a certain address. I don’t need to have everything separate. Let’s say I have a mail with text and attachments. The script takes the address the mail has been addressed to and consults a database to select the address the mail should be forwarded to.

    How do I get the same message with all attachments forwarded to another address?
    Thanks for any help.

    • Skye

      I think if you use the mail() fn and leave the first 3 fields blank then put all the $email in the headers input it should work how you want it to. ie. mail(”,”,”,$email);

      You’ll need to do a find/replace on the TO: field in $email in order to replace the email address with the one you want.

      • http://bastianstalder.ch Bastian

        Thanks for your suggestion. But calling the mail function without the first parameters doesn’t send a mail. :-(

  • Andrew Stein

    It’s worth mentioning that Outlook can put the body in the 3rd part too (maybe this is something new with Outlook 2010). It seems to happen when the e-mail is HTML based and has an attachment. Here’s my code to get the plain text –

     //---------------------- FIND THE BODY -----------------------//  
    				   
    				 //get the message body  
    				 
    				 if(substr($decoded[0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Body'])){  
    				   
    					 $body = $decoded[0]['Body'];  
    				   
    				 } elseif(substr($decoded[0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Parts'][0]['Body'])) {  
    				   
    					 $body = $decoded[0]['Parts'][0]['Body'];  
    				   
    				 } elseif(substr($decoded[0]['Parts'][0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Parts'][0]['Parts'][0]['Body'])) {  
    				   
    					 $body = $decoded[0]['Parts'][0]['Parts'][0]['Body'];  
    				   
    				 } elseif(substr($decoded[0]['Parts'][0]['Parts'][0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/plain')) == 'text/plain' && isset($decoded[0]['Parts'][0]['Parts'][0]['Parts'][0]['Body'])) {  
    				   
    					 $body = $decoded[0]['Parts'][0]['Parts'][0]['Parts'][0]['Body'];  
    				   
    				 } elseif(substr($decoded[0]['Headers']['content-type:'],0,strlen('text/html')) == 'text/html' && isset($decoded[0]['Body'])){  
    				   
    					 $body = strip_tags($decoded[0]['Body']);  
    				   
    				 } elseif(substr($decoded[0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/html')) == 'text/html' && isset($decoded[0]['Parts'][0]['Body'])) {  
    				   
    					 $body = strip_tags($decoded[0]['Parts'][0]['Body']);  
    				   
    				 } elseif(substr($decoded[0]['Parts'][0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/html')) == 'text/html' && isset($decoded[0]['Parts'][0]['Parts'][0]['Body'])) {  
    				   
    					 $body = strip_tags($decoded[0]['Parts'][0]['Parts'][0]['Body']);  
    				   
    				 } elseif(substr($decoded[0]['Parts'][0]['Parts'][0]['Parts'][0]['Headers']['content-type:'],0,strlen('text/html')) == 'text/html' && isset($decoded[0]['Parts'][0]['Parts'][0]['Parts'][0]['Body'])) {  
    				   
    					 $body = strip_tags($decoded[0]['Parts'][0]['Parts'][0]['Parts'][0]['Body']);  
    				   
    				 } 
    
    
    
  • Lucian S.

    I’m glad I found you!
    Yes, very nice explained, even too good.
    Thanks!

  • http://www.jaredstenquist.com Jared

    I am unable to get the body printed. (I tried your code as well @Andrew).

    Here is the result of dumping the array $decode.

    http://pastebin.com/4TmB9dLC

    Does anybody know why i’m not getting the $body to print?

  • Guilherme

    Tanks!

  • Mark Newbegin

    I just ran a test with this and get a bounceback message. I was using a simplified version like your first script but the one from. http://jamescollings.co.uk/blog/php-email-pipe-introduction/ I like that your script lets me filter out the other stuff and possibly write it to a mysql database… but how can I get from your script to a working example. Do you have possibly a zip download I can grab that everything is built. I just don’t see how the script is handling the incoming email or even forwarding the message. thanks

    • http://www.rocketcases.com/ Skye Chilton

      Hey Mark. You have to set up piping on the email address to forward the email to the script. See part 1 for details. It depends on your host and what mail program they are using though so my instructions may not be entirely correct.

      Your host’s wiki and support should be able to point you in the right direction for getting the piping set up. It can be tricky.

      Let me know how it goes.

      • Mark Newbegin

        I did setup piping. But the other directions I was following (from that other site) specifically ask the email address to forward the email data to. Your script I think was not asking for that email address, it was just storing a record that could then be injected into a database. thanks for the prompt response. Maybe I was just missing something in the scripts. Part 1 asks for the email address for testing to make sure it was receiving the data, and in part 2 its gone from the reference I believe.

        • http://www.rocketcases.com/ Skye Chilton

          Ya part 2 just breaks down the email into its’ parts. At that point you can do whatever you want with it: email it, store in db, etc.

          Was is it you want to do?

          • Mark Newbegin

            I was interested in building a “todo” type site, that you can email yourself items and update the lists. I actually talked with a better programmer then myself, and have him looking into it, using your site as a reference to try to build it out for me. Eventually ill turn it into an app when it works.

  • sami

    thank you for the great code

    can you help me I need to know header charset. I tried:

    $decoded[0]['Headers']['charset=']

    but no luck

    I need it so I can know if I have to encode msg text or not ,some msgs I get are utf-8 and some are not

    • http://www.rocketcases.com/ Skye Chilton

      Do a print_r on $decoded[0]['Headers'] and I’m guessing it should show up in there for the actual index value. Let me know if that works.

      • sami

        thank you for the fast replay. I did print_r it’s not there

        ==================================

        Array
        (
        [from ] => myemail@gmail.com Mon Aug 25 01:49:58 2014
        [received:] => Array
        (
        [0] => from mail-ie0-f175.google.com ([209.85.223.175]:56270) by server1.server.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.82) (envelope-from ) id 1XLgbl-0007Wr-Qt for myemail@myemail.com; Mon, 25 Aug 2014 01:49:57 +0300
        [1] => by mail-ie0-f175.google.com with SMTP id x19so8935968ier.20 for ; Sun, 24 Aug 2014 15:50:00 -0700 (PDT)
        [2] => by 10.50.128.225 with HTTP; Sun, 24 Aug 2014 15:50:00 -0700 (PDT)
        )

        [dkim-signature:] => v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=jMsKCUrtrKiXViN92veCqXjCPSPECV1yOTp8T1RQ8Yg=; b=ot1zzgfH2pnsDtMppyItDdebo5nJwhKRCW/8/0K6y3aZtENdVCx4+U9pmziHDNYF++ JZYOd2G47XKcW0bB6evtYyn90M19hIShwPHuRgswOyWxV1lNkwP/tJvvcBIeNOY0meJt k8JRXlrKZpli8nUNga3e0SIJWZ86b6HkjMSyLDb6jaxSGL/KTQmfWPfoXxoUBO6nhyq2 m504lJOksd0cYtGxNH0pAGt/OczSObL+WcFzJDmeSaQWHrOcjl7lim5jcjtHPbfs88zI Zlp1DoytFhmsuO74lnHnZjp5dxWFJhmkn+JzSDjaBdr4VPjCtftGcoy1VdhiBapSVuKR zUsg==
        [mime-version:] => 1.0
        [x-received:] => by 10.50.80.116 with SMTP id q20mr12110102igx.22.1408920600190; Sun, 24 Aug 2014 15:50:00 -0700 (PDT)
        [date:] => Mon, 25 Aug 2014 01:50:00 +0300
        [message-id:] =>
        [subject:] =>
        [from:] => jon
        [to:] => myemail@myemail.com
        [content-type:] => multipart/alternative; boundary=089e01536682e42558050167e31d
        )

        ==================================

        • sami

          ok so I used preg_match to find charset and if found and not utf-8 use iconv to change encoding like so:

          if(substr($decoded[0]['Headers']['content-type:'],0,strlen(‘text/plain’)) == ‘text/plain’ && isset($decoded[0]['Body'])){

          $body = $decoded[0]['Body'];
          $found=preg_match(‘~charset=([-a-z0-9_"']+)~i’,$decoded[0]['Headers']['content-type:'],$charset);
          if ($found==1){
          $charset[1]=str_replace(‘”‘, “”, $charset[1]);
          $charset[1]=str_replace(“‘”, “”, $charset[1]);
          if ($charset[1]!=’UTF-8′ || $charset[1]!=’utf-8′)
          $body = iconv($charset[1], ‘UTF-8′, $body);
          }

          } elseif(substr($decoded[0]['Parts'][0]['Headers']['content-type:'],0,strlen(‘text/plain’)) == ‘text/plain’ && isset($decoded[0]['Parts'][0]['Body'])) {

          $body = $decoded[0]['Parts'][0]['Body'];
          $found=preg_match(‘~charset=([-a-z0-9_"']+)~i’,$decoded[0]['Parts'][0]['Headers']['content-type:'],$charset);
          if ($found==1){
          $charset[1]=str_replace(‘”‘, “”, $charset[1]);
          $charset[1]=str_replace(“‘”, “”, $charset[1]);
          if ($charset[1]!=’UTF-8′ || $charset[1]!=’utf-8′)
          $body = iconv($charset[1], ‘UTF-8′, $body);
          }

          } elseif(substr($decoded[0]['Parts'][0]['Parts'][0]['Headers']['content-type:'],0,strlen(‘text/plain’)) == ‘text/plain’ && isset($decoded[0]['Parts'][0]['Parts'][0]['Body'])) {

          $body = $decoded[0]['Parts'][0]['Parts'][0]['Body'];
          $body = $decoded[0]['Parts'][0]['Body'];
          $found=preg_match(‘~charset=([-a-z0-9_"']+)~i’,$decoded[0]['Parts'][0]['Parts'][0]['Headers']['content-type:'],$charset);
          if ($found==1){
          $charset[1]=str_replace(‘”‘, “”, $charset[1]);
          $charset[1]=str_replace(“‘”, “”, $charset[1]);
          if ($charset[1]!=’UTF-8′ || $charset[1]!=’utf-8′)
          $body = iconv($charset[1], ‘UTF-8′, $body);
          }

          I’m sure there is a better way. but this is what I did :)

    • sami

      ok so I used preg_match to find charset and if found and not utf-8 use iconv to change encoding like so:

      $found=preg_match(‘~charset=([-a-z0-9_]+)~i’,$body,$charset);
      if ($found==1){
      if ($charset[1]!=’UTF-8′ || $charset[1]!=’utf-8′)
      $body = iconv($charset[1], ‘utf-8′, $body);
      }

      I’m sure there is a better way. but this is what I did :)