Jump to content

Recommended Posts

Posted

Here's a simplified version of what I'm doing:

 

# echo "<HTML>Testing</HTML>" | fgrep -i testing

<HTML>Testing</HTML>

 

No problems, it's working great. But I have a

need to remove any special formatting or junk

that may be included in that text.

 

For instance, my script fails on this kind of grep:

 

# echo "<HTML><B>Test</B>ing</HTML>" | fgrep -i testing

#

 

Is there some sed/awk method of removing any

<any character in here> from a line before I try

to do my greps? I desire to grep data, not HTML.

 

I'm no sed/awk expect, so I'm curious what to do!

Any help is clearly appreciated. :P

Posted (edited)

sed -e 's/<[^>]*>//g' should remove all HTML tags.

 

For example:

>$ echo "<HTML><B>Test</B>ing</HTML>" | sed -e 's/<[^>]*>//g'                  
Testing

 

So in your example above, you can add grep on the end:

>echo "<HTML><B>Test</B>ing</HTML>" | sed -e 's/<[^>]*>//g' | fgrep -i testing

Edited by TCH-MikeJ

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...