thehemi Posted September 10, 2004 Posted September 10, 2004 Here's a simplified version of what I'm doing: # echo "<HTML>Testing</HTML>" | fgrep -i testing <HTML>Testing</HTML> No problems, it's working great. But I have a need to remove any special formatting or junk that may be included in that text. For instance, my script fails on this kind of grep: # echo "<HTML><B>Test</B>ing</HTML>" | fgrep -i testing # Is there some sed/awk method of removing any <any character in here> from a line before I try to do my greps? I desire to grep data, not HTML. I'm no sed/awk expect, so I'm curious what to do! Any help is clearly appreciated. Quote
MikeJ Posted September 10, 2004 Posted September 10, 2004 (edited) sed -e 's/<[^>]*>//g' should remove all HTML tags. For example: >$ echo "<HTML><B>Test</B>ing</HTML>" | sed -e 's/<[^>]*>//g' Testing So in your example above, you can add grep on the end: >echo "<HTML><B>Test</B>ing</HTML>" | sed -e 's/<[^>]*>//g' | fgrep -i testing Edited September 10, 2004 by TCH-MikeJ Quote
thehemi Posted September 10, 2004 Author Posted September 10, 2004 Awesome, that should work perfect, thanks! Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.