News:

Yahoo Groups closing on Dec 14th 2019

Main Menu

Weird search results

Started by Charlie, April 25, 2005, 04:54:00 PM

Previous topic - Next topic

Charlie

I am trying to find messages with the words "font" and "size" in them. I'm entering the following search term, and selecting Exact match, into the Message box on the Find All dialog:

"font" AND "size"

However, I get 1000's of hits, most of which don't contain either of these terms, let alone both of them. For example, one message returned was "Thanks a lot to all of you. it was very helpful for me. thomas"

How can I get the search to work as intended?

Wilson Logan

#1
Hi Charlie,

Ahhh... PGO is searching the full message (including all the HTML). Obviously "font" and "size" is going to appear in nearly every message.

Maybe search for " font " and " size ".  


Cheers,

Wilson.

Charlie

Thanks. That makes sense, but obviously the search isn't working as intended and your suggested get-round is going to miss messages. Are there any plans to fix this by stripping out the HTML before searching?

Wilson Logan

>Thanks. That makes sense, but obviously the search isn't working as intended and your suggested get-round is going to miss messages.

Will it? I guess so... maybe try:

"font" and "size" and not "<font" and  "size="

> Are there any plans to fix this by stripping out the HTML before searching?

Thats possible but it'd make big searches very CPU intensive. I can't say I'm planning to add this.

Cheers,

Wilson.

Charlie

>"font" and "size" and not "<font" and  "size="

That does not work either, although I'm not sure why. It returns far fewer hits, but none seem to contain the search words.

> it'd make big searches very CPU intensive.

Firstly, so what? Secondly, why not simply have a duplicate copy of the message body in the database. One will be with HTML for display, and the other would be plain text for searching. That would solve your CPU problem.

At the moment, you in effect need a good knowledge of HTML in order to search messages correctly, although a lot of the time you'll get away with it. But PG Offline would be useless for searching a group about web site construction, for example.

Wilson Logan

>"font" and "size" and not "<font" and  "size="

That does not work either, although I'm not sure why.

>>>> I'm not sure why either. I just a bit hampered by the fact that i don't have any suitable test data. Can you send me an archive (say 1000 messages) including some that should be hits?

It returns far fewer hits, but none seem to contain the search words.

> it'd make big searches very CPU intensive.

Firstly, so what?

>>>> I'm not sure I'd like to wait 10 minutes for a search that would have taken 30 seconds because some guy once wanted to do a search for "font" and "size". Also, you might like to look at this thread to see whats actually involved in removing HTML & just leaving the text behind.

http://pgoffline.com/forum/index.php?board=5;action=display;threadid=338;start=msg1647#msg1647

Secondly, why not simply have a duplicate copy of the message body in the database. One will be with HTML for display, and the other would be plain text for searching. That would solve your CPU problem.

>>>> I refer you to my last comment about what this involves.


At the moment, you in effect need a good knowledge of HTML in order to search messages correctly, although a lot of the time you'll get away with it.

>>>> Almost always in fact.

But PG Offline would be useless for searching a group about web site construction, for example.

>>>> Ok, now that I can accept.

>>>> There is one more issue to deal with.... I've tried very hard to stop PGO being used as a 'Ripper'. I'd have included a txt only version of the messages from the first version but for that problem.

The real hurdle to stealing a group is the HTML. Once the messages are in txt format its easy to strip Yahoo messages & port them to another forum format or BBS.

On balance, I think the best I can do is to offer an option in the search facility to ignore the HTML (at a CPU penalty). Then if you need it, you've got it.  

Cheers,

Wilson.





SMF spam blocked by CleanTalk