News:

Yahoo Groups closing on Dec 14th 2019

Main Menu

Attachments

Started by danb, July 06, 2014, 04:58:27 AM

Previous topic - Next topic

danb

For the upcoming release 4.0.811 I have attachments downloading working.

Attachments are placed within [data-dir]\Attachments\[groupname]\

The corresponding message deletion will not delete the corresponding attachment file on your computer. But if you download the same message again and the attachment file already exist the file will be skipped.

Still to be added, I'm thinking that attachment files will show in the message window as links or possibly buttons to open the attachments.

p.s. You'll notice a new menu item under File called Open Data Folder which will open file Explorer to your data directory.

danb

(if you have version 4.0.808 or greater simply check for updates from the Help menu.)

4.0.811:
https://www.file2.me/pgoffline4/pg-offline-4.0.811-x86.exe

Attachment downloading added.

martin_lists

I notice with version 4.0.817 (and previous back to .814) that the 'copy attachments' option in the Export window is greyed out.  Can this be "un-greyed", please?

Wilson Logan

Dan & I have been discussing this.

The messages are in a database but attachments are files (docs, jpgs, xls, etc).

How do you export various formats simultaneously?

We decided to have an export file format called ".pgz"

.pgz files will be opened automatically by PG Offline and it will know what to do with the things it finds in the file.

Also, you can open it using 7zip or any other zip utility.

So, the 'copy attachments' option will be un-greyed when the new .pgz format is available.

Cheers,

Wilson.


Patch

I'm a little confused by how this is suppose to work.
I may just need to be patient.

The two problems have are:

Link between attachment and original message
Documents and photos are attached to individual messages in yahoo (most users post by emailing to the group, some times with attached documents). The attached documents and images have names based on what the original poster's file was named. The attached files from multiple messages appear to be all put together in
[data-dir]\Attachments\[groupname]\
The problem I have with that approach is if different messages use the same file name, then both can't be stored in the same directory. Also I don't know (and I can't see how the program is going to know) how to find attachments for an individual message. I would have thought PGO4 would need to prepend a PGO4 message number to the file name or create a directory for each message containing an attachment, or keep an internal copy within the SQLite database.

Downloading missed photos
Yahoo appears to have tighter restrictions on downloading photos than other attachments. Initially images downloaded, now most give an error in the format
Failed to download https://xa.yimg.com/kg/Groups/[7digit ?groupId]/or/[9 or 10 digit ?messageId]/name/[image name (space replaced by "+"].jpg to [data-dir]\Attachments\[groupname]\[image name].jpg. The remote server returned an error: (404) Not found

A file is created in [data-dir]\Attachments\[groupname]\ and rapidly deleted.
The relevant message in PGO4 has the correct number of attachments listed, and if I select the message and
Messages -> Open In Yahoo --> the message shows in my web browser complete with images

My question is how to I get these images into PGO4? If I try to re-download the message by changing the settings to include messages without downloaded attachments, the downloaded message is skipped.

Patch

Quote from: Patch on November 01, 2014, 02:39:07 AM
...missed photos
Yahoo appears to have tighter restrictions on downloading photos than other attachments. Initially images downloaded, now most give an error...
Update:
Some photos continue to come through, it is just all the larger photos that do not. Photos less than about 40KB download. The majority of my forums images are jpg straight from a digital camera, so much larger and don't download.
Makes me wonder if PGO4 needs to wait enough time to display the standard screen image, prior to downloading the original.
Just trying to guess why doing it manually works fine but PGO4 gets a server error.

Wilson Logan

Hi Patch,

I think the issue with Photos is that PGO captures the thumbnails but cannot download the underlying source image.

This is an issue with cookies. You can see it yourself if you display a full image from Yahoo, right click it, choose Properties, copy the URL and try pasting it into a new browser window. It will not download.

AFAIK attachments currently do not download so what happens with them vis-à-vis their filenames is moot.

I suspect that attachments will be saved in a different folder from Files & Photos.

Attachments has a bit of history behind it. Yahoo initially allowed them & early versions of PGO captured them. Then (IIRC in 2003) Yahoo banned attachments and only relatively recently resurrected this feature.

Cheers,

Wilson.

Patch

#7
Quote from: Wilson Logan on November 02, 2014, 03:56:35 PM
I think the issue with Photos is that PGO captures the thumbnails but cannot download the underlying source image.

This is an issue with cookies. You can see it yourself if you display a full image from Yahoo, right click it, choose Properties, copy the URL and try pasting it into a new browser window. It will not download.

You surprise me.
I thought that had been fixed.

Quote from: danb on July 01, 2014, 09:07:08 PM
Try the photo download in the latest build 4.0.808

The maximum sized photos should download.  Let me know of any group where it fails.

So I had a closer look.
There are 3 image resolutions used by yahoo. A message page thumbnail, Yahoo image viewer, and full resolution.
It is true if you copy the final link into a web browser it doesn't work, even if ?download=1 is appended.
If you  use links to the message, display the standard yahoo sized image, you can then download the full resolution.

PGO4 does download some attached photos (jpg files). Looking at the file links in yahoo, PGO4 and email client for images that do and don't download, I couldn't see any difference. The only consistent difference I could find was image size. Large attached images mostly don't download but small attached images and non image attachments seam to work.
It is possible yahoo have specifically blocked the downloading of high resolution images. I was hoping it was just the PGO4 image scraper was almost, but currently not fully compatible with yahoo image download.

Perhaps I'm overly optimistic.
Anyhow, thanks for a great program.


Wilson Logan

I surprise myself. You are quite correct. It was fixed.

Well, that part of it was fixed.

Do you have a rough idea of the cut off point (size wise) for images that fail to download?




Wilson Logan

#9
I just ran a test. I uploaded 35 images ranging in size from 2.4Mb to 3.2Mb in a newly created folder to a group I own (hobbicast_coffee_lounge).

I was able to download them all at full resolution using 4.0.819.

My initial test was to download images manually from old albums. At first I thought there was a problem as the images were all under 100kb. Having checked the source images I can see that are simply small images. Most date from over 10 years ago.





Patch

Attached images does not work reliably for me from hobbicast_coffee_lounge either.
To test I downloaded all messages (attachments start at message 10988 so starting from there is a far more efficient way of testing).
The following images didn't download

MessageImage NamesPost date#totalPGO4 behaviour
11038image021.jpg image020.jpg image019.jpg image018.jpg08/14/200920879KB5/20 detected, none downloaded
11087RedneckSanta.JPG12/14/200911.9MBdetected not downloaded
11121ATT76098874.jpg ATT76098863.jpg ATT76098852.jpg ATT76098841.jpg02/09/20105442KB5/5 detected, 0/4 images downloaded
11145circuit_diagram.jpg04/29/20101134KBdetected not downloaded
11146DSCF1312.JPG04/29/201011.8MBdetected not downloaded
11228earthquake.jpg11/07/20111775KBdetected not downloaded
11232relief.jpg DSCF1970.JPG01/20/201222.2MBdetected not downloaded

The 5 images PGO4 did download were (message number not listed as I'm not sure how to readily find it)

Image nameSize
IMG_0282.jpg3.2MB
watch-makers-lathe.jpg43KB
image001.jpg18KB
humidity1-day.png131KB
humidity2-month.png78KB

Not sure why one large image worked. Most of the attached image which have downloaded for me are less 150KB but I haven't checked the size of all images not downloaded in other groups.

Hope this helps

Patch

PS
Just to make sure we are talking about the same thing:
Images that are uploaded explicitly to a yahoo folder appear to work with PGO4 via
Group -> Download Photo -> list-> <Folders_list> -> Download

In groups where most users do not have a yahoo account then images are instead attached in email client and emailed to
<group_Name>@yahoogroups.com  eg hobbicast_coffee_lounge@yahoogroups.com
The images are then shown as attachments in PGO4 (message panel 7th column)
It is with larger images posted in this manner that PGO4 is having trouble downloading



Wilson Logan

Hi Patch,

That's interesting. I was unaware that any attachment downloading worked.

I tried to get both of the attachments in my posts 11266 & 11269 without success.

IMG_0282.jpg is one of mine (from 11269).

Odd that you can download it & not me.

Cheers,

Wilson.

Patch

#12
My guess is you are closer to the yahoo servers than me so my requests are slightly slower than yours.
For the large IMG_0282.jpg image I think I was just lucky.

Have you noticed when viewing images in the yahoo message browser. When displaying a large image, some times a spinning wheel is displayed, prior to the download button being available. (I guess yahoo is converting the raw image to the yahoo image viewer resolution).

My theory is PGO4 is requesting the raw image before the download button is displayed (user seeing spinning wheel and yahoo still processing raw image). Resulting in Yahoo image server reporting image not available.

This would explain why larger images mostly don't download (longer processing time), and I'm doing better (further from yahoo servers, so slightly longer request delay). Inexact file size for failure is also consistent with variable transmission times and server load.

I'm hoping strategic delays in PGO4 attachment download will dramatically decrease the download error rate.

Wilson Logan

I hadn't actually noticed the spinning wheel.

What you say sounds right though. If there's a delay between the request and serving the image that may be interpreted by PGO as a failure.

I'll put it to the developer.

Cheers,

Wilson.

Patch

#14
Quote from: Wilson Logan on November 09, 2014, 09:41:51 AM
I'll put it to the developer.
Thanks, I would really appreciate it if we can get attachment download working reliably.

To answer some of my own questions
Quote from: Patch on November 01, 2014, 02:39:07 AM
Link between attachment and original message
if different messages use the same file name, then both can't be stored in the same directory.
As described in the faq, older versions PGO handled this by appending a random number to the end of the name of any attachments with a duplicate name. The format used was
(original name)rad(five digit random hex number)
This or similar solution is yet to be implemented in PGO4.
Currently if an attachment uses the same name as an older attachment, the new attachment is not downloaded. I suspect it is detecting a file of that name has already been downloaded, and assumes the old file does not need to be down loaded again, rather than recognising the attachment for this new message is yet to be downloaded and will need to be renamed (a bug / yet to be implemented feature).

Quote from: Patch on November 01, 2014, 02:39:07 AM
I don't know (and I can't see how the program is going to know) how to find attachments for an individual message.
PGO4 records the full name (including path) for each attachment and a link to the message it was posted in. It is planned to display a link to each attachment when displaying a message with attachments in PGO4, but this feature is not implemented yet.

SMF spam blocked by CleanTalk