Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open a .webarchive Modify It and Save It

I'm developing an app for Lion and what I want to do is open a .webarchive file, modify a snippet of the DOM, and then write out the modified DOM to the same file.

Here is my code thus far. It opens the webarchive, modifies it, and then saves it back to the file.

    NSString *archivePath = @"/Users/tigger/Library/Mail/V2/MailData/Signatures/1216DD8D-C7E2-4DE1-9FCD-0A9A3412C788.webarchive";
    NSData *plistData = [NSData dataWithContentsOfFile:archivePath];
    NSString *error;
    NSPropertyListFormat format;
    NSMutableDictionary *plist;

    plist = (NSMutableDictionary *)[NSPropertyListSerialization propertyListFromData:plistData
                                             mutabilityOption:NSPropertyListMutableContainersAndLeaves
                                                       format:&format
                                             errorDescription:&error];
    if(!plist){
        printf("no plist");
        [error release];
    }else{
        NSString *s = [NSString stringWithUTF8String:[[[plist objectForKey:@"WebMainResource"] objectForKey:@"WebResourceData"] bytes]];
        NSString *new = [s stringByReplacingOccurrencesOfString:@"</body>" withString:@"hey there!</body>"];

        [[plist objectForKey:@"WebMainResource"] setObject:new forKey:@"WebResourceData"];
        printf("Archive: %s", [[plist description] UTF8String]);       
        NSData *data = [NSPropertyListSerialization dataFromPropertyList:plist format:NSPropertyListBinaryFormat_v1_0 errorDescription:nil];
        [data writeToURL:[NSURL fileURLWithPath:@"/Users/tigger/Library/Mail/V2/MailData/Signatures/test.webarchive"] atomically:YES];

    }

The problem is that the resulting webarchive is invalid. The original looks like this:

bplist00—_WebMainResource’  
_WebResourceTextEncodingName_WebResourceFrameName^WebResourceURL_WebResourceData_WebResourceMIMETypeUUTF-8PUdata:O<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Dan Shipper</div><div>[email protected]</div><div><br></div></body></span><br class="Apple-interchange-newline">Ytext/html(F]l~îöõ°™
¥

While the resulting webarchive looks like this:

bplist00—_WebMainResource’  
^WebResourceURL_WebResourceFrameName_WebResourceMIMEType_WebResourceData_WebResourceTextEncodingNameUdata:PYtext/html_<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Dan Shipper</div><div>[email protected]</div><div><br></div>hey there!</body></span><br class="Apple-interchange-newline">UUTF-8(7Ndvîöõ•∏
æ

Anyone have any ideas on why it's invalid or how to fix it? Thanks so much for your help!

I've also tried to use the textutil convert command to generate the webarchive, but it doesn't work because in my original HTML file I have an image like this:

<img src="http://www.domainpolish.com/images/crowd.png">

But when I use textutil it downloads the image and saves it like this:

<img src"file:///1.png">

Even though I don't want it to download or change the url. I've used the noload, nostore and baseurl options to no avail.

EDIT: Fixed it!! So the problem was that I was when I was replacing the HTML I was inserting it as an NSString instead of an NSData:

NSString *s = [NSString stringWithUTF8String:[[[plist objectForKey:@"WebMainResource"] objectForKey:@"WebResourceData"] bytes]];
NSString *new = [s stringByReplacingOccurrencesOfString:@"</body>" withString:@"hi there!</body>"];
NSData *sourceData = [new dataUsingEncoding:NSUTF8StringEncoding];
[[plist objectForKey:@"WebMainResource"] setObject:sourceData forKey:@"WebResourceData"];
like image 378
dshipper Avatar asked Oct 27 '11 14:10

dshipper


People also ask

Can you edit WEBARCHIVE?

Please note that not all metadata can be modified. For instance, media type, collection, uploader, identifier, and some other fields can only be modified by admins at Internet Archive. Yes, you can only add, edit or delete your own files.

How do I save a WEBARCHIVE file?

Open the web page that you want to save in your browser. Click on Save Page to Wayback Machine in the bookmarks toolbar. Wait while the page is being crawled. Once the archiving process is complete, the URL of the archived page appears.

How do I convert a WEBARCHIVE to PDF on a Mac?

If you want to convert a collection of webarchive files into a single PDF document, just select multiple Webarchive files, drag and drop them into the Home window. Alternatively, you can also go to the "File" menu and choose "Create" > "PDF from File...".

How do I access Webarchive files?

You can download the extracted HTML files. You can open a Webarchive file to view from your computer or your Google Drive. The webarchive file format is available on macOS and Windows for saving and reviewing complete web pages using the Safari web browser.


1 Answers

Update: I just re-read the question and saw the solution...

You are replacing the main resource data with the wrong object in this line:

[[plist objectForKey:@"WebMainResource"] setObject:new forKey:@"WebResourceData"];

new is a NSString where it you should be a NSData object:

After the replacement, you should convert the string content to binary data.

[[plist objectForKey:@"WebMainResource"] setObject:[new dataUsingEncoding:NSUTF8StringEncoding] forKey:@"WebResourceData"];
like image 58
Laurent Etiemble Avatar answered Oct 09 '22 10:10

Laurent Etiemble