Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java : Compare, mark and interpret HTML texts in Java

I am working on a Java project, where there is an HTML editor and the user can enter text in a html-editor(ckeditor) and the actual HTML text is saved in the database.

Now when a user comes again next time, and edits the same text, I would like to show the difference between the two by comparing it from the database.

The most important problem I am facing is, even if any comparator-tool knows that style from Italic has changed to Bold, the output of the comparator is it strike-throughs the word Italic and shows Bold was inserted in place of that.

But that doesn't explain the Intention or Action of the actual edit. The intention/action was that the user made it from Italic to Bold. What I am looking for is a tool, which instead of showing that the word Italic was removed and Bold was added in place of that, would show me the Italic word/sentence first which is strikethrough and the replacement by the Bold word/sentence.

I hope what I mean is clear. I have been trying to achieve this for quite some time. I tried diff_match_patch, daisydiff, etc, nothing helped.

My trials :

/*

            String oldTextHtml = mnotes1.getMnotetext();
            String newTextHTML = mnotes.getMnotetext();


            oldTextHtml = oldTextHtml.replace("<br>","\n");
            oldTextHtml = Jsoup.clean(oldTextHtml, Whitelist.basic());
           oldTextHtml = Jsoup.parse(oldTextHtml).text();

            newTextHTML = newTextHTML.replace("<br>","\n");
            newTextHTML = Jsoup.clean(newTextHTML,Whitelist.basic());
            newTextHTML = Jsoup.parse(newTextHTML).text();


            diff_match_patch diffMatchPatch = new diff_match_patch();
            LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(oldTextHtml, newTextHTML);
            diffMatchPatch.diff_cleanupSemantic(deltas);
            newText += diffMatchPatch.diff_prettyHtml(deltas);
            groupNoteHistory.setWhatHasChanged("textchange");
            groupNoteHistory.setNewNoteText(newText);
            noEdit = true;
*/


           List<String> oldTextList = Arrays.asList(mnotes1.getMnotetext().split("(\\.|\\n)"));
            List<String> newTextList = Arrays.asList(mnotes.getMnotetext().split("(\\.|\\n)"));
            if (oldTextList.size() == newTextList.size()) {

                for (int current = 0; current < oldTextList.size(); current++) {
                    if (isLineDifferent(oldTextList.get(current), newTextList.get(current))) {
                        noEdit = true;
                        diff_match_patch diffMatchPatch = new diff_match_patch();
                        LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(oldTextList.get(current), newTextList.get(current));
                        diffMatchPatch.diff_cleanupSemantic(deltas);
                        newText += diffMatchPatch.diff_prettyHtml(deltas);
                        groupNoteHistory.setWhatHasChanged("textchange");
                        groupNoteHistory.setNewNoteText(newText);
                    }
                }
            } else {
                if (!(mnotes.getMnotetext().equals(mnotes1.getMnotetext()))) {
                    if (isLineDifferent(mnotes1.getMnotetext(), mnotes.getMnotetext())) {
                        diff_match_patch diffMatchPatch = new diff_match_patch();

                        LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(mnotes1.getMnotetext(),
                                mnotes.getMnotetext());
                        diffMatchPatch.diff_cleanupSemantic(deltas);
                        newText += diffMatchPatch.diff_prettyHtml(deltas);
                        groupNoteHistory.setWhatHasChanged("textchange");
                        noEdit = true;
                    }
                    groupNoteHistory.setNewNoteText(newText);
                    groupNoteHistory.setWhatHasChanged("textchange");
                }
            }

If anyone has any idea how I can achieve this, kindly let me know. Thanks a lot. :-)

Edit

I was asked for an image. Explanation and then the image.

Old text : <style= bold>Hello</style>
new Text : <style = Italic>Hello</style>

Difference output expected :

As in this image.

like image 788
We are Borg Avatar asked Nov 30 '15 10:11

We are Borg


People also ask

How do I convert HTML text to normal text in Java?

Just call the method html2text with passing the html text and it will return plain text.

How do I compare HTML files?

Compare two HTML documents To compare HTML s and verify how our Java library works, simply load the files you want to diff and select the export file format. After comparing two files, the document containing the difference of this comparison will be automatically loaded.

How do you compare codes in Java?

You can compare two Strings in Java using the compareTo() method, equals() method or == operator. The compareTo() method compares two strings. The comparison is based on the Unicode value of each character in the strings.


1 Answers

Recently I did a Probe of concept about an open source library that implements the diff command on java, and many other features.

Basically I compared two java files and get the modified lines between them, with that information I think it would be easy to achieve what you want.

Basically I have two java files under src/test/resources/files folder

File1

package com.onuba.car.javadiff;

import difflib.Chunk;
import difflib.Delta;
import difflib.DiffUtils;
import difflib.Patch;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class FileComparator {

    private final File original;

    private final File revised;

    public FileComparator(File original, File revised) {
        this.original = original;
        this.revised = revised;
    }

    public List<Chunk> getChangesFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.CHANGE);
    }

    public List<Chunk> getInsertsFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.INSERT);
    }

    public List<Chunk> getDeletesFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.DELETE);
    }

    private List<Chunk> getChunksByType(Delta.TYPE type) throws IOException {
        final List<Chunk> listOfChanges = new ArrayList<Chunk>();
        final List<Delta> deltas = getDeltas();
        for (Delta delta : deltas) {
            if (delta.getType() == type) {
                listOfChanges.add(delta.getRevised());
            }
        }
        return listOfChanges;
    }

    private List<Delta> getDeltas() throws IOException {

        final List<String> originalFileLines = fileToLines(original);
        final List<String> revisedFileLines = fileToLines(revised);

        final Patch patch = DiffUtils.diff(originalFileLines, revisedFileLines);

        return patch.getDeltas();
    }

    private List<String> fileToLines(File file) throws IOException {
        final List<String> lines = new ArrayList<String>();
        String line;
        final BufferedReader in = new BufferedReader(new FileReader(file));
        while ((line = in.readLine()) != null) {
            lines.add(line);
        }

        return lines;
    }

    <style= bold>Hello</style>

}

File2

package com.onuba.car.javadiff;

import difflib.Chunk;
import difflib.Delta;
import difflib.DiffUtils;
import difflib.Patch;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class FileComparator {

    private final File original;

    private final File revised;

    public FileComparator(File original, File revised) {
        this.original = original;
        this.revised = revised;
    }

    public List<Chunk> getChangesFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.CHANGE);
    }

    public List<Chunk> getInsertsFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.INSERT);
    }

    public List<Chunk> getDeletesFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.DELETE);
    }

    private List<Chunk> getChunksByType(Delta.TYPE type) throws IOException {
        final List<Chunk> listOfChanges = new ArrayList<Chunk>();
        final List<Delta> deltas = getDeltas();
        for (Delta delta : deltas) {
            if (delta.getType() == type) {
                listOfChanges.add(delta.getRevised());
            }
        }
        return listOfChanges;
    }

    private List<Delta> getDeltas(String nuevoParam) throws IOException {

        final List<String> originalFileLines = fileToLines(original);
        final List<String> revisedFileLines = fileToLines(revised);

        final Patch patch = DiffUtils.diff(originalFileLines, revisedFileLines);

        return patch.getDeltas();
    }

    private List<String> fileToLines(File file, String nuevoParam) throws IOException {
        final List<String> lines = new ArrayList<String>();
        String line;
        final BufferedReader in = new BufferedReader(new FileReader(file));
        while ((line = in.readLine()) != null) {
            lines.add(line);
        }

        return lines;
    }

    <style = Italic>Hello</style>

    private void nuevoMetodoCool(File file) {

    }

}

A brief FileComparator class (remember it was a POC :D)

package com.onuba.car.javadiff;

import difflib.Chunk;
import difflib.Delta;
import difflib.DiffUtils;
import difflib.Patch;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class FileComparator {

    private final File original;

    private final File revised;

    public FileComparator(File original, File revised) {
        this.original = original;
        this.revised = revised;
    }

    public List<Chunk> getChangesFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.CHANGE);
    }

    public List<Chunk> getInsertsFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.INSERT);
    }

    public List<Chunk> getDeletesFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.DELETE);
    }

    private List<Chunk> getChunksByType(Delta.TYPE type) throws IOException {
        final List<Chunk> listOfChanges = new ArrayList<Chunk>();
        final List<Delta> deltas = getDeltas();
        for (Delta delta : deltas) {
            if (delta.getType() == type) {
                listOfChanges.add(delta.getRevised());
            }
        }
        return listOfChanges;
    }

    private List<Delta> getDeltas() throws IOException {

        final List<String> originalFileLines = fileToLines(original);
        final List<String> revisedFileLines = fileToLines(revised);

        final Patch patch = DiffUtils.diff(originalFileLines, revisedFileLines);

        return patch.getDeltas();
    }

    private List<String> fileToLines(File file) throws IOException {
        final List<String> lines = new ArrayList<String>();
        String line;
        final BufferedReader in = new BufferedReader(new FileReader(file));
        while ((line = in.readLine()) != null) {
            lines.add(line);
        }

        return lines;
    }

}

And a Junit for it

package com.onuba.car.javadiff.test;

import static org.junit.Assert.fail;

import java.io.File;
import java.io.IOException;
import java.util.List;

import org.junit.Test;

import com.everis.car.javadiff.FileComparator;

import difflib.Chunk;

public class FileComparatorTest {

    private final File original = new File("./src/test/resources/files/FileComparatorv1.java");

    private final File revised = new File("./src/test/resources/files/FileComparatorv2.java");

    @Test
    public void shouldGetChangesBetweenFiles() {

        final FileComparator comparator = new FileComparator(original, revised);

        try {
            final List<Chunk> changesFromOriginal = comparator.getChangesFromOriginal();

            final int changeNum = changesFromOriginal.size();
            System.out.println("Tamaño de cambios: " + changeNum);

            for (int i = 0; i < changeNum; i++) {

                final Chunk change = changesFromOriginal.get(i);
                final int firstLineOfFirstChange = change.getPosition() + 1;
                final int changeSize = change.size();
                //final String changeText = change.getLines().get(0).toString();

                System.out.println("Cambio nº " + i);
                System.out.println("firstLineOfFirstChange: " + firstLineOfFirstChange);
                System.out.println("changeSize: " + changeSize);
                System.out.println("change text: ");
                showTest(change.getLines());

            }

            /*assertEquals(3, changesFromOriginal.size());

            final Chunk firstChange = changesFromOriginal.get(0);
            final int firstLineOfFirstChange = firstChange.getPosition() + 1;
            final int firstChangeSize = firstChange.size();
            assertEquals(2, firstLineOfFirstChange);
            assertEquals(1, firstChangeSize);
            final String firstChangeText = firstChange.getLines().get(0).toString();
            assertEquals("Line 3 with changes", firstChangeText);

            final Chunk secondChange = changesFromOriginal.get(1);
            final int firstLineOfSecondChange = secondChange.getPosition() + 1;
            final int secondChangeSize = secondChange.size();
            assertEquals(4, firstLineOfSecondChange);
            assertEquals(2, secondChangeSize);
            final String secondChangeFirstLineText = secondChange.getLines().get(0).toString();
            final String secondChangeSecondLineText = secondChange.getLines().get(1).toString();
            assertEquals("Line 5 with changes and", secondChangeFirstLineText);
            assertEquals("a new line", secondChangeSecondLineText);

            final Chunk thirdChange = changesFromOriginal.get(2);
            final int firstLineOfThirdChange = thirdChange.getPosition() + 1;
            final int thirdChangeSize = thirdChange.size();
            assertEquals(11, firstLineOfThirdChange);
            assertEquals(1, thirdChangeSize);
            final String thirdChangeText = thirdChange.getLines().get(0).toString();
            assertEquals("Line 10 with changes", thirdChangeText);*/

        } catch (IOException ioe) {
            fail("Error running test shouldGetChangesBetweenFiles " + ioe.toString());
        }
    }

    @Test
    public void shouldGetInsertsBetweenFiles() {

        final FileComparator comparator = new FileComparator(original, revised);

        try {
            final List<Chunk> insertsFromOriginal = comparator.getInsertsFromOriginal();

            final int changeNum = insertsFromOriginal.size();
            System.out.println("Tamaño de inserciones: " + changeNum);

            for (int i = 0; i < changeNum; i++) {

                final Chunk change = insertsFromOriginal.get(i);
                final int firstLineOfFirstChange = change.getPosition() + 1;
                final int changeSize = change.size();
                //final String changeText = change.getLines().get(0).toString();

                System.out.println("insercion nº " + i);
                System.out.println("firstLineOfFirstInsertion: " + firstLineOfFirstChange);
                System.out.println("insertion Size: " + changeSize);
                System.out.println("insertion text: ");
                showTest(change.getLines());

            }
        } catch (IOException ioe) {
            fail("Error running test shouldGetInsertsBetweenFiles " + ioe.toString());
        }
        /*try {
            final List<Chunk> insertsFromOriginal = comparator.getInsertsFromOriginal();
            assertEquals(1, insertsFromOriginal.size());

            final Chunk firstInsert = insertsFromOriginal.get(0);
            final int firstLineOfFirstInsert = firstInsert.getPosition() + 1;
            final int firstInsertSize = firstInsert.size();
            assertEquals(7, firstLineOfFirstInsert);
            assertEquals(1, firstInsertSize);
            final String firstInsertText = firstInsert.getLines().get(0).toString();
            assertEquals("new line 6.1", firstInsertText);

        } catch (IOException ioe) {
            fail("Error running test shouldGetInsertsBetweenFiles " + ioe.toString());
        }*/
    }

    @Test
    public void shouldGetDeletesBetweenFiles() {

        final FileComparator comparator = new FileComparator(original, revised);

        try {
            final List<Chunk> deletesFromOriginal = comparator.getDeletesFromOriginal();

            final int changeNum = deletesFromOriginal.size();
            System.out.println("Tamaño de deletes: " + changeNum);

            for (int i = 0; i < changeNum; i++) {

                final Chunk change = deletesFromOriginal.get(i);
                final int firstLineOfFirstChange = change.getPosition() + 1;
                final int changeSize = change.size();
                //final String changeText = change.getLines().get(0).toString();

                System.out.println("delete nº " + i);
                System.out.println("firstLineOfFirstDelete: " + firstLineOfFirstChange);
                System.out.println("delete Size: " + changeSize);
                System.out.println("delete text: ");
                showTest(change.getLines());

            }
        } catch (IOException ioe) {
            fail("Error running test shouldGetInsertsBetweenFiles " + ioe.toString());
        }

        /*try {
            final List<Chunk> deletesFromOriginal = comparator.getDeletesFromOriginal();
            assertEquals(1, deletesFromOriginal.size());

            final Chunk firstDelete = deletesFromOriginal.get(0);
            final int firstLineOfFirstDelete = firstDelete.getPosition() + 1;
            assertEquals(1, firstLineOfFirstDelete);

        } catch (IOException ioe) {
            fail("Error running test shouldGetDeletesBetweenFiles " + ioe.toString());
        }*/
    }

    private void showTest(List<?> texts) {

        if (texts != null) {
            for (Object s : texts) {
                System.out.println(s.toString());
            }
        }
    }
}

Finally my pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.onuba.car</groupId>
    <artifactId>javadiffpoc</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    <name>JavaDiff ::  POC</name>

    <url>http://maven.apache.org</url>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
            <scope>test</scope>
        </dependency>

        <!-- GUAVA -->
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>15.0</version>
        </dependency>

        <dependency>
            <groupId>com.googlecode.java-diff-utils</groupId>
            <artifactId>diffutils</artifactId>
            <version>1.2.1</version>
        </dependency>

        <!-- Logger -->
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-access</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-core</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.6.4</version>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.4</version>
            </plugin>
        </plugins>
    </build>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
</project>

Sorry about some logs and some little stuffs in Spanish :D, maybe with that you can achieve what you want.

The lib home page: https://code.google.com/p/java-diff-utils/ There is a tutorial link at the end of the page (in Spanish)

Hope helps!

UPDATE

I did a simple class that generate a file with the differences as strike out lines with this code (I don´t understand quite well your desired format, you can add more decorators if you need)

package com.onuba.car.javadiff;

import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.RandomAccessFile;
import java.util.ArrayList;
import java.util.List;

import difflib.Chunk;

public class Comparer {

    private final File original = new File("./src/test/resources/files/FileComparatorv1.java");

    private final File revised = new File("./src/test/resources/files/FileComparatorv2.java");

    public static void main(String[] args) {

        final Comparer comparer = new Comparer();

        comparer.createDiffFile();
    }

    private void createDiffFile() {

        PrintWriter diffFile = null;
        //RandomAccessFile diffFile = null;
        RandomAccessFile oldFile = null;

        try {

            //diffFile = new RandomAccessFile(new File("./diffFile_" + System.currentTimeMillis()), "rw");
            diffFile = new PrintWriter("./diffFile_" + System.currentTimeMillis(), "UTF-8");
            oldFile = new RandomAccessFile(original, "r");

            final FileComparator comparator = new FileComparator(original, revised);

            final List<Chunk> changesFromOriginal = comparator.getChangesFromOriginal();

            final int changeNum = changesFromOriginal.size();
            System.out.println("Tamaño de cambios: " + changeNum);

            final List<Integer> changesIndex = new ArrayList<Integer>();

            for (Chunk change : changesFromOriginal) {

                changesIndex.add(change.getPosition());
            }

            String line = oldFile.readLine();
            int lineIndex = 0;
            while (line != null) {

                if (changesIndex.contains(lineIndex)) {

                    String strikeLine = "From: <strike-through color=yellow>" + line + "</strike-through>"; 
                diffFile.print(strikeLine + " To: <strong>");

                for (Object s : changesFromOriginal.get(changesIndex.indexOf(lineIndex)).getLines()) {
                    diffFile.println(s.toString());
                }
                diffFile.print("</strong>");

                } else {

                    diffFile.println(line);
                }

                line = oldFile.readLine();
                lineIndex++;
            }

        } catch (IOException e) {

        } finally {
            try {
                if (diffFile != null) {
                    diffFile.close();
                }

                if (oldFile != null) {
                    oldFile.close();
                }
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
    }
}

The output file is

package com.onuba.car.javadiff;

import difflib.Chunk;
import difflib.Delta;
import difflib.DiffUtils;
import difflib.Patch;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class FileComparator {

    private final File original;

    private final File revised;

    public FileComparator(File original, File revised) {
        this.original = original;
        this.revised = revised;
    }

    public List<Chunk> getChangesFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.CHANGE);
    }

    public List<Chunk> getInsertsFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.INSERT);
    }

    public List<Chunk> getDeletesFromOriginal() throws IOException {
        return getChunksByType(Delta.TYPE.DELETE);
    }

    private List<Chunk> getChunksByType(Delta.TYPE type) throws IOException {
        final List<Chunk> listOfChanges = new ArrayList<Chunk>();
        final List<Delta> deltas = getDeltas();
        for (Delta delta : deltas) {
            if (delta.getType() == type) {
                listOfChanges.add(delta.getRevised());
            }
        }
        return listOfChanges;
    }

From: <strike-through color=yellow>    private List<Delta> getDeltas() throws IOException {</strike-through> To: <strong>    private List<Delta> getDeltas(String nuevoParam) throws IOException {
</strong> 
        final List<String> originalFileLines = fileToLines(original);
        final List<String> revisedFileLines = fileToLines(revised);

        final Patch patch = DiffUtils.diff(originalFileLines, revisedFileLines);

        return patch.getDeltas();
    }

From: <strike-through color=yellow>    private List<String> fileToLines(File file) throws IOException {</strike-through> To: <strong>    private List<String> fileToLines(File file, String nuevoParam) throws IOException {
</strong>        final List<String> lines = new ArrayList<String>();
        String line;
        final BufferedReader in = new BufferedReader(new FileReader(file));
        while ((line = in.readLine()) != null) {
            lines.add(line);
        }

        return lines;
    }

From: <strike-through color=yellow>    <style= bold>Hello</style></strike-through> To: <strong>    <style = Italic>Hello</style>

    private void nuevoMetodoCool(File file) {

    }
</strong> 
}

Is that usefull for you?

like image 70
Francisco Hernandez Avatar answered Oct 24 '22 11:10

Francisco Hernandez