I have a process in Talend which gets the search result of a page, saves the html and writes it into files, as seen here:
Initially I had a two step process with parsing out the date from the HTML files in Java. Here is the code: It works and writes it to a mysql database. Here is the code which basically does exactly that. (I'm a beginner, sorry for the lack of elegance)
package org.jsoup.examples; import java.io.*; import org.jsoup.*; import org.jsoup.nodes.*; import org.jsoup.select.Elements; import java.io.IOException; public class parse2 { static parse2 parseIt2 = new parse2(); String companyName = "Platzhalter"; String jobTitle = "Platzhalter"; String location = "Platzhalter"; String timeAdded = "Platzhalter"; public static void main(String[] args) throws IOException { parseIt2.getData(); } // public void getData() throws IOException { Document document = Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"), "utf-8"); Elements elements = document.select(".joblisting"); for (Element element : elements) { // Parse Data into Elements Elements jobTitleElement = element.select(".job_title span"); Elements companyNameElement = element.select(".company_name span[itemprop=name]"); Elements locationElement = element.select(".locality span[itemprop=addressLocality]"); Elements dateElement = element.select(".job_date_added [datetime]"); // Strip Data from unnecessary tags String companyName = companyNameElement.text(); String jobTitle = jobTitleElement.text(); String location = locationElement.text(); String timeAdded = dateElement.attr("datetime"); System.out.println("Firma:\t"+ companyName + "\t" + jobTitle + "\t in:\t" + location + " \t Erstellt am \t" + timeAdded ); } } }
Now I want to do the process End-to-End in Talend, and I got assured this works. I tried this (which looks quite shady to me):
Basically I put all imports in "advanced settings" and the code in the "basic settings" section. This importLibrary is thought to load the jsoup parsing library, as well as the mysql connect (i might to the connect with talend tools though).
Obviously this isn't working. I tried to strip the Base Code from classes and stuff and it was even worse. Can you help me how to get the generated .txt files parsed with Java here?
EDIT: Here is the Link to the talend Job http://www.share-online.biz/dl/8M5MD99NR1
EDIT2: I changed the code to the one I tried in JavaFlex. But it didn't work (the import part in the start part of the code, the rest in "body/main" and nothing in "end".
In Talend, you can load text file data into the database table in two ways. Drag and drop the tFileInputDelimited and browse the Text file, and create a schema (or column names) for that text file. Create metadata for the text file and use that File Delimited metadata.
Encoding: By default, Talend will select a suitable Encoding. However, you can use the drop-down button to select the one. Field Separator: Please choose the field that separates each column in your text file. If the desired separator is not available in the option, select the Custom and use Corresponding Character option to place the separator.
Thanks for sharing your experience of using tMap component with us (exactly, Talend generates Java code). In addition, for the user routine mentioned by @boulayj, please refer to How+to+create+user+routines and Calling+a+routine+from+a+Job
First, drag and drop the EmpInfo from the File Delimited folder into the Talend Job design. From the below screenshot, you can see that the file Component properties are using the Repository values. Next, drag and drop the tDBConnection, tDBCommit, and tDBOutput from Palette to Job design space. Here, you can use only tDBOutput also.
This is a problem related to Talend, in your code, use the complete method names including their packages. For your document parsing for example, you can use :
Document document = org.jsoup.Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"), "utf-8");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With