We thought* only python has good NLP support, today explored NLP Java support for data extraction. For data extraction - jsoup(for reading web pages) , bliki( for reading Wikipedia like documents), PDFbox for extracting data from PDF files, opencsv , Jackson (we used already) . Will put them in podcast so we can go over whenever needed
No comments:
Post a Comment