仅在内容具有逗号时才parsing适用于文本限定符的Java中的CSV

我有一个CSV文件,其内容如下:

1,"hello, there",I have a csv in which,"only when ""double quote"" or comma are there in the content",it will be wrapped in the double quotes,otherwise not,something like 1/2" will not be wrapped up in double quotes. 

我使用OpenCSV和其他CSV库进行分析,但没有奏效。

我用StackOverflow问题中引用的正则expression式,但它也没有工作。

但是,当我在Excel中打开它工作正常。 有人可以给我一个关于如何parsing这个CSV文件的提示。

请注意,当内容包含逗号时,只有它包含在文本限定符中。 当这样的内容包含在双引号中,并且双引号是内容的一部分时,则使用双引号将其转义。 换句话说,它变成了双重双引号。 但是如果内容有双引号,那么它不会被包含在文本限定符中。

请告知这一点。

上述内容的parsing输出如下:

输出应如下所示:

 1 hello, there I have a csv in which only whn "double quote" or comma are there in the content it will be wrapped in the double quotes otherwise not something like 1/2" will not be wrapped up in double quotes. 

我试过使用开放的CSV,也尝试使用正则expression式拆分:

 ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)" 

但没用。

我的数据如下所示:

 PRODUCT,,1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE,P,2510906459,,DEWALT TOOLS,,,<br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL, 

希望这个parsing如下 (当我们在Excel中看到它时,我用它来表示一个空单元格)

 PRODUCT <BLANK> 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE P 2510906459 <BLANK> DEWALT TOOLS <BLANK> <BLANK> <br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL 

我用uniVocityparsing器parsing你的input没有问题:

  String input = "PRODUCT,,1/2\" 18V CORDLESS XRP LI-LON DRILL/DRIVE,P,2510906459,,DEWALT TOOLS,,,<br><img src=\"http://example.com/image.png\"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2\" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL,"; Reader reader = new StringReader(input); CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial. settings.setNullValue("<BLANK>"); //use that to obtain <BLANK> to represent nulls String[] row = new CsvParser(settings).parseAll(reader).get(0); for(String element : row){ System.out.println(element); } 

输出:

 PRODUCT <BLANK> 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE P 2510906459 <BLANK> DEWALT TOOLS <BLANK> <BLANK> <br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL <BLANK> 

免责声明:我是这个库的作者,它是开放源码和免费的(Apache 2.0许可证)

尝试遵循正则expression式:

 Stream<String> lines = Files.lines(Paths.get("path to csv file")); Pattern regex = Pattern.compile("\"(.*?)\"(?=,|$)|(?<=(?:,|^))(.*?)(?=,|$)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); lines.forEach( line -> { Matcher matcher = regex.matcher(line); while (matcher.find()) { String content = matcher.group(1) == null ? matcher.group() : matcher.group(1); System.out.println(content); } }); 

基于示例input文本

 1,"hello, there",I have a csv in which, "only when ""double quote"" or comma are there in the content", it will be wrapped in the double quotes,otherwise not, something like 1/2" will not be wrapped up in double quotes. 

它会发射。

 1 hello, there I have a csv in which only when ""double quote"" or comma are there in the content it will be wrapped in the double quotes otherwise not something like 1/2" will not be wrapped up in double quotes.