仅在内容具有逗号时才parsing适用于文本限定符的Java中的CSV
我有一个CSV文件,其内容如下:
1,"hello, there",I have a csv in which,"only when ""double quote"" or comma are there in the content",it will be wrapped in the double quotes,otherwise not,something like 1/2" will not be wrapped up in double quotes.
我使用OpenCSV和其他CSV库进行分析,但没有奏效。
我用StackOverflow问题中引用的正则expression式,但它也没有工作。
但是,当我在Excel中打开它工作正常。 有人可以给我一个关于如何parsing这个CSV文件的提示。
请注意,当内容包含逗号时,只有它包含在文本限定符中。 当这样的内容包含在双引号中,并且双引号是内容的一部分时,则使用双引号将其转义。 换句话说,它变成了双重双引号。 但是如果内容有双引号,那么它不会被包含在文本限定符中。
请告知这一点。
上述内容的parsing输出如下:
输出应如下所示:
1 hello, there I have a csv in which only whn "double quote" or comma are there in the content it will be wrapped in the double quotes otherwise not something like 1/2" will not be wrapped up in double quotes.
我试过使用开放的CSV,也尝试使用正则expression式拆分:
",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
但没用。
我的数据如下所示:
PRODUCT,,1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE,P,2510906459,,DEWALT TOOLS,,,<br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL,
希望这个parsing如下 (当我们在Excel中看到它时,我用它来表示一个空单元格)
PRODUCT <BLANK> 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE P 2510906459 <BLANK> DEWALT TOOLS <BLANK> <BLANK> <br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL
我用uniVocityparsing器parsing你的input没有问题:
String input = "PRODUCT,,1/2\" 18V CORDLESS XRP LI-LON DRILL/DRIVE,P,2510906459,,DEWALT TOOLS,,,<br><img src=\"http://example.com/image.png\"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2\" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL,"; Reader reader = new StringReader(input); CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial. settings.setNullValue("<BLANK>"); //use that to obtain <BLANK> to represent nulls String[] row = new CsvParser(settings).parseAll(reader).get(0); for(String element : row){ System.out.println(element); }
输出:
PRODUCT <BLANK> 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVE P 2510906459 <BLANK> DEWALT TOOLS <BLANK> <BLANK> <br><img src="http://example.com/image.png"><br><br><p><b>UNIT OF MEASURE: EA<br><br> QTY PER UNIT OF MEASURE: 1<br><br> MINIMUM ORDER QUANTITY: 1<br></P></b>DEWALT TOOLS DCD960KL - 1/2" 18V CORDLESS XRP LI-LON DRILL/DRIVER KIT - XRP™ CORDLESS DRILLS - BEST IN CLASS LENGTH FOR IMPROVED BALANCE AND BETTER CONTROL|LED WORKLIGHT PROVIDES INCREASED VISIBILITY IN CONFINED SPACES|PATENTED 3-SPEED ALL-METAL TRANSMISSION MATCHES THE TOOL TO TASK FOR FASTEST APPLICATION SPEED AND IMPROVED - EQUAL TO 115-DCD960KL <BLANK>
免责声明:我是这个库的作者,它是开放源码和免费的(Apache 2.0许可证)
尝试遵循正则expression式:
Stream<String> lines = Files.lines(Paths.get("path to csv file")); Pattern regex = Pattern.compile("\"(.*?)\"(?=,|$)|(?<=(?:,|^))(.*?)(?=,|$)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE); lines.forEach( line -> { Matcher matcher = regex.matcher(line); while (matcher.find()) { String content = matcher.group(1) == null ? matcher.group() : matcher.group(1); System.out.println(content); } });
基于示例input文本
1,"hello, there",I have a csv in which, "only when ""double quote"" or comma are there in the content", it will be wrapped in the double quotes,otherwise not, something like 1/2" will not be wrapped up in double quotes.
它会发射。
1 hello, there I have a csv in which only when ""double quote"" or comma are there in the content it will be wrapped in the double quotes otherwise not something like 1/2" will not be wrapped up in double quotes.