如何在Java中读写非英文字符(特殊字符如Marathi,Tamil,Hindi等)?

从Excel文件中读取非英文字符假设读取Marathi语言,然后将此语言写入XML文件。 当我从Excel中读取这个Marathi语言并在Java代码中检查时,它显示的是完全的Marathi语言,但是在阅读完这些代码后,我通过Java代码将其写入到XML中,得到了与此Marathi语言相对应的一些符号。 所以请build议我如何处理这种情况。 请find相同的代码。

public void excelToXML(String path) { FileWriter fostream; PrintWriter out = null; String strOutputPath = "C:\\Temp\\"; try { File file = new File(path); InputStream inputStream = new FileInputStream(file); Workbook wb = WorkbookFactory.create(inputStream); List<String> sheetNames = new ArrayList<String>(); for (int i = 0; i < wb.getNumberOfSheets(); i++) { sheetNames.add(wb.getSheetName(i)); } fostream = new FileWriter(strOutputPath + "\\" + "iTicker" + ".xml"); out = new PrintWriter(new BufferedWriter(fostream)); // out.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>"); out.println("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"); out.println("<root xmlns:xsi=\"http://www.w3.org/3921/XMLSchema-instance\">"); for (String sheetName : sheetNames) { if(sheetName.equals("Sheet3")){ System.out.println(sheetName); break; } Sheet sheet = wb.getSheet(sheetName); boolean firstRow = true; ArrayList<String> myStringArray = new ArrayList<String>(); Iterator<Cell> cells = sheet.getRow(0).cellIterator(); while (cells.hasNext()) { myStringArray.add(cells.next().toString()); } for (Row row : sheet) { if (firstRow == true) { firstRow = false; continue; } if (!sheetName.equals("Sheet1")) { out.println("\t<element>"); } for (int i = 0; i < myStringArray.size(); i++) { if (row.getCell(i) != null && !(row.getCell(i)).toString().isEmpty() && row.getCell(i).toString().length() > 0) { if(!(myStringArray.get(i) != null && myStringArray.get(i).toString().equals("Start_Epoch_Time") || myStringArray.get(i).toString().equals("End_Epoch_Time"))){ out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i)))); } else{ long ePochValue=EpochConverter.getepochValue(row.getCell(i).toString()); out.println(formatElement("\t\t", myStringArray.get(i), String.valueOf(ePochValue))); } } else { blankValues.add(sheetName +":" + "column header" +":" +myStringArray.get(i)+":"+"row no:"+row.getRowNum()+" " +"is blank."); } } if (!sheetName.equals("Sheet1")) { out.println("\t</element>"); } } } out.write("</root>"); out.flush(); out.close(); if(blankValues != null && blankValues.size() >0){ FileUploadController.writeErrorLog(blankValues + "Please fill all the mandatory values."); } } catch (Exception e) { new DTHException(e.getMessage()); e.printStackTrace(); } } private static String formatCell(Cell cell) { if (cell == null) { return ""; } switch (cell.getCellType()) { case Cell.CELL_TYPE_BLANK: return ""; case Cell.CELL_TYPE_BOOLEAN: return Boolean.toString(cell.getBooleanCellValue()); case Cell.CELL_TYPE_ERROR: return "*error*"; case Cell.CELL_TYPE_NUMERIC: return df.format(cell.getNumericCellValue()); case Cell.CELL_TYPE_STRING: return cell.getStringCellValue(); default: return "<unknown value>"; } } private static String formatElement(String prefix, String tag, String value) { StringBuilder sb = new StringBuilder(prefix); sb.append("<"); sb.append(tag); if (value != null && value.length() > 0) { sb.append(">"); sb.append(value); sb.append("</"); sb.append(tag); sb.append(">"); } else { sb.append("/>"); } return sb.toString(); } 

在下面的行中,当检查这个row.getCell(i)值时,我得到了确切的Marathi值,但是在写入这个值之后得到不同的输出。

out.println(formatElement(“\ t \ t”,myStringArray.get(i),formatCell(row.getCell(i))));

你的代码有两个大问题。

1)你显然使用Windows(pathC:\\Temp ),但是 – 正如Axel Richter在注释中已经指出的 – 你正在使用输出文件的默认编码。 直接使用文件名创buildFileWriter将为您提供平台的默认编码,即Windows的Windows ANSI。 不是你想要的,因为以后你用UTF-8编码XML头部声明。

你不应该依赖平台的默认编码。 通过OutputStreamWriterFileOutputStream通过显式编码创buildPrintWriter,如下所示:

 PrintWriter writer = new PrintWriter(new BufferedWriter( new OutputStreamWriter( new FileOutputStream("iTicker.xml"), StandardCharsets.UTF_8))); 

2)像你一样手动编写XML是不好的做法。 如果你这样做,你应该照顾像“<”,“>”和“&”的特殊字符。 总是build议使用一个库,它会自动转义。 Java标准库的一部分是例如接口XMLStreamWriter的实现。

这里有一个很容易使用的例子:

 import java.io.BufferedOutputStream; import java.io.File; import java.io.FileOutputStream; import java.io.OutputStream; import javax.xml.stream.XMLOutputFactory; import javax.xml.stream.XMLStreamWriter; public class WriteXml { public static void main(String[] args) { try { File outFile = new File("iTicker.xml"); // Outputstream for the XML document. The XMLStreamWriter should take care of the right encoding. OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile)); XMLStreamWriter xmlWriter = XMLOutputFactory.newInstance().createXMLStreamWriter(out); xmlWriter.writeStartDocument("UTF-8", "1.0"); xmlWriter.writeCharacters("\n"); xmlWriter.writeStartElement("root"); xmlWriter.writeNamespace("xsi", "http://www.w3.org/3921/XMLSchema-instance"); xmlWriter.writeCharacters("\n "); xmlWriter.writeStartElement("element"); // Some special characters and (I hope) some Marathi letters xmlWriter.writeCharacters("<>&\": मराठी वर्णमाला"); xmlWriter.writeEndElement(); // element xmlWriter.writeCharacters("\n"); xmlWriter.writeEndElement(); // root xmlWriter.writeEndDocument(); xmlWriter.close(); // should be better in a finally block out.close(); // should be better handled automatically by try-with-resources } catch(Exception e) { e.printStackTrace(); } } } 

这将创build以下XML:

 <?xml version="1.0" encoding="UTF-8"?> <root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance"> <element>&lt;&gt;&amp;": मराठी वर्णमाला</element> </root>