正则expression式从CSV中删除加倍的双引号
我有一个Excel表格,它有一个从SQL数据库的Python字典forms的一列中的大量数据。 我无权访问原始数据库,并且由于CSV每一行上的键/值不是相同的顺序,因此无法使用本地infile命令将CSV导入到sql。 当我将excel表导出为CSV时,我得到:
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}" "{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"
在键/值周围删除“大括号和大括号之前和之后”的最佳方法是什么?
我也需要离开那些没有引号的整数。
我试图然后导入到python与json模块,以便我可以打印特定的键,但我不能导入他们与双重双引号。 我最终需要将数据保存在如下所示的文件中:
{"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34}
任何帮助是最感激!
如果input文件如上所示,并且您提到的尺寸较小,则可以将整个文件加载到内存中,进行replace,然后保存。 恕我直言,你不需要RegEx来做到这一点。 最简单的阅读代码是:
with open(filename) as f: input= f.read() input= str.replace('""','"') input= str.replace('"{','{') input= str.replace('}"','}') with open(filename, "w") as f: f.write(input)
我用样本inputtesting了它,它产生:
{"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34}
这正是你想要的。
如果你愿意,你也可以打包代码并写入
with open(inputFilename) as if: with open(outputFilename, "w") as of: of.write(if.read().replace('""','"').replace('"{','{').replace('}"','}'))
但我认为第一个更清晰,两者都完全一样。
简单:
text = re.sub(r'"(?!")', '', text)
给定input文件:TEST.TXT:
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"
剧本:
import re f = open("TEST.TXT","r") text_in = f.read() text_out = re.sub(r'"(?!")', '', text_in) print(text_out)
产生以下输出:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
这应该做到这一点:
with open('old.csv') as old, open('new.csv', 'w') as new: new.writelines(re.sub(r'"(?!")', '', line) for line in old)
我想你是在解决问题,为什么不replace数据呢?
l = list() with open('foo.txt') as f: for line in f: l.append(line.replace('""','"').replace('"{','{').replace('}"','}')) s = ''.join(l) print s # or save it to file
它产生:
{"first_name":"John","last_name":"Smith","age":30} {"first_name":"Tim","last_name":"Johnson","age":34}
使用list
来存储中间行,然后调用.join
来提高性能,正如在附加到string的好方法中所解释的那样
你可以实际使用csv模块和正则expression式来做到这一点:
st='''\ "{""first_name"":""John"",""last_name"":""Smith"",""age"":30}" "{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"\ ''' import csv, re data=[] reader=csv.reader(st, dialect='excel') for line in reader: data.extend(line) s=re.sub(r'(\w+)',r'"\1"',''.join(data)) s=re.sub(r'({[^}]+})',r'\1\n',s).strip() print s
打印
{"first_name":"John","last_name":"Smith","age":"30"} {"first_name":"Tim","last_name":"Johnson","age":"34"}