Python - How To Stream Large (11 Gb) Json File To Be Broken Up
Solution 1:
jq 1.5 has a streaming parser (documented at http://stedolan.github.io/jq/manual/#Streaming). In one sense it's easy to use, e.g. if your 1G file is named 1G.json, then the following command will produce a stream of lines, including one line per "leaf" value:
jq -c --stream . 1G.json
(The output is shown below. Notice that each line is itself valid JSON.)
However, using the streamed output may not be so easy, but that depends on what you want to do :-)
The key to understanding the streamed output is that most lines have the form:
[ PATH, VALUE ]
where "PATH" is an array representation of the path. (When using jq, this array can in fact be used as a path.)
[["actor","classification",0],"suggested"]
[["actor","classification",0]]
[["actor","displayName"],"myself"]
[["actor","followersCount"],0]
[["actor","followingCount"],0]
[["actor","followingStocksCount"],0]
[["actor","id"],"person:stocktwits:183087"]
[["actor","image"],"http://avatars.stocktwits.com/production/183087/thumb-1350332393.png"]
[["actor","link"],"http://stocktwits.com/myselfbtc"]
[["actor","links",0,"href"],null]
[["actor","links",0,"rel"],"me"]
[["actor","links",0,"rel"]]
[["actor","links",0]]
[["actor","objectType"],"person"]
[["actor","preferredUsername"],"myselfbtc"]
[["actor","statusesCount"],2]
[["actor","summary"],null]
[["actor","tradingStrategy","approach"],"Technical"]
[["actor","tradingStrategy","assetsFrequentlyTraded",0],"Forex"]
[["actor","tradingStrategy","assetsFrequentlyTraded",0]]
[["actor","tradingStrategy","experience"],"Novice"]
[["actor","tradingStrategy","holdingPeriod"],"Day Trader"]
[["actor","tradingStrategy","holdingPeriod"]]
[["actor","tradingStrategy"]]
[["body"],"$BCOIN and macd is going down ..... http://stks.co/iDEB"]
[["entities","chart","fullImage","link"],"http://charts.stocktwits.com/production/original_10047145.png"]
[["entities","chart","fullImage","link"]]
[["entities","chart","image","link"],"http://charts.stocktwits.com/production/small_10047145.png"]
[["entities","chart","image","link"]]
[["entities","chart","link"],"http://stks.co/iDEB"]
[["entities","chart","objectType"],"image"]
[["entities","chart","objectType"]]
[["entities","sentiment","basic"],"Bearish"]
[["entities","sentiment","basic"]]
[["entities","stocks",0,"displayName"],"Bitcoin"]
[["entities","stocks",0,"exchange"],"PRIVATE"]
[["entities","stocks",0,"industry"],null]
[["entities","stocks",0,"sector"],null]
[["entities","stocks",0,"stocktwits_id"],9659]
[["entities","stocks",0,"symbol"],"BCOIN"]
[["entities","stocks",0,"symbol"]]
[["entities","stocks",0]]
[["entities","video"],null]
[["entities","video"]]
[["gnip","language","value"],"en"]
[["gnip","language","value"]]
[["gnip","language"]]
[["id"],"tag:gnip.stocktwits.com:2012:note/10047145"]
[["inReplyTo","id"],"tag:gnip.stocktwits.com:2012:note/10046953"]
[["inReplyTo","objectType"],"comment"]
[["inReplyTo","objectType"]]
[["link"],"http://stocktwits.com/myselfbtc/message/10047145"]
[["object","id"],"note:stocktwits:10047145"]
[["object","link"],"http://stocktwits.com/myselfbtc/message/10047145"]
[["object","objectType"],"note"]
[["object","postedTime"],"2012-10-17T19:13:50Z"]
[["object","summary"],"$BCOIN and macd is going down ..... http://stks.co/iDEB"]
[["object","updatedTime"],"2012-10-17T19:13:50Z"]
[["object","updatedTime"]]
[["provider","displayName"],"StockTwits"]
[["provider","link"],"http://stocktwits.com"]
[["provider","link"]]
[["verb"],"post"]
[["verb"]]
Solution 2:
I think you need something like a stream parser. ijson may work:
https://changelog.com/ijson-parse-streams-of-json-in-python/
Post a Comment for "Python - How To Stream Large (11 Gb) Json File To Be Broken Up"